--- title: "June, 2024" date: 2024-06-03T14:14:00+03:00 author: "Alan Orth" categories: ["Notes"] --- ## 2024-06-03 - Working on IFPRI datasets - I noticed the licenses were missing from Nilam's original file so I found a way to check [Dataverse's API for a persistent identifier](https://guides.dataverse.org/en/latest/api/native-api.html#export-metadata-of-a-dataset-in-various-formats) - We have both Handles and DOIs for these datasets, both from Harvard's Dataverse - I used this GREL in OpenRefine to create a new column based on URLs using the DOI (uppercasing the DOI for Dataverse): ``` "https://dataverse.harvard.edu/api/datasets/export?exporter=dataverse_json&persistentId=doi:" + value.split('https://doi.org/')[-1].toUppercase() ``` - Then I was able to extract the license text from the JSON response using: ``` value.parseJson()['datasetVersion']['termsOfUse'] ``` - Similar for the Handle... ## 2024-06-04 - Some Dataverse entries have the license in `['datasetVersion']['license']` instead... - I finalized cleaning the 722 IFPRI datasets and uploaded them to CGSpace ## 2024-06-14 - Minor cleanups on IFPRI's 2016–2019 batch migration file - I will start with duplicates on unique identifiers like DOIs ## 2026-06-18 - Merge and upload metadata for duplicates in IFPRI's 2016–2019 set: - 144 exact match on CGSpace via DOI, type, and date - 32 with CGSpace handles - I also spent some time converting the `ilri/post_bitstreams.py` script to use the DSpace 7 REST API via dspace-rest-client - There are 28 PDFs specified for these 176 duplicates, and a handful of them do not already exist on CGSpace so I will upload them