2024-06-03
- Working on IFPRI datasets
- I used this GREL in OpenRefine to create a new column based on URLs using the DOI (uppercasing the DOI for Dataverse):
"https://dataverse.harvard.edu/api/datasets/export?exporter=dataverse_json&persistentId=doi:" + value.split('https://doi.org/')[-1].toUppercase()
- Then I was able to extract the license text from the JSON response using:
value.parseJson()['datasetVersion']['termsOfUse']
2024-06-04
- Some Dataverse entries have the license in
['datasetVersion']['license']
instead…
- I finalized cleaning the 722 IFPRI datasets and uploaded them to CGSpace
2024-06-14
- Minor cleanups on IFPRI’s 2016–2019 batch migration file
- I will start with duplicates on unique identifiers like DOIs
2026-06-18
- Merge and upload metadata for duplicates in IFPRI’s 2016–2019 set:
- 144 exact match on CGSpace via DOI, type, and date
- 32 with CGSpace handles
- I also spent some time converting the
ilri/post_bitstreams.py
script to use the DSpace 7 REST API via dspace-rest-client
- There are 28 PDFs specified for these 176 duplicates, and a handful of them do not already exist on CGSpace so I will upload them