mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-09 16:45:45 +01:00
1.3 KiB
1.3 KiB
title | date | author | categories | |
---|---|---|---|---|
June, 2024 | 2024-06-03T14:14:00+03:00 | Alan Orth |
|
2024-06-03
- Working on IFPRI datasets
- I noticed the licenses were missing from Nilam's original file so I found a way to check Dataverse's API for a persistent identifier
- We have both Handles and DOIs for these datasets, both from Harvard's Dataverse
- I used this GREL in OpenRefine to create a new column based on URLs using the DOI (uppercasing the DOI for Dataverse):
"https://dataverse.harvard.edu/api/datasets/export?exporter=dataverse_json&persistentId=doi:" + value.split('https://doi.org/')[-1].toUppercase()
- Then I was able to extract the license text from the JSON response using:
value.parseJson()['datasetVersion']['termsOfUse']
- Similar for the Handle...
2024-06-04
- Some Dataverse entries have the license in
['datasetVersion']['license']
instead... - I finalized cleaning the 722 IFPRI datasets and uploaded them to CGSpace
2024-06-14
- Minor cleanups on IFPRI's 2016–2019 batch migration file
- I will start with duplicates on unique identifiers like DOIs