cgspace-notes/content/posts/2024-06.md

1.7 KiB
Raw Blame History

title date author categories
June, 2024 2024-06-03T14:14:00+03:00 Alan Orth
Notes

2024-06-03

  • Working on IFPRI datasets
  • I used this GREL in OpenRefine to create a new column based on URLs using the DOI (uppercasing the DOI for Dataverse):
"https://dataverse.harvard.edu/api/datasets/export?exporter=dataverse_json&persistentId=doi:" + value.split('https://doi.org/')[-1].toUppercase()
  • Then I was able to extract the license text from the JSON response using:
value.parseJson()['datasetVersion']['termsOfUse']
  • Similar for the Handle...

2024-06-04

  • Some Dataverse entries have the license in ['datasetVersion']['license'] instead...
  • I finalized cleaning the 722 IFPRI datasets and uploaded them to CGSpace

2024-06-14

  • Minor cleanups on IFPRI's 20162019 batch migration file
    • I will start with duplicates on unique identifiers like DOIs

2026-06-18

  • Merge and upload metadata for duplicates in IFPRI's 20162019 set:
    • 144 exact match on CGSpace via DOI, type, and date
    • 32 with CGSpace handles
    • I also spent some time converting the ilri/post_bitstreams.py script to use the DSpace 7 REST API via dspace-rest-client
    • There are 28 PDFs specified for these 176 duplicates, and a handful of them do not already exist on CGSpace so I will upload them