cgspace-notes/content/posts/2024-06.md

49 lines
1.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "June, 2024"
date: 2024-06-03T14:14:00+03:00
author: "Alan Orth"
categories: ["Notes"]
---
## 2024-06-03
- Working on IFPRI datasets
- I noticed the licenses were missing from Nilam's original file so I found a way to check [Dataverse's API for a persistent identifier](https://guides.dataverse.org/en/latest/api/native-api.html#export-metadata-of-a-dataset-in-various-formats)
- We have both Handles and DOIs for these datasets, both from Harvard's Dataverse
<!--more-->
- I used this GREL in OpenRefine to create a new column based on URLs using the DOI (uppercasing the DOI for Dataverse):
```
"https://dataverse.harvard.edu/api/datasets/export?exporter=dataverse_json&persistentId=doi:" + value.split('https://doi.org/')[-1].toUppercase()
```
- Then I was able to extract the license text from the JSON response using:
```
value.parseJson()['datasetVersion']['termsOfUse']
```
- Similar for the Handle...
## 2024-06-04
- Some Dataverse entries have the license in `['datasetVersion']['license']` instead...
- I finalized cleaning the 722 IFPRI datasets and uploaded them to CGSpace
## 2024-06-14
- Minor cleanups on IFPRI's 20162019 batch migration file
- I will start with duplicates on unique identifiers like DOIs
## 2026-06-18
- Merge and upload metadata for duplicates in IFPRI's 20162019 set:
- 144 exact match on CGSpace via DOI, type, and date
- 32 with CGSpace handles
- I also spent some time converting the `ilri/post_bitstreams.py` script to use the DSpace 7 REST API via dspace-rest-client
- There are 28 PDFs specified for these 176 duplicates, and a handful of them do not already exist on CGSpace so I will upload them
<!-- vim: set sw=2 ts=2: -->