cgspace-notes/content/posts/2024-06.md
2024-06-16 16:40:54 +03:00

41 lines
1.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "June, 2024"
date: 2024-06-03T14:14:00+03:00
author: "Alan Orth"
categories: ["Notes"]
---
## 2024-06-03
- Working on IFPRI datasets
- I noticed the licenses were missing from Nilam's original file so I found a way to check [Dataverse's API for a persistent identifier](https://guides.dataverse.org/en/latest/api/native-api.html#export-metadata-of-a-dataset-in-various-formats)
- We have both Handles and DOIs for these datasets, both from Harvard's Dataverse
<!--more-->
- I used this GREL in OpenRefine to create a new column based on URLs using the DOI (uppercasing the DOI for Dataverse):
```
"https://dataverse.harvard.edu/api/datasets/export?exporter=dataverse_json&persistentId=doi:" + value.split('https://doi.org/')[-1].toUppercase()
```
- Then I was able to extract the license text from the JSON response using:
```
value.parseJson()['datasetVersion']['termsOfUse']
```
- Similar for the Handle...
## 2024-06-04
- Some Dataverse entries have the license in `['datasetVersion']['license']` instead...
- I finalized cleaning the 722 IFPRI datasets and uploaded them to CGSpace
## 2024-06-14
- Minor cleanups on IFPRI's 20162019 batch migration file
- I will start with duplicates on unique identifiers like DOIs
<!-- vim: set sw=2 ts=2: -->