mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-09-20 01:04:47 +02:00
49 lines
1.7 KiB
Markdown
49 lines
1.7 KiB
Markdown
---
|
||
title: "June, 2024"
|
||
date: 2024-06-03T14:14:00+03:00
|
||
author: "Alan Orth"
|
||
categories: ["Notes"]
|
||
---
|
||
|
||
## 2024-06-03
|
||
|
||
- Working on IFPRI datasets
|
||
- I noticed the licenses were missing from Nilam's original file so I found a way to check [Dataverse's API for a persistent identifier](https://guides.dataverse.org/en/latest/api/native-api.html#export-metadata-of-a-dataset-in-various-formats)
|
||
- We have both Handles and DOIs for these datasets, both from Harvard's Dataverse
|
||
|
||
<!--more-->
|
||
|
||
- I used this GREL in OpenRefine to create a new column based on URLs using the DOI (uppercasing the DOI for Dataverse):
|
||
|
||
```
|
||
"https://dataverse.harvard.edu/api/datasets/export?exporter=dataverse_json&persistentId=doi:" + value.split('https://doi.org/')[-1].toUppercase()
|
||
```
|
||
|
||
- Then I was able to extract the license text from the JSON response using:
|
||
|
||
```
|
||
value.parseJson()['datasetVersion']['termsOfUse']
|
||
```
|
||
|
||
- Similar for the Handle...
|
||
|
||
## 2024-06-04
|
||
|
||
- Some Dataverse entries have the license in `['datasetVersion']['license']` instead...
|
||
- I finalized cleaning the 722 IFPRI datasets and uploaded them to CGSpace
|
||
|
||
## 2024-06-14
|
||
|
||
- Minor cleanups on IFPRI's 2016–2019 batch migration file
|
||
- I will start with duplicates on unique identifiers like DOIs
|
||
|
||
## 2026-06-18
|
||
|
||
- Merge and upload metadata for duplicates in IFPRI's 2016–2019 set:
|
||
- 144 exact match on CGSpace via DOI, type, and date
|
||
- 32 with CGSpace handles
|
||
- I also spent some time converting the `ilri/post_bitstreams.py` script to use the DSpace 7 REST API via dspace-rest-client
|
||
- There are 28 PDFs specified for these 176 duplicates, and a handful of them do not already exist on CGSpace so I will upload them
|
||
|
||
<!-- vim: set sw=2 ts=2: -->
|