Add notes

This commit is contained in:
2024-03-19 09:01:13 +03:00
parent 83c053f7ee
commit 90c4d46607
41 changed files with 141 additions and 48 deletions

View File

@ -169,7 +169,7 @@ $ csvjoin --outer -c alpha2 ~/Downloads/clarisa-countries.csv ~/Downloads/UNSD\
- Then re-export the UN M.49 countries to a clean list because the one I did yesterday somehow has errors:
```console
csvcut -d ';' -c 'ISO-alpha2 Code,Country or Area' ~/Downloads/UNSD\ \ Methodology.csv | sed -e '1s/ISO-alpha2 Code/alpha2/' -e '1s/Country or Area/UN M.49 Name/' > ~/Downloads/un-countries.csv
$ csvcut -d ';' -c 'ISO-alpha2 Code,Country or Area' ~/Downloads/UNSD\ \ Methodology.csv | sed -e '1s/ISO-alpha2 Code/alpha2/' -e '1s/Country or Area/UN M.49 Name/' > ~/Downloads/un-countries.csv
```
- Check the number of lines in each file:

View File

@ -113,4 +113,48 @@ $ csvcut -c 'id,dc.title[en_US],dc.identifier.uri[en_US],cg.link.permalink[en_US
SELECT ds6_item2itemhandle(dspace_object_id) AS handle FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item WHERE NOT discoverable) AND metadata_field_id=28 AND text_value LIKE 'Submitted by Alliance TIP Submit%';
```
## 2024-03-14
- Looking in to reports of rate limiting of Altmetric's bot on CGSpace
- I don't see any HTTP 429 responses for their user agents in any of our logs...
- I tried myself on an item page and never hit a limit...
```console
$ for num in {1..60}; do echo -n "Request ${num}: "; curl -s -o /dev/null -w "%{http_code}" https://dspace7test.ilri.org/items/c9b8999d-3001-42ba-a267-14f4bfa90b53 && echo; done
Request 1: 200
Request 2: 200
Request 3: 200
Request 4: 200
...
Request 60: 200
```
- All responses were HTTP 200...
- In any case, I whitelisted their production IPs and told them to try again
- I imported 468 of IFPRI's 2023 records that were confirmed to not be duplicates to CGSpace
- I also spent some time merging metadata from 415 of the remaining 432 duplicates with the metadata for the existing items on CGSpace
- This was a bit of dirty work using csvkit, xsv, and OpenRefine
## 2024-03-17
- There are 17 records from IFPRI's 2023 batch that are remaining from the 432 that I identified as already being on CGSpace
- These are different in that they are duplicates on CGSpace as well, so the csvjoin failed and the metadata got messed up in my migration
- I looked closer and whittled this down to 14 actual records, and spent some time working on them
- I isolated 12 of these items that existed on CGSpace and added publication ranks, project identifiers, and provenance links
- Now there only remain two confusing records about the Inkomati catchment
## 2024-03-18
- Checking to see how many IFPRI records we have migrated so far:
```console
$ csvgrep -c 'dc.description.provenance[en_US]' -m 'Original URL from IFPRI CONTENTdm' cgspace.csv \
| csvcut -c 'id,dc.title[en_US],dc.identifier.uri[en_US],dc.description.provenance[en_US],dcterms.type[en_US]' \
| tee /tmp/ifpri-records.csv \
| csvstat --count
898
```
- I finalized the remaining two on Inkomati catchment and now we are at 900!
<!-- vim: set sw=2 ts=2: -->