mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes
This commit is contained in:
@ -169,7 +169,7 @@ $ csvjoin --outer -c alpha2 ~/Downloads/clarisa-countries.csv ~/Downloads/UNSD\
|
||||
- Then re-export the UN M.49 countries to a clean list because the one I did yesterday somehow has errors:
|
||||
|
||||
```console
|
||||
csvcut -d ';' -c 'ISO-alpha2 Code,Country or Area' ~/Downloads/UNSD\ —\ Methodology.csv | sed -e '1s/ISO-alpha2 Code/alpha2/' -e '1s/Country or Area/UN M.49 Name/' > ~/Downloads/un-countries.csv
|
||||
$ csvcut -d ';' -c 'ISO-alpha2 Code,Country or Area' ~/Downloads/UNSD\ —\ Methodology.csv | sed -e '1s/ISO-alpha2 Code/alpha2/' -e '1s/Country or Area/UN M.49 Name/' > ~/Downloads/un-countries.csv
|
||||
```
|
||||
|
||||
- Check the number of lines in each file:
|
||||
|
@ -113,4 +113,48 @@ $ csvcut -c 'id,dc.title[en_US],dc.identifier.uri[en_US],cg.link.permalink[en_US
|
||||
SELECT ds6_item2itemhandle(dspace_object_id) AS handle FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item WHERE NOT discoverable) AND metadata_field_id=28 AND text_value LIKE 'Submitted by Alliance TIP Submit%';
|
||||
```
|
||||
|
||||
## 2024-03-14
|
||||
|
||||
- Looking in to reports of rate limiting of Altmetric's bot on CGSpace
|
||||
- I don't see any HTTP 429 responses for their user agents in any of our logs...
|
||||
- I tried myself on an item page and never hit a limit...
|
||||
|
||||
```console
|
||||
$ for num in {1..60}; do echo -n "Request ${num}: "; curl -s -o /dev/null -w "%{http_code}" https://dspace7test.ilri.org/items/c9b8999d-3001-42ba-a267-14f4bfa90b53 && echo; done
|
||||
Request 1: 200
|
||||
Request 2: 200
|
||||
Request 3: 200
|
||||
Request 4: 200
|
||||
...
|
||||
Request 60: 200
|
||||
```
|
||||
|
||||
- All responses were HTTP 200...
|
||||
- In any case, I whitelisted their production IPs and told them to try again
|
||||
- I imported 468 of IFPRI's 2023 records that were confirmed to not be duplicates to CGSpace
|
||||
- I also spent some time merging metadata from 415 of the remaining 432 duplicates with the metadata for the existing items on CGSpace
|
||||
- This was a bit of dirty work using csvkit, xsv, and OpenRefine
|
||||
|
||||
## 2024-03-17
|
||||
|
||||
- There are 17 records from IFPRI's 2023 batch that are remaining from the 432 that I identified as already being on CGSpace
|
||||
- These are different in that they are duplicates on CGSpace as well, so the csvjoin failed and the metadata got messed up in my migration
|
||||
- I looked closer and whittled this down to 14 actual records, and spent some time working on them
|
||||
- I isolated 12 of these items that existed on CGSpace and added publication ranks, project identifiers, and provenance links
|
||||
- Now there only remain two confusing records about the Inkomati catchment
|
||||
|
||||
## 2024-03-18
|
||||
|
||||
- Checking to see how many IFPRI records we have migrated so far:
|
||||
|
||||
```console
|
||||
$ csvgrep -c 'dc.description.provenance[en_US]' -m 'Original URL from IFPRI CONTENTdm' cgspace.csv \
|
||||
| csvcut -c 'id,dc.title[en_US],dc.identifier.uri[en_US],dc.description.provenance[en_US],dcterms.type[en_US]' \
|
||||
| tee /tmp/ifpri-records.csv \
|
||||
| csvstat --count
|
||||
898
|
||||
```
|
||||
|
||||
- I finalized the remaining two on Inkomati catchment and now we are at 900!
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user