mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2021-08-09
This commit is contained in:
@ -178,5 +178,37 @@ $ csvcut -c 'id,cg.issn,cg.issn[],cg.issn[en],cg.issn[en_US],cg.isbn,cg.isbn[],c
|
||||
- Then in OpenRefine I merged all null, blank, and en fields into the `en_US` one for each, removed all spaces, fixed invalid multi-value separators, removed everything other than ISSN/ISBNs themselves
|
||||
- In total it was a few thousand metadata entries or so so I had to split the CSV with `xsv split` in order to process it
|
||||
- I was reminded again how DSpace 6 is very fucking slow when it comes to any database-related operations, as it takes over an hour to process 200 metadata changes...
|
||||
- In total it was 1,195 changes to ISSN and ISBN metadata fields
|
||||
|
||||
## 2021-08-09
|
||||
|
||||
- Extract all unique ISSNs to look up on Sherpa Romeo and Crossref
|
||||
|
||||
```console
|
||||
$ csvcut -c 'cg.issn[en_US]' ~/Downloads/2021-08-08-CGSpace-ISBN-ISSN.csv | csvgrep -c 1 -r '^[0-9]{4}' | sed 1d | sort | uniq > /tmp/2021-08-09-issns.txt
|
||||
$ ./ilri/sherpa-issn-lookup.py -a mehhhhhhhhhhhhh -i /tmp/2021-08-09-issns.txt -o /tmp/2021-08-09-journals-sherpa-romeo.csv
|
||||
$ ./ilri/crossref-issn-lookup.py -e me@cgiar.org -i /tmp/2021-08-09-issns.txt -o /tmp/2021-08-09-journals-crossref.csv
|
||||
```
|
||||
|
||||
- Then I updated the CSV headers for each and joined the CSVs on the issn column:
|
||||
|
||||
```console
|
||||
$ sed -i '1s/journal title/sherpa romeo journal title/' /tmp/2021-08-09-journals-sherpa-romeo.csv
|
||||
$ sed -i '1s/journal title/crossref journal title/' /tmp/2021-08-09-journals-crossref.csv
|
||||
$ csvjoin -c issn /tmp/2021-08-09-journals-sherpa-romeo.csv /tmp/2021-08-09-journals-crossref.csv > /tmp/2021-08-09-journals-all.csv
|
||||
```
|
||||
|
||||
- In OpenRefine I faceted by blank in each column and copied the values from the other, then created a new column to indicate whether the values were the same with this GREL:
|
||||
|
||||
```console
|
||||
if(cells['sherpa romeo journal title'].value == cells['crossref journal title'].value,"same","different")
|
||||
```
|
||||
|
||||
- Then I exported the list of journals that differ and sent it to Peter for comments and corrections
|
||||
- I want to build an updated controlled vocabulary so I can update CGSpace and reconcile our existing metadata against it
|
||||
- Convert my `generate-thumbnails.py` script to use libvips instead of Graphicsmagick
|
||||
- It is faster and uses less memory than GraphicsMagick (and ImageMagick), and produces nice thumbnails from PDFs
|
||||
- One drawback is that libvips uses Poppler instead of Graphicsmagick, which apparently means that it can't work in CMYK
|
||||
- I tested one item (10568/51999) that uses CMYK and the thumbnail looked OK (closer to the original than GraphicsMagick), so I'm not sure...
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user