Add notes for 2022-02-14

This commit is contained in:
2022-02-14 16:43:12 +03:00
parent e3109b7483
commit 9b4498de04
26 changed files with 107 additions and 31 deletions

View File

@ -374,4 +374,42 @@ sys 3m2.459s
- Start a full harvest on AReS
## 2022-02-14
- Last week Gaia sent me her notes on the second batch of TAC/ICW documents (items 201400 in the spreadsheet)
- I created a filter in LibreOffice and selected the IDs for items with the action "delete", then I created a custom text facet in OpenRefine with this GREL:
```
or(
isNotNull(value.match('201')),
isNotNull(value.match('203')),
isNotNull(value.match('209')),
isNotNull(value.match('209')),
isNotNull(value.match('215')),
isNotNull(value.match('220')),
isNotNull(value.match('225')),
isNotNull(value.match('226')),
isNotNull(value.match('227')),
...
isNotNull(value.match('396'))
```
- Then I flagged all matching records and exported a CSV to use with SAFBuilder
- Then I imported the SAF bundle on DSpace Test:
```console
$ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" dspace import --add --eperson=fuuu@umm.com --source /tmp/SimpleArchiveFormat --mapfile=./2022-02-14-tac-batch2-201to400.map
```
- Export the next batch from OpenRefine (items with ID 401 to 700), check duplicates, and then join with the file names:
```console
$ csvcut -c id,dc.title,dcterms.issued,dcterms.type ~/Downloads/2022-01-21-CGSpace-TAC-ICW-batch3-401to700.csv > /tmp/tac3.csv
$ ./ilri/check-duplicates.py -i /tmp/tac3.csv -db dspacetest -u dspacetest -p 'dom@in34sniper' -o /tmp/2022-02-14-tac-batch3-401-700.csv
$ csvcut -c id,filename ~/Downloads/2022-01-21-CGSpace-TAC-ICW-batch3-401to700.csv > /tmp/tac3-filenames.csv
$ csvjoin -c id /tmp/2022-02-14-tac-batch3-401-700.csv /tmp/tac3-filenames.csv > /tmp/2022-02-14-tac-batch3-401-700-filenames.csv
```
- I sent these 300 items to Gaia...
<!-- vim: set sw=2 ts=2: -->