Add notes for 2022-01-30

This commit is contained in:
2022-01-31 09:00:59 +03:00
parent 673f718ef3
commit ed9fb3fe99
26 changed files with 124 additions and 32 deletions

View File

@ -188,5 +188,37 @@ $ grep -E '^2022-01*' /var/log/postgresql/postgresql-10-main.log | grep -c 'stil
- I included the id because I will need a unique field to join the resulting list of non-duplicates with the original CSV where the rest of the metadata and filenames are
- Since these items are not in DSpace yet, I generated simple numeric IDs in OpenRefine using this GREL transform: `row.index + 1`
- Then I ran `check-duplicates.py` on items 1200 and sent the resulting CSV to Gaia
- Delete one duplicate item I saw in IITA's Journal Articles that was uploaded earlier in WLE
- Also do some general cleanup on IITA's Journal Articles collection in OpenRefine
- Delete one duplicate item I saw in ILRI's Journal Articles collection
- Also do some general cleanup on ILRI's Journal Articles collection in OpenRefine and csv-metadata-quality
## 2022-01-29
- I did some more cleanup on the ILRI Journal Articles
- I added missing journal titles for items that had ISSNs
- Then I added pages for items that had them in the citation
- First, I faceted the citation field based on whether or not the item had something like ": 232-234" present:
```console
value.contains(/:\s?\d+(-|)\d+/)
```
- Then I faceted by blank on `dcterms.extent` and did a transform to extract the page information for over 1,000 items!
```console
'p. ' +
cells['dcterms.bibliographicCitation[en_US]'].value.match(/.*:\s?(\d+)(-|)(\d+).*/)[0] +
'-' +
cells['dcterms.bibliographicCitation[en_US]'].value.match(/.*:\s?(\d+)(-|)(\d+).*/)[2]
```
- Then I did similar for `cg.volume` and `cg.issue`, also based on the citation, for example to extract the "16" from "Journal of Blah 16(1)", where "16" is the second capture group in a zero-based match:
```console
cells['dcterms.bibliographicCitation[en_US]'].value.match(/.*( |;)(\d+)\((\d+)\).*/)[1]
```
- This was 3,000 items so I imported the changes on CGSpace 1,000 at a time...
<!-- vim: set sw=2 ts=2: -->