Add notes for 2021-10-09

This commit is contained in:
2021-10-09 22:00:59 +03:00
parent 23d6a808fc
commit ab8cb272ea
26 changed files with 79 additions and 31 deletions

View File

@ -248,5 +248,32 @@ if(cells['dcterms.subject[en_US]'].value == cells['dcterms.subject[en_Fu]'].valu
- For these rows I starred them and then blanked out the original field so DSpace would see it as a removal, and add the new column
- After these are uploaded I will normalize the `text_lang` fields in PostgreSQL again
- I did the same for CIAT but there were over 7,000 duplicate metadata values! Hard to believe:
```console
$ grep -c 'Removing duplicate value' /tmp/out.log
7720
```
- I applied these to the CIAT community, so in total that's over 8,000 duplicate metadata values removed in a handful of fields...
## 2021-10-09
- I did similar metadata cleanups for CCAFS and IITA too, but there were only a few hundred duplicates there
- Also of note, there are some other fixes too, for example in IITA's community:
```console
$ grep -c -E '(Fixing|Removing) (duplicate|excessive|invalid)' /tmp/out.log
249
```
- I ran a full Discovery re-indexing on CGSpace
- Then I exported all of CGSpace and extracted the ISSNs and ISBNs:
```console
$ csvcut -c 'id,cg.issn[en_US],dc.identifier.issn[en_US],cg.isbn[en_US],dc.identifier.isbn[en_US]' /tmp/cgspace.csv > /tmp/cgspace-issn-isbn.csv
```
- I did cleanups on about seventy items with invalid and mixed ISSNs/ISBNs
<!-- vim: set sw=2 ts=2: -->