Add notes for 2019-09-19

2025-01-27 05:49:12 +01:00 · 2019-09-19 18:20:04 +03:00
parent 9d5c1a6e13
commit 63a28eff29
77 changed files with 262 additions and 83 deletions
--- a/content/posts/2019-09.md
+++ b/content/posts/2019-09.md
@ -152,4 +152,86 @@ dspace.log.2019-09-15:808
 - I restarted Tomcat and the item views came back, but then the Solr statistics cores didn't all load properly
  - After restarting Tomcat once again, both the item views and the Solr statistics cores all came back OK

+## 2019-09-19
+
+- For some reason my podman PostgreSQL container isn't working so I had to use Docker to re-create it for my testing work today:
+
+```
+# docker pull docker.io/library/postgres:9.6-alpine
+# docker create volume dspacedb_data
+# docker run --name dspacedb -v dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine
+$ createuser -h localhost -U postgres --pwprompt dspacetest
+$ createdb -h localhost -U postgres -O dspacetest --encoding=UNICODE dspacetest
+$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest superuser;'
+$ pg_restore -h localhost -U postgres -d dspacetest -O --role=dspacetest -h localhost ~/Downloads/cgspace_2019-08-31.backup
+$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;'
+$ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest
+```
+
+- Elizabeth from CIAT sent me a list of sixteen authors who need to have their ORCID identifiers tagged with their publications
+  - I manually checked the ORCID profile links to make sure they matched the names
+  - Then I created an input file to use with my `add-orcid-identifiers-csv.py` script:
+
+```
+dc.contributor.author,cg.creator.id
+"Kihara, Job","Job Kihara: 0000-0002-4394-9553"
+"Twyman, Jennifer","Jennifer Twyman: 0000-0002-8581-5668"
+"Ishitani, Manabu","Manabu Ishitani: 0000-0002-6950-4018"
+"Arango, Jacobo","Jacobo Arango: 0000-0002-4828-9398"
+"Chavarriaga Aguirre, Paul","Paul Chavarriaga-Aguirre: 0000-0001-7579-3250"
+"Paul, Birthe","Birthe Paul: 0000-0002-5994-5354"
+"Eitzinger, Anton","Anton Eitzinger: 0000-0001-7317-3381"
+"Hoek, Rein van der","Rein van der Hoek: 0000-0003-4528-7669"
+"Aranzales Rondón, Ericson","Ericson Aranzales Rondon: 0000-0001-7487-9909"
+"Staiger-Rivas, Simone","Simone Staiger: 0000-0002-3539-0817"
+"de Haan, Stef","Stef de Haan: 0000-0001-8690-1886"
+"Pulleman, Mirjam","Mirjam Pulleman: 0000-0001-9950-0176"
+"Abera, Wuletawu","Wuletawu Abera: 0000-0002-3657-5223"
+"Tamene, Lulseged","Lulseged Tamene: 0000-0002-3806-8890"
+"Andrieu, Nadine","Nadine Andrieu: 0000-0001-9558-9302"
+"Ramírez-Villegas, Julián","Julian Ramirez-Villegas: 0000-0002-8044-583X"
+```
+
+- I tested the file on my local development machine with the following invocation:
+
+```
+$ ./add-orcid-identifiers-csv.py -i 2019-09-19-ciat-orcids.csv -db dspace -u dspace -p 'fuuu'
+```
+
+- In my test environment this added 390 ORCID identifier
+- I ran the same updates on CGSpace and DSpace Test and then started a Discovery re-index to force the search index to update
+- Update the PostgreSQL JDBC driver to version 42.2.8 in our [Ansible infrastructure scripts](https://github.com/ilri/rmg-ansible-public)
+  - There is only [one minor fix to a usecase we aren't using](https://github.com/pgjdbc/pgjdbc/issues/1567) so I will deploy this on the servers the next time I do updates
+- Run system updates on DSpace Test (linode19) and reboot it
+- Start looking at IITA's latest round of batch updates that Sisay had [uploaded to DSpace Test](https://dspacetest.cgiar.org/handle/10568/105486) earlier this month
+  - For posterity, IITA's original input file was 20196th.xls and Sisay uploaded it as "IITA_Sep_06" to DSpace Test
+  - Sisay said he did ran the csv-metadata-quality script on the records, but I assume he didn't run the unsafe fixes or AGROVOC checks because I still see unneccessary Unicode, excessive whitespace, one invalid ISBN, missing dates and a few invalid AGROVOC fields
+  - In addition, a few records were missing authorship type
+  - I deleted two invalid AGROVOC terms because they were ambiguous
+  - Validate and normalize affiliations against our 2019-04 list using reconcile-csv and OpenRefine:
+    - `$ lein run ~/src/git/DSpace/2019-04-08-affiliations.csv name id`
+    - I always forget how to copy the reconciled values in OpenRefine, but you need to make a new colum and populate it using this GREL: `if(cell.recon.matched, cell.recon.match.name, value)`
+  - I also looked through the IITA subjects to normalize some values
+- Follow up with Marissa again about the CCAFS phase II project tags
+- Generate a list of the top 1500 authors on CGSpace:
+
+```
+dspace=# \copy (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = (SELECT metadata_field_id FROM metadatafieldregistry WHERE element = 'contributor' AND qualifier = 'author') AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2019-09-19-top-1500-authors.csv WITH CSV HEADER;
+```
+
+- Then I used `csvcut` to select the column of author names, strip the header and quote characters, and saved the sorted file:
+
+```
+$ csvcut -c text_value /tmp/2019-09-19-top-1500-authors.csv | grep -v text_value | sed 's/"//g' | sort > dspace/config/controlled-vocabularies/dc-contributor-author.xml
+```
+
+- After adding the XML formatting back to the file I formatted it using XML tidy:
+
+```
+$ tidy -xml -utf8 -m -iq -w 0 dspace/config/controlled-vocabularies/dc-contributor-author.xml
+```
+
+- I created and merged [a pull request for the updates](https://github.com/ilri/DSpace/pull/433)
+  - This is the first time we've updated this controlled vocabulary since 2018-09
+
 <!-- vim: set sw=2 ts=2: -->