Add notes for 2019-09-19

2025-01-27 05:49:12 +01:00 · 2019-09-19 18:20:04 +03:00
parent 9d5c1a6e13
commit 63a28eff29
77 changed files with 262 additions and 83 deletions
--- a/docs/2019-09/index.html
+++ b/docs/2019-09/index.html
@@ -40,7 +40,7 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
 <meta property="og:type" content="article" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-09/" />
 <meta property="article:published_time" content="2019-09-01T10:17:51+03:00" />
-<meta property="article:modified_time" content="2019-09-12T18:21:43+03:00" />
+<meta property="article:modified_time" content="2019-09-15T17:29:48+03:00" />

 <meta name="twitter:card" content="summary"/>
 <meta name="twitter:title" content="September, 2019"/>
@@ -75,7 +75,7 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
 9124 45.5.186.2

 "/>
-<meta name="generator" content="Hugo 0.58.1" />
+<meta name="generator" content="Hugo 0.58.2" />


    
@@ -85,9 +85,9 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
  "@type": "BlogPosting",
  "headline": "September, 2019",
  "url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-09\/",
-  "wordCount": "1000",
+  "wordCount": "1636",
  "datePublished": "2019-09-01T10:17:51\x2b03:00",
-  "dateModified": "2019-09-12T18:21:43\x2b03:00",
+  "dateModified": "2019-09-15T17:29:48\x2b03:00",
  "author": {
    "@type": "Person",
    "name": "Alan Orth"
@@ -338,6 +338,103 @@ dspace.log.2019-09-15:808
 </ul></li>
 </ul>

+<h2 id="2019-09-19">2019-09-19</h2>
+
+<ul>
+<li><p>For some reason my podman PostgreSQL container isn&rsquo;t working so I had to use Docker to re-create it for my testing work today:</p>
+
+<pre><code># docker pull docker.io/library/postgres:9.6-alpine
+# docker create volume dspacedb_data
+# docker run --name dspacedb -v dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine
+$ createuser -h localhost -U postgres --pwprompt dspacetest
+$ createdb -h localhost -U postgres -O dspacetest --encoding=UNICODE dspacetest
+$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest superuser;'
+$ pg_restore -h localhost -U postgres -d dspacetest -O --role=dspacetest -h localhost ~/Downloads/cgspace_2019-08-31.backup
+$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;'
+$ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest
+</code></pre></li>
+
+<li><p>Elizabeth from CIAT sent me a list of sixteen authors who need to have their ORCID identifiers tagged with their publications</p>
+
+<ul>
+<li>I manually checked the ORCID profile links to make sure they matched the names</li>
+
+<li><p>Then I created an input file to use with my <code>add-orcid-identifiers-csv.py</code> script:</p>
+
+<pre><code>dc.contributor.author,cg.creator.id
+&quot;Kihara, Job&quot;,&quot;Job Kihara: 0000-0002-4394-9553&quot;
+&quot;Twyman, Jennifer&quot;,&quot;Jennifer Twyman: 0000-0002-8581-5668&quot;
+&quot;Ishitani, Manabu&quot;,&quot;Manabu Ishitani: 0000-0002-6950-4018&quot;
+&quot;Arango, Jacobo&quot;,&quot;Jacobo Arango: 0000-0002-4828-9398&quot;
+&quot;Chavarriaga Aguirre, Paul&quot;,&quot;Paul Chavarriaga-Aguirre: 0000-0001-7579-3250&quot;
+&quot;Paul, Birthe&quot;,&quot;Birthe Paul: 0000-0002-5994-5354&quot;
+&quot;Eitzinger, Anton&quot;,&quot;Anton Eitzinger: 0000-0001-7317-3381&quot;
+&quot;Hoek, Rein van der&quot;,&quot;Rein van der Hoek: 0000-0003-4528-7669&quot;
+&quot;Aranzales Rondón, Ericson&quot;,&quot;Ericson Aranzales Rondon: 0000-0001-7487-9909&quot;
+&quot;Staiger-Rivas, Simone&quot;,&quot;Simone Staiger: 0000-0002-3539-0817&quot;
+&quot;de Haan, Stef&quot;,&quot;Stef de Haan: 0000-0001-8690-1886&quot;
+&quot;Pulleman, Mirjam&quot;,&quot;Mirjam Pulleman: 0000-0001-9950-0176&quot;
+&quot;Abera, Wuletawu&quot;,&quot;Wuletawu Abera: 0000-0002-3657-5223&quot;
+&quot;Tamene, Lulseged&quot;,&quot;Lulseged Tamene: 0000-0002-3806-8890&quot;
+&quot;Andrieu, Nadine&quot;,&quot;Nadine Andrieu: 0000-0001-9558-9302&quot;
+&quot;Ramírez-Villegas, Julián&quot;,&quot;Julian Ramirez-Villegas: 0000-0002-8044-583X&quot;
+</code></pre></li>
+</ul></li>
+
+<li><p>I tested the file on my local development machine with the following invocation:</p>
+
+<pre><code>$ ./add-orcid-identifiers-csv.py -i 2019-09-19-ciat-orcids.csv -db dspace -u dspace -p 'fuuu'
+</code></pre></li>
+
+<li><p>In my test environment this added 390 ORCID identifier</p></li>
+
+<li><p>I ran the same updates on CGSpace and DSpace Test and then started a Discovery re-index to force the search index to update</p></li>
+
+<li><p>Update the PostgreSQL JDBC driver to version 42.2.8 in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a></p>
+
+<ul>
+<li>There is only <a href="https://github.com/pgjdbc/pgjdbc/issues/1567">one minor fix to a usecase we aren&rsquo;t using</a> so I will deploy this on the servers the next time I do updates</li>
+</ul></li>
+
+<li><p>Run system updates on DSpace Test (linode19) and reboot it</p></li>
+
+<li><p>Start looking at IITA&rsquo;s latest round of batch updates that Sisay had <a href="https://dspacetest.cgiar.org/handle/10568/105486">uploaded to DSpace Test</a> earlier this month</p>
+
+<ul>
+<li>For posterity, IITA&rsquo;s original input file was 20196th.xls and Sisay uploaded it as &ldquo;IITA_Sep_06&rdquo; to DSpace Test</li>
+<li>Sisay said he did ran the csv-metadata-quality script on the records, but I assume he didn&rsquo;t run the unsafe fixes or AGROVOC checks because I still see unneccessary Unicode, excessive whitespace, one invalid ISBN, missing dates and a few invalid AGROVOC fields</li>
+<li>In addition, a few records were missing authorship type</li>
+<li>I deleted two invalid AGROVOC terms because they were ambiguous</li>
+<li>Validate and normalize affiliations against our 2019-04 list using reconcile-csv and OpenRefine:</li>
+<li><code>$ lein run ~/src/git/DSpace/2019-04-08-affiliations.csv name id</code></li>
+<li>I always forget how to copy the reconciled values in OpenRefine, but you need to make a new colum and populate it using this GREL: <code>if(cell.recon.matched, cell.recon.match.name, value)</code></li>
+<li>I also looked through the IITA subjects to normalize some values</li>
+</ul></li>
+
+<li><p>Follow up with Marissa again about the CCAFS phase II project tags</p></li>
+
+<li><p>Generate a list of the top 1500 authors on CGSpace:</p>
+
+<pre><code>dspace=# \copy (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = (SELECT metadata_field_id FROM metadatafieldregistry WHERE element = 'contributor' AND qualifier = 'author') AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2019-09-19-top-1500-authors.csv WITH CSV HEADER;
+</code></pre></li>
+
+<li><p>Then I used <code>csvcut</code> to select the column of author names, strip the header and quote characters, and saved the sorted file:</p>
+
+<pre><code>$ csvcut -c text_value /tmp/2019-09-19-top-1500-authors.csv | grep -v text_value | sed 's/&quot;//g' | sort &gt; dspace/config/controlled-vocabularies/dc-contributor-author.xml
+</code></pre></li>
+
+<li><p>After adding the XML formatting back to the file I formatted it using XML tidy:</p>
+
+<pre><code>$ tidy -xml -utf8 -m -iq -w 0 dspace/config/controlled-vocabularies/dc-contributor-author.xml
+</code></pre></li>
+
+<li><p>I created and merged <a href="https://github.com/ilri/DSpace/pull/433">a pull request for the updates</a></p>
+
+<ul>
+<li>This is the first time we&rsquo;ve updated this controlled vocabulary since 2018-09</li>
+</ul></li>
+</ul>
+
 <!-- vim: set sw=2 ts=2: -->