Add notes for 2020-02-11

This commit is contained in:
2020-02-11 12:52:11 +02:00
parent 4bd85f9323
commit dc637490f1
89 changed files with 187 additions and 96 deletions

View File

@ -35,7 +35,7 @@ The code finally builds and runs with a fresh install
"/>
<meta name="generator" content="Hugo 0.64.0" />
<meta name="generator" content="Hugo 0.64.1" />
@ -45,7 +45,7 @@ The code finally builds and runs with a fresh install
"@type": "BlogPosting",
"headline": "February, 2020",
"url": "https:\/\/alanorth.github.io\/cgspace-notes\/2020-02\/",
"wordCount": "2551",
"wordCount": "2811",
"datePublished": "2020-02-02T11:56:30+02:00",
"dateModified": "2020-02-09T17:34:12+02:00",
"author": {
@ -510,7 +510,51 @@ $ cat out.dspace510-1 | ../FlameGraph/stackcollapse-perf.pl | grep -E '^java' |
</ul>
</li>
</ul>
<!-- raw HTML omitted -->
<h2 id="2020-02-10">2020-02-10</h2>
<ul>
<li>Follow up with <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=706">Atmire about DSpace 6.x upgrade</a>
<ul>
<li>I raised the issue of targetting 6.4-SNAPSHOT as well as the Discovery indexing performance issues in 6.x</li>
</ul>
</li>
</ul>
<h2 id="2020-02-11">2020-02-11</h2>
<ul>
<li>Maria from Bioversity asked me to add some ORCID iDs to our controlled vocabulary so I combined them with our existing ones and updated the names from the ORCID API:</li>
</ul>
<pre><code>$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/bioversity-orcid-ids.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq &gt; /tmp/2020-02-11-combined-orcids.txt
$ ./resolve-orcids.py -i /tmp/2020-02-11-combined-orcids.txt -o /tmp/2020-02-11-combined-names.txt -d
# sort names, copy to cg-creator-id.xml, add XML formatting, and then format with tidy (preserving accents)
$ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-id.xml
</code></pre><ul>
<li>Then I noticed some author names had changed, so I captured the old and new names in a CSV file and fixed them using <code>fix-metadata-values.py</code>:</li>
</ul>
<pre><code>$ ./fix-metadata-values.py -i 2020-02-11-correct-orcid-ids.csv -db dspace -u dspace -p 'fuuu' -f cg.creator.id -t correct -m 240 -d
</code></pre><ul>
<li>On a hunch I decided to try to add these ORCID iDs to existing items that might not have them yet
<ul>
<li>I checked the database for likely matches to the author name and then created a CSV with the author names and ORCID iDs:</li>
</ul>
</li>
</ul>
<pre><code>dc.contributor.author,cg.creator.id
&quot;Staver, Charles&quot;,charles staver: 0000-0002-4532-6077
&quot;Staver, C.&quot;,charles staver: 0000-0002-4532-6077
&quot;Fungo, R.&quot;,Robert Fungo: 0000-0002-4264-6905
&quot;Remans, R.&quot;,Roseline Remans: 0000-0003-3659-8529
&quot;Remans, Roseline&quot;,Roseline Remans: 0000-0003-3659-8529
&quot;Rietveld A.&quot;,Anne Rietveld: 0000-0002-9400-9473
&quot;Rietveld, A.&quot;,Anne Rietveld: 0000-0002-9400-9473
&quot;Rietveld, A.M.&quot;,Anne Rietveld: 0000-0002-9400-9473
&quot;Rietveld, Anne M.&quot;,Anne Rietveld: 0000-0002-9400-9473
&quot;Fongar, A.&quot;,Andrea Fongar: 0000-0003-2084-1571
&quot;Müller, Anna&quot;,Anna Müller: 0000-0003-3120-8560
&quot;Müller, A.&quot;,Anna Müller: 0000-0003-3120-8560
</code></pre><ul>
<li>Running the <code>add-orcid-identifiers-csv.py</code> script I added 144 ORCID iDs to items!</li>
</ul>
<pre><code>$ ./add-orcid-identifiers-csv.py -i /tmp/2020-02-11-add-orcid-ids.csv -db dspace -u dspace -p 'fuuu'
</code></pre><!-- raw HTML omitted -->