Add notes for 2021-09-13

2025-01-27 05:49:12 +01:00 · 2021-09-13 16:21:16 +03:00
parent 8b487a4a77
commit c05c7213c2
109 changed files with 2627 additions and 2530 deletions
--- a/docs/2020-10/index.html
+++ b/docs/2020-10/index.html
@@ -44,7 +44,7 @@ During the FlywayDB migration I got an error:


 "/>
-<meta name="generator" content="Hugo 0.87.0" />
+<meta name="generator" content="Hugo 0.88.1" />


    
@@ -144,7 +144,7 @@ During the FlywayDB migration I got an error:
 </ul>
 </li>
 </ul>
-<pre><code>2020-10-06 21:36:04,138 ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ Batch entry 0 update public.bitstreamformatregistry set description='Electronic publishing', internal='FALSE', mimetype='application/epub+zip', short_description='EPUB', support_level=1 where bitstream_format_id=78 was aborted: ERROR: duplicate key value violates unique constraint &quot;bitstreamformatregistry_short_description_key&quot;
+<pre tabindex="0"><code>2020-10-06 21:36:04,138 ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ Batch entry 0 update public.bitstreamformatregistry set description='Electronic publishing', internal='FALSE', mimetype='application/epub+zip', short_description='EPUB', support_level=1 where bitstream_format_id=78 was aborted: ERROR: duplicate key value violates unique constraint &quot;bitstreamformatregistry_short_description_key&quot;
  Detail: Key (short_description)=(EPUB) already exists.  Call getNextException to see other errors in the batch.
 2020-10-06 21:36:04,138 WARN  org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ SQL Error: 0, SQLState: 23505
 2020-10-06 21:36:04,138 ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ ERROR: duplicate key value violates unique constraint &quot;bitstreamformatregistry_short_description_key&quot;
@@ -212,7 +212,7 @@ org.hibernate.exception.ConstraintViolationException: could not execute batch
 </ul>
 </li>
 </ul>
-<pre><code>$ dspace metadata-import -f /tmp/2020-10-06-import-test.csv -e aorth@mjanja.ch
+<pre tabindex="0"><code>$ dspace metadata-import -f /tmp/2020-10-06-import-test.csv -e aorth@mjanja.ch
 Loading @mire database changes for module MQM
 Changes have been processed
 -----------------------------------------------------------
@@ -259,7 +259,7 @@ org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Expected m
 </code></pre><ul>
 <li>Also, I tested Listings and Reports and there are still no hits for &ldquo;Orth, Alan&rdquo; as a contributor, despite there being dozens of items in the repository and the Solr query generated by Listings and Reports actually returning hits:</li>
 </ul>
-<pre><code>2020-10-06 22:23:44,116 INFO org.apache.solr.core.SolrCore @ [search] webapp=/solr path=/select params={q=*:*&amp;fl=handle,search.resourcetype,search.resourceid,search.uniqueid&amp;start=0&amp;fq=NOT(withdrawn:true)&amp;fq=NOT(discoverable:false)&amp;fq=search.resourcetype:2&amp;fq=author_keyword:Orth,\+A.+OR+author_keyword:Orth,\+Alan&amp;fq=dateIssued.year:[2013+TO+2021]&amp;rows=500&amp;wt=javabin&amp;version=2} hits=18 status=0 QTime=10 
+<pre tabindex="0"><code>2020-10-06 22:23:44,116 INFO org.apache.solr.core.SolrCore @ [search] webapp=/solr path=/select params={q=*:*&amp;fl=handle,search.resourcetype,search.resourceid,search.uniqueid&amp;start=0&amp;fq=NOT(withdrawn:true)&amp;fq=NOT(discoverable:false)&amp;fq=search.resourcetype:2&amp;fq=author_keyword:Orth,\+A.+OR+author_keyword:Orth,\+Alan&amp;fq=dateIssued.year:[2013+TO+2021]&amp;rows=500&amp;wt=javabin&amp;version=2} hits=18 status=0 QTime=10 
 </code></pre><ul>
 <li>Solr returns <code>hits=18</code> for the L&amp;R query, but there are no result shown in the L&amp;R UI</li>
 <li>I sent all this feedback to Atmire&hellip;</li>
@@ -278,16 +278,16 @@ org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Expected m
 </ul>
 </li>
 </ul>
-<pre><code>$ http -f POST https://dspacetest.cgiar.org/rest/login email=aorth@fuuu.com 'password=fuuuu'
+<pre tabindex="0"><code>$ http -f POST https://dspacetest.cgiar.org/rest/login email=aorth@fuuu.com 'password=fuuuu'
 $ http https://dspacetest.cgiar.org/rest/status Cookie:JSESSIONID=EABAC9EFF942028AA52DFDA16DBCAFDE
 </code></pre><ul>
 <li>Then we post an item in JSON format to <code>/rest/collections/{uuid}/items</code>:</li>
 </ul>
-<pre><code>$ http POST https://dspacetest.cgiar.org/rest/collections/f10ad667-2746-4705-8b16-4439abe61d22/items Cookie:JSESSIONID=EABAC9EFF942028AA52DFDA16DBCAFDE &lt; item-object.json
+<pre tabindex="0"><code>$ http POST https://dspacetest.cgiar.org/rest/collections/f10ad667-2746-4705-8b16-4439abe61d22/items Cookie:JSESSIONID=EABAC9EFF942028AA52DFDA16DBCAFDE &lt; item-object.json
 </code></pre><ul>
 <li>Format of JSON is:</li>
 </ul>
-<pre><code>{ &quot;metadata&quot;: [
+<pre tabindex="0"><code>{ &quot;metadata&quot;: [
    {
      &quot;key&quot;: &quot;dc.title&quot;,
      &quot;value&quot;: &quot;Testing REST API post&quot;,
@@ -362,7 +362,7 @@ $ http https://dspacetest.cgiar.org/rest/status Cookie:JSESSIONID=EABAC9EFF94202
 </ul>
 </li>
 </ul>
-<pre><code>$ http POST http://localhost:8080/rest/login email=aorth@fuuu.com 'password=ddddd'
+<pre tabindex="0"><code>$ http POST http://localhost:8080/rest/login email=aorth@fuuu.com 'password=ddddd'
 $ http http://localhost:8080/rest/status rest-dspace-token:d846f138-75d3-47ba-9180-b88789a28099
 $ http POST http://localhost:8080/rest/collections/1549/items rest-dspace-token:d846f138-75d3-47ba-9180-b88789a28099 &lt; item-object.json
 </code></pre><ul>
@@ -408,7 +408,7 @@ $ http POST http://localhost:8080/rest/collections/1549/items rest-dspace-token:
 </ul>
 </li>
 </ul>
-<pre><code>$ http --print Hh https://dspacetest.cgiar.org/rest/bitstreams/dfa1d9c3-75d3-4380-a9d3-4c8cbbed2d21/retrieve User-Agent:&quot;RTB website BOT&quot;
+<pre tabindex="0"><code>$ http --print Hh https://dspacetest.cgiar.org/rest/bitstreams/dfa1d9c3-75d3-4380-a9d3-4c8cbbed2d21/retrieve User-Agent:&quot;RTB website BOT&quot;
 $ http --print Hh https://dspacetest.cgiar.org/rest/bitstreams/dfa1d9c3-75d3-4380-a9d3-4c8cbbed2d21/retrieve User-Agent:&quot;RTB website BOT&quot;
 $ http --print Hh https://dspacetest.cgiar.org/rest/bitstreams/dfa1d9c3-75d3-4380-a9d3-4c8cbbed2d21/retrieve User-Agent:&quot;RTB website BOT&quot;
 $ http --print Hh https://dspacetest.cgiar.org/rest/bitstreams/dfa1d9c3-75d3-4380-a9d3-4c8cbbed2d21/retrieve User-Agent:&quot;RTB website BOT&quot;
@@ -438,7 +438,7 @@ $ http --print Hh https://dspacetest.cgiar.org/rest/bitstreams/dfa1d9c3-75d3-438
 <li>I added <code>[Ss]pider</code> to the Tomcat Crawler Session Manager Valve regex because this can catch a few more generic bots and force them to use the same Tomcat JSESSIONID</li>
 <li>I added a few of the patterns from above to our local agents list and ran the <code>check-spider-hits.sh</code> on CGSpace:</li>
 </ul>
-<pre><code>$ ./check-spider-hits.sh -f dspace/config/spiders/agents/ilri -s statistics -u http://localhost:8083/solr -p
+<pre tabindex="0"><code>$ ./check-spider-hits.sh -f dspace/config/spiders/agents/ilri -s statistics -u http://localhost:8083/solr -p
 Purging 228916 hits from RTB website BOT in statistics
 Purging 18707 hits from ILRI Livestock Website Publications importer BOT in statistics
 Purging 2661 hits from ^Java\/[0-9]{1,2}.[0-9] in statistics
@@ -472,7 +472,7 @@ Total number of bot hits purged: 3684
 </li>
 <li>I can update the country metadata in PostgreSQL like this:</li>
 </ul>
-<pre><code>dspace=&gt; BEGIN;
+<pre tabindex="0"><code>dspace=&gt; BEGIN;
 dspace=&gt; UPDATE metadatavalue SET text_value=INITCAP(text_value) WHERE resource_type_id=2 AND metadata_field_id=228;
 UPDATE 51756
 dspace=&gt; COMMIT;
@@ -483,7 +483,7 @@ dspace=&gt; COMMIT;
 </ul>
 </li>
 </ul>
-<pre><code>dspace=&gt; \COPY (SELECT DISTINCT(text_value) as &quot;cg.coverage.country&quot; FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=228) TO /tmp/2020-10-13-countries.csv WITH CSV HEADER;
+<pre tabindex="0"><code>dspace=&gt; \COPY (SELECT DISTINCT(text_value) as &quot;cg.coverage.country&quot; FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=228) TO /tmp/2020-10-13-countries.csv WITH CSV HEADER;
 COPY 195
 </code></pre><ul>
 <li>Then use OpenRefine and make a new column for corrections, then use this GREL to convert to title case: <code>value.toTitlecase()</code>
@@ -493,7 +493,7 @@ COPY 195
 </li>
 <li>For the input forms I found out how to do a complicated search and replace in vim:</li>
 </ul>
-<pre><code>:'&lt;,'&gt;s/\&lt;\(pair\|displayed\|stored\|value\|AND\)\@!\(\w\)\(\w*\|\)\&gt;/\u\2\L\3/g
+<pre tabindex="0"><code>:'&lt;,'&gt;s/\&lt;\(pair\|displayed\|stored\|value\|AND\)\@!\(\w\)\(\w*\|\)\&gt;/\u\2\L\3/g
 </code></pre><ul>
 <li>It uses a <a href="https://jbodah.github.io/blog/2016/11/01/positivenegative-lookaheadlookbehind-vim/">negative lookahead</a> (aka &ldquo;lookaround&rdquo; in PCRE?) to match words that are <em>not</em> &ldquo;pair&rdquo;, &ldquo;displayed&rdquo;, etc because we don&rsquo;t want to edit the XML tags themselves&hellip;
 <ul>
@@ -509,18 +509,18 @@ COPY 195
 </ul>
 </li>
 </ul>
-<pre><code>dspace=&gt; \COPY (SELECT DISTINCT(text_value) as &quot;cg.coverage.region&quot; FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=227) TO /tmp/2020-10-14-regions.csv WITH CSV HEADER;
+<pre tabindex="0"><code>dspace=&gt; \COPY (SELECT DISTINCT(text_value) as &quot;cg.coverage.region&quot; FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=227) TO /tmp/2020-10-14-regions.csv WITH CSV HEADER;
 COPY 34
 </code></pre><ul>
 <li>I did the same as the countries in OpenRefine for the database values and in vim for the input forms</li>
 <li>After testing the replacements locally I ran them on CGSpace:</li>
 </ul>
-<pre><code>$ ./fix-metadata-values.py -i /tmp/2020-10-13-CGSpace-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -t 'correct' -m 228
+<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2020-10-13-CGSpace-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -t 'correct' -m 228
 $ ./fix-metadata-values.py -i /tmp/2020-10-14-CGSpace-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -t 'correct' -m 227
 </code></pre><ul>
 <li>Then I started a full re-indexing:</li>
 </ul>
-<pre><code>$ time chrt -b 0 ionice -c2 -n7 nice -n19 dspace index-discovery -b
+<pre tabindex="0"><code>$ time chrt -b 0 ionice -c2 -n7 nice -n19 dspace index-discovery -b

 real    88m21.678s
 user    7m59.182s
@@ -579,7 +579,7 @@ sys     2m22.713s
 <li>I posted a message on Yammer to inform all our users about the changes to countries, regions, and AGROVOC subjects</li>
 <li>I modified all AGROVOC subjects to be lower case in PostgreSQL and then exported a list of the top 1500 to update the controlled vocabulary in our submission form:</li>
 </ul>
-<pre><code>dspace=&gt; BEGIN;
+<pre tabindex="0"><code>dspace=&gt; BEGIN;
 dspace=&gt; UPDATE metadatavalue SET text_value=LOWER(text_value) WHERE resource_type_id=2 AND metadata_field_id=57;
 UPDATE 335063
 dspace=&gt; COMMIT;
@@ -588,7 +588,7 @@ COPY 1500
 </code></pre><ul>
 <li>Use my <code>agrovoc-lookup.py</code> script to validate subject terms against the AGROVOC REST API, extract matches with <code>csvgrep</code>, and then update and format the controlled vocabulary:</li>
 </ul>
-<pre><code>$ csvcut -c 1 /tmp/2020-10-15-top-1500-agrovoc-subject.csv | tail -n 1500 &gt; /tmp/subjects.txt
+<pre tabindex="0"><code>$ csvcut -c 1 /tmp/2020-10-15-top-1500-agrovoc-subject.csv | tail -n 1500 &gt; /tmp/subjects.txt
 $ ./agrovoc-lookup.py -i /tmp/subjects.txt -o /tmp/subjects.csv -d
 $ csvgrep -c 4 -m 0 -i /tmp/subjects.csv | csvcut -c 1 | sed '1d' &gt; dspace/config/controlled-vocabularies/dc-subject.xml
 # apply formatting in XML file
@@ -596,7 +596,7 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/dc-subject.x
 </code></pre><ul>
 <li>Then I started a full re-indexing on CGSpace:</li>
 </ul>
-<pre><code>$ time chrt -b 0 ionice -c2 -n7 nice -n19 dspace index-discovery -b
+<pre tabindex="0"><code>$ time chrt -b 0 ionice -c2 -n7 nice -n19 dspace index-discovery -b

 real    88m21.678s
 user    7m59.182s
@@ -614,7 +614,7 @@ sys     2m22.713s
 <li>They are using the user agent &ldquo;CCAFS Website Publications importer BOT&rdquo; so they are getting rate limited by nginx</li>
 <li>Ideally they would use the REST <code>find-by-metadata-field</code> endpoint, but it is <em>really</em> slow for large result sets (like twenty minutes!):</li>
 </ul>
-<pre><code>$ curl -f -H &quot;CCAFS Website Publications importer BOT&quot; -H &quot;Content-Type: application/json&quot; -X POST &quot;https://dspacetest.cgiar.org/rest/items/find-by-metadata-field?limit=100&quot; -d '{&quot;key&quot;:&quot;cg.contributor.crp&quot;, &quot;value&quot;:&quot;Climate Change, Agriculture and Food Security&quot;,&quot;language&quot;: &quot;en_US&quot;}'
+<pre tabindex="0"><code>$ curl -f -H &quot;CCAFS Website Publications importer BOT&quot; -H &quot;Content-Type: application/json&quot; -X POST &quot;https://dspacetest.cgiar.org/rest/items/find-by-metadata-field?limit=100&quot; -d '{&quot;key&quot;:&quot;cg.contributor.crp&quot;, &quot;value&quot;:&quot;Climate Change, Agriculture and Food Security&quot;,&quot;language&quot;: &quot;en_US&quot;}'
 </code></pre><ul>
 <li>For now I will whitelist their user agent so that they can continue scraping /browse</li>
 <li>I figured out that the mappings for AReS are stored in Elasticsearch
@@ -624,7 +624,7 @@ sys     2m22.713s
 </ul>
 </li>
 </ul>
-<pre><code>$ curl -XPOST &quot;localhost:9200/openrxv-values/_delete_by_query&quot; -H 'Content-Type: application/json' -d'
+<pre tabindex="0"><code>$ curl -XPOST &quot;localhost:9200/openrxv-values/_delete_by_query&quot; -H 'Content-Type: application/json' -d'
 {
  &quot;query&quot;: {
    &quot;match&quot;: {
@@ -635,7 +635,7 @@ sys     2m22.713s
 </code></pre><ul>
 <li>I added a new find/replace:</li>
 </ul>
-<pre><code>$ curl -XPOST &quot;localhost:9200/openrxv-values/_doc?pretty&quot; -H 'Content-Type: application/json' -d'
+<pre tabindex="0"><code>$ curl -XPOST &quot;localhost:9200/openrxv-values/_doc?pretty&quot; -H 'Content-Type: application/json' -d'
 {
  &quot;find&quot;: &quot;ALAN1&quot;,
  &quot;replace&quot;: &quot;ALAN2&quot;,
@@ -645,11 +645,11 @@ sys     2m22.713s
 <li>I see it in Kibana, and I can search it in Elasticsearch, but I don&rsquo;t see it in OpenRXV&rsquo;s mapping values dashboard</li>
 <li>Now I deleted everything in the <code>openrxv-values</code> index:</li>
 </ul>
-<pre><code>$ curl -XDELETE http://localhost:9200/openrxv-values
+<pre tabindex="0"><code>$ curl -XDELETE http://localhost:9200/openrxv-values
 </code></pre><ul>
 <li>Then I tried posting it again:</li>
 </ul>
-<pre><code>$ curl -XPOST &quot;localhost:9200/openrxv-values/_doc?pretty&quot; -H 'Content-Type: application/json' -d'
+<pre tabindex="0"><code>$ curl -XPOST &quot;localhost:9200/openrxv-values/_doc?pretty&quot; -H 'Content-Type: application/json' -d'
 {
  &quot;find&quot;: &quot;ALAN1&quot;,
  &quot;replace&quot;: &quot;ALAN2&quot;,
@@ -682,12 +682,12 @@ sys     2m22.713s
 <ul>
 <li>Last night I learned how to POST mappings to Elasticsearch for AReS:</li>
 </ul>
-<pre><code>$ curl -XDELETE http://localhost:9200/openrxv-values
+<pre tabindex="0"><code>$ curl -XDELETE http://localhost:9200/openrxv-values
 $ curl -XPOST http://localhost:9200/openrxv-values/_doc/_bulk -H &quot;Content-Type: application/json&quot; --data-binary @./mapping.json
 </code></pre><ul>
 <li>The JSON file looks like this, with one instruction on each line:</li>
 </ul>
-<pre><code>{&quot;index&quot;:{}}
+<pre tabindex="0"><code>{&quot;index&quot;:{}}
 { &quot;find&quot;: &quot;CRP on Dryland Systems - DS&quot;, &quot;replace&quot;: &quot;Dryland Systems&quot; }
 {&quot;index&quot;:{}}
 { &quot;find&quot;: &quot;FISH&quot;, &quot;replace&quot;: &quot;Fish&quot; }
@@ -737,7 +737,7 @@ f<span style="color:#f92672">.</span>close()
 <li>It filters all upper and lower case strings as well as any replacements that end in an acronym like &ldquo;- ILRI&rdquo;, reducing the number of mappings from around 4,000 to about 900</li>
 <li>I deleted the existing <code>openrxv-values</code> Elasticsearch core and then POSTed it:</li>
 </ul>
-<pre><code>$ ./convert-mapping.py &gt; /tmp/elastic-mappings.txt
+<pre tabindex="0"><code>$ ./convert-mapping.py &gt; /tmp/elastic-mappings.txt
 $ curl -XDELETE http://localhost:9200/openrxv-values
 $ curl -XPOST http://localhost:9200/openrxv-values/_doc/_bulk -H &quot;Content-Type: application/json&quot; --data-binary @/tmp/elastic-mappings.txt
 </code></pre><ul>
@@ -762,17 +762,17 @@ $ curl -XPOST http://localhost:9200/openrxv-values/_doc/_bulk -H &quot;Content-T
 </li>
 <li>I ran the <code>dspace cleanup -v</code> process on CGSpace and got an error:</li>
 </ul>
-<pre><code>Error: ERROR: update or delete on table &quot;bitstream&quot; violates foreign key constraint &quot;bundle_primary_bitstream_id_fkey&quot; on table &quot;bundle&quot;
+<pre tabindex="0"><code>Error: ERROR: update or delete on table &quot;bitstream&quot; violates foreign key constraint &quot;bundle_primary_bitstream_id_fkey&quot; on table &quot;bundle&quot;
  Detail: Key (bitstream_id)=(192921) is still referenced from table &quot;bundle&quot;.
 </code></pre><ul>
 <li>The solution is, as always:</li>
 </ul>
-<pre><code>$ psql -d dspace -U dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (192921);'
+<pre tabindex="0"><code>$ psql -d dspace -U dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (192921);'
 UPDATE 1
 </code></pre><ul>
 <li>After looking at the CGSpace Solr stats for 2020-10 I found some hits to purge:</li>
 </ul>
-<pre><code>$ ./check-spider-hits.sh -f /tmp/agents -s statistics -u http://localhost:8083/solr -p
+<pre tabindex="0"><code>$ ./check-spider-hits.sh -f /tmp/agents -s statistics -u http://localhost:8083/solr -p

 Purging 2474 hits from ShortLinkTranslate in statistics
 Purging 2568 hits from RI\/1\.0 in statistics
@@ -794,7 +794,7 @@ Total number of bot hits purged: 8174
 </ul>
 </li>
 </ul>
-<pre><code>$ http --print Hh https://dspacetest.cgiar.org/rest/bitstreams/dfa1d9c3-75d3-4380-a9d3-4c8cbbed2d21/retrieve User-Agent:&quot;RTB website BOT&quot;
+<pre tabindex="0"><code>$ http --print Hh https://dspacetest.cgiar.org/rest/bitstreams/dfa1d9c3-75d3-4380-a9d3-4c8cbbed2d21/retrieve User-Agent:&quot;RTB website BOT&quot;
 $ curl -s 'http://localhost:8083/solr/statistics/update?softCommit=true'
 </code></pre><ul>
 <li>And I saw three hits in Solr with <code>isBot: true</code>!!!
@@ -817,7 +817,7 @@ $ curl -s 'http://localhost:8083/solr/statistics/update?softCommit=true'
 </ul>
 </li>
 </ul>
-<pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx2048m&quot;
+<pre tabindex="0"><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx2048m&quot;
 $ dspace metadata-export -f /tmp/cgspace.csv
 $ csvcut -c 'id,dc.subject[],dc.subject[en_US],cg.subject.ilri[],cg.subject.ilri[en_US],cg.subject.alliancebiovciat[],cg.subject.alliancebiovciat[en_US],cg.subject.bioversity[en_US],cg.subject.ccafs[],cg.subject.ccafs[en_US],cg.subject.ciat[],cg.subject.ciat[en_US],cg.subject.cip[],cg.subject.cip[en_US],cg.subject.cpwf[en_US],cg.subject.iita,cg.subject.iita[en_US],cg.subject.iwmi[en_US]' /tmp/cgspace.csv &gt; /tmp/cgspace-subjects.csv
 </code></pre><ul>
@@ -833,7 +833,7 @@ $ csvcut -c 'id,dc.subject[],dc.subject[en_US],cg.subject.ilri[],cg.subject.ilri
 <ul>
 <li>Bosede was getting this error on CGSpace yesterday:</li>
 </ul>
-<pre><code>Authorization denied for action WORKFLOW_STEP_1 on COLLECTION:1072 by user 1759
+<pre tabindex="0"><code>Authorization denied for action WORKFLOW_STEP_1 on COLLECTION:1072 by user 1759
 </code></pre><ul>
 <li>Collection 1072 appears to be <a href="https://cgspace.cgiar.org/handle/10568/69542">IITA Miscellaneous</a>
 <ul>
@@ -848,7 +848,7 @@ $ csvcut -c 'id,dc.subject[],dc.subject[en_US],cg.subject.ilri[],cg.subject.ilri
 </ul>
 </li>
 </ul>
-<pre><code>$ http 'http://localhost:9200/openrxv-items-final/_search?_source_includes=affiliation&amp;size=10000&amp;q=*:*' &gt; /tmp/affiliations.json
+<pre tabindex="0"><code>$ http 'http://localhost:9200/openrxv-items-final/_search?_source_includes=affiliation&amp;size=10000&amp;q=*:*' &gt; /tmp/affiliations.json
 </code></pre><ul>
 <li>Then I decided to try a different approach and I adjusted my <code>convert-mapping.py</code> script to re-consider some replacement patterns with acronyms from the original AReS <code>mapping.json</code> file to hopefully address some MEL to CGSpace mappings
 <ul>
@@ -893,7 +893,7 @@ $ csvcut -c 'id,dc.subject[],dc.subject[en_US],cg.subject.ilri[],cg.subject.ilri
 <ul>
 <li>I re-installed DSpace Test with a fresh snapshot of CGSpace&rsquo;s to test the DSpace 6 upgrade (the last time was in 2020-05, and we&rsquo;ve fixed a lot of issues since then):</li>
 </ul>
-<pre><code>$ cp dspace/etc/postgres/update-sequences.sql /tmp/dspace5-update-sequences.sql
+<pre tabindex="0"><code>$ cp dspace/etc/postgres/update-sequences.sql /tmp/dspace5-update-sequences.sql
 $ git checkout origin/6_x-dev-atmire-modules
 $ chrt -b 0 mvn -U -Dmirage2.on=true -Dmirage2.deps.included=false clean package
 $ sudo su - postgres
@@ -911,7 +911,7 @@ $ sudo systemctl start tomcat7
 </code></pre><ul>
 <li>Then I started processing the Solr stats one core and 1 million records at a time:</li>
 </ul>
-<pre><code>$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx2048m'
+<pre tabindex="0"><code>$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx2048m'
 $ chrt -b 0 dspace solr-upgrade-statistics-6x -n 1000000 -i statistics
 $ chrt -b 0 dspace solr-upgrade-statistics-6x -n 1000000 -i statistics
 $ chrt -b 0 dspace solr-upgrade-statistics-6x -n 1000000 -i statistics
@@ -920,7 +920,7 @@ $ chrt -b 0 dspace solr-upgrade-statistics-6x -n 1000000 -i statistics
 </code></pre><ul>
 <li>After the fifth or so run I got this error:</li>
 </ul>
-<pre><code>Exception: Error while creating field 'p_group_id{type=uuid,properties=indexed,stored,multiValued}' from value '10'
+<pre tabindex="0"><code>Exception: Error while creating field 'p_group_id{type=uuid,properties=indexed,stored,multiValued}' from value '10'
 org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error while creating field 'p_group_id{type=uuid,properties=indexed,stored,multiValued}' from value '10'
        at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552)
        at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
@@ -945,7 +945,7 @@ org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error whil
 </ul>
 </li>
 </ul>
-<pre><code>$ curl -s &quot;http://localhost:8083/solr/statistics/update?softCommit=true&quot; -H &quot;Content-Type: text/xml&quot; --data-binary &quot;&lt;delete&gt;&lt;query&gt;(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)&lt;/query&gt;&lt;/delete&gt;&quot;
+<pre tabindex="0"><code>$ curl -s &quot;http://localhost:8083/solr/statistics/update?softCommit=true&quot; -H &quot;Content-Type: text/xml&quot; --data-binary &quot;&lt;delete&gt;&lt;query&gt;(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)&lt;/query&gt;&lt;/delete&gt;&quot;
 </code></pre><ul>
 <li>Then I restarted the <code>solr-upgrade-statistics-6x</code> process, which apparently had no records left to process</li>
 <li>I started processing the statistics-2019 core&hellip;
@@ -958,7 +958,7 @@ org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error whil
 <ul>
 <li>The statistics processing on the statistics-2018 core errored after 1.8 million records:</li>
 </ul>
-<pre><code>Exception: Java heap space
+<pre tabindex="0"><code>Exception: Java heap space
 java.lang.OutOfMemoryError: Java heap space
 </code></pre><ul>
 <li>I had the same problem when I processed the statistics-2018 core in 2020-07 and 2020-08
@@ -967,7 +967,7 @@ java.lang.OutOfMemoryError: Java heap space
 </ul>
 </li>
 </ul>
-<pre><code class="language-console" data-lang="console">$ curl -s &quot;http://localhost:8083/solr/statistics-2018/update?softCommit=true&quot; -H &quot;Content-Type: text/xml&quot; --data-binary &quot;&lt;delete&gt;&lt;query&gt;id:/.+-unmigrated/&lt;/query&gt;&lt;/delete&gt;&quot;
+<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s &quot;http://localhost:8083/solr/statistics-2018/update?softCommit=true&quot; -H &quot;Content-Type: text/xml&quot; --data-binary &quot;&lt;delete&gt;&lt;query&gt;id:/.+-unmigrated/&lt;/query&gt;&lt;/delete&gt;&quot;
 </code></pre><ul>
 <li>I restarted the process and it crashed again a few minutes later
 <ul>
@@ -976,7 +976,7 @@ java.lang.OutOfMemoryError: Java heap space
 </ul>
 </li>
 </ul>
-<pre><code>$ curl -s &quot;http://localhost:8083/solr/statistics-2018/update?softCommit=true&quot; -H &quot;Content-Type: text/xml&quot; --data-binary &quot;&lt;delete&gt;&lt;query&gt;(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)&lt;/query&gt;&lt;/delete&gt;&quot;
+<pre tabindex="0"><code>$ curl -s &quot;http://localhost:8083/solr/statistics-2018/update?softCommit=true&quot; -H &quot;Content-Type: text/xml&quot; --data-binary &quot;&lt;delete&gt;&lt;query&gt;(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)&lt;/query&gt;&lt;/delete&gt;&quot;
 </code></pre><ul>
 <li>Then I started processing the statistics-2017 core&hellip;
 <ul>
@@ -984,7 +984,7 @@ java.lang.OutOfMemoryError: Java heap space
 </ul>
 </li>
 </ul>
-<pre><code>$ curl -s &quot;http://localhost:8083/solr/statistics-2017/update?softCommit=true&quot; -H &quot;Content-Type: text/xml&quot; --data-binary &quot;&lt;delete&gt;&lt;query&gt;(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)&lt;/query&gt;&lt;/delete&gt;&quot;
+<pre tabindex="0"><code>$ curl -s &quot;http://localhost:8083/solr/statistics-2017/update?softCommit=true&quot; -H &quot;Content-Type: text/xml&quot; --data-binary &quot;&lt;delete&gt;&lt;query&gt;(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)&lt;/query&gt;&lt;/delete&gt;&quot;
 </code></pre><ul>
 <li>Also I purged 2.7 million unmigrated records from the statistics-2019 core</li>
 <li>I filed an issue with Atmire about the duplicate values in the <code>owningComm</code> and <code>containerCommunity</code> fields in Solr: <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=839">https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=839</a></li>
@@ -1002,7 +1002,7 @@ java.lang.OutOfMemoryError: Java heap space
 </ul>
 </li>
 </ul>
-<pre><code>$ chrt -b 0 dspace dsrun com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI -t 12 -c statistics
+<pre tabindex="0"><code>$ chrt -b 0 dspace dsrun com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI -t 12 -c statistics
 </code></pre><ul>
 <li>Peter asked me to add the new preferred AGROVOC subject &ldquo;covid-19&rdquo; to all items we had previously added &ldquo;coronavirus disease&rdquo;, and to make sure all items with ILRI subject &ldquo;ZOONOTIC DISEASES&rdquo; have the AGROVOC subject &ldquo;zoonoses&rdquo;
 <ul>
@@ -1010,7 +1010,7 @@ java.lang.OutOfMemoryError: Java heap space
 </ul>
 </li>
 </ul>
-<pre><code>$ dspace metadata-export -f /tmp/cgspace.csv
+<pre tabindex="0"><code>$ dspace metadata-export -f /tmp/cgspace.csv
 $ csvcut -c 'id,dc.subject[],dc.subject[en_US],cg.subject.ilri[],cg.subject.ilri[en_US]' /tmp/cgspace.csv &gt; /tmp/cgspace-subjects.csv
 </code></pre><ul>
 <li>I sanity checked the CSV in <code>csv-metadata-quality</code> after exporting from OpenRefine, then applied the changes to 453 items on CGSpace</li>
@@ -1040,7 +1040,7 @@ $ csvcut -c 'id,dc.subject[],dc.subject[en_US],cg.subject.ilri[],cg.subject.ilri
 </ul>
 </li>
 </ul>
-<pre><code>$ ./create-mappings.py &gt; /tmp/elasticsearch-mappings.txt
+<pre tabindex="0"><code>$ ./create-mappings.py &gt; /tmp/elasticsearch-mappings.txt
 $ ./convert-mapping.py &gt;&gt; /tmp/elasticsearch-mappings.txt
 $ curl -XDELETE http://localhost:9200/openrxv-values
 $ curl -XPOST http://localhost:9200/openrxv-values/_doc/_bulk -H &quot;Content-Type: application/json&quot; --data-binary @/tmp/elasticsearch-mappings.txt
@@ -1048,12 +1048,12 @@ $ curl -XPOST http://localhost:9200/openrxv-values/_doc/_bulk -H &quot;Content-T
 <li>After that I had to manually create and delete a fake mapping in the AReS UI so that the mappings would show up</li>
 <li>I fixed a few strings in the OpenRXV admin dashboard and then re-built the frontent container:</li>
 </ul>
-<pre><code>$ docker-compose up --build -d angular_nginx
+<pre tabindex="0"><code>$ docker-compose up --build -d angular_nginx
 </code></pre><h2 id="2020-10-28">2020-10-28</h2>
 <ul>
 <li>Fix a handful more of grammar and spelling issues in OpenRXV and then re-build the containers:</li>
 </ul>
-<pre><code>$ docker-compose up --build -d --force-recreate angular_nginx
+<pre tabindex="0"><code>$ docker-compose up --build -d --force-recreate angular_nginx
 </code></pre><ul>
 <li>Also, I realized that the mysterious issue with countries getting changed to inconsistent lower case like &ldquo;Burkina faso&rdquo; is due to the country formatter (see: <code>backend/src/harvester/consumers/fetch.consumer.ts</code>)
 <ul>
@@ -1079,7 +1079,7 @@ $ curl -XPOST http://localhost:9200/openrxv-values/_doc/_bulk -H &quot;Content-T
 </ul>
 </li>
 </ul>
-<pre><code>$ cat 2020-10-28-update-regions.csv
+<pre tabindex="0"><code>$ cat 2020-10-28-update-regions.csv
 cg.coverage.region,correct
 East Africa,Eastern Africa
 West Africa,Western Africa
@@ -1092,7 +1092,7 @@ $ ./fix-metadata-values.py -i 2020-10-28-update-regions.csv -db dspace -u dspace
 </code></pre><ul>
 <li>Then I started a full Discovery re-indexing:</li>
 </ul>
-<pre><code class="language-console" data-lang="console">$ time chrt -b 0 ionice -c2 -n7 nice -n19 dspace index-discovery -b
+<pre tabindex="0"><code class="language-console" data-lang="console">$ time chrt -b 0 ionice -c2 -n7 nice -n19 dspace index-discovery -b

 real    92m14.294s
 user    7m59.840s
@@ -1115,7 +1115,7 @@ sys     2m22.327s
 </li>
 <li>Send Peter a list of affiliations, authors, journals, publishers, investors, and series for correction:</li>
 </ul>
-<pre><code>dspace=&gt; \COPY (SELECT DISTINCT text_value as &quot;cg.contributor.affiliation&quot;, count(*) FROM metadatavalue WHERE resource_type_id = 2 AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC) to /tmp/2020-10-28-affiliations.csv WITH CSV HEADER;
+<pre tabindex="0"><code>dspace=&gt; \COPY (SELECT DISTINCT text_value as &quot;cg.contributor.affiliation&quot;, count(*) FROM metadatavalue WHERE resource_type_id = 2 AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC) to /tmp/2020-10-28-affiliations.csv WITH CSV HEADER;
 COPY 6357
 dspace=&gt; \COPY (SELECT DISTINCT text_value as &quot;dc.description.sponsorship&quot;, count(*) FROM metadatavalue WHERE resource_type_id = 2 AND metadata_field_id = 29 GROUP BY text_value ORDER BY count DESC) to /tmp/2020-10-28-sponsors.csv WITH CSV HEADER;
 COPY 730
@@ -1134,7 +1134,7 @@ COPY 5598
 </ul>
 </li>
 </ul>
-<pre><code>$ grep -c '&quot;find&quot;' /tmp/elasticsearch-mappings*
+<pre tabindex="0"><code>$ grep -c '&quot;find&quot;' /tmp/elasticsearch-mappings*
 /tmp/elasticsearch-mappings2.txt:350
 /tmp/elasticsearch-mappings.txt:1228
 $ cat /tmp/elasticsearch-mappings* | grep -v '{&quot;index&quot;:{}}' | wc -l
@@ -1148,7 +1148,7 @@ $ cat /tmp/elasticsearch-mappings* | grep -v '{&quot;index&quot;:{}}' | sort | u
 </ul>
 </li>
 </ul>
-<pre><code class="language-console" data-lang="console">$ cat /tmp/elasticsearch-mappings* &gt; /tmp/new-elasticsearch-mappings.txt
+<pre tabindex="0"><code class="language-console" data-lang="console">$ cat /tmp/elasticsearch-mappings* &gt; /tmp/new-elasticsearch-mappings.txt
 $ curl -XDELETE http://localhost:9200/openrxv-values
 $ curl -XPOST http://localhost:9200/openrxv-values/_doc/_bulk -H &quot;Content-Type: application/json&quot; --data-binary @/tmp/new-elasticsearch-mappings.txt
 </code></pre><ul>
@@ -1159,14 +1159,14 @@ $ curl -XPOST http://localhost:9200/openrxv-values/_doc/_bulk -H &quot;Content-T
 </li>
 <li>Lower case some straggling AGROVOC subjects on CGSpace:</li>
 </ul>
-<pre><code>dspace=# BEGIN;
+<pre tabindex="0"><code>dspace=# BEGIN;
 dspace=# UPDATE metadatavalue SET text_value=LOWER(text_value) WHERE resource_type_id=2 AND metadata_field_id=57 AND text_value ~ '[[:upper:]]';
 UPDATE 123
 dspace=# COMMIT;
 </code></pre><ul>
 <li>Move some top-level communities to the CGIAR System community for Peter:</li>
 </ul>
-<pre><code>$ dspace community-filiator --set --parent 10568/83389 --child 10568/1208
+<pre tabindex="0"><code>$ dspace community-filiator --set --parent 10568/83389 --child 10568/1208
 $ dspace community-filiator --set --parent 10568/83389 --child 10568/56924
 </code></pre><h2 id="2020-10-30">2020-10-30</h2>
 <ul>
@@ -1187,7 +1187,7 @@ $ dspace community-filiator --set --parent 10568/83389 --child 10568/56924
 </ul>
 </li>
 </ul>
-<pre><code>or(
+<pre tabindex="0"><code>or(
  isNotNull(value.match(/.*\uFFFD.*/)),
  isNotNull(value.match(/.*\u00A0.*/)),
  isNotNull(value.match(/.*\u200A.*/)),
@@ -1198,7 +1198,7 @@ $ dspace community-filiator --set --parent 10568/83389 --child 10568/56924
 </code></pre><ul>
 <li>Then I did a test to apply the corrections and deletions on my local DSpace:</li>
 </ul>
-<pre><code>$ ./fix-metadata-values.py -i 2020-10-30-fix-854-journals.csv -db dspace -u dspace -p 'fuuu' -f dc.source -t 'correct' -m 55
+<pre tabindex="0"><code>$ ./fix-metadata-values.py -i 2020-10-30-fix-854-journals.csv -db dspace -u dspace -p 'fuuu' -f dc.source -t 'correct' -m 55
 $ ./delete-metadata-values.py -i 2020-10-30-delete-90-journals.csv -db dspace -u dspace -p 'fuuu' -f dc.source -m 55
 $ ./fix-metadata-values.py -i 2020-10-30-fix-386-publishers.csv -db dspace -u dspace -p 'fuuu' -f dc.publisher -t correct -m 39
 $ ./delete-metadata-values.py -i 2020-10-30-delete-10-publishers.csv -db dspace -u dspace -p 'fuuu' -f dc.publisher -m 39
@@ -1214,12 +1214,12 @@ $ ./delete-metadata-values.py -i 2020-10-30-delete-10-publishers.csv -db dspace
 </li>
 <li>Quickly process the sponsor corrections Peter sent me a few days ago and test them locally:</li>
 </ul>
-<pre><code>$ ./fix-metadata-values.py -i 2020-10-31-fix-82-sponsors.csv -db dspace -u dspace -p 'fuuu' -f dc.description.sponsorship -t 'correct' -m 29
+<pre tabindex="0"><code>$ ./fix-metadata-values.py -i 2020-10-31-fix-82-sponsors.csv -db dspace -u dspace -p 'fuuu' -f dc.description.sponsorship -t 'correct' -m 29
 $ ./delete-metadata-values.py -i 2020-10-31-delete-74-sponsors.csv -db dspace -u dspace -p 'fuuu' -f dc.description.sponsorship -m 29
 </code></pre><ul>
 <li>I applied all the fixes from today and yesterday on CGSpace and then started a full Discovery re-index:</li>
 </ul>
-<pre><code>$ time chrt -b 0 ionice -c2 -n7 nice -n19 dspace index-discovery -b
+<pre tabindex="0"><code>$ time chrt -b 0 ionice -c2 -n7 nice -n19 dspace index-discovery -b
 </code></pre><!-- raw HTML omitted -->