Add notes for 2021-09-13

This commit is contained in:
2021-09-13 16:21:16 +03:00
parent 8b487a4a77
commit c05c7213c2
109 changed files with 2627 additions and 2530 deletions

View File

@ -36,7 +36,7 @@ I looked at the top user agents and IPs in the Solr statistics for last month an
I will add the RI/1.0 pattern to our DSpace agents overload and purge them from Solr (we had previously seen this agent with 9,000 hits or so in 2020-09), but I think I will leave the Microsoft Word one… as that’s an actual user…
"/>
<meta name="generator" content="Hugo 0.87.0" />
<meta name="generator" content="Hugo 0.88.1" />
@ -147,7 +147,7 @@ I will add the RI/1.0 pattern to our DSpace agents overload and purge them from
</ul>
</li>
</ul>
<pre><code class="language-console" data-lang="console">193.169.254.178 - - [21/Apr/2021:01:59:01 +0200] &quot;GET /rest/collections/1179/items?limit=812&amp;expand=metadata\x22%20and%20\x2221\x22=\x2221 HTTP/1.1&quot; 400 5 &quot;-&quot; &quot;Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)&quot;
<pre tabindex="0"><code class="language-console" data-lang="console">193.169.254.178 - - [21/Apr/2021:01:59:01 +0200] &quot;GET /rest/collections/1179/items?limit=812&amp;expand=metadata\x22%20and%20\x2221\x22=\x2221 HTTP/1.1&quot; 400 5 &quot;-&quot; &quot;Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)&quot;
193.169.254.178 - - [21/Apr/2021:02:00:36 +0200] &quot;GET /rest/collections/1179/items?limit=812&amp;expand=metadata-21%2B21*01 HTTP/1.1&quot; 200 458201 &quot;-&quot; &quot;Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)&quot;
193.169.254.178 - - [21/Apr/2021:02:00:36 +0200] &quot;GET /rest/collections/1179/items?limit=812&amp;expand=metadata'||lower('')||' HTTP/1.1&quot; 400 5 &quot;-&quot; &quot;Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)&quot;
193.169.254.178 - - [21/Apr/2021:02:02:10 +0200] &quot;GET /rest/collections/1179/items?limit=812&amp;expand=metadata'%2Brtrim('')%2B' HTTP/1.1&quot; 200 458209 &quot;-&quot; &quot;Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)&quot;
@ -155,7 +155,7 @@ I will add the RI/1.0 pattern to our DSpace agents overload and purge them from
<li>I will report the IP on abuseipdb.com and purge their hits from Solr</li>
<li>The second IP is in Colombia and is making thousands of requests for what looks like some test site:</li>
</ul>
<pre><code class="language-console" data-lang="console">181.62.166.177 - - [20/Apr/2021:22:48:42 +0200] &quot;GET /rest/collections/d1e11546-c62a-4aee-af91-fd482b3e7653/items?expand=metadata HTTP/2.0&quot; 200 123613 &quot;http://cassavalighthousetest.org/&quot; &quot;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36&quot;
<pre tabindex="0"><code class="language-console" data-lang="console">181.62.166.177 - - [20/Apr/2021:22:48:42 +0200] &quot;GET /rest/collections/d1e11546-c62a-4aee-af91-fd482b3e7653/items?expand=metadata HTTP/2.0&quot; 200 123613 &quot;http://cassavalighthousetest.org/&quot; &quot;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36&quot;
181.62.166.177 - - [20/Apr/2021:22:55:39 +0200] &quot;GET /rest/collections/d1e11546-c62a-4aee-af91-fd482b3e7653/items?expand=metadata HTTP/2.0&quot; 200 123613 &quot;http://cassavalighthousetest.org/&quot; &quot;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36&quot;
</code></pre><ul>
<li>But this site does not exist (yet?)
@ -165,11 +165,11 @@ I will add the RI/1.0 pattern to our DSpace agents overload and purge them from
</li>
<li>The third IP is in Russia apparently, and the user agent has the <code>pl-PL</code> locale with thousands of requests like this:</li>
</ul>
<pre><code class="language-console" data-lang="console">45.146.166.180 - - [18/Apr/2021:16:28:44 +0200] &quot;GET /bitstream/handle/10947/4153/.AAS%202014%20Annual%20Report.pdf?sequence=1%22%29%29%20AND%201691%3DUTL_INADDR.GET_HOST_ADDRESS%28CHR%28113%29%7C%7CCHR%28118%29%7C%7CCHR%28113%29%7C%7CCHR%28106%29%7C%7CCHR%28113%29%7C%7C%28SELECT%20%28CASE%20WHEN%20%281691%3D1691%29%20THEN%201%20ELSE%200%20END%29%20FROM%20DUAL%29%7C%7CCHR%28113%29%7C%7CCHR%2898%29%7C%7CCHR%28122%29%7C%7CCHR%28120%29%7C%7CCHR%28113%29%29%20AND%20%28%28%22RKbp%22%3D%22RKbp&amp;isAllowed=y HTTP/1.1&quot; 200 918998 &quot;http://cgspace.cgiar.org:80/bitstream/handle/10947/4153/.AAS 2014 Annual Report.pdf&quot; &quot;Mozilla/5.0 (Windows; U; Windows NT 5.1; pl-PL) AppleWebKit/523.15 (KHTML, like Gecko) Version/3.0 Safari/523.15&quot;
<pre tabindex="0"><code class="language-console" data-lang="console">45.146.166.180 - - [18/Apr/2021:16:28:44 +0200] &quot;GET /bitstream/handle/10947/4153/.AAS%202014%20Annual%20Report.pdf?sequence=1%22%29%29%20AND%201691%3DUTL_INADDR.GET_HOST_ADDRESS%28CHR%28113%29%7C%7CCHR%28118%29%7C%7CCHR%28113%29%7C%7CCHR%28106%29%7C%7CCHR%28113%29%7C%7C%28SELECT%20%28CASE%20WHEN%20%281691%3D1691%29%20THEN%201%20ELSE%200%20END%29%20FROM%20DUAL%29%7C%7CCHR%28113%29%7C%7CCHR%2898%29%7C%7CCHR%28122%29%7C%7CCHR%28120%29%7C%7CCHR%28113%29%29%20AND%20%28%28%22RKbp%22%3D%22RKbp&amp;isAllowed=y HTTP/1.1&quot; 200 918998 &quot;http://cgspace.cgiar.org:80/bitstream/handle/10947/4153/.AAS 2014 Annual Report.pdf&quot; &quot;Mozilla/5.0 (Windows; U; Windows NT 5.1; pl-PL) AppleWebKit/523.15 (KHTML, like Gecko) Version/3.0 Safari/523.15&quot;
</code></pre><ul>
<li>I will purge these all with my <code>check-spider-ip-hits.sh</code> script:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ ./ilri/check-spider-ip-hits.sh -f /tmp/ips.txt -p
<pre tabindex="0"><code class="language-console" data-lang="console">$ ./ilri/check-spider-ip-hits.sh -f /tmp/ips.txt -p
Purging 21648 hits from 193.169.254.178 in statistics
Purging 20323 hits from 181.62.166.177 in statistics
Purging 19376 hits from 45.146.166.180 in statistics
@ -179,7 +179,7 @@ Total number of bot hits purged: 61347
<ul>
<li>Check the AReS Harvester indexes:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
yellow open openrxv-items-temp H-CGsyyLTaqAj6-nKXZ-7w 1 1 0 0 283b 283b
yellow open openrxv-items-final ul3SKsa7Q9Cd_K7qokBY_w 1 1 103951 0 254mb 254mb
$ curl -s 'http://localhost:9200/_alias/' | python -m json.tool
@ -195,13 +195,13 @@ $ curl -s 'http://localhost:9200/_alias/' | python -m json.tool
</code></pre><ul>
<li>I think they look OK (<code>openrxv-items</code> is an alias of <code>openrxv-items-final</code>), but I took a backup just in case:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_mapping.json --type=mapping
<pre tabindex="0"><code class="language-console" data-lang="console">$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_mapping.json --type=mapping
$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_data.json --type=data --limit=1000
</code></pre><ul>
<li>Then I started an indexing in the AReS Explorer admin dashboard</li>
<li>The indexing finished, but it looks like the aliases are messed up again:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
yellow open openrxv-items-temp H-CGsyyLTaqAj6-nKXZ-7w 1 1 104165 105024 487.7mb 487.7mb
yellow open openrxv-items-final d0tbMM_SRWimirxr_gm9YA 1 1 937 0 2.2mb 2.2mb
</code></pre><h2 id="2021-05-05">2021-05-05</h2>
@ -229,7 +229,7 @@ yellow open openrxv-items-final d0tbMM_SRWimirxr_gm9YA 1 1 937 0
</ul>
</li>
</ul>
<pre><code class="language-console" data-lang="console">$ time ~/dspace64/bin/dspace index-discovery -b
<pre tabindex="0"><code class="language-console" data-lang="console">$ time ~/dspace64/bin/dspace index-discovery -b
~/dspace64/bin/dspace index-discovery -b 4053.24s user 53.17s system 38% cpu 2:58:53.83 total
</code></pre><ul>
<li>Nope! Still slow, and still no mapped item&hellip;
@ -244,7 +244,7 @@ yellow open openrxv-items-final d0tbMM_SRWimirxr_gm9YA 1 1 937 0
</li>
<li>The indexes on AReS Explorer are messed up after last week&rsquo;s harvesting:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
yellow open openrxv-items-temp H-CGsyyLTaqAj6-nKXZ-7w 1 1 104165 105024 487.7mb 487.7mb
yellow open openrxv-items-final d0tbMM_SRWimirxr_gm9YA 1 1 937 0 2.2mb 2.2mb
@ -262,21 +262,21 @@ $ curl -s 'http://localhost:9200/_alias/' | python -m json.tool
<li><code>openrxv-items</code> should be an alias of <code>openrxv-items-final</code>&hellip;</li>
<li>I made a backup of the temp index and then started indexing on the AReS Explorer admin dashboard:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -X PUT &quot;localhost:9200/openrxv-items-temp/_settings&quot; -H 'Content-Type: application/json' -d'{&quot;settings&quot;: {&quot;index.blocks.write&quot;: true}}'
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -X PUT &quot;localhost:9200/openrxv-items-temp/_settings&quot; -H 'Content-Type: application/json' -d'{&quot;settings&quot;: {&quot;index.blocks.write&quot;: true}}'
$ curl -s -X POST http://localhost:9200/openrxv-items-temp/_clone/openrxv-items-temp-backup
$ curl -X PUT &quot;localhost:9200/openrxv-items-temp/_settings&quot; -H 'Content-Type: application/json' -d'{&quot;settings&quot;: {&quot;index.blocks.write&quot;: false}}'
</code></pre><h2 id="2021-05-10">2021-05-10</h2>
<ul>
<li>Amazing, the harvesting on AReS finished but it messed up all the indexes and now there are no items in any index!</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
yellow open openrxv-items-temp 8thRX0WVRUeAzmd2hkG6TA 1 1 0 0 283b 283b
yellow open openrxv-items-temp-backup _0tyvctBTg2pjOlcoVP1LA 1 1 104165 20134 305.5mb 305.5mb
yellow open openrxv-items-final BtvV9kwVQ3yBYCZvJS1QyQ 1 1 0 0 283b 283b
</code></pre><ul>
<li>I fixed the indexes manually by re-creating them and cloning from the backup:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -XDELETE 'http://localhost:9200/openrxv-items-final'
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -XDELETE 'http://localhost:9200/openrxv-items-final'
$ curl -X PUT &quot;localhost:9200/openrxv-items-temp-backup/_settings&quot; -H 'Content-Type: application/json' -d'{&quot;settings&quot;: {&quot;index.blocks.write&quot;: true}}'
$ curl -s -X POST http://localhost:9200/openrxv-items-temp-backup/_clone/openrxv-items-final
$ curl -s -X POST 'http://localhost:9200/_aliases' -H 'Content-Type: application/json' -d'{&quot;actions&quot; : [{&quot;add&quot; : { &quot;index&quot; : &quot;openrxv-items-final&quot;, &quot;alias&quot; : &quot;openrxv-items&quot;}}]}'
@ -284,11 +284,11 @@ $ curl -XDELETE 'http://localhost:9200/openrxv-items-temp-backup'
</code></pre><ul>
<li>Also I ran all updated on the server and updated all Docker images, then rebooted the server (linode20):</li>
</ul>
<pre><code class="language-console" data-lang="console">$ docker images | grep -v ^REPO | sed 's/ \+/:/g' | cut -d: -f1,2 | xargs -L1 docker pull
<pre tabindex="0"><code class="language-console" data-lang="console">$ docker images | grep -v ^REPO | sed 's/ \+/:/g' | cut -d: -f1,2 | xargs -L1 docker pull
</code></pre><ul>
<li>I backed up the AReS Elasticsearch data using elasticdump, then started a new harvest:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_mapping.json --type=mapping
<pre tabindex="0"><code class="language-console" data-lang="console">$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_mapping.json --type=mapping
$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_data.json --type=data --limit=1000
</code></pre><ul>
<li>Discuss CGSpace statistics with the CIP team
@ -329,7 +329,7 @@ $ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/o
</li>
<li>I checked the CLARISA list against ROR&rsquo;s April, 2020 release (&ldquo;Version 9&rdquo;, on figshare, though it is version 8 in the dump):</li>
</ul>
<pre><code class="language-console" data-lang="console">$ ./ilri/ror-lookup.py -i /tmp/clarisa-institutions.txt -r ror-data-2021-04-06.json -o /tmp/clarisa-ror-matches.csv
<pre tabindex="0"><code class="language-console" data-lang="console">$ ./ilri/ror-lookup.py -i /tmp/clarisa-institutions.txt -r ror-data-2021-04-06.json -o /tmp/clarisa-ror-matches.csv
$ csvgrep -c matched -m 'true' /tmp/clarisa-ror-matches.csv | sed '1d' | wc -l
1770
</code></pre><ul>
@ -341,7 +341,7 @@ $ csvgrep -c matched -m 'true' /tmp/clarisa-ror-matches.csv | sed '1d' | wc -l
<ul>
<li>Fix a few thousand IWMI URLs that are using HTTP instead of HTTPS on CGSpace:</li>
</ul>
<pre><code class="language-console" data-lang="console">localhost/dspace63= &gt; UPDATE metadatavalue SET text_value = REGEXP_REPLACE(text_value, 'http://www.iwmi.cgiar.org','https://www.iwmi.cgiar.org', 'g') WHERE text_value LIKE 'http://www.iwmi.cgiar.org%' AND metadata_field_id=219;
<pre tabindex="0"><code class="language-console" data-lang="console">localhost/dspace63= &gt; UPDATE metadatavalue SET text_value = REGEXP_REPLACE(text_value, 'http://www.iwmi.cgiar.org','https://www.iwmi.cgiar.org', 'g') WHERE text_value LIKE 'http://www.iwmi.cgiar.org%' AND metadata_field_id=219;
UPDATE 1132
localhost/dspace63= &gt; UPDATE metadatavalue SET text_value = REGEXP_REPLACE(text_value, 'http://publications.iwmi.org','https://publications.iwmi.org', 'g') WHERE text_value LIKE 'http://publications.iwmi.org%' AND metadata_field_id=219;
UPDATE 1803
@ -367,7 +367,7 @@ UPDATE 1803
<ul>
<li>I have to fix the Elasticsearch indexes on AReS after last week&rsquo;s harvesting because, as always, the <code>openrxv-items</code> index should be an alias of <code>openrxv-items-final</code> instead of <code>openrxv-items-temp</code>:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/_alias/' | python -m json.tool
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/_alias/' | python -m json.tool
&quot;openrxv-items-final&quot;: {
&quot;aliases&quot;: {}
},
@ -380,13 +380,13 @@ UPDATE 1803
</code></pre><ul>
<li>I took a backup of the <code>openrxv-items</code> index with elasticdump so I can re-create them manually before starting a new harvest tomorrow:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_mapping.json --type=mapping
<pre tabindex="0"><code class="language-console" data-lang="console">$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_mapping.json --type=mapping
$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_data.json --type=data --limit=1000
</code></pre><h2 id="2021-05-16">2021-05-16</h2>
<ul>
<li>I deleted and re-created the Elasticsearch indexes on AReS:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -XDELETE 'http://localhost:9200/openrxv-items-final'
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -XDELETE 'http://localhost:9200/openrxv-items-final'
$ curl -XDELETE 'http://localhost:9200/openrxv-items-temp'
$ curl -XPUT 'http://localhost:9200/openrxv-items-final'
$ curl -XPUT 'http://localhost:9200/openrxv-items-temp'
@ -394,7 +394,7 @@ $ curl -s -X POST 'http://localhost:9200/_aliases' -H 'Content-Type: application
</code></pre><ul>
<li>Then I re-imported the backup that I created with elasticdump yesterday:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ elasticdump --input=/home/aorth/openrxv-items_mapping.json --output=http://localhost:9200/openrxv-items-final --type=mapping
<pre tabindex="0"><code class="language-console" data-lang="console">$ elasticdump --input=/home/aorth/openrxv-items_mapping.json --output=http://localhost:9200/openrxv-items-final --type=mapping
$ elasticdump --input=/home/aorth/openrxv-items_data.json --output=http://localhost:9200/openrxv-items-final --type=data --limit=1000
</code></pre><ul>
<li>Then I started a new harvest on AReS</li>
@ -403,7 +403,7 @@ $ elasticdump --input=/home/aorth/openrxv-items_data.json --output=http://localh
<ul>
<li>The AReS harvest finished and the Elasticsearch indexes seem OK so I shouldn&rsquo;t have to fix them next time&hellip;</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
yellow open openrxv-items-temp o3ijJLcyTtGMOPeWpAJiVA 1 1 0 0 283b 283b
yellow open openrxv-items-final TrJ1Ict3QZ-vFkj-4VcAzw 1 1 104317 0 259.4mb 259.4mb
$ curl -s 'http://localhost:9200/_alias/' | python -m json.tool
@ -423,7 +423,7 @@ $ curl -s 'http://localhost:9200/_alias/' | python -m json.tool
</ul>
</li>
</ul>
<pre><code class="language-console" data-lang="console">$ ldapsearch -x -H ldaps://AZCGNEROOT2.CGIARAD.ORG:636/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;cgspace-ldap@cgiarad.org&quot; -W &quot;(sAMAccountName=aorth)&quot;
<pre tabindex="0"><code class="language-console" data-lang="console">$ ldapsearch -x -H ldaps://AZCGNEROOT2.CGIARAD.ORG:636/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;cgspace-ldap@cgiarad.org&quot; -W &quot;(sAMAccountName=aorth)&quot;
Enter LDAP Password:
ldap_bind: Invalid credentials (49)
additional info: 80090308: LdapErr: DSID-0C090453, comment: AcceptSecurityContext error, data 532, v3839
@ -446,11 +446,11 @@ ldap_bind: Invalid credentials (49)
</ul>
</li>
</ul>
<pre><code class="language-console" data-lang="console">$ xmllint --xpath '//value-pairs[@value-pairs-name=&quot;ccafsprojectpii&quot;]/pair/stored-value/node()' dspace/config/input-forms.xml
<pre tabindex="0"><code class="language-console" data-lang="console">$ xmllint --xpath '//value-pairs[@value-pairs-name=&quot;ccafsprojectpii&quot;]/pair/stored-value/node()' dspace/config/input-forms.xml
</code></pre><ul>
<li>I formatted the input file with tidy, especially because one of the new project tags has an ampersand character&hellip; grrr:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ tidy -xml -utf8 -m -iq -w 0 dspace/config/input-forms.xml
<pre tabindex="0"><code class="language-console" data-lang="console">$ tidy -xml -utf8 -m -iq -w 0 dspace/config/input-forms.xml
line 3658 column 26 - Warning: unescaped &amp; or unknown entity &quot;&amp;WA_EU-IFAD&quot;
line 3659 column 23 - Warning: unescaped &amp; or unknown entity &quot;&amp;WA_EU-IFAD&quot;
</code></pre><ul>
@ -461,16 +461,16 @@ line 3659 column 23 - Warning: unescaped &amp; or unknown entity &quot;&amp;WA_E
<li>Paola from the Alliance emailed me some new ORCID identifiers to add to CGSpace</li>
<li>I saved the new ones to a text file, combined them with the others, extracted the ORCID iDs themselves, and updated the names using <code>resolve-orcids.py</code>:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-identifier.xml /tmp/new | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq &gt; /tmp/2021-05-18-combined.txt
<pre tabindex="0"><code class="language-console" data-lang="console">$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-identifier.xml /tmp/new | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq &gt; /tmp/2021-05-18-combined.txt
$ ./ilri/resolve-orcids.py -i /tmp/2021-05-18-combined.txt -o /tmp/2021-05-18-combined-names.txt
</code></pre><ul>
<li>I sorted the names and added the XML formatting in vim, then ran it through tidy:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ tidy -xml -utf8 -m -iq -w 0 dspace/config/controlled-vocabularies/cg-creator-identifier.xml
<pre tabindex="0"><code class="language-console" data-lang="console">$ tidy -xml -utf8 -m -iq -w 0 dspace/config/controlled-vocabularies/cg-creator-identifier.xml
</code></pre><ul>
<li>Tag fifty-five items from the Alliance&rsquo;s new authors with ORCID iDs using <code>add-orcid-identifiers-csv.py</code>:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ cat 2021-05-18-add-orcids.csv
<pre tabindex="0"><code class="language-console" data-lang="console">$ cat 2021-05-18-add-orcids.csv
dc.contributor.author,cg.creator.identifier
&quot;Urioste Daza, Sergio&quot;,Sergio Alejandro Urioste Daza: 0000-0002-3208-032X
&quot;Urioste, Sergio&quot;,Sergio Alejandro Urioste Daza: 0000-0002-3208-032X
@ -504,7 +504,7 @@ $ ./ilri/add-orcid-identifiers-csv.py -i /tmp/2021-05-18-add-orcids.csv -db dspa
</ul>
</li>
</ul>
<pre><code class="language-console" data-lang="console">dspace=# UPDATE metadatavalue SET text_value=LOWER(text_value) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=187 AND text_value ~ '[[:upper:]]';
<pre tabindex="0"><code class="language-console" data-lang="console">dspace=# UPDATE metadatavalue SET text_value=LOWER(text_value) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=187 AND text_value ~ '[[:upper:]]';
UPDATE 47405
</code></pre><ul>
<li>That&rsquo;s interesting because we lowercased them all a few months ago, so these must all be new&hellip; wow
@ -518,7 +518,7 @@ UPDATE 47405
<ul>
<li>Export the top 5,000 AGROVOC terms to validate them:</li>
</ul>
<pre><code class="language-console" data-lang="console">localhost/dspace63= &gt; \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id = 187 GROUP BY text_value ORDER BY count DESC LIMIT 5000) to /tmp/2021-05-20-agrovoc.csv WITH CSV HEADER;
<pre tabindex="0"><code class="language-console" data-lang="console">localhost/dspace63= &gt; \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id = 187 GROUP BY text_value ORDER BY count DESC LIMIT 5000) to /tmp/2021-05-20-agrovoc.csv WITH CSV HEADER;
COPY 5000
$ csvcut -c 1 /tmp/2021-05-20-agrovoc.csv| sed 1d &gt; /tmp/2021-05-20-agrovoc.txt
$ ./ilri/agrovoc-lookup.py -i /tmp/2021-05-20-agrovoc.txt -o /tmp/2021-05-20-agrovoc-results.csv
@ -545,7 +545,7 @@ $ csvgrep -c &quot;number of matches&quot; -r '^0$' /tmp/2021-05-20-agrovoc-resu
<ul>
<li>Add ORCID identifiers for missing ILRI authors and tag 550 others based on a few authors I noticed that were missing them:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ cat 2021-05-24-add-orcids.csv
<pre tabindex="0"><code class="language-console" data-lang="console">$ cat 2021-05-24-add-orcids.csv
dc.contributor.author,cg.creator.identifier
&quot;Patel, Ekta&quot;,&quot;Ekta Patel: 0000-0001-9400-6988&quot;
&quot;Dessie, Tadelle&quot;,&quot;Tadelle Dessie: 0000-0002-1630-0417&quot;
@ -562,7 +562,7 @@ $ ./ilri/add-orcid-identifiers-csv.py -i 2021-05-24-add-orcids.csv -db dspace -u
</code></pre><ul>
<li>A few days ago I took a backup of the Elasticsearch indexes on AReS using elasticdump:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_data.json --type=data --limit=1000
<pre tabindex="0"><code class="language-console" data-lang="console">$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_data.json --type=data --limit=1000
$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_mapping.json --type=mapping
</code></pre><ul>
<li>The indexes look OK so I started a harvesting on AReS</li>
@ -571,13 +571,13 @@ $ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/o
<ul>
<li>The AReS harvest got messed up somehow, as I see the number of items in the indexes are the same as before the harvesting:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
yellow open openrxv-items-temp o3ijJLcyTtGMOPeWpAJiVA 1 1 104373 106455 491.5mb 491.5mb
yellow open openrxv-items-final soEzAnp3TDClIGZbmVyEIw 1 1 953 0 2.3mb 2.3mb
</code></pre><ul>
<li>Update all docker images on the AReS server (linode20):</li>
</ul>
<pre><code class="language-console" data-lang="console">$ docker images | grep -v ^REPO | sed 's/ \+/:/g' | cut -d: -f1,2 | xargs -L1 docker pull
<pre tabindex="0"><code class="language-console" data-lang="console">$ docker images | grep -v ^REPO | sed 's/ \+/:/g' | cut -d: -f1,2 | xargs -L1 docker pull
$ docker-compose -f docker/docker-compose.yml down
$ docker-compose -f docker/docker-compose.yml build
</code></pre><ul>
@ -585,7 +585,7 @@ $ docker-compose -f docker/docker-compose.yml build
<li>Oh crap, I deleted everything on AReS and restored the backup and the total items are now 104317&hellip; so it was actually correct before!</li>
<li>For reference, this is how I re-created everything:</li>
</ul>
<pre><code class="language-console" data-lang="console">curl -XDELETE 'http://localhost:9200/openrxv-items-final'
<pre tabindex="0"><code class="language-console" data-lang="console">curl -XDELETE 'http://localhost:9200/openrxv-items-final'
curl -XDELETE 'http://localhost:9200/openrxv-items-temp'
curl -XPUT 'http://localhost:9200/openrxv-items-final'
curl -XPUT 'http://localhost:9200/openrxv-items-temp'
@ -605,7 +605,7 @@ elasticdump --input=/home/aorth/openrxv-items_data.json --output=http://localhos
</li>
<li>Looking in the DSpace log for this morning I see a big hole in the logs at that time (UTC+2 server time):</li>
</ul>
<pre><code>2021-05-26 02:17:52,808 INFO org.dspace.curate.Curator @ Curation task: countrycodetagger performed on: 10568/70659 with status: 2. Result: '10568/70659: item has country codes, skipping'
<pre tabindex="0"><code>2021-05-26 02:17:52,808 INFO org.dspace.curate.Curator @ Curation task: countrycodetagger performed on: 10568/70659 with status: 2. Result: '10568/70659: item has country codes, skipping'
2021-05-26 02:17:52,853 INFO org.dspace.curate.Curator @ Curation task: countrycodetagger performed on: 10568/66761 with status: 2. Result: '10568/66761: item has country codes, skipping'
2021-05-26 03:00:05,772 INFO org.dspace.statistics.SolrLoggerServiceImpl @ solr-statistics.spidersfile:null
2021-05-26 03:00:05,773 INFO org.dspace.statistics.SolrLoggerServiceImpl @ solr-statistics.server:http://localhost:8081/solr/statistics
@ -613,7 +613,7 @@ elasticdump --input=/home/aorth/openrxv-items_data.json --output=http://localhos
<li>There are no logs between 02:17 and 03:00&hellip; hmmm.</li>
<li>I see a similar gap in the Solr log, though it starts at 02:15:</li>
</ul>
<pre><code>2021-05-26 02:15:07,968 INFO org.apache.solr.core.SolrCore @ [search] webapp=/solr path=/select params={f.location.coll.facet.sort=count&amp;facet.field=location.comm&amp;facet.field=location.coll&amp;fl=handle,search.resourcetype,search.resourceid,search.uniqueid&amp;start=0&amp;fq=NOT(withdrawn:true)&amp;fq=NOT(discoverable:false)&amp;fq=search.resourcetype:2&amp;fq=NOT(discoverable:false)&amp;rows=0&amp;version=2&amp;q=*:*&amp;f.location.coll.facet.limit=-1&amp;facet.mincount=1&amp;facet=true&amp;f.location.comm.facet.sort=count&amp;wt=javabin&amp;facet.offset=0&amp;f.location.comm.facet.limit=-1} hits=90792 status=0 QTime=6
<pre tabindex="0"><code>2021-05-26 02:15:07,968 INFO org.apache.solr.core.SolrCore @ [search] webapp=/solr path=/select params={f.location.coll.facet.sort=count&amp;facet.field=location.comm&amp;facet.field=location.coll&amp;fl=handle,search.resourcetype,search.resourceid,search.uniqueid&amp;start=0&amp;fq=NOT(withdrawn:true)&amp;fq=NOT(discoverable:false)&amp;fq=search.resourcetype:2&amp;fq=NOT(discoverable:false)&amp;rows=0&amp;version=2&amp;q=*:*&amp;f.location.coll.facet.limit=-1&amp;facet.mincount=1&amp;facet=true&amp;f.location.comm.facet.sort=count&amp;wt=javabin&amp;facet.offset=0&amp;f.location.comm.facet.limit=-1} hits=90792 status=0 QTime=6
2021-05-26 02:15:09,446 INFO org.apache.solr.core.SolrCore @ [statistics] webapp=/solr path=/update params={wt=javabin&amp;version=2} status=0 QTime=1
2021-05-26 02:28:03,602 INFO org.apache.solr.update.UpdateHandler @ start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
2021-05-26 02:28:03,630 INFO org.apache.solr.core.SolrCore @ SolrDeletionPolicy.onCommit: commits: num=2
@ -626,19 +626,19 @@ elasticdump --input=/home/aorth/openrxv-items_data.json --output=http://localhos
</code></pre><ul>
<li>Ah, it seems to have been a <a href="https://status.linode.com/incidents/byqmt6nss9l0">Linode network issue in the Frankfurt region</a>:</li>
</ul>
<pre><code>May 26, 2021
<pre tabindex="0"><code>May 26, 2021
Connectivity Issue - Frankfurt
Resolved - We havent observed any additional connectivity issues in our Frankfurt data center, and will now consider this incident resolved. If you continue to experience problems, please open a Support ticket for assistance.
May 26, 02:57 UTC
</code></pre><ul>
<li>While looking in the logs I noticed an error about SMTP:</li>
</ul>
<pre><code>2021-05-26 02:00:18,015 ERROR org.dspace.eperson.SubscribeCLITool @ Failed to send subscription to eperson_id=934cb92f-2e77-4881-89e2-6f13ad4b1378
<pre tabindex="0"><code>2021-05-26 02:00:18,015 ERROR org.dspace.eperson.SubscribeCLITool @ Failed to send subscription to eperson_id=934cb92f-2e77-4881-89e2-6f13ad4b1378
2021-05-26 02:00:18,015 ERROR org.dspace.eperson.SubscribeCLITool @ javax.mail.SendFailedException: Send failure (javax.mail.MessagingException: Could not convert socket to TLS (javax.net.ssl.SSLHandshakeException: No appropriate protocol (protocol is disabled or cipher suites are inappropriate)))
</code></pre><ul>
<li>And indeed the email seems to be broken:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ dspace test-email
<pre tabindex="0"><code class="language-console" data-lang="console">$ dspace test-email
About to send test email:
- To: fuuuuuu