Add notes for 2021-11-08

This commit is contained in:
2021-11-09 06:29:52 +02:00
parent b3df4ff58f
commit 9afe5c13f9
110 changed files with 1827 additions and 1737 deletions

View File

@ -36,7 +36,7 @@ I looked at the top user agents and IPs in the Solr statistics for last month an
I will add the RI/1.0 pattern to our DSpace agents overload and purge them from Solr (we had previously seen this agent with 9,000 hits or so in 2020-09), but I think I will leave the Microsoft Word one… as that’s an actual user…
"/>
<meta name="generator" content="Hugo 0.88.1" />
<meta name="generator" content="Hugo 0.89.2" />
@ -147,17 +147,17 @@ I will add the RI/1.0 pattern to our DSpace agents overload and purge them from
</ul>
</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">193.169.254.178 - - [21/Apr/2021:01:59:01 +0200] &quot;GET /rest/collections/1179/items?limit=812&amp;expand=metadata\x22%20and%20\x2221\x22=\x2221 HTTP/1.1&quot; 400 5 &quot;-&quot; &quot;Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)&quot;
193.169.254.178 - - [21/Apr/2021:02:00:36 +0200] &quot;GET /rest/collections/1179/items?limit=812&amp;expand=metadata-21%2B21*01 HTTP/1.1&quot; 200 458201 &quot;-&quot; &quot;Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)&quot;
193.169.254.178 - - [21/Apr/2021:02:00:36 +0200] &quot;GET /rest/collections/1179/items?limit=812&amp;expand=metadata'||lower('')||' HTTP/1.1&quot; 400 5 &quot;-&quot; &quot;Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)&quot;
193.169.254.178 - - [21/Apr/2021:02:02:10 +0200] &quot;GET /rest/collections/1179/items?limit=812&amp;expand=metadata'%2Brtrim('')%2B' HTTP/1.1&quot; 200 458209 &quot;-&quot; &quot;Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)&quot;
</code></pre><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">193.169.254.178 - - [21/Apr/2021:01:59:01 +0200] &#34;GET /rest/collections/1179/items?limit=812&amp;expand=metadata\x22%20and%20\x2221\x22=\x2221 HTTP/1.1&#34; 400 5 &#34;-&#34; &#34;Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)&#34;
193.169.254.178 - - [21/Apr/2021:02:00:36 +0200] &#34;GET /rest/collections/1179/items?limit=812&amp;expand=metadata-21%2B21*01 HTTP/1.1&#34; 200 458201 &#34;-&#34; &#34;Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)&#34;
193.169.254.178 - - [21/Apr/2021:02:00:36 +0200] &#34;GET /rest/collections/1179/items?limit=812&amp;expand=metadata&#39;||lower(&#39;&#39;)||&#39; HTTP/1.1&#34; 400 5 &#34;-&#34; &#34;Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)&#34;
193.169.254.178 - - [21/Apr/2021:02:02:10 +0200] &#34;GET /rest/collections/1179/items?limit=812&amp;expand=metadata&#39;%2Brtrim(&#39;&#39;)%2B&#39; HTTP/1.1&#34; 200 458209 &#34;-&#34; &#34;Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)&#34;
</code></pre></div><ul>
<li>I will report the IP on abuseipdb.com and purge their hits from Solr</li>
<li>The second IP is in Colombia and is making thousands of requests for what looks like some test site:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">181.62.166.177 - - [20/Apr/2021:22:48:42 +0200] &quot;GET /rest/collections/d1e11546-c62a-4aee-af91-fd482b3e7653/items?expand=metadata HTTP/2.0&quot; 200 123613 &quot;http://cassavalighthousetest.org/&quot; &quot;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36&quot;
181.62.166.177 - - [20/Apr/2021:22:55:39 +0200] &quot;GET /rest/collections/d1e11546-c62a-4aee-af91-fd482b3e7653/items?expand=metadata HTTP/2.0&quot; 200 123613 &quot;http://cassavalighthousetest.org/&quot; &quot;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36&quot;
</code></pre><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">181.62.166.177 - - [20/Apr/2021:22:48:42 +0200] &#34;GET /rest/collections/d1e11546-c62a-4aee-af91-fd482b3e7653/items?expand=metadata HTTP/2.0&#34; 200 123613 &#34;http://cassavalighthousetest.org/&#34; &#34;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36&#34;
181.62.166.177 - - [20/Apr/2021:22:55:39 +0200] &#34;GET /rest/collections/d1e11546-c62a-4aee-af91-fd482b3e7653/items?expand=metadata HTTP/2.0&#34; 200 123613 &#34;http://cassavalighthousetest.org/&#34; &#34;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36&#34;
</code></pre></div><ul>
<li>But this site does not exist (yet?)
<ul>
<li>I will purge them from Solr</li>
@ -165,46 +165,46 @@ I will add the RI/1.0 pattern to our DSpace agents overload and purge them from
</li>
<li>The third IP is in Russia apparently, and the user agent has the <code>pl-PL</code> locale with thousands of requests like this:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">45.146.166.180 - - [18/Apr/2021:16:28:44 +0200] &quot;GET /bitstream/handle/10947/4153/.AAS%202014%20Annual%20Report.pdf?sequence=1%22%29%29%20AND%201691%3DUTL_INADDR.GET_HOST_ADDRESS%28CHR%28113%29%7C%7CCHR%28118%29%7C%7CCHR%28113%29%7C%7CCHR%28106%29%7C%7CCHR%28113%29%7C%7C%28SELECT%20%28CASE%20WHEN%20%281691%3D1691%29%20THEN%201%20ELSE%200%20END%29%20FROM%20DUAL%29%7C%7CCHR%28113%29%7C%7CCHR%2898%29%7C%7CCHR%28122%29%7C%7CCHR%28120%29%7C%7CCHR%28113%29%29%20AND%20%28%28%22RKbp%22%3D%22RKbp&amp;isAllowed=y HTTP/1.1&quot; 200 918998 &quot;http://cgspace.cgiar.org:80/bitstream/handle/10947/4153/.AAS 2014 Annual Report.pdf&quot; &quot;Mozilla/5.0 (Windows; U; Windows NT 5.1; pl-PL) AppleWebKit/523.15 (KHTML, like Gecko) Version/3.0 Safari/523.15&quot;
</code></pre><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">45.146.166.180 - - [18/Apr/2021:16:28:44 +0200] &#34;GET /bitstream/handle/10947/4153/.AAS%202014%20Annual%20Report.pdf?sequence=1%22%29%29%20AND%201691%3DUTL_INADDR.GET_HOST_ADDRESS%28CHR%28113%29%7C%7CCHR%28118%29%7C%7CCHR%28113%29%7C%7CCHR%28106%29%7C%7CCHR%28113%29%7C%7C%28SELECT%20%28CASE%20WHEN%20%281691%3D1691%29%20THEN%201%20ELSE%200%20END%29%20FROM%20DUAL%29%7C%7CCHR%28113%29%7C%7CCHR%2898%29%7C%7CCHR%28122%29%7C%7CCHR%28120%29%7C%7CCHR%28113%29%29%20AND%20%28%28%22RKbp%22%3D%22RKbp&amp;isAllowed=y HTTP/1.1&#34; 200 918998 &#34;http://cgspace.cgiar.org:80/bitstream/handle/10947/4153/.AAS 2014 Annual Report.pdf&#34; &#34;Mozilla/5.0 (Windows; U; Windows NT 5.1; pl-PL) AppleWebKit/523.15 (KHTML, like Gecko) Version/3.0 Safari/523.15&#34;
</code></pre></div><ul>
<li>I will purge these all with my <code>check-spider-ip-hits.sh</code> script:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ ./ilri/check-spider-ip-hits.sh -f /tmp/ips.txt -p
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ./ilri/check-spider-ip-hits.sh -f /tmp/ips.txt -p
Purging 21648 hits from 193.169.254.178 in statistics
Purging 20323 hits from 181.62.166.177 in statistics
Purging 19376 hits from 45.146.166.180 in statistics
Total number of bot hits purged: 61347
</code></pre><h2 id="2021-05-02">2021-05-02</h2>
<span style="color:#960050;background-color:#1e0010">
</span><span style="color:#960050;background-color:#1e0010"></span>Total number of bot hits purged: 61347
</code></pre></div><h2 id="2021-05-02">2021-05-02</h2>
<ul>
<li>Check the AReS Harvester indexes:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
yellow open openrxv-items-temp H-CGsyyLTaqAj6-nKXZ-7w 1 1 0 0 283b 283b
yellow open openrxv-items-final ul3SKsa7Q9Cd_K7qokBY_w 1 1 103951 0 254mb 254mb
$ curl -s 'http://localhost:9200/_alias/' | python -m json.tool
$ curl -s <span style="color:#e6db74">&#39;http://localhost:9200/_alias/&#39;</span> | python -m json.tool
...
&quot;openrxv-items-temp&quot;: {
&quot;aliases&quot;: {}
&#34;openrxv-items-temp&#34;: {
&#34;aliases&#34;: {}
},
&quot;openrxv-items-final&quot;: {
&quot;aliases&quot;: {
&quot;openrxv-items&quot;: {}
&#34;openrxv-items-final&#34;: {
&#34;aliases&#34;: {
&#34;openrxv-items&#34;: {}
}
},
</code></pre><ul>
</code></pre></div><ul>
<li>I think they look OK (<code>openrxv-items</code> is an alias of <code>openrxv-items-final</code>), but I took a backup just in case:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_mapping.json --type=mapping
$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_data.json --type=data --limit=1000
</code></pre><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ elasticdump --input<span style="color:#f92672">=</span>http://localhost:9200/openrxv-items --output<span style="color:#f92672">=</span>/home/aorth/openrxv-items_mapping.json --type<span style="color:#f92672">=</span>mapping
$ elasticdump --input<span style="color:#f92672">=</span>http://localhost:9200/openrxv-items --output<span style="color:#f92672">=</span>/home/aorth/openrxv-items_data.json --type<span style="color:#f92672">=</span>data --limit<span style="color:#f92672">=</span><span style="color:#ae81ff">1000</span>
</code></pre></div><ul>
<li>Then I started an indexing in the AReS Explorer admin dashboard</li>
<li>The indexing finished, but it looks like the aliases are messed up again:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
yellow open openrxv-items-temp H-CGsyyLTaqAj6-nKXZ-7w 1 1 104165 105024 487.7mb 487.7mb
yellow open openrxv-items-final d0tbMM_SRWimirxr_gm9YA 1 1 937 0 2.2mb 2.2mb
</code></pre><h2 id="2021-05-05">2021-05-05</h2>
</code></pre></div><h2 id="2021-05-05">2021-05-05</h2>
<ul>
<li>Peter noticed that we no longer display <code>cg.link.reference</code> on the item view
<ul>
@ -229,9 +229,9 @@ yellow open openrxv-items-final d0tbMM_SRWimirxr_gm9YA 1 1 937 0
</ul>
</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ time ~/dspace64/bin/dspace index-discovery -b
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ time ~/dspace64/bin/dspace index-discovery -b
~/dspace64/bin/dspace index-discovery -b 4053.24s user 53.17s system 38% cpu 2:58:53.83 total
</code></pre><ul>
</code></pre></div><ul>
<li>Nope! Still slow, and still no mapped item&hellip;
<ul>
<li>I even tried unmapping it from all collections, and adding it to a single new owning collection&hellip;</li>
@ -244,53 +244,53 @@ yellow open openrxv-items-final d0tbMM_SRWimirxr_gm9YA 1 1 937 0
</li>
<li>The indexes on AReS Explorer are messed up after last week&rsquo;s harvesting:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
yellow open openrxv-items-temp H-CGsyyLTaqAj6-nKXZ-7w 1 1 104165 105024 487.7mb 487.7mb
yellow open openrxv-items-final d0tbMM_SRWimirxr_gm9YA 1 1 937 0 2.2mb 2.2mb
$ curl -s 'http://localhost:9200/_alias/' | python -m json.tool
<span style="color:#960050;background-color:#1e0010">
</span><span style="color:#960050;background-color:#1e0010"></span>$ curl -s <span style="color:#e6db74">&#39;http://localhost:9200/_alias/&#39;</span> | python -m json.tool
...
&quot;openrxv-items-final&quot;: {
&quot;aliases&quot;: {}
&#34;openrxv-items-final&#34;: {
&#34;aliases&#34;: {}
},
&quot;openrxv-items-temp&quot;: {
&quot;aliases&quot;: {
&quot;openrxv-items&quot;: {}
&#34;openrxv-items-temp&#34;: {
&#34;aliases&#34;: {
&#34;openrxv-items&#34;: {}
}
}
</code></pre><ul>
</code></pre></div><ul>
<li><code>openrxv-items</code> should be an alias of <code>openrxv-items-final</code>&hellip;</li>
<li>I made a backup of the temp index and then started indexing on the AReS Explorer admin dashboard:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -X PUT &quot;localhost:9200/openrxv-items-temp/_settings&quot; -H 'Content-Type: application/json' -d'{&quot;settings&quot;: {&quot;index.blocks.write&quot;: true}}'
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ curl -X PUT <span style="color:#e6db74">&#34;localhost:9200/openrxv-items-temp/_settings&#34;</span> -H <span style="color:#e6db74">&#39;Content-Type: application/json&#39;</span> -d<span style="color:#e6db74">&#39;{&#34;settings&#34;: {&#34;index.blocks.write&#34;: true}}&#39;</span>
$ curl -s -X POST http://localhost:9200/openrxv-items-temp/_clone/openrxv-items-temp-backup
$ curl -X PUT &quot;localhost:9200/openrxv-items-temp/_settings&quot; -H 'Content-Type: application/json' -d'{&quot;settings&quot;: {&quot;index.blocks.write&quot;: false}}'
</code></pre><h2 id="2021-05-10">2021-05-10</h2>
$ curl -X PUT <span style="color:#e6db74">&#34;localhost:9200/openrxv-items-temp/_settings&#34;</span> -H <span style="color:#e6db74">&#39;Content-Type: application/json&#39;</span> -d<span style="color:#e6db74">&#39;{&#34;settings&#34;: {&#34;index.blocks.write&#34;: false}}&#39;</span>
</code></pre></div><h2 id="2021-05-10">2021-05-10</h2>
<ul>
<li>Amazing, the harvesting on AReS finished but it messed up all the indexes and now there are no items in any index!</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
yellow open openrxv-items-temp 8thRX0WVRUeAzmd2hkG6TA 1 1 0 0 283b 283b
yellow open openrxv-items-temp-backup _0tyvctBTg2pjOlcoVP1LA 1 1 104165 20134 305.5mb 305.5mb
yellow open openrxv-items-final BtvV9kwVQ3yBYCZvJS1QyQ 1 1 0 0 283b 283b
</code></pre><ul>
</code></pre></div><ul>
<li>I fixed the indexes manually by re-creating them and cloning from the backup:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -XDELETE 'http://localhost:9200/openrxv-items-final'
$ curl -X PUT &quot;localhost:9200/openrxv-items-temp-backup/_settings&quot; -H 'Content-Type: application/json' -d'{&quot;settings&quot;: {&quot;index.blocks.write&quot;: true}}'
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ curl -XDELETE <span style="color:#e6db74">&#39;http://localhost:9200/openrxv-items-final&#39;</span>
$ curl -X PUT <span style="color:#e6db74">&#34;localhost:9200/openrxv-items-temp-backup/_settings&#34;</span> -H <span style="color:#e6db74">&#39;Content-Type: application/json&#39;</span> -d<span style="color:#e6db74">&#39;{&#34;settings&#34;: {&#34;index.blocks.write&#34;: true}}&#39;</span>
$ curl -s -X POST http://localhost:9200/openrxv-items-temp-backup/_clone/openrxv-items-final
$ curl -s -X POST 'http://localhost:9200/_aliases' -H 'Content-Type: application/json' -d'{&quot;actions&quot; : [{&quot;add&quot; : { &quot;index&quot; : &quot;openrxv-items-final&quot;, &quot;alias&quot; : &quot;openrxv-items&quot;}}]}'
$ curl -XDELETE 'http://localhost:9200/openrxv-items-temp-backup'
</code></pre><ul>
$ curl -s -X POST <span style="color:#e6db74">&#39;http://localhost:9200/_aliases&#39;</span> -H <span style="color:#e6db74">&#39;Content-Type: application/json&#39;</span> -d<span style="color:#e6db74">&#39;{&#34;actions&#34; : [{&#34;add&#34; : { &#34;index&#34; : &#34;openrxv-items-final&#34;, &#34;alias&#34; : &#34;openrxv-items&#34;}}]}&#39;</span>
$ curl -XDELETE <span style="color:#e6db74">&#39;http://localhost:9200/openrxv-items-temp-backup&#39;</span>
</code></pre></div><ul>
<li>Also I ran all updated on the server and updated all Docker images, then rebooted the server (linode20):</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ docker images | grep -v ^REPO | sed 's/ \+/:/g' | cut -d: -f1,2 | xargs -L1 docker pull
</code></pre><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ docker images | grep -v ^REPO | sed <span style="color:#e6db74">&#39;s/ \+/:/g&#39;</span> | cut -d: -f1,2 | xargs -L1 docker pull
</code></pre></div><ul>
<li>I backed up the AReS Elasticsearch data using elasticdump, then started a new harvest:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_mapping.json --type=mapping
$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_data.json --type=data --limit=1000
</code></pre><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ elasticdump --input<span style="color:#f92672">=</span>http://localhost:9200/openrxv-items --output<span style="color:#f92672">=</span>/home/aorth/openrxv-items_mapping.json --type<span style="color:#f92672">=</span>mapping
$ elasticdump --input<span style="color:#f92672">=</span>http://localhost:9200/openrxv-items --output<span style="color:#f92672">=</span>/home/aorth/openrxv-items_data.json --type<span style="color:#f92672">=</span>data --limit<span style="color:#f92672">=</span><span style="color:#ae81ff">1000</span>
</code></pre></div><ul>
<li>Discuss CGSpace statistics with the CIP team
<ul>
<li>They were wondering why their numbers for 2020 were so low</li>
@ -329,10 +329,10 @@ $ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/o
</li>
<li>I checked the CLARISA list against ROR&rsquo;s April, 2020 release (&ldquo;Version 9&rdquo;, on figshare, though it is version 8 in the dump):</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ ./ilri/ror-lookup.py -i /tmp/clarisa-institutions.txt -r ror-data-2021-04-06.json -o /tmp/clarisa-ror-matches.csv
$ csvgrep -c matched -m 'true' /tmp/clarisa-ror-matches.csv | sed '1d' | wc -l
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ./ilri/ror-lookup.py -i /tmp/clarisa-institutions.txt -r ror-data-2021-04-06.json -o /tmp/clarisa-ror-matches.csv
$ csvgrep -c matched -m <span style="color:#e6db74">&#39;true&#39;</span> /tmp/clarisa-ror-matches.csv | sed <span style="color:#e6db74">&#39;1d&#39;</span> | wc -l
1770
</code></pre><ul>
</code></pre></div><ul>
<li>With 1770 out of 6230 matched, that&rsquo;s 28.5%&hellip;</li>
<li>I sent an email to Hector Tobon to point out the issues in CLARISA again and ask him to chat</li>
<li>Meeting with GARDIAN developers about CG Core and how GARDIAN works</li>
@ -341,11 +341,11 @@ $ csvgrep -c matched -m 'true' /tmp/clarisa-ror-matches.csv | sed '1d' | wc -l
<ul>
<li>Fix a few thousand IWMI URLs that are using HTTP instead of HTTPS on CGSpace:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">localhost/dspace63= &gt; UPDATE metadatavalue SET text_value = REGEXP_REPLACE(text_value, 'http://www.iwmi.cgiar.org','https://www.iwmi.cgiar.org', 'g') WHERE text_value LIKE 'http://www.iwmi.cgiar.org%' AND metadata_field_id=219;
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">localhost/dspace63= &gt; UPDATE metadatavalue SET text_value = REGEXP_REPLACE(text_value, &#39;http://www.iwmi.cgiar.org&#39;,&#39;https://www.iwmi.cgiar.org&#39;, &#39;g&#39;) WHERE text_value LIKE &#39;http://www.iwmi.cgiar.org%&#39; AND metadata_field_id=219;
UPDATE 1132
localhost/dspace63= &gt; UPDATE metadatavalue SET text_value = REGEXP_REPLACE(text_value, 'http://publications.iwmi.org','https://publications.iwmi.org', 'g') WHERE text_value LIKE 'http://publications.iwmi.org%' AND metadata_field_id=219;
localhost/dspace63= &gt; UPDATE metadatavalue SET text_value = REGEXP_REPLACE(text_value, &#39;http://publications.iwmi.org&#39;,&#39;https://publications.iwmi.org&#39;, &#39;g&#39;) WHERE text_value LIKE &#39;http://publications.iwmi.org%&#39; AND metadata_field_id=219;
UPDATE 1803
</code></pre><ul>
</code></pre></div><ul>
<li>In the case of the latter, the HTTP links don&rsquo;t even work! The web server returns HTTP 404 unless the request is HTTPS</li>
<li>IWMI also says that their subjects are a subset of AGROVOC so they no longer want to use <code>cg.subject.iwmi</code> for their subjects
<ul>
@ -367,67 +367,67 @@ UPDATE 1803
<ul>
<li>I have to fix the Elasticsearch indexes on AReS after last week&rsquo;s harvesting because, as always, the <code>openrxv-items</code> index should be an alias of <code>openrxv-items-final</code> instead of <code>openrxv-items-temp</code>:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/_alias/' | python -m json.tool
&quot;openrxv-items-final&quot;: {
&quot;aliases&quot;: {}
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ curl -s <span style="color:#e6db74">&#39;http://localhost:9200/_alias/&#39;</span> | python -m json.tool
&#34;openrxv-items-final&#34;: {
&#34;aliases&#34;: {}
},
&quot;openrxv-items-temp&quot;: {
&quot;aliases&quot;: {
&quot;openrxv-items&quot;: {}
&#34;openrxv-items-temp&#34;: {
&#34;aliases&#34;: {
&#34;openrxv-items&#34;: {}
}
},
...
</code></pre><ul>
</code></pre></div><ul>
<li>I took a backup of the <code>openrxv-items</code> index with elasticdump so I can re-create them manually before starting a new harvest tomorrow:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_mapping.json --type=mapping
$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_data.json --type=data --limit=1000
</code></pre><h2 id="2021-05-16">2021-05-16</h2>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ elasticdump --input<span style="color:#f92672">=</span>http://localhost:9200/openrxv-items --output<span style="color:#f92672">=</span>/home/aorth/openrxv-items_mapping.json --type<span style="color:#f92672">=</span>mapping
$ elasticdump --input<span style="color:#f92672">=</span>http://localhost:9200/openrxv-items --output<span style="color:#f92672">=</span>/home/aorth/openrxv-items_data.json --type<span style="color:#f92672">=</span>data --limit<span style="color:#f92672">=</span><span style="color:#ae81ff">1000</span>
</code></pre></div><h2 id="2021-05-16">2021-05-16</h2>
<ul>
<li>I deleted and re-created the Elasticsearch indexes on AReS:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -XDELETE 'http://localhost:9200/openrxv-items-final'
$ curl -XDELETE 'http://localhost:9200/openrxv-items-temp'
$ curl -XPUT 'http://localhost:9200/openrxv-items-final'
$ curl -XPUT 'http://localhost:9200/openrxv-items-temp'
$ curl -s -X POST 'http://localhost:9200/_aliases' -H 'Content-Type: application/json' -d'{&quot;actions&quot; : [{&quot;add&quot; : { &quot;index&quot; : &quot;openrxv-items-final&quot;, &quot;alias&quot; : &quot;openrxv-items&quot;}}]}'
</code></pre><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ curl -XDELETE <span style="color:#e6db74">&#39;http://localhost:9200/openrxv-items-final&#39;</span>
$ curl -XDELETE <span style="color:#e6db74">&#39;http://localhost:9200/openrxv-items-temp&#39;</span>
$ curl -XPUT <span style="color:#e6db74">&#39;http://localhost:9200/openrxv-items-final&#39;</span>
$ curl -XPUT <span style="color:#e6db74">&#39;http://localhost:9200/openrxv-items-temp&#39;</span>
$ curl -s -X POST <span style="color:#e6db74">&#39;http://localhost:9200/_aliases&#39;</span> -H <span style="color:#e6db74">&#39;Content-Type: application/json&#39;</span> -d<span style="color:#e6db74">&#39;{&#34;actions&#34; : [{&#34;add&#34; : { &#34;index&#34; : &#34;openrxv-items-final&#34;, &#34;alias&#34; : &#34;openrxv-items&#34;}}]}&#39;</span>
</code></pre></div><ul>
<li>Then I re-imported the backup that I created with elasticdump yesterday:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ elasticdump --input=/home/aorth/openrxv-items_mapping.json --output=http://localhost:9200/openrxv-items-final --type=mapping
$ elasticdump --input=/home/aorth/openrxv-items_data.json --output=http://localhost:9200/openrxv-items-final --type=data --limit=1000
</code></pre><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ elasticdump --input<span style="color:#f92672">=</span>/home/aorth/openrxv-items_mapping.json --output<span style="color:#f92672">=</span>http://localhost:9200/openrxv-items-final --type<span style="color:#f92672">=</span>mapping
$ elasticdump --input<span style="color:#f92672">=</span>/home/aorth/openrxv-items_data.json --output<span style="color:#f92672">=</span>http://localhost:9200/openrxv-items-final --type<span style="color:#f92672">=</span>data --limit<span style="color:#f92672">=</span><span style="color:#ae81ff">1000</span>
</code></pre></div><ul>
<li>Then I started a new harvest on AReS</li>
</ul>
<h2 id="2021-05-17">2021-05-17</h2>
<ul>
<li>The AReS harvest finished and the Elasticsearch indexes seem OK so I shouldn&rsquo;t have to fix them next time&hellip;</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
yellow open openrxv-items-temp o3ijJLcyTtGMOPeWpAJiVA 1 1 0 0 283b 283b
yellow open openrxv-items-final TrJ1Ict3QZ-vFkj-4VcAzw 1 1 104317 0 259.4mb 259.4mb
$ curl -s 'http://localhost:9200/_alias/' | python -m json.tool
&quot;openrxv-items-temp&quot;: {
&quot;aliases&quot;: {}
$ curl -s <span style="color:#e6db74">&#39;http://localhost:9200/_alias/&#39;</span> | python -m json.tool
&#34;openrxv-items-temp&#34;: {
&#34;aliases&#34;: {}
},
&quot;openrxv-items-final&quot;: {
&quot;aliases&quot;: {
&quot;openrxv-items&quot;: {}
&#34;openrxv-items-final&#34;: {
&#34;aliases&#34;: {
&#34;openrxv-items&#34;: {}
}
},
...
</code></pre><ul>
</code></pre></div><ul>
<li>Abenet said she and some others can&rsquo;t log into CGSpace
<ul>
<li>I tried to check the CGSpace LDAP account and it does seem to be not working:</li>
</ul>
</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ ldapsearch -x -H ldaps://AZCGNEROOT2.CGIARAD.ORG:636/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;cgspace-ldap@cgiarad.org&quot; -W &quot;(sAMAccountName=aorth)&quot;
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ldapsearch -x -H ldaps://AZCGNEROOT2.CGIARAD.ORG:636/ -b <span style="color:#e6db74">&#34;dc=cgiarad,dc=org&#34;</span> -D <span style="color:#e6db74">&#34;cgspace-ldap@cgiarad.org&#34;</span> -W <span style="color:#e6db74">&#34;(sAMAccountName=aorth)&#34;</span>
Enter LDAP Password:
ldap_bind: Invalid credentials (49)
additional info: 80090308: LdapErr: DSID-0C090453, comment: AcceptSecurityContext error, data 532, v3839
</code></pre><ul>
</code></pre></div><ul>
<li>I sent a message to Biruk so he can check the LDAP account</li>
<li>IWMI confirmed that they do indeed want to move all their subjects to AGROVOC, so I made the changes in the XMLUI and config (<a href="https://github.com/ilri/DSpace/pull/467">#467</a>)
<ul>
@ -446,14 +446,14 @@ ldap_bind: Invalid credentials (49)
</ul>
</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ xmllint --xpath '//value-pairs[@value-pairs-name=&quot;ccafsprojectpii&quot;]/pair/stored-value/node()' dspace/config/input-forms.xml
</code></pre><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ xmllint --xpath <span style="color:#e6db74">&#39;//value-pairs[@value-pairs-name=&#34;ccafsprojectpii&#34;]/pair/stored-value/node()&#39;</span> dspace/config/input-forms.xml
</code></pre></div><ul>
<li>I formatted the input file with tidy, especially because one of the new project tags has an ampersand character&hellip; grrr:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ tidy -xml -utf8 -m -iq -w 0 dspace/config/input-forms.xml
line 3658 column 26 - Warning: unescaped &amp; or unknown entity &quot;&amp;WA_EU-IFAD&quot;
line 3659 column 23 - Warning: unescaped &amp; or unknown entity &quot;&amp;WA_EU-IFAD&quot;
</code></pre><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ tidy -xml -utf8 -m -iq -w <span style="color:#ae81ff">0</span> dspace/config/input-forms.xml
line 3658 column 26 - Warning: unescaped &amp; or unknown entity &#34;&amp;WA_EU-IFAD&#34;
line 3659 column 23 - Warning: unescaped &amp; or unknown entity &#34;&amp;WA_EU-IFAD&#34;
</code></pre></div><ul>
<li>After testing whether this escaped value worked during submission, I created and merged a pull request to <code>6_x-prod</code> (<a href="https://github.com/ilri/DSpace/pull/468">#468</a>)</li>
</ul>
<h2 id="2021-05-18">2021-05-18</h2>
@ -461,34 +461,34 @@ line 3659 column 23 - Warning: unescaped &amp; or unknown entity &quot;&amp;WA_E
<li>Paola from the Alliance emailed me some new ORCID identifiers to add to CGSpace</li>
<li>I saved the new ones to a text file, combined them with the others, extracted the ORCID iDs themselves, and updated the names using <code>resolve-orcids.py</code>:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-identifier.xml /tmp/new | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq &gt; /tmp/2021-05-18-combined.txt
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-identifier.xml /tmp/new | grep -oE <span style="color:#e6db74">&#39;[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}&#39;</span> | sort | uniq &gt; /tmp/2021-05-18-combined.txt
$ ./ilri/resolve-orcids.py -i /tmp/2021-05-18-combined.txt -o /tmp/2021-05-18-combined-names.txt
</code></pre><ul>
</code></pre></div><ul>
<li>I sorted the names and added the XML formatting in vim, then ran it through tidy:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ tidy -xml -utf8 -m -iq -w 0 dspace/config/controlled-vocabularies/cg-creator-identifier.xml
</code></pre><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ tidy -xml -utf8 -m -iq -w <span style="color:#ae81ff">0</span> dspace/config/controlled-vocabularies/cg-creator-identifier.xml
</code></pre></div><ul>
<li>Tag fifty-five items from the Alliance&rsquo;s new authors with ORCID iDs using <code>add-orcid-identifiers-csv.py</code>:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ cat 2021-05-18-add-orcids.csv
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ cat 2021-05-18-add-orcids.csv
dc.contributor.author,cg.creator.identifier
&quot;Urioste Daza, Sergio&quot;,Sergio Alejandro Urioste Daza: 0000-0002-3208-032X
&quot;Urioste, Sergio&quot;,Sergio Alejandro Urioste Daza: 0000-0002-3208-032X
&quot;Villegas, Daniel&quot;,Daniel M. Villegas: 0000-0001-6801-3332
&quot;Villegas, Daniel M.&quot;,Daniel M. Villegas: 0000-0001-6801-3332
&quot;Giles, James&quot;,James Giles: 0000-0003-1899-9206
&quot;Simbare, Alice&quot;,Alice Simbare: 0000-0003-2389-0969
&quot;Simbare, Alice&quot;,Alice Simbare: 0000-0003-2389-0969
&quot;Simbare, A.&quot;,Alice Simbare: 0000-0003-2389-0969
&quot;Dita Rodriguez, Miguel&quot;,Miguel Angel Dita Rodriguez: 0000-0002-0496-4267
&quot;Templer, Noel&quot;,Noel Templer: 0000-0002-3201-9043
&quot;Jalonen, R.&quot;,Riina Jalonen: 0000-0003-1669-9138
&quot;Jalonen, Riina&quot;,Riina Jalonen: 0000-0003-1669-9138
&quot;Izquierdo, Paulo&quot;,Paulo Izquierdo: 0000-0002-2153-0655
&quot;Reyes, Byron&quot;,Byron Reyes: 0000-0003-2672-9636
&quot;Reyes, Byron A.&quot;,Byron Reyes: 0000-0003-2672-9636
$ ./ilri/add-orcid-identifiers-csv.py -i /tmp/2021-05-18-add-orcids.csv -db dspace -u dspace -p 'fuuu' -d
</code></pre><ul>
&#34;Urioste Daza, Sergio&#34;,Sergio Alejandro Urioste Daza: 0000-0002-3208-032X
&#34;Urioste, Sergio&#34;,Sergio Alejandro Urioste Daza: 0000-0002-3208-032X
&#34;Villegas, Daniel&#34;,Daniel M. Villegas: 0000-0001-6801-3332
&#34;Villegas, Daniel M.&#34;,Daniel M. Villegas: 0000-0001-6801-3332
&#34;Giles, James&#34;,James Giles: 0000-0003-1899-9206
&#34;Simbare, Alice&#34;,Alice Simbare: 0000-0003-2389-0969
&#34;Simbare, Alice&#34;,Alice Simbare: 0000-0003-2389-0969
&#34;Simbare, A.&#34;,Alice Simbare: 0000-0003-2389-0969
&#34;Dita Rodriguez, Miguel&#34;,Miguel Angel Dita Rodriguez: 0000-0002-0496-4267
&#34;Templer, Noel&#34;,Noel Templer: 0000-0002-3201-9043
&#34;Jalonen, R.&#34;,Riina Jalonen: 0000-0003-1669-9138
&#34;Jalonen, Riina&#34;,Riina Jalonen: 0000-0003-1669-9138
&#34;Izquierdo, Paulo&#34;,Paulo Izquierdo: 0000-0002-2153-0655
&#34;Reyes, Byron&#34;,Byron Reyes: 0000-0003-2672-9636
&#34;Reyes, Byron A.&#34;,Byron Reyes: 0000-0003-2672-9636
$ ./ilri/add-orcid-identifiers-csv.py -i /tmp/2021-05-18-add-orcids.csv -db dspace -u dspace -p <span style="color:#e6db74">&#39;fuuu&#39;</span> -d
</code></pre></div><ul>
<li>I deployed the latest <code>6_x-prod</code> branch on CGSpace, ran all system updates, and rebooted the server
<ul>
<li>This included the IWMI changes, so I also migrated the <code>cg.subject.iwmi</code> metadata to <code>dcterms.subject</code> and deleted the subject term</li>
@ -504,9 +504,9 @@ $ ./ilri/add-orcid-identifiers-csv.py -i /tmp/2021-05-18-add-orcids.csv -db dspa
</ul>
</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">dspace=# UPDATE metadatavalue SET text_value=LOWER(text_value) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=187 AND text_value ~ '[[:upper:]]';
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">dspace=# UPDATE metadatavalue SET text_value=LOWER(text_value) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=187 AND text_value ~ &#39;[[:upper:]]&#39;;
UPDATE 47405
</code></pre><ul>
</code></pre></div><ul>
<li>That&rsquo;s interesting because we lowercased them all a few months ago, so these must all be new&hellip; wow
<ul>
<li>We have 405,000 total AGROVOC terms, with 20,600 of them being unique</li>
@ -518,12 +518,12 @@ UPDATE 47405
<ul>
<li>Export the top 5,000 AGROVOC terms to validate them:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">localhost/dspace63= &gt; \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id = 187 GROUP BY text_value ORDER BY count DESC LIMIT 5000) to /tmp/2021-05-20-agrovoc.csv WITH CSV HEADER;
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">localhost/dspace63= &gt; \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id = 187 GROUP BY text_value ORDER BY count DESC LIMIT 5000) to /tmp/2021-05-20-agrovoc.csv WITH CSV HEADER;
COPY 5000
$ csvcut -c 1 /tmp/2021-05-20-agrovoc.csv| sed 1d &gt; /tmp/2021-05-20-agrovoc.txt
$ csvcut -c <span style="color:#ae81ff">1</span> /tmp/2021-05-20-agrovoc.csv| sed 1d &gt; /tmp/2021-05-20-agrovoc.txt
$ ./ilri/agrovoc-lookup.py -i /tmp/2021-05-20-agrovoc.txt -o /tmp/2021-05-20-agrovoc-results.csv
$ csvgrep -c &quot;number of matches&quot; -r '^0$' /tmp/2021-05-20-agrovoc-results.csv &gt; /tmp/2021-05-20-agrovoc-rejected.csv
</code></pre><ul>
$ csvgrep -c <span style="color:#e6db74">&#34;number of matches&#34;</span> -r <span style="color:#e6db74">&#39;^0$&#39;</span> /tmp/2021-05-20-agrovoc-results.csv &gt; /tmp/2021-05-20-agrovoc-rejected.csv
</code></pre></div><ul>
<li>Meeting with Medha and Pythagoras about the FAIR Workflow tool
<ul>
<li>Discussed the need for such a tool, other tools being developed, etc</li>
@ -545,54 +545,54 @@ $ csvgrep -c &quot;number of matches&quot; -r '^0$' /tmp/2021-05-20-agrovoc-resu
<ul>
<li>Add ORCID identifiers for missing ILRI authors and tag 550 others based on a few authors I noticed that were missing them:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ cat 2021-05-24-add-orcids.csv
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ cat 2021-05-24-add-orcids.csv
dc.contributor.author,cg.creator.identifier
&quot;Patel, Ekta&quot;,&quot;Ekta Patel: 0000-0001-9400-6988&quot;
&quot;Dessie, Tadelle&quot;,&quot;Tadelle Dessie: 0000-0002-1630-0417&quot;
&quot;Tadelle, D.&quot;,&quot;Tadelle Dessie: 0000-0002-1630-0417&quot;
&quot;Dione, Michel M.&quot;,&quot;Michel Dione: 0000-0001-7812-5776&quot;
&quot;Kiara, Henry K.&quot;,&quot;Henry Kiara: 0000-0001-9578-1636&quot;
&quot;Naessens, Jan&quot;,&quot;Jan Naessens: 0000-0002-7075-9915&quot;
&quot;Steinaa, Lucilla&quot;,&quot;Lucilla Steinaa: 0000-0003-3691-3971&quot;
&quot;Wieland, Barbara&quot;,&quot;Barbara Wieland: 0000-0003-4020-9186&quot;
&quot;Grace, Delia&quot;,&quot;Delia Grace: 0000-0002-0195-9489&quot;
&quot;Rao, Idupulapati M.&quot;,&quot;Idupulapati M. Rao: 0000-0002-8381-9358&quot;
&quot;Cardoso Arango, Juan Andrés&quot;,&quot;Juan Andrés Cardoso Arango: 0000-0002-0252-4655&quot;
$ ./ilri/add-orcid-identifiers-csv.py -i 2021-05-24-add-orcids.csv -db dspace -u dspace -p 'fuuu'
</code></pre><ul>
&#34;Patel, Ekta&#34;,&#34;Ekta Patel: 0000-0001-9400-6988&#34;
&#34;Dessie, Tadelle&#34;,&#34;Tadelle Dessie: 0000-0002-1630-0417&#34;
&#34;Tadelle, D.&#34;,&#34;Tadelle Dessie: 0000-0002-1630-0417&#34;
&#34;Dione, Michel M.&#34;,&#34;Michel Dione: 0000-0001-7812-5776&#34;
&#34;Kiara, Henry K.&#34;,&#34;Henry Kiara: 0000-0001-9578-1636&#34;
&#34;Naessens, Jan&#34;,&#34;Jan Naessens: 0000-0002-7075-9915&#34;
&#34;Steinaa, Lucilla&#34;,&#34;Lucilla Steinaa: 0000-0003-3691-3971&#34;
&#34;Wieland, Barbara&#34;,&#34;Barbara Wieland: 0000-0003-4020-9186&#34;
&#34;Grace, Delia&#34;,&#34;Delia Grace: 0000-0002-0195-9489&#34;
&#34;Rao, Idupulapati M.&#34;,&#34;Idupulapati M. Rao: 0000-0002-8381-9358&#34;
&#34;Cardoso Arango, Juan Andrés&#34;,&#34;Juan Andrés Cardoso Arango: 0000-0002-0252-4655&#34;
$ ./ilri/add-orcid-identifiers-csv.py -i 2021-05-24-add-orcids.csv -db dspace -u dspace -p <span style="color:#e6db74">&#39;fuuu&#39;</span>
</code></pre></div><ul>
<li>A few days ago I took a backup of the Elasticsearch indexes on AReS using elasticdump:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_data.json --type=data --limit=1000
$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_mapping.json --type=mapping
</code></pre><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ elasticdump --input<span style="color:#f92672">=</span>http://localhost:9200/openrxv-items --output<span style="color:#f92672">=</span>/home/aorth/openrxv-items_data.json --type<span style="color:#f92672">=</span>data --limit<span style="color:#f92672">=</span><span style="color:#ae81ff">1000</span>
$ elasticdump --input<span style="color:#f92672">=</span>http://localhost:9200/openrxv-items --output<span style="color:#f92672">=</span>/home/aorth/openrxv-items_mapping.json --type<span style="color:#f92672">=</span>mapping
</code></pre></div><ul>
<li>The indexes look OK so I started a harvesting on AReS</li>
</ul>
<h2 id="2021-05-25">2021-05-25</h2>
<ul>
<li>The AReS harvest got messed up somehow, as I see the number of items in the indexes are the same as before the harvesting:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
yellow open openrxv-items-temp o3ijJLcyTtGMOPeWpAJiVA 1 1 104373 106455 491.5mb 491.5mb
yellow open openrxv-items-final soEzAnp3TDClIGZbmVyEIw 1 1 953 0 2.3mb 2.3mb
</code></pre><ul>
</code></pre></div><ul>
<li>Update all docker images on the AReS server (linode20):</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ docker images | grep -v ^REPO | sed 's/ \+/:/g' | cut -d: -f1,2 | xargs -L1 docker pull
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ docker images | grep -v ^REPO | sed <span style="color:#e6db74">&#39;s/ \+/:/g&#39;</span> | cut -d: -f1,2 | xargs -L1 docker pull
$ docker-compose -f docker/docker-compose.yml down
$ docker-compose -f docker/docker-compose.yml build
</code></pre><ul>
</code></pre></div><ul>
<li>Then run all system updates on the server and reboot it</li>
<li>Oh crap, I deleted everything on AReS and restored the backup and the total items are now 104317&hellip; so it was actually correct before!</li>
<li>For reference, this is how I re-created everything:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">curl -XDELETE 'http://localhost:9200/openrxv-items-final'
curl -XDELETE 'http://localhost:9200/openrxv-items-temp'
curl -XPUT 'http://localhost:9200/openrxv-items-final'
curl -XPUT 'http://localhost:9200/openrxv-items-temp'
curl -s -X POST 'http://localhost:9200/_aliases' -H 'Content-Type: application/json' -d'{&quot;actions&quot; : [{&quot;add&quot; : { &quot;index&quot; : &quot;openrxv-items-final&quot;, &quot;alias&quot; : &quot;openrxv-items&quot;}}]}'
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">curl -XDELETE &#39;http://localhost:9200/openrxv-items-final&#39;
curl -XDELETE &#39;http://localhost:9200/openrxv-items-temp&#39;
curl -XPUT &#39;http://localhost:9200/openrxv-items-final&#39;
curl -XPUT &#39;http://localhost:9200/openrxv-items-temp&#39;
curl -s -X POST &#39;http://localhost:9200/_aliases&#39; -H &#39;Content-Type: application/json&#39; -d&#39;{&#34;actions&#34; : [{&#34;add&#34; : { &#34;index&#34; : &#34;openrxv-items-final&#34;, &#34;alias&#34; : &#34;openrxv-items&#34;}}]}&#39;
elasticdump --input=/home/aorth/openrxv-items_mapping.json --output=http://localhost:9200/openrxv-items-final --type=mapping
elasticdump --input=/home/aorth/openrxv-items_data.json --output=http://localhost:9200/openrxv-items-final --type=data --limit=1000
</code></pre><ul>
</code></pre></div><ul>
<li>I will just start a new harvest&hellip; sigh</li>
</ul>
<h2 id="2021-05-26">2021-05-26</h2>
@ -638,18 +638,18 @@ May 26, 02:57 UTC
</code></pre><ul>
<li>And indeed the email seems to be broken:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ dspace test-email
About to send test email:
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ dspace test-email
<span style="color:#960050;background-color:#1e0010">
</span><span style="color:#960050;background-color:#1e0010"></span>About to send test email:
- To: fuuuuuu
- Subject: DSpace test email
- Server: smtp.office365.com
Error sending email:
<span style="color:#960050;background-color:#1e0010">
</span><span style="color:#960050;background-color:#1e0010"></span>Error sending email:
- Error: javax.mail.SendFailedException: Send failure (javax.mail.MessagingException: Could not convert socket to TLS (javax.net.ssl.SSLHandshakeException: No appropriate protocol (protocol is disabled or cipher suites are inappropriate)))
Please see the DSpace documentation for assistance.
</code></pre><ul>
<span style="color:#960050;background-color:#1e0010">
</span><span style="color:#960050;background-color:#1e0010"></span>Please see the DSpace documentation for assistance.
</code></pre></div><ul>
<li>I saw a recent thread on the dspace-tech mailing list about this that makes me wonder if Microsoft changed something on Office 365
<ul>
<li>I added <code>mail.smtp.ssl.protocols=TLSv1.2</code> to the <code>mail.extraproperties</code> in dspace.cfg and the test email sent successfully</li>