mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2021-11-08
This commit is contained in:
@ -36,7 +36,7 @@ I looked at the top user agents and IPs in the Solr statistics for last month an
|
||||
|
||||
I will add the RI/1.0 pattern to our DSpace agents overload and purge them from Solr (we had previously seen this agent with 9,000 hits or so in 2020-09), but I think I will leave the Microsoft Word one… as that’s an actual user…
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.88.1" />
|
||||
<meta name="generator" content="Hugo 0.89.2" />
|
||||
|
||||
|
||||
|
||||
@ -147,17 +147,17 @@ I will add the RI/1.0 pattern to our DSpace agents overload and purge them from
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">193.169.254.178 - - [21/Apr/2021:01:59:01 +0200] "GET /rest/collections/1179/items?limit=812&expand=metadata\x22%20and%20\x2221\x22=\x2221 HTTP/1.1" 400 5 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
|
||||
193.169.254.178 - - [21/Apr/2021:02:00:36 +0200] "GET /rest/collections/1179/items?limit=812&expand=metadata-21%2B21*01 HTTP/1.1" 200 458201 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
|
||||
193.169.254.178 - - [21/Apr/2021:02:00:36 +0200] "GET /rest/collections/1179/items?limit=812&expand=metadata'||lower('')||' HTTP/1.1" 400 5 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
|
||||
193.169.254.178 - - [21/Apr/2021:02:02:10 +0200] "GET /rest/collections/1179/items?limit=812&expand=metadata'%2Brtrim('')%2B' HTTP/1.1" 200 458209 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
|
||||
</code></pre><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">193.169.254.178 - - [21/Apr/2021:01:59:01 +0200] "GET /rest/collections/1179/items?limit=812&expand=metadata\x22%20and%20\x2221\x22=\x2221 HTTP/1.1" 400 5 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
|
||||
193.169.254.178 - - [21/Apr/2021:02:00:36 +0200] "GET /rest/collections/1179/items?limit=812&expand=metadata-21%2B21*01 HTTP/1.1" 200 458201 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
|
||||
193.169.254.178 - - [21/Apr/2021:02:00:36 +0200] "GET /rest/collections/1179/items?limit=812&expand=metadata'||lower('')||' HTTP/1.1" 400 5 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
|
||||
193.169.254.178 - - [21/Apr/2021:02:02:10 +0200] "GET /rest/collections/1179/items?limit=812&expand=metadata'%2Brtrim('')%2B' HTTP/1.1" 200 458209 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
|
||||
</code></pre></div><ul>
|
||||
<li>I will report the IP on abuseipdb.com and purge their hits from Solr</li>
|
||||
<li>The second IP is in Colombia and is making thousands of requests for what looks like some test site:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">181.62.166.177 - - [20/Apr/2021:22:48:42 +0200] "GET /rest/collections/d1e11546-c62a-4aee-af91-fd482b3e7653/items?expand=metadata HTTP/2.0" 200 123613 "http://cassavalighthousetest.org/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36"
|
||||
181.62.166.177 - - [20/Apr/2021:22:55:39 +0200] "GET /rest/collections/d1e11546-c62a-4aee-af91-fd482b3e7653/items?expand=metadata HTTP/2.0" 200 123613 "http://cassavalighthousetest.org/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36"
|
||||
</code></pre><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">181.62.166.177 - - [20/Apr/2021:22:48:42 +0200] "GET /rest/collections/d1e11546-c62a-4aee-af91-fd482b3e7653/items?expand=metadata HTTP/2.0" 200 123613 "http://cassavalighthousetest.org/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36"
|
||||
181.62.166.177 - - [20/Apr/2021:22:55:39 +0200] "GET /rest/collections/d1e11546-c62a-4aee-af91-fd482b3e7653/items?expand=metadata HTTP/2.0" 200 123613 "http://cassavalighthousetest.org/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36"
|
||||
</code></pre></div><ul>
|
||||
<li>But this site does not exist (yet?)
|
||||
<ul>
|
||||
<li>I will purge them from Solr</li>
|
||||
@ -165,46 +165,46 @@ I will add the RI/1.0 pattern to our DSpace agents overload and purge them from
|
||||
</li>
|
||||
<li>The third IP is in Russia apparently, and the user agent has the <code>pl-PL</code> locale with thousands of requests like this:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">45.146.166.180 - - [18/Apr/2021:16:28:44 +0200] "GET /bitstream/handle/10947/4153/.AAS%202014%20Annual%20Report.pdf?sequence=1%22%29%29%20AND%201691%3DUTL_INADDR.GET_HOST_ADDRESS%28CHR%28113%29%7C%7CCHR%28118%29%7C%7CCHR%28113%29%7C%7CCHR%28106%29%7C%7CCHR%28113%29%7C%7C%28SELECT%20%28CASE%20WHEN%20%281691%3D1691%29%20THEN%201%20ELSE%200%20END%29%20FROM%20DUAL%29%7C%7CCHR%28113%29%7C%7CCHR%2898%29%7C%7CCHR%28122%29%7C%7CCHR%28120%29%7C%7CCHR%28113%29%29%20AND%20%28%28%22RKbp%22%3D%22RKbp&isAllowed=y HTTP/1.1" 200 918998 "http://cgspace.cgiar.org:80/bitstream/handle/10947/4153/.AAS 2014 Annual Report.pdf" "Mozilla/5.0 (Windows; U; Windows NT 5.1; pl-PL) AppleWebKit/523.15 (KHTML, like Gecko) Version/3.0 Safari/523.15"
|
||||
</code></pre><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">45.146.166.180 - - [18/Apr/2021:16:28:44 +0200] "GET /bitstream/handle/10947/4153/.AAS%202014%20Annual%20Report.pdf?sequence=1%22%29%29%20AND%201691%3DUTL_INADDR.GET_HOST_ADDRESS%28CHR%28113%29%7C%7CCHR%28118%29%7C%7CCHR%28113%29%7C%7CCHR%28106%29%7C%7CCHR%28113%29%7C%7C%28SELECT%20%28CASE%20WHEN%20%281691%3D1691%29%20THEN%201%20ELSE%200%20END%29%20FROM%20DUAL%29%7C%7CCHR%28113%29%7C%7CCHR%2898%29%7C%7CCHR%28122%29%7C%7CCHR%28120%29%7C%7CCHR%28113%29%29%20AND%20%28%28%22RKbp%22%3D%22RKbp&isAllowed=y HTTP/1.1" 200 918998 "http://cgspace.cgiar.org:80/bitstream/handle/10947/4153/.AAS 2014 Annual Report.pdf" "Mozilla/5.0 (Windows; U; Windows NT 5.1; pl-PL) AppleWebKit/523.15 (KHTML, like Gecko) Version/3.0 Safari/523.15"
|
||||
</code></pre></div><ul>
|
||||
<li>I will purge these all with my <code>check-spider-ip-hits.sh</code> script:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ ./ilri/check-spider-ip-hits.sh -f /tmp/ips.txt -p
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ./ilri/check-spider-ip-hits.sh -f /tmp/ips.txt -p
|
||||
Purging 21648 hits from 193.169.254.178 in statistics
|
||||
Purging 20323 hits from 181.62.166.177 in statistics
|
||||
Purging 19376 hits from 45.146.166.180 in statistics
|
||||
|
||||
Total number of bot hits purged: 61347
|
||||
</code></pre><h2 id="2021-05-02">2021-05-02</h2>
|
||||
<span style="color:#960050;background-color:#1e0010">
|
||||
</span><span style="color:#960050;background-color:#1e0010"></span>Total number of bot hits purged: 61347
|
||||
</code></pre></div><h2 id="2021-05-02">2021-05-02</h2>
|
||||
<ul>
|
||||
<li>Check the AReS Harvester indexes:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
|
||||
yellow open openrxv-items-temp H-CGsyyLTaqAj6-nKXZ-7w 1 1 0 0 283b 283b
|
||||
yellow open openrxv-items-final ul3SKsa7Q9Cd_K7qokBY_w 1 1 103951 0 254mb 254mb
|
||||
$ curl -s 'http://localhost:9200/_alias/' | python -m json.tool
|
||||
$ curl -s <span style="color:#e6db74">'http://localhost:9200/_alias/'</span> | python -m json.tool
|
||||
...
|
||||
"openrxv-items-temp": {
|
||||
"aliases": {}
|
||||
"openrxv-items-temp": {
|
||||
"aliases": {}
|
||||
},
|
||||
"openrxv-items-final": {
|
||||
"aliases": {
|
||||
"openrxv-items": {}
|
||||
"openrxv-items-final": {
|
||||
"aliases": {
|
||||
"openrxv-items": {}
|
||||
}
|
||||
},
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>I think they look OK (<code>openrxv-items</code> is an alias of <code>openrxv-items-final</code>), but I took a backup just in case:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_mapping.json --type=mapping
|
||||
$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_data.json --type=data --limit=1000
|
||||
</code></pre><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ elasticdump --input<span style="color:#f92672">=</span>http://localhost:9200/openrxv-items --output<span style="color:#f92672">=</span>/home/aorth/openrxv-items_mapping.json --type<span style="color:#f92672">=</span>mapping
|
||||
$ elasticdump --input<span style="color:#f92672">=</span>http://localhost:9200/openrxv-items --output<span style="color:#f92672">=</span>/home/aorth/openrxv-items_data.json --type<span style="color:#f92672">=</span>data --limit<span style="color:#f92672">=</span><span style="color:#ae81ff">1000</span>
|
||||
</code></pre></div><ul>
|
||||
<li>Then I started an indexing in the AReS Explorer admin dashboard</li>
|
||||
<li>The indexing finished, but it looks like the aliases are messed up again:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
|
||||
yellow open openrxv-items-temp H-CGsyyLTaqAj6-nKXZ-7w 1 1 104165 105024 487.7mb 487.7mb
|
||||
yellow open openrxv-items-final d0tbMM_SRWimirxr_gm9YA 1 1 937 0 2.2mb 2.2mb
|
||||
</code></pre><h2 id="2021-05-05">2021-05-05</h2>
|
||||
</code></pre></div><h2 id="2021-05-05">2021-05-05</h2>
|
||||
<ul>
|
||||
<li>Peter noticed that we no longer display <code>cg.link.reference</code> on the item view
|
||||
<ul>
|
||||
@ -229,9 +229,9 @@ yellow open openrxv-items-final d0tbMM_SRWimirxr_gm9YA 1 1 937 0
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ time ~/dspace64/bin/dspace index-discovery -b
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ time ~/dspace64/bin/dspace index-discovery -b
|
||||
~/dspace64/bin/dspace index-discovery -b 4053.24s user 53.17s system 38% cpu 2:58:53.83 total
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>Nope! Still slow, and still no mapped item…
|
||||
<ul>
|
||||
<li>I even tried unmapping it from all collections, and adding it to a single new owning collection…</li>
|
||||
@ -244,53 +244,53 @@ yellow open openrxv-items-final d0tbMM_SRWimirxr_gm9YA 1 1 937 0
|
||||
</li>
|
||||
<li>The indexes on AReS Explorer are messed up after last week’s harvesting:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
|
||||
yellow open openrxv-items-temp H-CGsyyLTaqAj6-nKXZ-7w 1 1 104165 105024 487.7mb 487.7mb
|
||||
yellow open openrxv-items-final d0tbMM_SRWimirxr_gm9YA 1 1 937 0 2.2mb 2.2mb
|
||||
|
||||
$ curl -s 'http://localhost:9200/_alias/' | python -m json.tool
|
||||
<span style="color:#960050;background-color:#1e0010">
|
||||
</span><span style="color:#960050;background-color:#1e0010"></span>$ curl -s <span style="color:#e6db74">'http://localhost:9200/_alias/'</span> | python -m json.tool
|
||||
...
|
||||
"openrxv-items-final": {
|
||||
"aliases": {}
|
||||
"openrxv-items-final": {
|
||||
"aliases": {}
|
||||
},
|
||||
"openrxv-items-temp": {
|
||||
"aliases": {
|
||||
"openrxv-items": {}
|
||||
"openrxv-items-temp": {
|
||||
"aliases": {
|
||||
"openrxv-items": {}
|
||||
}
|
||||
}
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li><code>openrxv-items</code> should be an alias of <code>openrxv-items-final</code>…</li>
|
||||
<li>I made a backup of the temp index and then started indexing on the AReS Explorer admin dashboard:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -X PUT "localhost:9200/openrxv-items-temp/_settings" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": true}}'
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ curl -X PUT <span style="color:#e6db74">"localhost:9200/openrxv-items-temp/_settings"</span> -H <span style="color:#e6db74">'Content-Type: application/json'</span> -d<span style="color:#e6db74">'{"settings": {"index.blocks.write": true}}'</span>
|
||||
$ curl -s -X POST http://localhost:9200/openrxv-items-temp/_clone/openrxv-items-temp-backup
|
||||
$ curl -X PUT "localhost:9200/openrxv-items-temp/_settings" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": false}}'
|
||||
</code></pre><h2 id="2021-05-10">2021-05-10</h2>
|
||||
$ curl -X PUT <span style="color:#e6db74">"localhost:9200/openrxv-items-temp/_settings"</span> -H <span style="color:#e6db74">'Content-Type: application/json'</span> -d<span style="color:#e6db74">'{"settings": {"index.blocks.write": false}}'</span>
|
||||
</code></pre></div><h2 id="2021-05-10">2021-05-10</h2>
|
||||
<ul>
|
||||
<li>Amazing, the harvesting on AReS finished but it messed up all the indexes and now there are no items in any index!</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
|
||||
yellow open openrxv-items-temp 8thRX0WVRUeAzmd2hkG6TA 1 1 0 0 283b 283b
|
||||
yellow open openrxv-items-temp-backup _0tyvctBTg2pjOlcoVP1LA 1 1 104165 20134 305.5mb 305.5mb
|
||||
yellow open openrxv-items-final BtvV9kwVQ3yBYCZvJS1QyQ 1 1 0 0 283b 283b
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>I fixed the indexes manually by re-creating them and cloning from the backup:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -XDELETE 'http://localhost:9200/openrxv-items-final'
|
||||
$ curl -X PUT "localhost:9200/openrxv-items-temp-backup/_settings" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": true}}'
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ curl -XDELETE <span style="color:#e6db74">'http://localhost:9200/openrxv-items-final'</span>
|
||||
$ curl -X PUT <span style="color:#e6db74">"localhost:9200/openrxv-items-temp-backup/_settings"</span> -H <span style="color:#e6db74">'Content-Type: application/json'</span> -d<span style="color:#e6db74">'{"settings": {"index.blocks.write": true}}'</span>
|
||||
$ curl -s -X POST http://localhost:9200/openrxv-items-temp-backup/_clone/openrxv-items-final
|
||||
$ curl -s -X POST 'http://localhost:9200/_aliases' -H 'Content-Type: application/json' -d'{"actions" : [{"add" : { "index" : "openrxv-items-final", "alias" : "openrxv-items"}}]}'
|
||||
$ curl -XDELETE 'http://localhost:9200/openrxv-items-temp-backup'
|
||||
</code></pre><ul>
|
||||
$ curl -s -X POST <span style="color:#e6db74">'http://localhost:9200/_aliases'</span> -H <span style="color:#e6db74">'Content-Type: application/json'</span> -d<span style="color:#e6db74">'{"actions" : [{"add" : { "index" : "openrxv-items-final", "alias" : "openrxv-items"}}]}'</span>
|
||||
$ curl -XDELETE <span style="color:#e6db74">'http://localhost:9200/openrxv-items-temp-backup'</span>
|
||||
</code></pre></div><ul>
|
||||
<li>Also I ran all updated on the server and updated all Docker images, then rebooted the server (linode20):</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ docker images | grep -v ^REPO | sed 's/ \+/:/g' | cut -d: -f1,2 | xargs -L1 docker pull
|
||||
</code></pre><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ docker images | grep -v ^REPO | sed <span style="color:#e6db74">'s/ \+/:/g'</span> | cut -d: -f1,2 | xargs -L1 docker pull
|
||||
</code></pre></div><ul>
|
||||
<li>I backed up the AReS Elasticsearch data using elasticdump, then started a new harvest:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_mapping.json --type=mapping
|
||||
$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_data.json --type=data --limit=1000
|
||||
</code></pre><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ elasticdump --input<span style="color:#f92672">=</span>http://localhost:9200/openrxv-items --output<span style="color:#f92672">=</span>/home/aorth/openrxv-items_mapping.json --type<span style="color:#f92672">=</span>mapping
|
||||
$ elasticdump --input<span style="color:#f92672">=</span>http://localhost:9200/openrxv-items --output<span style="color:#f92672">=</span>/home/aorth/openrxv-items_data.json --type<span style="color:#f92672">=</span>data --limit<span style="color:#f92672">=</span><span style="color:#ae81ff">1000</span>
|
||||
</code></pre></div><ul>
|
||||
<li>Discuss CGSpace statistics with the CIP team
|
||||
<ul>
|
||||
<li>They were wondering why their numbers for 2020 were so low</li>
|
||||
@ -329,10 +329,10 @@ $ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/o
|
||||
</li>
|
||||
<li>I checked the CLARISA list against ROR’s April, 2020 release (“Version 9”, on figshare, though it is version 8 in the dump):</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ ./ilri/ror-lookup.py -i /tmp/clarisa-institutions.txt -r ror-data-2021-04-06.json -o /tmp/clarisa-ror-matches.csv
|
||||
$ csvgrep -c matched -m 'true' /tmp/clarisa-ror-matches.csv | sed '1d' | wc -l
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ./ilri/ror-lookup.py -i /tmp/clarisa-institutions.txt -r ror-data-2021-04-06.json -o /tmp/clarisa-ror-matches.csv
|
||||
$ csvgrep -c matched -m <span style="color:#e6db74">'true'</span> /tmp/clarisa-ror-matches.csv | sed <span style="color:#e6db74">'1d'</span> | wc -l
|
||||
1770
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>With 1770 out of 6230 matched, that’s 28.5%…</li>
|
||||
<li>I sent an email to Hector Tobon to point out the issues in CLARISA again and ask him to chat</li>
|
||||
<li>Meeting with GARDIAN developers about CG Core and how GARDIAN works</li>
|
||||
@ -341,11 +341,11 @@ $ csvgrep -c matched -m 'true' /tmp/clarisa-ror-matches.csv | sed '1d' | wc -l
|
||||
<ul>
|
||||
<li>Fix a few thousand IWMI URLs that are using HTTP instead of HTTPS on CGSpace:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">localhost/dspace63= > UPDATE metadatavalue SET text_value = REGEXP_REPLACE(text_value, 'http://www.iwmi.cgiar.org','https://www.iwmi.cgiar.org', 'g') WHERE text_value LIKE 'http://www.iwmi.cgiar.org%' AND metadata_field_id=219;
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">localhost/dspace63= > UPDATE metadatavalue SET text_value = REGEXP_REPLACE(text_value, 'http://www.iwmi.cgiar.org','https://www.iwmi.cgiar.org', 'g') WHERE text_value LIKE 'http://www.iwmi.cgiar.org%' AND metadata_field_id=219;
|
||||
UPDATE 1132
|
||||
localhost/dspace63= > UPDATE metadatavalue SET text_value = REGEXP_REPLACE(text_value, 'http://publications.iwmi.org','https://publications.iwmi.org', 'g') WHERE text_value LIKE 'http://publications.iwmi.org%' AND metadata_field_id=219;
|
||||
localhost/dspace63= > UPDATE metadatavalue SET text_value = REGEXP_REPLACE(text_value, 'http://publications.iwmi.org','https://publications.iwmi.org', 'g') WHERE text_value LIKE 'http://publications.iwmi.org%' AND metadata_field_id=219;
|
||||
UPDATE 1803
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>In the case of the latter, the HTTP links don’t even work! The web server returns HTTP 404 unless the request is HTTPS</li>
|
||||
<li>IWMI also says that their subjects are a subset of AGROVOC so they no longer want to use <code>cg.subject.iwmi</code> for their subjects
|
||||
<ul>
|
||||
@ -367,67 +367,67 @@ UPDATE 1803
|
||||
<ul>
|
||||
<li>I have to fix the Elasticsearch indexes on AReS after last week’s harvesting because, as always, the <code>openrxv-items</code> index should be an alias of <code>openrxv-items-final</code> instead of <code>openrxv-items-temp</code>:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/_alias/' | python -m json.tool
|
||||
"openrxv-items-final": {
|
||||
"aliases": {}
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ curl -s <span style="color:#e6db74">'http://localhost:9200/_alias/'</span> | python -m json.tool
|
||||
"openrxv-items-final": {
|
||||
"aliases": {}
|
||||
},
|
||||
"openrxv-items-temp": {
|
||||
"aliases": {
|
||||
"openrxv-items": {}
|
||||
"openrxv-items-temp": {
|
||||
"aliases": {
|
||||
"openrxv-items": {}
|
||||
}
|
||||
},
|
||||
...
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>I took a backup of the <code>openrxv-items</code> index with elasticdump so I can re-create them manually before starting a new harvest tomorrow:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_mapping.json --type=mapping
|
||||
$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_data.json --type=data --limit=1000
|
||||
</code></pre><h2 id="2021-05-16">2021-05-16</h2>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ elasticdump --input<span style="color:#f92672">=</span>http://localhost:9200/openrxv-items --output<span style="color:#f92672">=</span>/home/aorth/openrxv-items_mapping.json --type<span style="color:#f92672">=</span>mapping
|
||||
$ elasticdump --input<span style="color:#f92672">=</span>http://localhost:9200/openrxv-items --output<span style="color:#f92672">=</span>/home/aorth/openrxv-items_data.json --type<span style="color:#f92672">=</span>data --limit<span style="color:#f92672">=</span><span style="color:#ae81ff">1000</span>
|
||||
</code></pre></div><h2 id="2021-05-16">2021-05-16</h2>
|
||||
<ul>
|
||||
<li>I deleted and re-created the Elasticsearch indexes on AReS:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -XDELETE 'http://localhost:9200/openrxv-items-final'
|
||||
$ curl -XDELETE 'http://localhost:9200/openrxv-items-temp'
|
||||
$ curl -XPUT 'http://localhost:9200/openrxv-items-final'
|
||||
$ curl -XPUT 'http://localhost:9200/openrxv-items-temp'
|
||||
$ curl -s -X POST 'http://localhost:9200/_aliases' -H 'Content-Type: application/json' -d'{"actions" : [{"add" : { "index" : "openrxv-items-final", "alias" : "openrxv-items"}}]}'
|
||||
</code></pre><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ curl -XDELETE <span style="color:#e6db74">'http://localhost:9200/openrxv-items-final'</span>
|
||||
$ curl -XDELETE <span style="color:#e6db74">'http://localhost:9200/openrxv-items-temp'</span>
|
||||
$ curl -XPUT <span style="color:#e6db74">'http://localhost:9200/openrxv-items-final'</span>
|
||||
$ curl -XPUT <span style="color:#e6db74">'http://localhost:9200/openrxv-items-temp'</span>
|
||||
$ curl -s -X POST <span style="color:#e6db74">'http://localhost:9200/_aliases'</span> -H <span style="color:#e6db74">'Content-Type: application/json'</span> -d<span style="color:#e6db74">'{"actions" : [{"add" : { "index" : "openrxv-items-final", "alias" : "openrxv-items"}}]}'</span>
|
||||
</code></pre></div><ul>
|
||||
<li>Then I re-imported the backup that I created with elasticdump yesterday:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ elasticdump --input=/home/aorth/openrxv-items_mapping.json --output=http://localhost:9200/openrxv-items-final --type=mapping
|
||||
$ elasticdump --input=/home/aorth/openrxv-items_data.json --output=http://localhost:9200/openrxv-items-final --type=data --limit=1000
|
||||
</code></pre><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ elasticdump --input<span style="color:#f92672">=</span>/home/aorth/openrxv-items_mapping.json --output<span style="color:#f92672">=</span>http://localhost:9200/openrxv-items-final --type<span style="color:#f92672">=</span>mapping
|
||||
$ elasticdump --input<span style="color:#f92672">=</span>/home/aorth/openrxv-items_data.json --output<span style="color:#f92672">=</span>http://localhost:9200/openrxv-items-final --type<span style="color:#f92672">=</span>data --limit<span style="color:#f92672">=</span><span style="color:#ae81ff">1000</span>
|
||||
</code></pre></div><ul>
|
||||
<li>Then I started a new harvest on AReS</li>
|
||||
</ul>
|
||||
<h2 id="2021-05-17">2021-05-17</h2>
|
||||
<ul>
|
||||
<li>The AReS harvest finished and the Elasticsearch indexes seem OK so I shouldn’t have to fix them next time…</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
|
||||
yellow open openrxv-items-temp o3ijJLcyTtGMOPeWpAJiVA 1 1 0 0 283b 283b
|
||||
yellow open openrxv-items-final TrJ1Ict3QZ-vFkj-4VcAzw 1 1 104317 0 259.4mb 259.4mb
|
||||
$ curl -s 'http://localhost:9200/_alias/' | python -m json.tool
|
||||
"openrxv-items-temp": {
|
||||
"aliases": {}
|
||||
$ curl -s <span style="color:#e6db74">'http://localhost:9200/_alias/'</span> | python -m json.tool
|
||||
"openrxv-items-temp": {
|
||||
"aliases": {}
|
||||
},
|
||||
"openrxv-items-final": {
|
||||
"aliases": {
|
||||
"openrxv-items": {}
|
||||
"openrxv-items-final": {
|
||||
"aliases": {
|
||||
"openrxv-items": {}
|
||||
}
|
||||
},
|
||||
...
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>Abenet said she and some others can’t log into CGSpace
|
||||
<ul>
|
||||
<li>I tried to check the CGSpace LDAP account and it does seem to be not working:</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ ldapsearch -x -H ldaps://AZCGNEROOT2.CGIARAD.ORG:636/ -b "dc=cgiarad,dc=org" -D "cgspace-ldap@cgiarad.org" -W "(sAMAccountName=aorth)"
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ldapsearch -x -H ldaps://AZCGNEROOT2.CGIARAD.ORG:636/ -b <span style="color:#e6db74">"dc=cgiarad,dc=org"</span> -D <span style="color:#e6db74">"cgspace-ldap@cgiarad.org"</span> -W <span style="color:#e6db74">"(sAMAccountName=aorth)"</span>
|
||||
Enter LDAP Password:
|
||||
ldap_bind: Invalid credentials (49)
|
||||
additional info: 80090308: LdapErr: DSID-0C090453, comment: AcceptSecurityContext error, data 532, v3839
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>I sent a message to Biruk so he can check the LDAP account</li>
|
||||
<li>IWMI confirmed that they do indeed want to move all their subjects to AGROVOC, so I made the changes in the XMLUI and config (<a href="https://github.com/ilri/DSpace/pull/467">#467</a>)
|
||||
<ul>
|
||||
@ -446,14 +446,14 @@ ldap_bind: Invalid credentials (49)
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ xmllint --xpath '//value-pairs[@value-pairs-name="ccafsprojectpii"]/pair/stored-value/node()' dspace/config/input-forms.xml
|
||||
</code></pre><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ xmllint --xpath <span style="color:#e6db74">'//value-pairs[@value-pairs-name="ccafsprojectpii"]/pair/stored-value/node()'</span> dspace/config/input-forms.xml
|
||||
</code></pre></div><ul>
|
||||
<li>I formatted the input file with tidy, especially because one of the new project tags has an ampersand character… grrr:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ tidy -xml -utf8 -m -iq -w 0 dspace/config/input-forms.xml
|
||||
line 3658 column 26 - Warning: unescaped & or unknown entity "&WA_EU-IFAD"
|
||||
line 3659 column 23 - Warning: unescaped & or unknown entity "&WA_EU-IFAD"
|
||||
</code></pre><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ tidy -xml -utf8 -m -iq -w <span style="color:#ae81ff">0</span> dspace/config/input-forms.xml
|
||||
line 3658 column 26 - Warning: unescaped & or unknown entity "&WA_EU-IFAD"
|
||||
line 3659 column 23 - Warning: unescaped & or unknown entity "&WA_EU-IFAD"
|
||||
</code></pre></div><ul>
|
||||
<li>After testing whether this escaped value worked during submission, I created and merged a pull request to <code>6_x-prod</code> (<a href="https://github.com/ilri/DSpace/pull/468">#468</a>)</li>
|
||||
</ul>
|
||||
<h2 id="2021-05-18">2021-05-18</h2>
|
||||
@ -461,34 +461,34 @@ line 3659 column 23 - Warning: unescaped & or unknown entity "&WA_E
|
||||
<li>Paola from the Alliance emailed me some new ORCID identifiers to add to CGSpace</li>
|
||||
<li>I saved the new ones to a text file, combined them with the others, extracted the ORCID iDs themselves, and updated the names using <code>resolve-orcids.py</code>:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-identifier.xml /tmp/new | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq > /tmp/2021-05-18-combined.txt
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-identifier.xml /tmp/new | grep -oE <span style="color:#e6db74">'[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}'</span> | sort | uniq > /tmp/2021-05-18-combined.txt
|
||||
$ ./ilri/resolve-orcids.py -i /tmp/2021-05-18-combined.txt -o /tmp/2021-05-18-combined-names.txt
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>I sorted the names and added the XML formatting in vim, then ran it through tidy:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ tidy -xml -utf8 -m -iq -w 0 dspace/config/controlled-vocabularies/cg-creator-identifier.xml
|
||||
</code></pre><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ tidy -xml -utf8 -m -iq -w <span style="color:#ae81ff">0</span> dspace/config/controlled-vocabularies/cg-creator-identifier.xml
|
||||
</code></pre></div><ul>
|
||||
<li>Tag fifty-five items from the Alliance’s new authors with ORCID iDs using <code>add-orcid-identifiers-csv.py</code>:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ cat 2021-05-18-add-orcids.csv
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ cat 2021-05-18-add-orcids.csv
|
||||
dc.contributor.author,cg.creator.identifier
|
||||
"Urioste Daza, Sergio",Sergio Alejandro Urioste Daza: 0000-0002-3208-032X
|
||||
"Urioste, Sergio",Sergio Alejandro Urioste Daza: 0000-0002-3208-032X
|
||||
"Villegas, Daniel",Daniel M. Villegas: 0000-0001-6801-3332
|
||||
"Villegas, Daniel M.",Daniel M. Villegas: 0000-0001-6801-3332
|
||||
"Giles, James",James Giles: 0000-0003-1899-9206
|
||||
"Simbare, Alice",Alice Simbare: 0000-0003-2389-0969
|
||||
"Simbare, Alice",Alice Simbare: 0000-0003-2389-0969
|
||||
"Simbare, A.",Alice Simbare: 0000-0003-2389-0969
|
||||
"Dita Rodriguez, Miguel",Miguel Angel Dita Rodriguez: 0000-0002-0496-4267
|
||||
"Templer, Noel",Noel Templer: 0000-0002-3201-9043
|
||||
"Jalonen, R.",Riina Jalonen: 0000-0003-1669-9138
|
||||
"Jalonen, Riina",Riina Jalonen: 0000-0003-1669-9138
|
||||
"Izquierdo, Paulo",Paulo Izquierdo: 0000-0002-2153-0655
|
||||
"Reyes, Byron",Byron Reyes: 0000-0003-2672-9636
|
||||
"Reyes, Byron A.",Byron Reyes: 0000-0003-2672-9636
|
||||
$ ./ilri/add-orcid-identifiers-csv.py -i /tmp/2021-05-18-add-orcids.csv -db dspace -u dspace -p 'fuuu' -d
|
||||
</code></pre><ul>
|
||||
"Urioste Daza, Sergio",Sergio Alejandro Urioste Daza: 0000-0002-3208-032X
|
||||
"Urioste, Sergio",Sergio Alejandro Urioste Daza: 0000-0002-3208-032X
|
||||
"Villegas, Daniel",Daniel M. Villegas: 0000-0001-6801-3332
|
||||
"Villegas, Daniel M.",Daniel M. Villegas: 0000-0001-6801-3332
|
||||
"Giles, James",James Giles: 0000-0003-1899-9206
|
||||
"Simbare, Alice",Alice Simbare: 0000-0003-2389-0969
|
||||
"Simbare, Alice",Alice Simbare: 0000-0003-2389-0969
|
||||
"Simbare, A.",Alice Simbare: 0000-0003-2389-0969
|
||||
"Dita Rodriguez, Miguel",Miguel Angel Dita Rodriguez: 0000-0002-0496-4267
|
||||
"Templer, Noel",Noel Templer: 0000-0002-3201-9043
|
||||
"Jalonen, R.",Riina Jalonen: 0000-0003-1669-9138
|
||||
"Jalonen, Riina",Riina Jalonen: 0000-0003-1669-9138
|
||||
"Izquierdo, Paulo",Paulo Izquierdo: 0000-0002-2153-0655
|
||||
"Reyes, Byron",Byron Reyes: 0000-0003-2672-9636
|
||||
"Reyes, Byron A.",Byron Reyes: 0000-0003-2672-9636
|
||||
$ ./ilri/add-orcid-identifiers-csv.py -i /tmp/2021-05-18-add-orcids.csv -db dspace -u dspace -p <span style="color:#e6db74">'fuuu'</span> -d
|
||||
</code></pre></div><ul>
|
||||
<li>I deployed the latest <code>6_x-prod</code> branch on CGSpace, ran all system updates, and rebooted the server
|
||||
<ul>
|
||||
<li>This included the IWMI changes, so I also migrated the <code>cg.subject.iwmi</code> metadata to <code>dcterms.subject</code> and deleted the subject term</li>
|
||||
@ -504,9 +504,9 @@ $ ./ilri/add-orcid-identifiers-csv.py -i /tmp/2021-05-18-add-orcids.csv -db dspa
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">dspace=# UPDATE metadatavalue SET text_value=LOWER(text_value) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=187 AND text_value ~ '[[:upper:]]';
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">dspace=# UPDATE metadatavalue SET text_value=LOWER(text_value) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=187 AND text_value ~ '[[:upper:]]';
|
||||
UPDATE 47405
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>That’s interesting because we lowercased them all a few months ago, so these must all be new… wow
|
||||
<ul>
|
||||
<li>We have 405,000 total AGROVOC terms, with 20,600 of them being unique</li>
|
||||
@ -518,12 +518,12 @@ UPDATE 47405
|
||||
<ul>
|
||||
<li>Export the top 5,000 AGROVOC terms to validate them:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">localhost/dspace63= > \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id = 187 GROUP BY text_value ORDER BY count DESC LIMIT 5000) to /tmp/2021-05-20-agrovoc.csv WITH CSV HEADER;
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">localhost/dspace63= > \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id = 187 GROUP BY text_value ORDER BY count DESC LIMIT 5000) to /tmp/2021-05-20-agrovoc.csv WITH CSV HEADER;
|
||||
COPY 5000
|
||||
$ csvcut -c 1 /tmp/2021-05-20-agrovoc.csv| sed 1d > /tmp/2021-05-20-agrovoc.txt
|
||||
$ csvcut -c <span style="color:#ae81ff">1</span> /tmp/2021-05-20-agrovoc.csv| sed 1d > /tmp/2021-05-20-agrovoc.txt
|
||||
$ ./ilri/agrovoc-lookup.py -i /tmp/2021-05-20-agrovoc.txt -o /tmp/2021-05-20-agrovoc-results.csv
|
||||
$ csvgrep -c "number of matches" -r '^0$' /tmp/2021-05-20-agrovoc-results.csv > /tmp/2021-05-20-agrovoc-rejected.csv
|
||||
</code></pre><ul>
|
||||
$ csvgrep -c <span style="color:#e6db74">"number of matches"</span> -r <span style="color:#e6db74">'^0$'</span> /tmp/2021-05-20-agrovoc-results.csv > /tmp/2021-05-20-agrovoc-rejected.csv
|
||||
</code></pre></div><ul>
|
||||
<li>Meeting with Medha and Pythagoras about the FAIR Workflow tool
|
||||
<ul>
|
||||
<li>Discussed the need for such a tool, other tools being developed, etc</li>
|
||||
@ -545,54 +545,54 @@ $ csvgrep -c "number of matches" -r '^0$' /tmp/2021-05-20-agrovoc-resu
|
||||
<ul>
|
||||
<li>Add ORCID identifiers for missing ILRI authors and tag 550 others based on a few authors I noticed that were missing them:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ cat 2021-05-24-add-orcids.csv
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ cat 2021-05-24-add-orcids.csv
|
||||
dc.contributor.author,cg.creator.identifier
|
||||
"Patel, Ekta","Ekta Patel: 0000-0001-9400-6988"
|
||||
"Dessie, Tadelle","Tadelle Dessie: 0000-0002-1630-0417"
|
||||
"Tadelle, D.","Tadelle Dessie: 0000-0002-1630-0417"
|
||||
"Dione, Michel M.","Michel Dione: 0000-0001-7812-5776"
|
||||
"Kiara, Henry K.","Henry Kiara: 0000-0001-9578-1636"
|
||||
"Naessens, Jan","Jan Naessens: 0000-0002-7075-9915"
|
||||
"Steinaa, Lucilla","Lucilla Steinaa: 0000-0003-3691-3971"
|
||||
"Wieland, Barbara","Barbara Wieland: 0000-0003-4020-9186"
|
||||
"Grace, Delia","Delia Grace: 0000-0002-0195-9489"
|
||||
"Rao, Idupulapati M.","Idupulapati M. Rao: 0000-0002-8381-9358"
|
||||
"Cardoso Arango, Juan Andrés","Juan Andrés Cardoso Arango: 0000-0002-0252-4655"
|
||||
$ ./ilri/add-orcid-identifiers-csv.py -i 2021-05-24-add-orcids.csv -db dspace -u dspace -p 'fuuu'
|
||||
</code></pre><ul>
|
||||
"Patel, Ekta","Ekta Patel: 0000-0001-9400-6988"
|
||||
"Dessie, Tadelle","Tadelle Dessie: 0000-0002-1630-0417"
|
||||
"Tadelle, D.","Tadelle Dessie: 0000-0002-1630-0417"
|
||||
"Dione, Michel M.","Michel Dione: 0000-0001-7812-5776"
|
||||
"Kiara, Henry K.","Henry Kiara: 0000-0001-9578-1636"
|
||||
"Naessens, Jan","Jan Naessens: 0000-0002-7075-9915"
|
||||
"Steinaa, Lucilla","Lucilla Steinaa: 0000-0003-3691-3971"
|
||||
"Wieland, Barbara","Barbara Wieland: 0000-0003-4020-9186"
|
||||
"Grace, Delia","Delia Grace: 0000-0002-0195-9489"
|
||||
"Rao, Idupulapati M.","Idupulapati M. Rao: 0000-0002-8381-9358"
|
||||
"Cardoso Arango, Juan Andrés","Juan Andrés Cardoso Arango: 0000-0002-0252-4655"
|
||||
$ ./ilri/add-orcid-identifiers-csv.py -i 2021-05-24-add-orcids.csv -db dspace -u dspace -p <span style="color:#e6db74">'fuuu'</span>
|
||||
</code></pre></div><ul>
|
||||
<li>A few days ago I took a backup of the Elasticsearch indexes on AReS using elasticdump:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_data.json --type=data --limit=1000
|
||||
$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_mapping.json --type=mapping
|
||||
</code></pre><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ elasticdump --input<span style="color:#f92672">=</span>http://localhost:9200/openrxv-items --output<span style="color:#f92672">=</span>/home/aorth/openrxv-items_data.json --type<span style="color:#f92672">=</span>data --limit<span style="color:#f92672">=</span><span style="color:#ae81ff">1000</span>
|
||||
$ elasticdump --input<span style="color:#f92672">=</span>http://localhost:9200/openrxv-items --output<span style="color:#f92672">=</span>/home/aorth/openrxv-items_mapping.json --type<span style="color:#f92672">=</span>mapping
|
||||
</code></pre></div><ul>
|
||||
<li>The indexes look OK so I started a harvesting on AReS</li>
|
||||
</ul>
|
||||
<h2 id="2021-05-25">2021-05-25</h2>
|
||||
<ul>
|
||||
<li>The AReS harvest got messed up somehow, as I see the number of items in the indexes are the same as before the harvesting:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
|
||||
yellow open openrxv-items-temp o3ijJLcyTtGMOPeWpAJiVA 1 1 104373 106455 491.5mb 491.5mb
|
||||
yellow open openrxv-items-final soEzAnp3TDClIGZbmVyEIw 1 1 953 0 2.3mb 2.3mb
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>Update all docker images on the AReS server (linode20):</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ docker images | grep -v ^REPO | sed 's/ \+/:/g' | cut -d: -f1,2 | xargs -L1 docker pull
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ docker images | grep -v ^REPO | sed <span style="color:#e6db74">'s/ \+/:/g'</span> | cut -d: -f1,2 | xargs -L1 docker pull
|
||||
$ docker-compose -f docker/docker-compose.yml down
|
||||
$ docker-compose -f docker/docker-compose.yml build
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>Then run all system updates on the server and reboot it</li>
|
||||
<li>Oh crap, I deleted everything on AReS and restored the backup and the total items are now 104317… so it was actually correct before!</li>
|
||||
<li>For reference, this is how I re-created everything:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">curl -XDELETE 'http://localhost:9200/openrxv-items-final'
|
||||
curl -XDELETE 'http://localhost:9200/openrxv-items-temp'
|
||||
curl -XPUT 'http://localhost:9200/openrxv-items-final'
|
||||
curl -XPUT 'http://localhost:9200/openrxv-items-temp'
|
||||
curl -s -X POST 'http://localhost:9200/_aliases' -H 'Content-Type: application/json' -d'{"actions" : [{"add" : { "index" : "openrxv-items-final", "alias" : "openrxv-items"}}]}'
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">curl -XDELETE 'http://localhost:9200/openrxv-items-final'
|
||||
curl -XDELETE 'http://localhost:9200/openrxv-items-temp'
|
||||
curl -XPUT 'http://localhost:9200/openrxv-items-final'
|
||||
curl -XPUT 'http://localhost:9200/openrxv-items-temp'
|
||||
curl -s -X POST 'http://localhost:9200/_aliases' -H 'Content-Type: application/json' -d'{"actions" : [{"add" : { "index" : "openrxv-items-final", "alias" : "openrxv-items"}}]}'
|
||||
elasticdump --input=/home/aorth/openrxv-items_mapping.json --output=http://localhost:9200/openrxv-items-final --type=mapping
|
||||
elasticdump --input=/home/aorth/openrxv-items_data.json --output=http://localhost:9200/openrxv-items-final --type=data --limit=1000
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>I will just start a new harvest… sigh</li>
|
||||
</ul>
|
||||
<h2 id="2021-05-26">2021-05-26</h2>
|
||||
@ -638,18 +638,18 @@ May 26, 02:57 UTC
|
||||
</code></pre><ul>
|
||||
<li>And indeed the email seems to be broken:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ dspace test-email
|
||||
|
||||
About to send test email:
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ dspace test-email
|
||||
<span style="color:#960050;background-color:#1e0010">
|
||||
</span><span style="color:#960050;background-color:#1e0010"></span>About to send test email:
|
||||
- To: fuuuuuu
|
||||
- Subject: DSpace test email
|
||||
- Server: smtp.office365.com
|
||||
|
||||
Error sending email:
|
||||
<span style="color:#960050;background-color:#1e0010">
|
||||
</span><span style="color:#960050;background-color:#1e0010"></span>Error sending email:
|
||||
- Error: javax.mail.SendFailedException: Send failure (javax.mail.MessagingException: Could not convert socket to TLS (javax.net.ssl.SSLHandshakeException: No appropriate protocol (protocol is disabled or cipher suites are inappropriate)))
|
||||
|
||||
Please see the DSpace documentation for assistance.
|
||||
</code></pre><ul>
|
||||
<span style="color:#960050;background-color:#1e0010">
|
||||
</span><span style="color:#960050;background-color:#1e0010"></span>Please see the DSpace documentation for assistance.
|
||||
</code></pre></div><ul>
|
||||
<li>I saw a recent thread on the dspace-tech mailing list about this that makes me wonder if Microsoft changed something on Office 365
|
||||
<ul>
|
||||
<li>I added <code>mail.smtp.ssl.protocols=TLSv1.2</code> to the <code>mail.extraproperties</code> in dspace.cfg and the test email sent successfully</li>
|
||||
|
Reference in New Issue
Block a user