Add notes for 2022-03-04

This commit is contained in:
2022-03-04 15:30:06 +03:00
parent 7453499827
commit 27acbac859
115 changed files with 6550 additions and 6444 deletions

View File

@ -36,7 +36,7 @@ I simply started it and AReS was running again:
"/>
<meta name="generator" content="Hugo 0.92.2" />
<meta name="generator" content="Hugo 0.93.1" />
@ -132,8 +132,8 @@ I simply started it and AReS was running again:
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ docker-compose -f docker/docker-compose.yml start angular_nginx
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ docker-compose -f docker/docker-compose.yml start angular_nginx
</span></span></code></pre></div><ul>
<li>Margarita from CCAFS emailed me to say that workflow alerts haven&rsquo;t been working lately
<ul>
<li>I guess this is related to the SMTP issues last week</li>
@ -162,14 +162,14 @@ I simply started it and AReS was running again:
<ul>
<li>The Elasticsearch indexes are messed up so I dumped and re-created them correctly:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">curl -XDELETE &#39;http://localhost:9200/openrxv-items-final&#39;
curl -XDELETE &#39;http://localhost:9200/openrxv-items-temp&#39;
curl -XPUT &#39;http://localhost:9200/openrxv-items-final&#39;
curl -XPUT &#39;http://localhost:9200/openrxv-items-temp&#39;
curl -s -X POST &#39;http://localhost:9200/_aliases&#39; -H &#39;Content-Type: application/json&#39; -d&#39;{&#34;actions&#34; : [{&#34;add&#34; : { &#34;index&#34; : &#34;openrxv-items-final&#34;, &#34;alias&#34; : &#34;openrxv-items&#34;}}]}&#39;
elasticdump --input=/home/aorth/openrxv-items_mapping.json --output=http://localhost:9200/openrxv-items-final --type=mapping
elasticdump --input=/home/aorth/openrxv-items_data.json --output=http://localhost:9200/openrxv-items-final --type=data --limit=1000
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>curl -XDELETE &#39;http://localhost:9200/openrxv-items-final&#39;
</span></span><span style="display:flex;"><span>curl -XDELETE &#39;http://localhost:9200/openrxv-items-temp&#39;
</span></span><span style="display:flex;"><span>curl -XPUT &#39;http://localhost:9200/openrxv-items-final&#39;
</span></span><span style="display:flex;"><span>curl -XPUT &#39;http://localhost:9200/openrxv-items-temp&#39;
</span></span><span style="display:flex;"><span>curl -s -X POST &#39;http://localhost:9200/_aliases&#39; -H &#39;Content-Type: application/json&#39; -d&#39;{&#34;actions&#34; : [{&#34;add&#34; : { &#34;index&#34; : &#34;openrxv-items-final&#34;, &#34;alias&#34; : &#34;openrxv-items&#34;}}]}&#39;
</span></span><span style="display:flex;"><span>elasticdump --input=/home/aorth/openrxv-items_mapping.json --output=http://localhost:9200/openrxv-items-final --type=mapping
</span></span><span style="display:flex;"><span>elasticdump --input=/home/aorth/openrxv-items_data.json --output=http://localhost:9200/openrxv-items-final --type=data --limit=1000
</span></span></code></pre></div><ul>
<li>Then I started a harvesting on AReS</li>
</ul>
<h2 id="2021-06-07">2021-06-07</h2>
@ -208,8 +208,8 @@ elasticdump --input=/home/aorth/openrxv-items_data.json --output=http://localhos
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ podman unshare chown 1000:1000 /home/aorth/.local/share/containers/storage/volumes/docker_esData_7/_data
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ podman unshare chown 1000:1000 /home/aorth/.local/share/containers/storage/volumes/docker_esData_7/_data
</span></span></code></pre></div><ul>
<li>The new OpenRXV harvesting method by Moayad uses pages of 10 items instead of 100 and it&rsquo;s much faster
<ul>
<li>I harvested 90,000+ items from DSpace Test in ~3 hours</li>
@ -231,23 +231,23 @@ elasticdump --input=/home/aorth/openrxv-items_data.json --output=http://localhos
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ grep -oE <span style="color:#e6db74">&#39;&#34;handle&#34;:&#34;[[:digit:]]+/[[:digit:]]+&#34;&#39;</span> openrxv-items_data.json | awk -F: <span style="color:#e6db74">&#39;{print $2}&#39;</span> | wc -l
90459
$ grep -oE <span style="color:#e6db74">&#39;&#34;handle&#34;:&#34;[[:digit:]]+/[[:digit:]]+&#34;&#39;</span> openrxv-items_data.json | awk -F: <span style="color:#e6db74">&#39;{print $2}&#39;</span> | sort | uniq | wc -l
90380
$ grep -oE <span style="color:#e6db74">&#39;&#34;handle&#34;:&#34;[[:digit:]]+/[[:digit:]]+&#34;&#39;</span> openrxv-items_data.json | awk -F: <span style="color:#e6db74">&#39;{print $2}&#39;</span> | sort | uniq -c | sort -h
...
2 &#34;10568/99409&#34;
2 &#34;10568/99410&#34;
2 &#34;10568/99411&#34;
2 &#34;10568/99516&#34;
3 &#34;10568/102093&#34;
3 &#34;10568/103524&#34;
3 &#34;10568/106664&#34;
3 &#34;10568/106940&#34;
3 &#34;10568/107195&#34;
3 &#34;10568/96546&#34;
</code></pre></div><h2 id="2021-06-20">2021-06-20</h2>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ grep -oE <span style="color:#e6db74">&#39;&#34;handle&#34;:&#34;[[:digit:]]+/[[:digit:]]+&#34;&#39;</span> openrxv-items_data.json | awk -F: <span style="color:#e6db74">&#39;{print $2}&#39;</span> | wc -l
</span></span><span style="display:flex;"><span>90459
</span></span><span style="display:flex;"><span>$ grep -oE <span style="color:#e6db74">&#39;&#34;handle&#34;:&#34;[[:digit:]]+/[[:digit:]]+&#34;&#39;</span> openrxv-items_data.json | awk -F: <span style="color:#e6db74">&#39;{print $2}&#39;</span> | sort | uniq | wc -l
</span></span><span style="display:flex;"><span>90380
</span></span><span style="display:flex;"><span>$ grep -oE <span style="color:#e6db74">&#39;&#34;handle&#34;:&#34;[[:digit:]]+/[[:digit:]]+&#34;&#39;</span> openrxv-items_data.json | awk -F: <span style="color:#e6db74">&#39;{print $2}&#39;</span> | sort | uniq -c | sort -h
</span></span><span style="display:flex;"><span>...
</span></span><span style="display:flex;"><span> 2 &#34;10568/99409&#34;
</span></span><span style="display:flex;"><span> 2 &#34;10568/99410&#34;
</span></span><span style="display:flex;"><span> 2 &#34;10568/99411&#34;
</span></span><span style="display:flex;"><span> 2 &#34;10568/99516&#34;
</span></span><span style="display:flex;"><span> 3 &#34;10568/102093&#34;
</span></span><span style="display:flex;"><span> 3 &#34;10568/103524&#34;
</span></span><span style="display:flex;"><span> 3 &#34;10568/106664&#34;
</span></span><span style="display:flex;"><span> 3 &#34;10568/106940&#34;
</span></span><span style="display:flex;"><span> 3 &#34;10568/107195&#34;
</span></span><span style="display:flex;"><span> 3 &#34;10568/96546&#34;
</span></span></code></pre></div><h2 id="2021-06-20">2021-06-20</h2>
<ul>
<li>Udana asked me to update their IWMI subjects from <code>farmer managed irrigation systems</code> to <code>farmer-led irrigation</code>
<ul>
@ -255,12 +255,12 @@ $ grep -oE <span style="color:#e6db74">&#39;&#34;handle&#34;:&#34;[[:digit:]]+/[
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ dspace metadata-export -i 10568/16814 -f /tmp/2021-06-20-IWMI.csv
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ dspace metadata-export -i 10568/16814 -f /tmp/2021-06-20-IWMI.csv
</span></span></code></pre></div><ul>
<li>Then I used <code>csvcut</code> to extract just the columns I needed and do the replacement into a new CSV:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ csvcut -c <span style="color:#e6db74">&#39;id,dcterms.subject[],dcterms.subject[en_US]&#39;</span> /tmp/2021-06-20-IWMI.csv | sed <span style="color:#e6db74">&#39;s/farmer managed irrigation systems/farmer-led irrigation/&#39;</span> &gt; /tmp/2021-06-20-IWMI-new-subjects.csv
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ csvcut -c <span style="color:#e6db74">&#39;id,dcterms.subject[],dcterms.subject[en_US]&#39;</span> /tmp/2021-06-20-IWMI.csv | sed <span style="color:#e6db74">&#39;s/farmer managed irrigation systems/farmer-led irrigation/&#39;</span> &gt; /tmp/2021-06-20-IWMI-new-subjects.csv
</span></span></code></pre></div><ul>
<li>Then I uploaded the resulting CSV to CGSpace, updating 161 items</li>
<li>Start a harvest on AReS</li>
<li>I found <a href="https://jira.lyrasis.org/browse/DS-1977">a bug</a> and <a href="https://github.com/DSpace/DSpace/pull/2584">a patch</a> for the private items showing up in the DSpace sitemap bug
@ -278,19 +278,19 @@ $ grep -oE <span style="color:#e6db74">&#39;&#34;handle&#34;:&#34;[[:digit:]]+/[
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ grep -E <span style="color:#e6db74">&#39;&#34;repo&#34;:&#34;CGSpace&#34;&#39;</span> openrxv-items_data.json | grep -oE <span style="color:#e6db74">&#39;&#34;handle&#34;:&#34;[[:digit:]]+/[[:alnum:]]+&#34;&#39;</span> | wc -l
90937
$ grep -E <span style="color:#e6db74">&#39;&#34;repo&#34;:&#34;CGSpace&#34;&#39;</span> openrxv-items_data.json | grep -oE <span style="color:#e6db74">&#39;&#34;handle&#34;:&#34;[[:digit:]]+/[[:alnum:]]+&#34;&#39;</span> | sort -u | wc -l
85709
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ grep -E <span style="color:#e6db74">&#39;&#34;repo&#34;:&#34;CGSpace&#34;&#39;</span> openrxv-items_data.json | grep -oE <span style="color:#e6db74">&#39;&#34;handle&#34;:&#34;[[:digit:]]+/[[:alnum:]]+&#34;&#39;</span> | wc -l
</span></span><span style="display:flex;"><span>90937
</span></span><span style="display:flex;"><span>$ grep -E <span style="color:#e6db74">&#39;&#34;repo&#34;:&#34;CGSpace&#34;&#39;</span> openrxv-items_data.json | grep -oE <span style="color:#e6db74">&#39;&#34;handle&#34;:&#34;[[:digit:]]+/[[:alnum:]]+&#34;&#39;</span> | sort -u | wc -l
</span></span><span style="display:flex;"><span>85709
</span></span></code></pre></div><ul>
<li>So those could be duplicates from the way we harvest pages, but they could also be from mappings&hellip;
<ul>
<li>Manually inspecting the duplicates where handles appear more than once:</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ grep -E <span style="color:#e6db74">&#39;&#34;repo&#34;:&#34;CGSpace&#34;&#39;</span> openrxv-items_data.json | grep -oE <span style="color:#e6db74">&#39;&#34;handle&#34;:&#34;[[:digit:]]+/[[:alnum:]]+&#34;&#39;</span> | sort | uniq -c | sort -h
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ grep -E <span style="color:#e6db74">&#39;&#34;repo&#34;:&#34;CGSpace&#34;&#39;</span> openrxv-items_data.json | grep -oE <span style="color:#e6db74">&#39;&#34;handle&#34;:&#34;[[:digit:]]+/[[:alnum:]]+&#34;&#39;</span> | sort | uniq -c | sort -h
</span></span></code></pre></div><ul>
<li>Unfortunately I found no pattern:
<ul>
<li>Some appear twice in the Elasticsearch index, but appear in only one collection</li>
@ -312,23 +312,23 @@ $ grep -E <span style="color:#e6db74">&#39;&#34;repo&#34;:&#34;CGSpace&#34;&#39;
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ curl -s -H <span style="color:#e6db74">&#34;Accept: application/json&#34;</span> <span style="color:#e6db74">&#34;https://demo.dspace.org/rest/items?offset=0&amp;limit=5&#34;</span> | jq length
5
$ curl -s -H <span style="color:#e6db74">&#34;Accept: application/json&#34;</span> <span style="color:#e6db74">&#34;https://demo.dspace.org/rest/items?offset=0&amp;limit=5&#34;</span> | jq <span style="color:#e6db74">&#39;.[].handle&#39;</span>
&#34;10673/4&#34;
&#34;10673/3&#34;
&#34;10673/6&#34;
&#34;10673/5&#34;
&#34;10673/7&#34;
# log into DSpace Demo XMLUI as admin and make one item private <span style="color:#f92672">(</span><span style="color:#66d9ef">for</span> example 10673/6<span style="color:#f92672">)</span>
$ curl -s -H <span style="color:#e6db74">&#34;Accept: application/json&#34;</span> <span style="color:#e6db74">&#34;https://demo.dspace.org/rest/items?offset=0&amp;limit=5&#34;</span> | jq length
4
$ curl -s -H <span style="color:#e6db74">&#34;Accept: application/json&#34;</span> <span style="color:#e6db74">&#34;https://demo.dspace.org/rest/items?offset=0&amp;limit=5&#34;</span> | jq <span style="color:#e6db74">&#39;.[].handle&#39;</span>
&#34;10673/4&#34;
&#34;10673/3&#34;
&#34;10673/5&#34;
&#34;10673/7&#34;
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -s -H <span style="color:#e6db74">&#34;Accept: application/json&#34;</span> <span style="color:#e6db74">&#34;https://demo.dspace.org/rest/items?offset=0&amp;limit=5&#34;</span> | jq length
</span></span><span style="display:flex;"><span>5
</span></span><span style="display:flex;"><span>$ curl -s -H <span style="color:#e6db74">&#34;Accept: application/json&#34;</span> <span style="color:#e6db74">&#34;https://demo.dspace.org/rest/items?offset=0&amp;limit=5&#34;</span> | jq <span style="color:#e6db74">&#39;.[].handle&#39;</span>
</span></span><span style="display:flex;"><span>&#34;10673/4&#34;
</span></span><span style="display:flex;"><span>&#34;10673/3&#34;
</span></span><span style="display:flex;"><span>&#34;10673/6&#34;
</span></span><span style="display:flex;"><span>&#34;10673/5&#34;
</span></span><span style="display:flex;"><span>&#34;10673/7&#34;
</span></span><span style="display:flex;"><span># log into DSpace Demo XMLUI as admin and make one item private <span style="color:#f92672">(</span><span style="color:#66d9ef">for</span> example 10673/6<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>$ curl -s -H <span style="color:#e6db74">&#34;Accept: application/json&#34;</span> <span style="color:#e6db74">&#34;https://demo.dspace.org/rest/items?offset=0&amp;limit=5&#34;</span> | jq length
</span></span><span style="display:flex;"><span>4
</span></span><span style="display:flex;"><span>$ curl -s -H <span style="color:#e6db74">&#34;Accept: application/json&#34;</span> <span style="color:#e6db74">&#34;https://demo.dspace.org/rest/items?offset=0&amp;limit=5&#34;</span> | jq <span style="color:#e6db74">&#39;.[].handle&#39;</span>
</span></span><span style="display:flex;"><span>&#34;10673/4&#34;
</span></span><span style="display:flex;"><span>&#34;10673/3&#34;
</span></span><span style="display:flex;"><span>&#34;10673/5&#34;
</span></span><span style="display:flex;"><span>&#34;10673/7&#34;
</span></span></code></pre></div><ul>
<li>I tested the pull request on DSpace Test and it works, so I left a note on GitHub and Jira</li>
<li>Last week I noticed that the Gender Platform website is using &ldquo;cgspace.cgiar.org&rdquo; links for CGSpace, instead of handles
<ul>
@ -355,11 +355,11 @@ $ curl -s -H <span style="color:#e6db74">&#34;Accept: application/json&#34;</spa
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ grep -oE <span style="color:#e6db74">&#39;&#34;handle&#34;:&#34;[[:digit:]]+/[[:digit:]]+&#34;&#39;</span> openrxv-items_data-local-ds-4065.json | wc -l
90327
$ grep -oE <span style="color:#e6db74">&#39;&#34;handle&#34;:&#34;[[:digit:]]+/[[:digit:]]+&#34;&#39;</span> openrxv-items_data-local-ds-4065.json | sort -u | wc -l
90317
</code></pre></div><h2 id="2021-06-22">2021-06-22</h2>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ grep -oE <span style="color:#e6db74">&#39;&#34;handle&#34;:&#34;[[:digit:]]+/[[:digit:]]+&#34;&#39;</span> openrxv-items_data-local-ds-4065.json | wc -l
</span></span><span style="display:flex;"><span>90327
</span></span><span style="display:flex;"><span>$ grep -oE <span style="color:#e6db74">&#39;&#34;handle&#34;:&#34;[[:digit:]]+/[[:digit:]]+&#34;&#39;</span> openrxv-items_data-local-ds-4065.json | sort -u | wc -l
</span></span><span style="display:flex;"><span>90317
</span></span></code></pre></div><h2 id="2021-06-22">2021-06-22</h2>
<ul>
<li>Make a <a href="https://github.com/atmire/COUNTER-Robots/pull/43">pull request</a> to the COUNTER-Robots project to add two new user agents: crusty and newspaper
<ul>
@ -368,13 +368,13 @@ $ grep -oE <span style="color:#e6db74">&#39;&#34;handle&#34;:&#34;[[:digit:]]+/[
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ./ilri/check-spider-hits.sh -f dspace/config/spiders/agents/ilri -p
Purging 1339 hits from RI\/1\.0 in statistics
Purging 447 hits from crusty in statistics
Purging 3736 hits from newspaper in statistics
<span style="color:#960050;background-color:#1e0010">
</span><span style="color:#960050;background-color:#1e0010"></span>Total number of bot hits purged: 5522
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./ilri/check-spider-hits.sh -f dspace/config/spiders/agents/ilri -p
</span></span><span style="display:flex;"><span>Purging 1339 hits from RI\/1\.0 in statistics
</span></span><span style="display:flex;"><span>Purging 447 hits from crusty in statistics
</span></span><span style="display:flex;"><span>Purging 3736 hits from newspaper in statistics
</span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>Total number of bot hits purged: 5522
</span></span></code></pre></div><ul>
<li>Surprised to see RI/1.0 in there because it&rsquo;s been in the override file for a while</li>
<li>Looking at the 2021 statistics in Solr I see a few more suspicious user agents:
<ul>
@ -397,11 +397,11 @@ Purging 3736 hits from newspaper in statistics
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># journalctl --since<span style="color:#f92672">=</span>today -u tomcat7 | grep -c <span style="color:#e6db74">&#39;Connection has been abandoned&#39;</span>
978
$ psql -c <span style="color:#e6db74">&#39;SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;&#39;</span> | wc -l
10100
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># journalctl --since<span style="color:#f92672">=</span>today -u tomcat7 | grep -c <span style="color:#e6db74">&#39;Connection has been abandoned&#39;</span>
</span></span><span style="display:flex;"><span>978
</span></span><span style="display:flex;"><span>$ psql -c <span style="color:#e6db74">&#39;SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;&#39;</span> | wc -l
</span></span><span style="display:flex;"><span>10100
</span></span></code></pre></div><ul>
<li>I sent a message to Atmire, hoping that the database logging stuff they put in place last time this happened will be of help now</li>
<li>In the mean time, I decided to upgrade Tomcat from 7.0.107 to 7.0.109, and the PostgreSQL JDBC driver from 42.2.20 to 42.2.22 (first on DSpace Test)</li>
<li>I also applied the following patches from the 6.4 milestone to our <code>6_x-prod</code> branch:
@ -412,17 +412,17 @@ $ psql -c <span style="color:#e6db74">&#39;SELECT * FROM pg_locks pl LEFT JOIN p
</li>
<li>After upgrading and restarting Tomcat the database connections and locks were back down to normal levels:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ psql -c <span style="color:#e6db74">&#39;SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;&#39;</span> | wc -l
63
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ psql -c <span style="color:#e6db74">&#39;SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;&#39;</span> | wc -l
</span></span><span style="display:flex;"><span>63
</span></span></code></pre></div><ul>
<li>Looking in the DSpace log, the first &ldquo;pool empty&rdquo; message I saw this morning was at 4AM:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">2021-06-23 04:01:14,596 ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ [http-bio-127.0.0.1-8443-exec-4323] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:250; busy:250; idle:0; lastwait:5000].
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>2021-06-23 04:01:14,596 ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ [http-bio-127.0.0.1-8443-exec-4323] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:250; busy:250; idle:0; lastwait:5000].
</span></span></code></pre></div><ul>
<li>Oh, and I notice 8,000 hits from a Flipboard bot using this user-agent:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:49.0) Gecko/20100101 Firefox/49.0 (FlipboardProxy/1.2; +http://flipboard.com/browserproxy)
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:49.0) Gecko/20100101 Firefox/49.0 (FlipboardProxy/1.2; +http://flipboard.com/browserproxy)
</span></span></code></pre></div><ul>
<li>We can purge them, as this is not user traffic: <a href="https://about.flipboard.com/browserproxy/">https://about.flipboard.com/browserproxy/</a>
<ul>
<li>I will add it to our local user agent pattern file and eventually submit a pull request to COUNTER-Robots</li>
@ -448,17 +448,17 @@ $ psql -c <span style="color:#e6db74">&#39;SELECT * FROM pg_locks pl LEFT JOIN p
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ grep -oE <span style="color:#e6db74">&#39;&#34;handle&#34;:&#34;([[:digit:]]|\.)+/[[:digit:]]+&#34;&#39;</span> cgspace-openrxv-items-temp-backup.json | wc -l
104797
$ grep -oE <span style="color:#e6db74">&#39;&#34;handle&#34;:&#34;([[:digit:]]|\.)+/[[:digit:]]+&#34;&#39;</span> cgspace-openrxv-items-temp-backup.json | sort | uniq | wc -l
99186
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ grep -oE <span style="color:#e6db74">&#39;&#34;handle&#34;:&#34;([[:digit:]]|\.)+/[[:digit:]]+&#34;&#39;</span> cgspace-openrxv-items-temp-backup.json | wc -l
</span></span><span style="display:flex;"><span>104797
</span></span><span style="display:flex;"><span>$ grep -oE <span style="color:#e6db74">&#39;&#34;handle&#34;:&#34;([[:digit:]]|\.)+/[[:digit:]]+&#34;&#39;</span> cgspace-openrxv-items-temp-backup.json | sort | uniq | wc -l
</span></span><span style="display:flex;"><span>99186
</span></span></code></pre></div><ul>
<li>This number is probably unique for that particular harvest, but I don&rsquo;t think it represents the true number of items&hellip;</li>
<li>The harvest of DSpace Test I did on my local test instance yesterday has about 91,000 items:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ grep -E <span style="color:#e6db74">&#39;&#34;repo&#34;:&#34;DSpace Test&#34;&#39;</span> 2021-06-23-openrxv-items-final-local.json | grep -oE <span style="color:#e6db74">&#39;&#34;handle&#34;:&#34;([[:digit:]]|\.)+/[[:digit:]]+&#34;&#39;</span> | sort | uniq | wc -l
90990
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ grep -E <span style="color:#e6db74">&#39;&#34;repo&#34;:&#34;DSpace Test&#34;&#39;</span> 2021-06-23-openrxv-items-final-local.json | grep -oE <span style="color:#e6db74">&#39;&#34;handle&#34;:&#34;([[:digit:]]|\.)+/[[:digit:]]+&#34;&#39;</span> | sort | uniq | wc -l
</span></span><span style="display:flex;"><span>90990
</span></span></code></pre></div><ul>
<li>So the harvest on the live site is missing items, then why didn&rsquo;t the add missing items plugin find them?!
<ul>
<li>I notice that we are missing the <code>type</code> in the metadata structure config for each repository on the production site, and we are using <code>type</code> for item type in the actual schema&hellip; so maybe there is a conflict there</li>
@ -469,8 +469,8 @@ $ grep -oE <span style="color:#e6db74">&#39;&#34;handle&#34;:&#34;([[:digit:]]|\
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">172.104.229.92 - - [24/Jun/2021:07:52:58 +0200] &#34;GET /sitemap HTTP/1.1&#34; 503 190 &#34;-&#34; &#34;OpenRXV harvesting bot; https://github.com/ilri/OpenRXV&#34;
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>172.104.229.92 - - [24/Jun/2021:07:52:58 +0200] &#34;GET /sitemap HTTP/1.1&#34; 503 190 &#34;-&#34; &#34;OpenRXV harvesting bot; https://github.com/ilri/OpenRXV&#34;
</span></span></code></pre></div><ul>
<li>I fixed nginx so it always allows people to get the sitemap and then re-ran the plugins&hellip; now it&rsquo;s checking 180,000+ handles to see if they are collections or items&hellip;
<ul>
<li>I see it fetched the sitemap three times, we need to make sure it&rsquo;s only doing it once for each repository</li>
@ -478,9 +478,9 @@ $ grep -oE <span style="color:#e6db74">&#39;&#34;handle&#34;:&#34;([[:digit:]]|\
</li>
<li>According to the api logs we will be adding 5,697 items:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ docker logs api 2&gt;/dev/null | grep dspace_add_missing_items | sort | uniq | wc -l
5697
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ docker logs api 2&gt;/dev/null | grep dspace_add_missing_items | sort | uniq | wc -l
</span></span><span style="display:flex;"><span>5697
</span></span></code></pre></div><ul>
<li>Spent a few hours with Moayad troubleshooting and improving OpenRXV
<ul>
<li>We found a bug in the harvesting code that can occur when you are harvesting DSpace 5 and DSpace 6 instances, as DSpace 5 uses numeric (long) IDs, and DSpace 6 uses UUIDs</li>
@ -496,35 +496,35 @@ $ grep -oE <span style="color:#e6db74">&#39;&#34;handle&#34;:&#34;([[:digit:]]|\
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ redis-cli
127.0.0.1:6379&gt; SCAN 0 COUNT 5
1) &#34;49152&#34;
2) 1) &#34;bull:plugins:476595&#34;
2) &#34;bull:plugins:367382&#34;
3) &#34;bull:plugins:369228&#34;
4) &#34;bull:plugins:438986&#34;
5) &#34;bull:plugins:366215&#34;
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ redis-cli
</span></span><span style="display:flex;"><span>127.0.0.1:6379&gt; SCAN 0 COUNT 5
</span></span><span style="display:flex;"><span>1) &#34;49152&#34;
</span></span><span style="display:flex;"><span>2) 1) &#34;bull:plugins:476595&#34;
</span></span><span style="display:flex;"><span> 2) &#34;bull:plugins:367382&#34;
</span></span><span style="display:flex;"><span> 3) &#34;bull:plugins:369228&#34;
</span></span><span style="display:flex;"><span> 4) &#34;bull:plugins:438986&#34;
</span></span><span style="display:flex;"><span> 5) &#34;bull:plugins:366215&#34;
</span></span></code></pre></div><ul>
<li>We can apparently get the names of the jobs in each hash using <code>hget</code>:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">127.0.0.1:6379&gt; TYPE bull:plugins:401827
hash
127.0.0.1:6379&gt; HGET bull:plugins:401827 name
&#34;dspace_add_missing_items&#34;
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>127.0.0.1:6379&gt; TYPE bull:plugins:401827
</span></span><span style="display:flex;"><span>hash
</span></span><span style="display:flex;"><span>127.0.0.1:6379&gt; HGET bull:plugins:401827 name
</span></span><span style="display:flex;"><span>&#34;dspace_add_missing_items&#34;
</span></span></code></pre></div><ul>
<li>I whipped up a one liner to get the keys for all plugin jobs, convert to redis <code>HGET</code> commands to extract the value of the name field, and then sort them by their counts:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ redis-cli KEYS <span style="color:#e6db74">&#34;bull:plugins:*&#34;</span> <span style="color:#ae81ff">\
</span><span style="color:#ae81ff"></span> | sed -e &#39;s/^bull/HGET bull/&#39; -e &#39;s/\([[:digit:]]\)$/\1 name/&#39; \
| ncat -w 3 localhost 6379 \
| grep -v -E &#39;^\$&#39; | sort | uniq -c | sort -h
3 dspace_health_check
4 -ERR wrong number of arguments for &#39;hget&#39; command
12 mel_downloads_and_views
129 dspace_altmetrics
932 dspace_downloads_and_views
186428 dspace_add_missing_items
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ redis-cli KEYS <span style="color:#e6db74">&#34;bull:plugins:*&#34;</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | sed -e &#39;s/^bull/HGET bull/&#39; -e &#39;s/\([[:digit:]]\)$/\1 name/&#39; \
</span></span><span style="display:flex;"><span> | ncat -w 3 localhost 6379 \
</span></span><span style="display:flex;"><span> | grep -v -E &#39;^\$&#39; | sort | uniq -c | sort -h
</span></span><span style="display:flex;"><span> 3 dspace_health_check
</span></span><span style="display:flex;"><span> 4 -ERR wrong number of arguments for &#39;hget&#39; command
</span></span><span style="display:flex;"><span> 12 mel_downloads_and_views
</span></span><span style="display:flex;"><span> 129 dspace_altmetrics
</span></span><span style="display:flex;"><span> 932 dspace_downloads_and_views
</span></span><span style="display:flex;"><span> 186428 dspace_add_missing_items
</span></span></code></pre></div><ul>
<li>Note that this uses <code>ncat</code> to send commands directly to redis all at once instead of one at a time (<code>netcat</code> didn&rsquo;t work here, as it doesn&rsquo;t know when our input is finished and never quits)
<ul>
<li>I thought of using <code>redis-cli --pipe</code> but then you have to construct the commands in the redis protocol format with the number of args and length of each command</li>
@ -544,49 +544,49 @@ hash
<ul>
<li>Looking at the DSpace log I see there was definitely a higher number of sessions that day, perhaps twice the normal:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ <span style="color:#66d9ef">for</span> file in dspace.log.2021-06-<span style="color:#f92672">[</span>12<span style="color:#f92672">]</span>*; <span style="color:#66d9ef">do</span> echo <span style="color:#e6db74">&#34;</span>$file<span style="color:#e6db74">&#34;</span>; grep -oE <span style="color:#e6db74">&#39;session_id=[A-Z0-9]{32}&#39;</span> <span style="color:#e6db74">&#34;</span>$file<span style="color:#e6db74">&#34;</span> | sort | uniq | wc -l; <span style="color:#66d9ef">done</span>
dspace.log.2021-06-10
19072
dspace.log.2021-06-11
19224
dspace.log.2021-06-12
19215
dspace.log.2021-06-13
16721
dspace.log.2021-06-14
17880
dspace.log.2021-06-15
12103
dspace.log.2021-06-16
4651
dspace.log.2021-06-17
22785
dspace.log.2021-06-18
21406
dspace.log.2021-06-19
25967
dspace.log.2021-06-20
20850
dspace.log.2021-06-21
6388
dspace.log.2021-06-22
5945
dspace.log.2021-06-23
46371
dspace.log.2021-06-24
9024
dspace.log.2021-06-25
12521
dspace.log.2021-06-26
16163
dspace.log.2021-06-27
5886
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ <span style="color:#66d9ef">for</span> file in dspace.log.2021-06-<span style="color:#f92672">[</span>12<span style="color:#f92672">]</span>*; <span style="color:#66d9ef">do</span> echo <span style="color:#e6db74">&#34;</span>$file<span style="color:#e6db74">&#34;</span>; grep -oE <span style="color:#e6db74">&#39;session_id=[A-Z0-9]{32}&#39;</span> <span style="color:#e6db74">&#34;</span>$file<span style="color:#e6db74">&#34;</span> | sort | uniq | wc -l; <span style="color:#66d9ef">done</span>
</span></span><span style="display:flex;"><span>dspace.log.2021-06-10
</span></span><span style="display:flex;"><span>19072
</span></span><span style="display:flex;"><span>dspace.log.2021-06-11
</span></span><span style="display:flex;"><span>19224
</span></span><span style="display:flex;"><span>dspace.log.2021-06-12
</span></span><span style="display:flex;"><span>19215
</span></span><span style="display:flex;"><span>dspace.log.2021-06-13
</span></span><span style="display:flex;"><span>16721
</span></span><span style="display:flex;"><span>dspace.log.2021-06-14
</span></span><span style="display:flex;"><span>17880
</span></span><span style="display:flex;"><span>dspace.log.2021-06-15
</span></span><span style="display:flex;"><span>12103
</span></span><span style="display:flex;"><span>dspace.log.2021-06-16
</span></span><span style="display:flex;"><span>4651
</span></span><span style="display:flex;"><span>dspace.log.2021-06-17
</span></span><span style="display:flex;"><span>22785
</span></span><span style="display:flex;"><span>dspace.log.2021-06-18
</span></span><span style="display:flex;"><span>21406
</span></span><span style="display:flex;"><span>dspace.log.2021-06-19
</span></span><span style="display:flex;"><span>25967
</span></span><span style="display:flex;"><span>dspace.log.2021-06-20
</span></span><span style="display:flex;"><span>20850
</span></span><span style="display:flex;"><span>dspace.log.2021-06-21
</span></span><span style="display:flex;"><span>6388
</span></span><span style="display:flex;"><span>dspace.log.2021-06-22
</span></span><span style="display:flex;"><span>5945
</span></span><span style="display:flex;"><span>dspace.log.2021-06-23
</span></span><span style="display:flex;"><span>46371
</span></span><span style="display:flex;"><span>dspace.log.2021-06-24
</span></span><span style="display:flex;"><span>9024
</span></span><span style="display:flex;"><span>dspace.log.2021-06-25
</span></span><span style="display:flex;"><span>12521
</span></span><span style="display:flex;"><span>dspace.log.2021-06-26
</span></span><span style="display:flex;"><span>16163
</span></span><span style="display:flex;"><span>dspace.log.2021-06-27
</span></span><span style="display:flex;"><span>5886
</span></span></code></pre></div><ul>
<li>I see 15,000 unique IPs in the XMLUI logs alone on that day:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># zcat /var/log/nginx/access.log.5.gz /var/log/nginx/access.log.4.gz | grep <span style="color:#e6db74">&#39;23/Jun/2021&#39;</span> | awk <span style="color:#e6db74">&#39;{print $1}&#39;</span> | sort | uniq | wc -l
15835
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># zcat /var/log/nginx/access.log.5.gz /var/log/nginx/access.log.4.gz | grep <span style="color:#e6db74">&#39;23/Jun/2021&#39;</span> | awk <span style="color:#e6db74">&#39;{print $1}&#39;</span> | sort | uniq | wc -l
</span></span><span style="display:flex;"><span>15835
</span></span></code></pre></div><ul>
<li>Annoyingly I found 37,000 more hits from Bing using <code>dns:*msnbot* AND dns:*.msn.com.</code> as a Solr filter
<ul>
<li>WTF, they are using a normal user agent: <code>Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko</code></li>
@ -628,8 +628,8 @@ dspace.log.2021-06-27
</li>
<li>The DSpace log shows:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">2021-06-30 08:19:15,874 ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ Cannot get a connection, pool error Timeout waiting for idle object
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>2021-06-30 08:19:15,874 ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ Cannot get a connection, pool error Timeout waiting for idle object
</span></span></code></pre></div><ul>
<li>The first one of these I see is from last night at 2021-06-29 at 10:47 PM</li>
<li>I restarted Tomcat 7 and CGSpace came back up&hellip;</li>
<li>I didn&rsquo;t see that Atmire had responded last week (on 2021-06-23) about the issues we had
@ -641,14 +641,14 @@ dspace.log.2021-06-27
</li>
<li>Export a list of all CGSpace&rsquo;s AGROVOC keywords with counts for Enrico and Elizabeth Arnaud to discuss with AGROVOC:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">localhost/dspace63= &gt; \COPY (SELECT DISTINCT text_value AS &#34;dcterms.subject&#34;, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id = 187 GROUP BY &#34;dcterms.subject&#34; ORDER BY count DESC) to /tmp/2021-06-30-agrovoc.csv WITH CSV HEADER;
COPY 20780
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>localhost/dspace63= &gt; \COPY (SELECT DISTINCT text_value AS &#34;dcterms.subject&#34;, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id = 187 GROUP BY &#34;dcterms.subject&#34; ORDER BY count DESC) to /tmp/2021-06-30-agrovoc.csv WITH CSV HEADER;
</span></span><span style="display:flex;"><span>COPY 20780
</span></span></code></pre></div><ul>
<li>Actually Enrico wanted NON AGROVOC, so I extracted all the center and CRP subjects (ignoring system office and themes):</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">localhost/dspace63= &gt; \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242) GROUP BY subject ORDER BY count DESC) to /tmp/2021-06-30-non-agrovoc.csv WITH CSV HEADER;
COPY 1710
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>localhost/dspace63= &gt; \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242) GROUP BY subject ORDER BY count DESC) to /tmp/2021-06-30-non-agrovoc.csv WITH CSV HEADER;
</span></span><span style="display:flex;"><span>COPY 1710
</span></span></code></pre></div><ul>
<li>Fix an issue in the Ansible infrastructure playbooks for the DSpace role
<ul>
<li>It was causing the template module to fail when setting up the npm environment</li>
@ -657,13 +657,13 @@ COPY 1710
</li>
<li>I saw a strange message in the Tomcat 7 journal on DSpace Test (linode26):</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">Jun 30 16:00:09 linode26 tomcat7[30294]: WARNING: Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [111,733] milliseconds.
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>Jun 30 16:00:09 linode26 tomcat7[30294]: WARNING: Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [111,733] milliseconds.
</span></span></code></pre></div><ul>
<li>What&rsquo;s even crazier is that it is twice that on CGSpace (linode18)!</li>
<li>Apparently OpenJDK defaults to using <code>/dev/random</code> (see <code>/etc/java-8-openjdk/security/java.security</code>):</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">securerandom.source=file:/dev/urandom
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>securerandom.source=file:/dev/urandom
</span></span></code></pre></div><ul>
<li><code>/dev/random</code> blocks and can take a long time to get entropy, and urandom on modern Linux is a cryptographically secure pseudorandom number generator
<ul>
<li>Now Tomcat starts much faster and no warning is printed so I&rsquo;m going to add this to our Ansible infrastructure playbooks</li>