mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2022-03-04
This commit is contained in:
@ -36,7 +36,7 @@ I simply started it and AReS was running again:
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.92.2" />
|
||||
<meta name="generator" content="Hugo 0.93.1" />
|
||||
|
||||
|
||||
|
||||
@ -132,8 +132,8 @@ I simply started it and AReS was running again:
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ docker-compose -f docker/docker-compose.yml start angular_nginx
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ docker-compose -f docker/docker-compose.yml start angular_nginx
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>Margarita from CCAFS emailed me to say that workflow alerts haven’t been working lately
|
||||
<ul>
|
||||
<li>I guess this is related to the SMTP issues last week</li>
|
||||
@ -162,14 +162,14 @@ I simply started it and AReS was running again:
|
||||
<ul>
|
||||
<li>The Elasticsearch indexes are messed up so I dumped and re-created them correctly:</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">curl -XDELETE 'http://localhost:9200/openrxv-items-final'
|
||||
curl -XDELETE 'http://localhost:9200/openrxv-items-temp'
|
||||
curl -XPUT 'http://localhost:9200/openrxv-items-final'
|
||||
curl -XPUT 'http://localhost:9200/openrxv-items-temp'
|
||||
curl -s -X POST 'http://localhost:9200/_aliases' -H 'Content-Type: application/json' -d'{"actions" : [{"add" : { "index" : "openrxv-items-final", "alias" : "openrxv-items"}}]}'
|
||||
elasticdump --input=/home/aorth/openrxv-items_mapping.json --output=http://localhost:9200/openrxv-items-final --type=mapping
|
||||
elasticdump --input=/home/aorth/openrxv-items_data.json --output=http://localhost:9200/openrxv-items-final --type=data --limit=1000
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>curl -XDELETE 'http://localhost:9200/openrxv-items-final'
|
||||
</span></span><span style="display:flex;"><span>curl -XDELETE 'http://localhost:9200/openrxv-items-temp'
|
||||
</span></span><span style="display:flex;"><span>curl -XPUT 'http://localhost:9200/openrxv-items-final'
|
||||
</span></span><span style="display:flex;"><span>curl -XPUT 'http://localhost:9200/openrxv-items-temp'
|
||||
</span></span><span style="display:flex;"><span>curl -s -X POST 'http://localhost:9200/_aliases' -H 'Content-Type: application/json' -d'{"actions" : [{"add" : { "index" : "openrxv-items-final", "alias" : "openrxv-items"}}]}'
|
||||
</span></span><span style="display:flex;"><span>elasticdump --input=/home/aorth/openrxv-items_mapping.json --output=http://localhost:9200/openrxv-items-final --type=mapping
|
||||
</span></span><span style="display:flex;"><span>elasticdump --input=/home/aorth/openrxv-items_data.json --output=http://localhost:9200/openrxv-items-final --type=data --limit=1000
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>Then I started a harvesting on AReS</li>
|
||||
</ul>
|
||||
<h2 id="2021-06-07">2021-06-07</h2>
|
||||
@ -208,8 +208,8 @@ elasticdump --input=/home/aorth/openrxv-items_data.json --output=http://localhos
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ podman unshare chown 1000:1000 /home/aorth/.local/share/containers/storage/volumes/docker_esData_7/_data
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ podman unshare chown 1000:1000 /home/aorth/.local/share/containers/storage/volumes/docker_esData_7/_data
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>The new OpenRXV harvesting method by Moayad uses pages of 10 items instead of 100 and it’s much faster
|
||||
<ul>
|
||||
<li>I harvested 90,000+ items from DSpace Test in ~3 hours</li>
|
||||
@ -231,23 +231,23 @@ elasticdump --input=/home/aorth/openrxv-items_data.json --output=http://localhos
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ grep -oE <span style="color:#e6db74">'"handle":"[[:digit:]]+/[[:digit:]]+"'</span> openrxv-items_data.json | awk -F: <span style="color:#e6db74">'{print $2}'</span> | wc -l
|
||||
90459
|
||||
$ grep -oE <span style="color:#e6db74">'"handle":"[[:digit:]]+/[[:digit:]]+"'</span> openrxv-items_data.json | awk -F: <span style="color:#e6db74">'{print $2}'</span> | sort | uniq | wc -l
|
||||
90380
|
||||
$ grep -oE <span style="color:#e6db74">'"handle":"[[:digit:]]+/[[:digit:]]+"'</span> openrxv-items_data.json | awk -F: <span style="color:#e6db74">'{print $2}'</span> | sort | uniq -c | sort -h
|
||||
...
|
||||
2 "10568/99409"
|
||||
2 "10568/99410"
|
||||
2 "10568/99411"
|
||||
2 "10568/99516"
|
||||
3 "10568/102093"
|
||||
3 "10568/103524"
|
||||
3 "10568/106664"
|
||||
3 "10568/106940"
|
||||
3 "10568/107195"
|
||||
3 "10568/96546"
|
||||
</code></pre></div><h2 id="2021-06-20">2021-06-20</h2>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ grep -oE <span style="color:#e6db74">'"handle":"[[:digit:]]+/[[:digit:]]+"'</span> openrxv-items_data.json | awk -F: <span style="color:#e6db74">'{print $2}'</span> | wc -l
|
||||
</span></span><span style="display:flex;"><span>90459
|
||||
</span></span><span style="display:flex;"><span>$ grep -oE <span style="color:#e6db74">'"handle":"[[:digit:]]+/[[:digit:]]+"'</span> openrxv-items_data.json | awk -F: <span style="color:#e6db74">'{print $2}'</span> | sort | uniq | wc -l
|
||||
</span></span><span style="display:flex;"><span>90380
|
||||
</span></span><span style="display:flex;"><span>$ grep -oE <span style="color:#e6db74">'"handle":"[[:digit:]]+/[[:digit:]]+"'</span> openrxv-items_data.json | awk -F: <span style="color:#e6db74">'{print $2}'</span> | sort | uniq -c | sort -h
|
||||
</span></span><span style="display:flex;"><span>...
|
||||
</span></span><span style="display:flex;"><span> 2 "10568/99409"
|
||||
</span></span><span style="display:flex;"><span> 2 "10568/99410"
|
||||
</span></span><span style="display:flex;"><span> 2 "10568/99411"
|
||||
</span></span><span style="display:flex;"><span> 2 "10568/99516"
|
||||
</span></span><span style="display:flex;"><span> 3 "10568/102093"
|
||||
</span></span><span style="display:flex;"><span> 3 "10568/103524"
|
||||
</span></span><span style="display:flex;"><span> 3 "10568/106664"
|
||||
</span></span><span style="display:flex;"><span> 3 "10568/106940"
|
||||
</span></span><span style="display:flex;"><span> 3 "10568/107195"
|
||||
</span></span><span style="display:flex;"><span> 3 "10568/96546"
|
||||
</span></span></code></pre></div><h2 id="2021-06-20">2021-06-20</h2>
|
||||
<ul>
|
||||
<li>Udana asked me to update their IWMI subjects from <code>farmer managed irrigation systems</code> to <code>farmer-led irrigation</code>
|
||||
<ul>
|
||||
@ -255,12 +255,12 @@ $ grep -oE <span style="color:#e6db74">'"handle":"[[:digit:]]+/[
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ dspace metadata-export -i 10568/16814 -f /tmp/2021-06-20-IWMI.csv
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ dspace metadata-export -i 10568/16814 -f /tmp/2021-06-20-IWMI.csv
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>Then I used <code>csvcut</code> to extract just the columns I needed and do the replacement into a new CSV:</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ csvcut -c <span style="color:#e6db74">'id,dcterms.subject[],dcterms.subject[en_US]'</span> /tmp/2021-06-20-IWMI.csv | sed <span style="color:#e6db74">'s/farmer managed irrigation systems/farmer-led irrigation/'</span> > /tmp/2021-06-20-IWMI-new-subjects.csv
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ csvcut -c <span style="color:#e6db74">'id,dcterms.subject[],dcterms.subject[en_US]'</span> /tmp/2021-06-20-IWMI.csv | sed <span style="color:#e6db74">'s/farmer managed irrigation systems/farmer-led irrigation/'</span> > /tmp/2021-06-20-IWMI-new-subjects.csv
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>Then I uploaded the resulting CSV to CGSpace, updating 161 items</li>
|
||||
<li>Start a harvest on AReS</li>
|
||||
<li>I found <a href="https://jira.lyrasis.org/browse/DS-1977">a bug</a> and <a href="https://github.com/DSpace/DSpace/pull/2584">a patch</a> for the private items showing up in the DSpace sitemap bug
|
||||
@ -278,19 +278,19 @@ $ grep -oE <span style="color:#e6db74">'"handle":"[[:digit:]]+/[
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ grep -E <span style="color:#e6db74">'"repo":"CGSpace"'</span> openrxv-items_data.json | grep -oE <span style="color:#e6db74">'"handle":"[[:digit:]]+/[[:alnum:]]+"'</span> | wc -l
|
||||
90937
|
||||
$ grep -E <span style="color:#e6db74">'"repo":"CGSpace"'</span> openrxv-items_data.json | grep -oE <span style="color:#e6db74">'"handle":"[[:digit:]]+/[[:alnum:]]+"'</span> | sort -u | wc -l
|
||||
85709
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ grep -E <span style="color:#e6db74">'"repo":"CGSpace"'</span> openrxv-items_data.json | grep -oE <span style="color:#e6db74">'"handle":"[[:digit:]]+/[[:alnum:]]+"'</span> | wc -l
|
||||
</span></span><span style="display:flex;"><span>90937
|
||||
</span></span><span style="display:flex;"><span>$ grep -E <span style="color:#e6db74">'"repo":"CGSpace"'</span> openrxv-items_data.json | grep -oE <span style="color:#e6db74">'"handle":"[[:digit:]]+/[[:alnum:]]+"'</span> | sort -u | wc -l
|
||||
</span></span><span style="display:flex;"><span>85709
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>So those could be duplicates from the way we harvest pages, but they could also be from mappings…
|
||||
<ul>
|
||||
<li>Manually inspecting the duplicates where handles appear more than once:</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ grep -E <span style="color:#e6db74">'"repo":"CGSpace"'</span> openrxv-items_data.json | grep -oE <span style="color:#e6db74">'"handle":"[[:digit:]]+/[[:alnum:]]+"'</span> | sort | uniq -c | sort -h
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ grep -E <span style="color:#e6db74">'"repo":"CGSpace"'</span> openrxv-items_data.json | grep -oE <span style="color:#e6db74">'"handle":"[[:digit:]]+/[[:alnum:]]+"'</span> | sort | uniq -c | sort -h
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>Unfortunately I found no pattern:
|
||||
<ul>
|
||||
<li>Some appear twice in the Elasticsearch index, but appear in only one collection</li>
|
||||
@ -312,23 +312,23 @@ $ grep -E <span style="color:#e6db74">'"repo":"CGSpace"'
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ curl -s -H <span style="color:#e6db74">"Accept: application/json"</span> <span style="color:#e6db74">"https://demo.dspace.org/rest/items?offset=0&limit=5"</span> | jq length
|
||||
5
|
||||
$ curl -s -H <span style="color:#e6db74">"Accept: application/json"</span> <span style="color:#e6db74">"https://demo.dspace.org/rest/items?offset=0&limit=5"</span> | jq <span style="color:#e6db74">'.[].handle'</span>
|
||||
"10673/4"
|
||||
"10673/3"
|
||||
"10673/6"
|
||||
"10673/5"
|
||||
"10673/7"
|
||||
# log into DSpace Demo XMLUI as admin and make one item private <span style="color:#f92672">(</span><span style="color:#66d9ef">for</span> example 10673/6<span style="color:#f92672">)</span>
|
||||
$ curl -s -H <span style="color:#e6db74">"Accept: application/json"</span> <span style="color:#e6db74">"https://demo.dspace.org/rest/items?offset=0&limit=5"</span> | jq length
|
||||
4
|
||||
$ curl -s -H <span style="color:#e6db74">"Accept: application/json"</span> <span style="color:#e6db74">"https://demo.dspace.org/rest/items?offset=0&limit=5"</span> | jq <span style="color:#e6db74">'.[].handle'</span>
|
||||
"10673/4"
|
||||
"10673/3"
|
||||
"10673/5"
|
||||
"10673/7"
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -s -H <span style="color:#e6db74">"Accept: application/json"</span> <span style="color:#e6db74">"https://demo.dspace.org/rest/items?offset=0&limit=5"</span> | jq length
|
||||
</span></span><span style="display:flex;"><span>5
|
||||
</span></span><span style="display:flex;"><span>$ curl -s -H <span style="color:#e6db74">"Accept: application/json"</span> <span style="color:#e6db74">"https://demo.dspace.org/rest/items?offset=0&limit=5"</span> | jq <span style="color:#e6db74">'.[].handle'</span>
|
||||
</span></span><span style="display:flex;"><span>"10673/4"
|
||||
</span></span><span style="display:flex;"><span>"10673/3"
|
||||
</span></span><span style="display:flex;"><span>"10673/6"
|
||||
</span></span><span style="display:flex;"><span>"10673/5"
|
||||
</span></span><span style="display:flex;"><span>"10673/7"
|
||||
</span></span><span style="display:flex;"><span># log into DSpace Demo XMLUI as admin and make one item private <span style="color:#f92672">(</span><span style="color:#66d9ef">for</span> example 10673/6<span style="color:#f92672">)</span>
|
||||
</span></span><span style="display:flex;"><span>$ curl -s -H <span style="color:#e6db74">"Accept: application/json"</span> <span style="color:#e6db74">"https://demo.dspace.org/rest/items?offset=0&limit=5"</span> | jq length
|
||||
</span></span><span style="display:flex;"><span>4
|
||||
</span></span><span style="display:flex;"><span>$ curl -s -H <span style="color:#e6db74">"Accept: application/json"</span> <span style="color:#e6db74">"https://demo.dspace.org/rest/items?offset=0&limit=5"</span> | jq <span style="color:#e6db74">'.[].handle'</span>
|
||||
</span></span><span style="display:flex;"><span>"10673/4"
|
||||
</span></span><span style="display:flex;"><span>"10673/3"
|
||||
</span></span><span style="display:flex;"><span>"10673/5"
|
||||
</span></span><span style="display:flex;"><span>"10673/7"
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>I tested the pull request on DSpace Test and it works, so I left a note on GitHub and Jira</li>
|
||||
<li>Last week I noticed that the Gender Platform website is using “cgspace.cgiar.org” links for CGSpace, instead of handles
|
||||
<ul>
|
||||
@ -355,11 +355,11 @@ $ curl -s -H <span style="color:#e6db74">"Accept: application/json"</spa
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ grep -oE <span style="color:#e6db74">'"handle":"[[:digit:]]+/[[:digit:]]+"'</span> openrxv-items_data-local-ds-4065.json | wc -l
|
||||
90327
|
||||
$ grep -oE <span style="color:#e6db74">'"handle":"[[:digit:]]+/[[:digit:]]+"'</span> openrxv-items_data-local-ds-4065.json | sort -u | wc -l
|
||||
90317
|
||||
</code></pre></div><h2 id="2021-06-22">2021-06-22</h2>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ grep -oE <span style="color:#e6db74">'"handle":"[[:digit:]]+/[[:digit:]]+"'</span> openrxv-items_data-local-ds-4065.json | wc -l
|
||||
</span></span><span style="display:flex;"><span>90327
|
||||
</span></span><span style="display:flex;"><span>$ grep -oE <span style="color:#e6db74">'"handle":"[[:digit:]]+/[[:digit:]]+"'</span> openrxv-items_data-local-ds-4065.json | sort -u | wc -l
|
||||
</span></span><span style="display:flex;"><span>90317
|
||||
</span></span></code></pre></div><h2 id="2021-06-22">2021-06-22</h2>
|
||||
<ul>
|
||||
<li>Make a <a href="https://github.com/atmire/COUNTER-Robots/pull/43">pull request</a> to the COUNTER-Robots project to add two new user agents: crusty and newspaper
|
||||
<ul>
|
||||
@ -368,13 +368,13 @@ $ grep -oE <span style="color:#e6db74">'"handle":"[[:digit:]]+/[
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ./ilri/check-spider-hits.sh -f dspace/config/spiders/agents/ilri -p
|
||||
Purging 1339 hits from RI\/1\.0 in statistics
|
||||
Purging 447 hits from crusty in statistics
|
||||
Purging 3736 hits from newspaper in statistics
|
||||
<span style="color:#960050;background-color:#1e0010">
|
||||
</span><span style="color:#960050;background-color:#1e0010"></span>Total number of bot hits purged: 5522
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./ilri/check-spider-hits.sh -f dspace/config/spiders/agents/ilri -p
|
||||
</span></span><span style="display:flex;"><span>Purging 1339 hits from RI\/1\.0 in statistics
|
||||
</span></span><span style="display:flex;"><span>Purging 447 hits from crusty in statistics
|
||||
</span></span><span style="display:flex;"><span>Purging 3736 hits from newspaper in statistics
|
||||
</span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
|
||||
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>Total number of bot hits purged: 5522
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>Surprised to see RI/1.0 in there because it’s been in the override file for a while</li>
|
||||
<li>Looking at the 2021 statistics in Solr I see a few more suspicious user agents:
|
||||
<ul>
|
||||
@ -397,11 +397,11 @@ Purging 3736 hits from newspaper in statistics
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># journalctl --since<span style="color:#f92672">=</span>today -u tomcat7 | grep -c <span style="color:#e6db74">'Connection has been abandoned'</span>
|
||||
978
|
||||
$ psql -c <span style="color:#e6db74">'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;'</span> | wc -l
|
||||
10100
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># journalctl --since<span style="color:#f92672">=</span>today -u tomcat7 | grep -c <span style="color:#e6db74">'Connection has been abandoned'</span>
|
||||
</span></span><span style="display:flex;"><span>978
|
||||
</span></span><span style="display:flex;"><span>$ psql -c <span style="color:#e6db74">'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;'</span> | wc -l
|
||||
</span></span><span style="display:flex;"><span>10100
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>I sent a message to Atmire, hoping that the database logging stuff they put in place last time this happened will be of help now</li>
|
||||
<li>In the mean time, I decided to upgrade Tomcat from 7.0.107 to 7.0.109, and the PostgreSQL JDBC driver from 42.2.20 to 42.2.22 (first on DSpace Test)</li>
|
||||
<li>I also applied the following patches from the 6.4 milestone to our <code>6_x-prod</code> branch:
|
||||
@ -412,17 +412,17 @@ $ psql -c <span style="color:#e6db74">'SELECT * FROM pg_locks pl LEFT JOIN p
|
||||
</li>
|
||||
<li>After upgrading and restarting Tomcat the database connections and locks were back down to normal levels:</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ psql -c <span style="color:#e6db74">'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;'</span> | wc -l
|
||||
63
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ psql -c <span style="color:#e6db74">'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;'</span> | wc -l
|
||||
</span></span><span style="display:flex;"><span>63
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>Looking in the DSpace log, the first “pool empty” message I saw this morning was at 4AM:</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">2021-06-23 04:01:14,596 ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ [http-bio-127.0.0.1-8443-exec-4323] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:250; busy:250; idle:0; lastwait:5000].
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>2021-06-23 04:01:14,596 ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ [http-bio-127.0.0.1-8443-exec-4323] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:250; busy:250; idle:0; lastwait:5000].
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>Oh, and I notice 8,000 hits from a Flipboard bot using this user-agent:</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:49.0) Gecko/20100101 Firefox/49.0 (FlipboardProxy/1.2; +http://flipboard.com/browserproxy)
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:49.0) Gecko/20100101 Firefox/49.0 (FlipboardProxy/1.2; +http://flipboard.com/browserproxy)
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>We can purge them, as this is not user traffic: <a href="https://about.flipboard.com/browserproxy/">https://about.flipboard.com/browserproxy/</a>
|
||||
<ul>
|
||||
<li>I will add it to our local user agent pattern file and eventually submit a pull request to COUNTER-Robots</li>
|
||||
@ -448,17 +448,17 @@ $ psql -c <span style="color:#e6db74">'SELECT * FROM pg_locks pl LEFT JOIN p
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ grep -oE <span style="color:#e6db74">'"handle":"([[:digit:]]|\.)+/[[:digit:]]+"'</span> cgspace-openrxv-items-temp-backup.json | wc -l
|
||||
104797
|
||||
$ grep -oE <span style="color:#e6db74">'"handle":"([[:digit:]]|\.)+/[[:digit:]]+"'</span> cgspace-openrxv-items-temp-backup.json | sort | uniq | wc -l
|
||||
99186
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ grep -oE <span style="color:#e6db74">'"handle":"([[:digit:]]|\.)+/[[:digit:]]+"'</span> cgspace-openrxv-items-temp-backup.json | wc -l
|
||||
</span></span><span style="display:flex;"><span>104797
|
||||
</span></span><span style="display:flex;"><span>$ grep -oE <span style="color:#e6db74">'"handle":"([[:digit:]]|\.)+/[[:digit:]]+"'</span> cgspace-openrxv-items-temp-backup.json | sort | uniq | wc -l
|
||||
</span></span><span style="display:flex;"><span>99186
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>This number is probably unique for that particular harvest, but I don’t think it represents the true number of items…</li>
|
||||
<li>The harvest of DSpace Test I did on my local test instance yesterday has about 91,000 items:</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ grep -E <span style="color:#e6db74">'"repo":"DSpace Test"'</span> 2021-06-23-openrxv-items-final-local.json | grep -oE <span style="color:#e6db74">'"handle":"([[:digit:]]|\.)+/[[:digit:]]+"'</span> | sort | uniq | wc -l
|
||||
90990
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ grep -E <span style="color:#e6db74">'"repo":"DSpace Test"'</span> 2021-06-23-openrxv-items-final-local.json | grep -oE <span style="color:#e6db74">'"handle":"([[:digit:]]|\.)+/[[:digit:]]+"'</span> | sort | uniq | wc -l
|
||||
</span></span><span style="display:flex;"><span>90990
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>So the harvest on the live site is missing items, then why didn’t the add missing items plugin find them?!
|
||||
<ul>
|
||||
<li>I notice that we are missing the <code>type</code> in the metadata structure config for each repository on the production site, and we are using <code>type</code> for item type in the actual schema… so maybe there is a conflict there</li>
|
||||
@ -469,8 +469,8 @@ $ grep -oE <span style="color:#e6db74">'"handle":"([[:digit:]]|\
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">172.104.229.92 - - [24/Jun/2021:07:52:58 +0200] "GET /sitemap HTTP/1.1" 503 190 "-" "OpenRXV harvesting bot; https://github.com/ilri/OpenRXV"
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>172.104.229.92 - - [24/Jun/2021:07:52:58 +0200] "GET /sitemap HTTP/1.1" 503 190 "-" "OpenRXV harvesting bot; https://github.com/ilri/OpenRXV"
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>I fixed nginx so it always allows people to get the sitemap and then re-ran the plugins… now it’s checking 180,000+ handles to see if they are collections or items…
|
||||
<ul>
|
||||
<li>I see it fetched the sitemap three times, we need to make sure it’s only doing it once for each repository</li>
|
||||
@ -478,9 +478,9 @@ $ grep -oE <span style="color:#e6db74">'"handle":"([[:digit:]]|\
|
||||
</li>
|
||||
<li>According to the api logs we will be adding 5,697 items:</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ docker logs api 2>/dev/null | grep dspace_add_missing_items | sort | uniq | wc -l
|
||||
5697
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ docker logs api 2>/dev/null | grep dspace_add_missing_items | sort | uniq | wc -l
|
||||
</span></span><span style="display:flex;"><span>5697
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>Spent a few hours with Moayad troubleshooting and improving OpenRXV
|
||||
<ul>
|
||||
<li>We found a bug in the harvesting code that can occur when you are harvesting DSpace 5 and DSpace 6 instances, as DSpace 5 uses numeric (long) IDs, and DSpace 6 uses UUIDs</li>
|
||||
@ -496,35 +496,35 @@ $ grep -oE <span style="color:#e6db74">'"handle":"([[:digit:]]|\
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ redis-cli
|
||||
127.0.0.1:6379> SCAN 0 COUNT 5
|
||||
1) "49152"
|
||||
2) 1) "bull:plugins:476595"
|
||||
2) "bull:plugins:367382"
|
||||
3) "bull:plugins:369228"
|
||||
4) "bull:plugins:438986"
|
||||
5) "bull:plugins:366215"
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ redis-cli
|
||||
</span></span><span style="display:flex;"><span>127.0.0.1:6379> SCAN 0 COUNT 5
|
||||
</span></span><span style="display:flex;"><span>1) "49152"
|
||||
</span></span><span style="display:flex;"><span>2) 1) "bull:plugins:476595"
|
||||
</span></span><span style="display:flex;"><span> 2) "bull:plugins:367382"
|
||||
</span></span><span style="display:flex;"><span> 3) "bull:plugins:369228"
|
||||
</span></span><span style="display:flex;"><span> 4) "bull:plugins:438986"
|
||||
</span></span><span style="display:flex;"><span> 5) "bull:plugins:366215"
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>We can apparently get the names of the jobs in each hash using <code>hget</code>:</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">127.0.0.1:6379> TYPE bull:plugins:401827
|
||||
hash
|
||||
127.0.0.1:6379> HGET bull:plugins:401827 name
|
||||
"dspace_add_missing_items"
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>127.0.0.1:6379> TYPE bull:plugins:401827
|
||||
</span></span><span style="display:flex;"><span>hash
|
||||
</span></span><span style="display:flex;"><span>127.0.0.1:6379> HGET bull:plugins:401827 name
|
||||
</span></span><span style="display:flex;"><span>"dspace_add_missing_items"
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>I whipped up a one liner to get the keys for all plugin jobs, convert to redis <code>HGET</code> commands to extract the value of the name field, and then sort them by their counts:</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ redis-cli KEYS <span style="color:#e6db74">"bull:plugins:*"</span> <span style="color:#ae81ff">\
|
||||
</span><span style="color:#ae81ff"></span> | sed -e 's/^bull/HGET bull/' -e 's/\([[:digit:]]\)$/\1 name/' \
|
||||
| ncat -w 3 localhost 6379 \
|
||||
| grep -v -E '^\$' | sort | uniq -c | sort -h
|
||||
3 dspace_health_check
|
||||
4 -ERR wrong number of arguments for 'hget' command
|
||||
12 mel_downloads_and_views
|
||||
129 dspace_altmetrics
|
||||
932 dspace_downloads_and_views
|
||||
186428 dspace_add_missing_items
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ redis-cli KEYS <span style="color:#e6db74">"bull:plugins:*"</span> <span style="color:#ae81ff">\
|
||||
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | sed -e 's/^bull/HGET bull/' -e 's/\([[:digit:]]\)$/\1 name/' \
|
||||
</span></span><span style="display:flex;"><span> | ncat -w 3 localhost 6379 \
|
||||
</span></span><span style="display:flex;"><span> | grep -v -E '^\$' | sort | uniq -c | sort -h
|
||||
</span></span><span style="display:flex;"><span> 3 dspace_health_check
|
||||
</span></span><span style="display:flex;"><span> 4 -ERR wrong number of arguments for 'hget' command
|
||||
</span></span><span style="display:flex;"><span> 12 mel_downloads_and_views
|
||||
</span></span><span style="display:flex;"><span> 129 dspace_altmetrics
|
||||
</span></span><span style="display:flex;"><span> 932 dspace_downloads_and_views
|
||||
</span></span><span style="display:flex;"><span> 186428 dspace_add_missing_items
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>Note that this uses <code>ncat</code> to send commands directly to redis all at once instead of one at a time (<code>netcat</code> didn’t work here, as it doesn’t know when our input is finished and never quits)
|
||||
<ul>
|
||||
<li>I thought of using <code>redis-cli --pipe</code> but then you have to construct the commands in the redis protocol format with the number of args and length of each command</li>
|
||||
@ -544,49 +544,49 @@ hash
|
||||
<ul>
|
||||
<li>Looking at the DSpace log I see there was definitely a higher number of sessions that day, perhaps twice the normal:</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ <span style="color:#66d9ef">for</span> file in dspace.log.2021-06-<span style="color:#f92672">[</span>12<span style="color:#f92672">]</span>*; <span style="color:#66d9ef">do</span> echo <span style="color:#e6db74">"</span>$file<span style="color:#e6db74">"</span>; grep -oE <span style="color:#e6db74">'session_id=[A-Z0-9]{32}'</span> <span style="color:#e6db74">"</span>$file<span style="color:#e6db74">"</span> | sort | uniq | wc -l; <span style="color:#66d9ef">done</span>
|
||||
dspace.log.2021-06-10
|
||||
19072
|
||||
dspace.log.2021-06-11
|
||||
19224
|
||||
dspace.log.2021-06-12
|
||||
19215
|
||||
dspace.log.2021-06-13
|
||||
16721
|
||||
dspace.log.2021-06-14
|
||||
17880
|
||||
dspace.log.2021-06-15
|
||||
12103
|
||||
dspace.log.2021-06-16
|
||||
4651
|
||||
dspace.log.2021-06-17
|
||||
22785
|
||||
dspace.log.2021-06-18
|
||||
21406
|
||||
dspace.log.2021-06-19
|
||||
25967
|
||||
dspace.log.2021-06-20
|
||||
20850
|
||||
dspace.log.2021-06-21
|
||||
6388
|
||||
dspace.log.2021-06-22
|
||||
5945
|
||||
dspace.log.2021-06-23
|
||||
46371
|
||||
dspace.log.2021-06-24
|
||||
9024
|
||||
dspace.log.2021-06-25
|
||||
12521
|
||||
dspace.log.2021-06-26
|
||||
16163
|
||||
dspace.log.2021-06-27
|
||||
5886
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ <span style="color:#66d9ef">for</span> file in dspace.log.2021-06-<span style="color:#f92672">[</span>12<span style="color:#f92672">]</span>*; <span style="color:#66d9ef">do</span> echo <span style="color:#e6db74">"</span>$file<span style="color:#e6db74">"</span>; grep -oE <span style="color:#e6db74">'session_id=[A-Z0-9]{32}'</span> <span style="color:#e6db74">"</span>$file<span style="color:#e6db74">"</span> | sort | uniq | wc -l; <span style="color:#66d9ef">done</span>
|
||||
</span></span><span style="display:flex;"><span>dspace.log.2021-06-10
|
||||
</span></span><span style="display:flex;"><span>19072
|
||||
</span></span><span style="display:flex;"><span>dspace.log.2021-06-11
|
||||
</span></span><span style="display:flex;"><span>19224
|
||||
</span></span><span style="display:flex;"><span>dspace.log.2021-06-12
|
||||
</span></span><span style="display:flex;"><span>19215
|
||||
</span></span><span style="display:flex;"><span>dspace.log.2021-06-13
|
||||
</span></span><span style="display:flex;"><span>16721
|
||||
</span></span><span style="display:flex;"><span>dspace.log.2021-06-14
|
||||
</span></span><span style="display:flex;"><span>17880
|
||||
</span></span><span style="display:flex;"><span>dspace.log.2021-06-15
|
||||
</span></span><span style="display:flex;"><span>12103
|
||||
</span></span><span style="display:flex;"><span>dspace.log.2021-06-16
|
||||
</span></span><span style="display:flex;"><span>4651
|
||||
</span></span><span style="display:flex;"><span>dspace.log.2021-06-17
|
||||
</span></span><span style="display:flex;"><span>22785
|
||||
</span></span><span style="display:flex;"><span>dspace.log.2021-06-18
|
||||
</span></span><span style="display:flex;"><span>21406
|
||||
</span></span><span style="display:flex;"><span>dspace.log.2021-06-19
|
||||
</span></span><span style="display:flex;"><span>25967
|
||||
</span></span><span style="display:flex;"><span>dspace.log.2021-06-20
|
||||
</span></span><span style="display:flex;"><span>20850
|
||||
</span></span><span style="display:flex;"><span>dspace.log.2021-06-21
|
||||
</span></span><span style="display:flex;"><span>6388
|
||||
</span></span><span style="display:flex;"><span>dspace.log.2021-06-22
|
||||
</span></span><span style="display:flex;"><span>5945
|
||||
</span></span><span style="display:flex;"><span>dspace.log.2021-06-23
|
||||
</span></span><span style="display:flex;"><span>46371
|
||||
</span></span><span style="display:flex;"><span>dspace.log.2021-06-24
|
||||
</span></span><span style="display:flex;"><span>9024
|
||||
</span></span><span style="display:flex;"><span>dspace.log.2021-06-25
|
||||
</span></span><span style="display:flex;"><span>12521
|
||||
</span></span><span style="display:flex;"><span>dspace.log.2021-06-26
|
||||
</span></span><span style="display:flex;"><span>16163
|
||||
</span></span><span style="display:flex;"><span>dspace.log.2021-06-27
|
||||
</span></span><span style="display:flex;"><span>5886
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>I see 15,000 unique IPs in the XMLUI logs alone on that day:</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># zcat /var/log/nginx/access.log.5.gz /var/log/nginx/access.log.4.gz | grep <span style="color:#e6db74">'23/Jun/2021'</span> | awk <span style="color:#e6db74">'{print $1}'</span> | sort | uniq | wc -l
|
||||
15835
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># zcat /var/log/nginx/access.log.5.gz /var/log/nginx/access.log.4.gz | grep <span style="color:#e6db74">'23/Jun/2021'</span> | awk <span style="color:#e6db74">'{print $1}'</span> | sort | uniq | wc -l
|
||||
</span></span><span style="display:flex;"><span>15835
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>Annoyingly I found 37,000 more hits from Bing using <code>dns:*msnbot* AND dns:*.msn.com.</code> as a Solr filter
|
||||
<ul>
|
||||
<li>WTF, they are using a normal user agent: <code>Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko</code></li>
|
||||
@ -628,8 +628,8 @@ dspace.log.2021-06-27
|
||||
</li>
|
||||
<li>The DSpace log shows:</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">2021-06-30 08:19:15,874 ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ Cannot get a connection, pool error Timeout waiting for idle object
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>2021-06-30 08:19:15,874 ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ Cannot get a connection, pool error Timeout waiting for idle object
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>The first one of these I see is from last night at 2021-06-29 at 10:47 PM</li>
|
||||
<li>I restarted Tomcat 7 and CGSpace came back up…</li>
|
||||
<li>I didn’t see that Atmire had responded last week (on 2021-06-23) about the issues we had
|
||||
@ -641,14 +641,14 @@ dspace.log.2021-06-27
|
||||
</li>
|
||||
<li>Export a list of all CGSpace’s AGROVOC keywords with counts for Enrico and Elizabeth Arnaud to discuss with AGROVOC:</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">localhost/dspace63= > \COPY (SELECT DISTINCT text_value AS "dcterms.subject", count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id = 187 GROUP BY "dcterms.subject" ORDER BY count DESC) to /tmp/2021-06-30-agrovoc.csv WITH CSV HEADER;
|
||||
COPY 20780
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>localhost/dspace63= > \COPY (SELECT DISTINCT text_value AS "dcterms.subject", count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id = 187 GROUP BY "dcterms.subject" ORDER BY count DESC) to /tmp/2021-06-30-agrovoc.csv WITH CSV HEADER;
|
||||
</span></span><span style="display:flex;"><span>COPY 20780
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>Actually Enrico wanted NON AGROVOC, so I extracted all the center and CRP subjects (ignoring system office and themes):</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">localhost/dspace63= > \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242) GROUP BY subject ORDER BY count DESC) to /tmp/2021-06-30-non-agrovoc.csv WITH CSV HEADER;
|
||||
COPY 1710
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>localhost/dspace63= > \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242) GROUP BY subject ORDER BY count DESC) to /tmp/2021-06-30-non-agrovoc.csv WITH CSV HEADER;
|
||||
</span></span><span style="display:flex;"><span>COPY 1710
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>Fix an issue in the Ansible infrastructure playbooks for the DSpace role
|
||||
<ul>
|
||||
<li>It was causing the template module to fail when setting up the npm environment</li>
|
||||
@ -657,13 +657,13 @@ COPY 1710
|
||||
</li>
|
||||
<li>I saw a strange message in the Tomcat 7 journal on DSpace Test (linode26):</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">Jun 30 16:00:09 linode26 tomcat7[30294]: WARNING: Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [111,733] milliseconds.
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>Jun 30 16:00:09 linode26 tomcat7[30294]: WARNING: Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [111,733] milliseconds.
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>What’s even crazier is that it is twice that on CGSpace (linode18)!</li>
|
||||
<li>Apparently OpenJDK defaults to using <code>/dev/random</code> (see <code>/etc/java-8-openjdk/security/java.security</code>):</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">securerandom.source=file:/dev/urandom
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>securerandom.source=file:/dev/urandom
|
||||
</span></span></code></pre></div><ul>
|
||||
<li><code>/dev/random</code> blocks and can take a long time to get entropy, and urandom on modern Linux is a cryptographically secure pseudorandom number generator
|
||||
<ul>
|
||||
<li>Now Tomcat starts much faster and no warning is printed so I’m going to add this to our Ansible infrastructure playbooks</li>
|
||||
|
Reference in New Issue
Block a user