Add notes for 2021-09-13

This commit is contained in:
2021-09-13 16:21:16 +03:00
parent 8b487a4a77
commit c05c7213c2
109 changed files with 2627 additions and 2530 deletions

View File

@ -26,7 +26,7 @@ Catalina logs at least show some memory errors yesterday:
I tried to test something on DSpace Test but noticed that it’s down since god knows when
Catalina logs at least show some memory errors yesterday:
"/>
<meta name="generator" content="Hugo 0.87.0" />
<meta name="generator" content="Hugo 0.88.1" />
@ -117,7 +117,7 @@ Catalina logs at least show some memory errors yesterday:
<li>I tried to test something on DSpace Test but noticed that it&rsquo;s down since god knows when</li>
<li>Catalina logs at least show some memory errors yesterday:</li>
</ul>
<pre><code>Mar 31, 2018 10:26:42 PM org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor run
<pre tabindex="0"><code>Mar 31, 2018 10:26:42 PM org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor run
SEVERE: Unexpected death of background thread ContainerBackgroundProcessor[StandardEngine[Catalina]]
java.lang.OutOfMemoryError: Java heap space
@ -134,12 +134,12 @@ Exception in thread &quot;ContainerBackgroundProcessor[StandardEngine[Catalina]]
<li>Peter noticed that there were still some old CRP names on CGSpace, because I hadn&rsquo;t forced the Discovery index to be updated after I fixed the others last week</li>
<li>For completeness I re-ran the CRP corrections on CGSpace:</li>
</ul>
<pre><code>$ ./fix-metadata-values.py -i /tmp/Correct-21-CRPs-2018-03-16.csv -f cg.contributor.crp -t correct -m 230 -db dspace -u dspace -p 'fuuu'
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/Correct-21-CRPs-2018-03-16.csv -f cg.contributor.crp -t correct -m 230 -db dspace -u dspace -p 'fuuu'
Fixed 1 occurences of: AGRICULTURE FOR NUTRITION AND HEALTH
</code></pre><ul>
<li>Then started a full Discovery index:</li>
</ul>
<pre><code>$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx1024m'
<pre tabindex="0"><code>$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx1024m'
$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
real 76m13.841s
@ -149,18 +149,18 @@ sys 2m2.498s
<li>Elizabeth from CIAT emailed to ask if I could help her by adding ORCID identifiers to all of Joseph Tohme&rsquo;s items</li>
<li>I used my <a href="https://gist.githubusercontent.com/alanorth/a49d85cd9c5dea89cddbe809813a7050/raw/f67b6e45a9a940732882ae4bb26897a9b245ef31/add-orcid-identifiers-csv.py">add-orcid-identifiers-csv.py</a> script:</li>
</ul>
<pre><code>$ ./add-orcid-identifiers-csv.py -i /tmp/jtohme-2018-04-04.csv -db dspace -u dspace -p 'fuuu'
<pre tabindex="0"><code>$ ./add-orcid-identifiers-csv.py -i /tmp/jtohme-2018-04-04.csv -db dspace -u dspace -p 'fuuu'
</code></pre><ul>
<li>The CSV format of <code>jtohme-2018-04-04.csv</code> was:</li>
</ul>
<pre><code class="language-csv" data-lang="csv">dc.contributor.author,cg.creator.id
<pre tabindex="0"><code class="language-csv" data-lang="csv">dc.contributor.author,cg.creator.id
&quot;Tohme, Joseph M.&quot;,Joe Tohme: 0000-0003-2765-7101
</code></pre><ul>
<li>There was a quoting error in my CRP CSV and the replacements for <code>Forests, Trees and Agroforestry</code> got messed up</li>
<li>So I fixed them and had to re-index again!</li>
<li>I started preparing the git branch for the the DSpace 5.5→5.8 upgrade:</li>
</ul>
<pre><code>$ git checkout -b 5_x-dspace-5.8 5_x-prod
<pre tabindex="0"><code>$ git checkout -b 5_x-dspace-5.8 5_x-prod
$ git reset --hard ilri/5_x-prod
$ git rebase -i dspace-5.8
</code></pre><ul>
@ -181,7 +181,7 @@ $ git rebase -i dspace-5.8
<li>Fix Sisay&rsquo;s sudo access on the new DSpace Test server (linode19)</li>
<li>The reindexing process on DSpace Test took <em>forever</em> yesterday:</li>
</ul>
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
<pre tabindex="0"><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
real 599m32.961s
user 9m3.947s
@ -193,7 +193,7 @@ sys 2m52.585s
<li>Help Peter with the GDPR compliance / reporting form for CGSpace</li>
<li>DSpace Test crashed due to memory issues again:</li>
</ul>
<pre><code># grep -c 'java.lang.OutOfMemoryError: Java heap space' /var/log/tomcat7/catalina.out
<pre tabindex="0"><code># grep -c 'java.lang.OutOfMemoryError: Java heap space' /var/log/tomcat7/catalina.out
16
</code></pre><ul>
<li>I ran all system updates on DSpace Test and rebooted it</li>
@ -205,7 +205,7 @@ sys 2m52.585s
<li>I got a notice that CGSpace CPU usage was very high this morning</li>
<li>Looking at the nginx logs, here are the top users today so far:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;10/Apr/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;10/Apr/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
282 207.46.13.112
286 54.175.208.220
287 207.46.13.113
@ -220,24 +220,24 @@ sys 2m52.585s
<li>45.5.186.2 is of course CIAT</li>
<li>95.108.181.88 appears to be Yandex:</li>
</ul>
<pre><code>95.108.181.88 - - [09/Apr/2018:06:34:16 +0000] &quot;GET /bitstream/handle/10568/21794/ILRI_logo_usage.jpg.jpg HTTP/1.1&quot; 200 2638 &quot;-&quot; &quot;Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)&quot;
<pre tabindex="0"><code>95.108.181.88 - - [09/Apr/2018:06:34:16 +0000] &quot;GET /bitstream/handle/10568/21794/ILRI_logo_usage.jpg.jpg HTTP/1.1&quot; 200 2638 &quot;-&quot; &quot;Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)&quot;
</code></pre><ul>
<li>And for some reason Yandex created a lot of Tomcat sessions today:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=95.108.181.88' dspace.log.2018-04-10
<pre tabindex="0"><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=95.108.181.88' dspace.log.2018-04-10
4363
</code></pre><ul>
<li>70.32.83.92 appears to be some harvester we&rsquo;ve seen before, but on a new IP</li>
<li>They are not creating new Tomcat sessions so there is no problem there</li>
<li>178.154.200.38 also appears to be Yandex, and is also creating many Tomcat sessions:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=178.154.200.38' dspace.log.2018-04-10
<pre tabindex="0"><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=178.154.200.38' dspace.log.2018-04-10
3982
</code></pre><ul>
<li>I&rsquo;m not sure why Yandex creates so many Tomcat sessions, as its user agent should match the Crawler Session Manager valve</li>
<li>Let&rsquo;s try a manual request with and without their user agent:</li>
</ul>
<pre><code>$ http --print Hh https://cgspace.cgiar.org/bitstream/handle/10568/21794/ILRI_logo_usage.jpg.jpg 'User-Agent:Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)'
<pre tabindex="0"><code>$ http --print Hh https://cgspace.cgiar.org/bitstream/handle/10568/21794/ILRI_logo_usage.jpg.jpg 'User-Agent:Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)'
GET /bitstream/handle/10568/21794/ILRI_logo_usage.jpg.jpg HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
@ -294,7 +294,7 @@ X-XSS-Protection: 1; mode=block
<ul>
<li>In other news, it looks like the number of total requests processed by nginx in March went down from the previous months:</li>
</ul>
<pre><code># time zcat --force /var/log/nginx/* | grep -cE &quot;[0-9]{1,2}/Mar/2018&quot;
<pre tabindex="0"><code># time zcat --force /var/log/nginx/* | grep -cE &quot;[0-9]{1,2}/Mar/2018&quot;
2266594
real 0m13.658s
@ -303,25 +303,25 @@ sys 0m1.087s
</code></pre><ul>
<li>In other other news, the database cleanup script has an issue again:</li>
</ul>
<pre><code>$ dspace cleanup -v
<pre tabindex="0"><code>$ dspace cleanup -v
...
Error: ERROR: update or delete on table &quot;bitstream&quot; violates foreign key constraint &quot;bundle_primary_bitstream_id_fkey&quot; on table &quot;bundle&quot;
Detail: Key (bitstream_id)=(151626) is still referenced from table &quot;bundle&quot;.
</code></pre><ul>
<li>The solution is, as always:</li>
</ul>
<pre><code>$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (151626);'
<pre tabindex="0"><code>$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (151626);'
UPDATE 1
</code></pre><ul>
<li>Looking at abandoned connections in Tomcat:</li>
</ul>
<pre><code># zcat /var/log/tomcat7/catalina.out.[1-9].gz | grep -c 'org.apache.tomcat.jdbc.pool.ConnectionPool abandon'
<pre tabindex="0"><code># zcat /var/log/tomcat7/catalina.out.[1-9].gz | grep -c 'org.apache.tomcat.jdbc.pool.ConnectionPool abandon'
2115
</code></pre><ul>
<li>Apparently from these stacktraces we should be able to see which code is not closing connections properly</li>
<li>Here&rsquo;s a pretty good overview of days where we had database issues recently:</li>
</ul>
<pre><code># zcat /var/log/tomcat7/catalina.out.[1-9].gz | grep 'org.apache.tomcat.jdbc.pool.ConnectionPool abandon' | awk '{print $1,$2, $3}' | sort | uniq -c | sort -n
<pre tabindex="0"><code># zcat /var/log/tomcat7/catalina.out.[1-9].gz | grep 'org.apache.tomcat.jdbc.pool.ConnectionPool abandon' | awk '{print $1,$2, $3}' | sort | uniq -c | sort -n
1 Feb 18, 2018
1 Feb 19, 2018
1 Feb 20, 2018
@ -356,7 +356,7 @@ UPDATE 1
<ul>
<li>DSpace Test (linode19) crashed again some time since yesterday:</li>
</ul>
<pre><code># grep -c 'java.lang.OutOfMemoryError: Java heap space' /var/log/tomcat7/catalina.out
<pre tabindex="0"><code># grep -c 'java.lang.OutOfMemoryError: Java heap space' /var/log/tomcat7/catalina.out
168
</code></pre><ul>
<li>I ran all system updates and rebooted the server</li>
@ -374,12 +374,12 @@ UPDATE 1
<ul>
<li>While testing an XMLUI patch for <a href="https://jira.duraspace.org/browse/DS-3883">DS-3883</a> I noticed that there is still some remaining Authority / Solr configuration left that we need to remove:</li>
</ul>
<pre><code>2018-04-14 18:55:25,841 ERROR org.dspace.authority.AuthoritySolrServiceImpl @ Authority solr is not correctly configured, check &quot;solr.authority.server&quot; property in the dspace.cfg
<pre tabindex="0"><code>2018-04-14 18:55:25,841 ERROR org.dspace.authority.AuthoritySolrServiceImpl @ Authority solr is not correctly configured, check &quot;solr.authority.server&quot; property in the dspace.cfg
java.lang.NullPointerException
</code></pre><ul>
<li>I assume we need to remove <code>authority</code> from the consumers in <code>dspace/config/dspace.cfg</code>:</li>
</ul>
<pre><code>event.dispatcher.default.consumers = authority, versioning, discovery, eperson, harvester, statistics,batchedit, versioningmqm
<pre tabindex="0"><code>event.dispatcher.default.consumers = authority, versioning, discovery, eperson, harvester, statistics,batchedit, versioningmqm
</code></pre><ul>
<li>I see the same error on DSpace Test so this is definitely a problem</li>
<li>After disabling the authority consumer I no longer see the error</li>
@ -387,7 +387,7 @@ java.lang.NullPointerException
<li>File a ticket on DSpace&rsquo;s Jira for the <code>target=&quot;_blank&quot;</code> security and performance issue (<a href="https://jira.duraspace.org/browse/DS-3891">DS-3891</a>)</li>
<li>I re-deployed DSpace Test (linode19) and was surprised by how long it took the ant update to complete:</li>
</ul>
<pre><code>BUILD SUCCESSFUL
<pre tabindex="0"><code>BUILD SUCCESSFUL
Total time: 4 minutes 12 seconds
</code></pre><ul>
<li>The Linode block storage is much slower than the instance storage</li>
@ -404,7 +404,7 @@ Total time: 4 minutes 12 seconds
<li>They will need to use OpenSearch, but I can&rsquo;t remember all the parameters</li>
<li>Apparently search sort options for OpenSearch are in <code>dspace.cfg</code>:</li>
</ul>
<pre><code>webui.itemlist.sort-option.1 = title:dc.title:title
<pre tabindex="0"><code>webui.itemlist.sort-option.1 = title:dc.title:title
webui.itemlist.sort-option.2 = dateissued:dc.date.issued:date
webui.itemlist.sort-option.3 = dateaccessioned:dc.date.accessioned:date
webui.itemlist.sort-option.4 = type:dc.type:text
@ -422,27 +422,27 @@ webui.itemlist.sort-option.4 = type:dc.type:text
<li>They are missing the <code>order</code> parameter (ASC vs DESC)</li>
<li>I notice that DSpace Test has crashed again, due to memory:</li>
</ul>
<pre><code># grep -c 'java.lang.OutOfMemoryError: Java heap space' /var/log/tomcat7/catalina.out
<pre tabindex="0"><code># grep -c 'java.lang.OutOfMemoryError: Java heap space' /var/log/tomcat7/catalina.out
178
</code></pre><ul>
<li>I will increase the JVM heap size from 5120M to 6144M, though we don&rsquo;t have much room left to grow as DSpace Test (linode19) is using a smaller instance size than CGSpace</li>
<li>Gabriela from CIP asked if I could send her a list of all CIP authors so she can do some replacements on the name formats</li>
<li>I got a list of all the CIP collections manually and use the same query that I used in <a href="/cgspace-notes/2017-08">August, 2017</a>:</li>
</ul>
<pre><code>dspace#= \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/89347', '10568/88229', '10568/53086', '10568/53085', '10568/69069', '10568/53087', '10568/53088', '10568/53089', '10568/53090', '10568/53091', '10568/53092', '10568/70150', '10568/53093', '10568/64874', '10568/53094'))) group by text_value order by count desc) to /tmp/cip-authors.csv with csv;
<pre tabindex="0"><code>dspace#= \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/89347', '10568/88229', '10568/53086', '10568/53085', '10568/69069', '10568/53087', '10568/53088', '10568/53089', '10568/53090', '10568/53091', '10568/53092', '10568/70150', '10568/53093', '10568/64874', '10568/53094'))) group by text_value order by count desc) to /tmp/cip-authors.csv with csv;
</code></pre><h2 id="2018-04-19">2018-04-19</h2>
<ul>
<li>Run updates on DSpace Test (linode19) and reboot the server</li>
<li>Also try deploying updated GeoLite database during ant update while re-deploying code:</li>
</ul>
<pre><code>$ ant update update_geolite clean_backups
<pre tabindex="0"><code>$ ant update update_geolite clean_backups
</code></pre><ul>
<li>I also re-deployed CGSpace (linode18) to make the ORCID search, authority cleanup, CCAFS project tag <code>PII-LAM_CSAGender</code> live</li>
<li>When re-deploying I also updated the GeoLite databases so I hope the country stats become more accurate&hellip;</li>
<li>After re-deployment I ran all system updates on the server and rebooted it</li>
<li>After the reboot I forced a reïndexing of the Discovery to populate the new ORCID index:</li>
</ul>
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
<pre tabindex="0"><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
real 73m42.635s
user 8m15.885s
@ -456,21 +456,21 @@ sys 2m2.687s
<li>I confirm that it&rsquo;s just giving a white page around 4:16</li>
<li>The DSpace logs show that there are no database connections:</li>
</ul>
<pre><code>org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-715] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:250; busy:18; idle:0; lastwait:5000].
<pre tabindex="0"><code>org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-715] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:250; busy:18; idle:0; lastwait:5000].
</code></pre><ul>
<li>And there have been shit tons of errors in the last (starting only 20 minutes ago luckily):</li>
</ul>
<pre><code># grep -c 'org.apache.tomcat.jdbc.pool.PoolExhaustedException' /home/cgspace.cgiar.org/log/dspace.log.2018-04-20
<pre tabindex="0"><code># grep -c 'org.apache.tomcat.jdbc.pool.PoolExhaustedException' /home/cgspace.cgiar.org/log/dspace.log.2018-04-20
32147
</code></pre><ul>
<li>I can&rsquo;t even log into PostgreSQL as the <code>postgres</code> user, WTF?</li>
</ul>
<pre><code>$ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c
<pre tabindex="0"><code>$ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c
^C
</code></pre><ul>
<li>Here are the most active IPs today:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;20/Apr/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;20/Apr/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
917 207.46.13.182
935 213.55.99.121
970 40.77.167.134
@ -484,7 +484,7 @@ sys 2m2.687s
</code></pre><ul>
<li>It doesn&rsquo;t even seem like there is a lot of traffic compared to the previous days:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;20/Apr/2018&quot; | wc -l
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;20/Apr/2018&quot; | wc -l
74931
# zcat --force /var/log/nginx/*.log.1 /var/log/nginx/*.log.2.gz| grep -E &quot;19/Apr/2018&quot; | wc -l
91073
@ -499,7 +499,7 @@ sys 2m2.687s
<li>Everything is back but I have no idea what caused this—I suspect something with the hosting provider</li>
<li>Also super weird, the last entry in the DSpace log file is from <code>2018-04-20 16:35:09</code>, and then immediately it goes to <code>2018-04-20 19:15:04</code> (three hours later!):</li>
</ul>
<pre><code>2018-04-20 16:35:09,144 ERROR org.dspace.app.util.AbstractDSpaceWebapp @ Failed to record shutdown in Webapp table.
<pre tabindex="0"><code>2018-04-20 16:35:09,144 ERROR org.dspace.app.util.AbstractDSpaceWebapp @ Failed to record shutdown in Webapp table.
org.apache.tomcat.jdbc.pool.PoolExhaustedException: [localhost-startStop-2] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:250; busy:18; idle
:0; lastwait:5000].
at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:685)
@ -543,12 +543,12 @@ org.apache.tomcat.jdbc.pool.PoolExhaustedException: [localhost-startStop-2] Time
<li>One other new thing I notice is that PostgreSQL 9.6 no longer uses <code>createuser</code> and <code>nocreateuser</code>, as those have actually meant <code>superuser</code> and <code>nosuperuser</code> and have been deprecated for <em>ten years</em></li>
<li>So for my notes, when I&rsquo;m importing a CGSpace database dump I need to amend my notes to give super user permission to a user, rather than create user:</li>
</ul>
<pre><code>$ psql dspacetest -c 'alter user dspacetest superuser;'
<pre tabindex="0"><code>$ psql dspacetest -c 'alter user dspacetest superuser;'
$ pg_restore -O -U dspacetest -d dspacetest -W -h localhost /tmp/dspace_2018-04-18.backup
</code></pre><ul>
<li>There&rsquo;s another issue with Tomcat in Ubuntu 18.04:</li>
</ul>
<pre><code>25-Apr-2018 13:26:21.493 SEVERE [http-nio-127.0.0.1-8443-exec-1] org.apache.coyote.AbstractProtocol$ConnectionHandler.process Error reading request, ignored
<pre tabindex="0"><code>25-Apr-2018 13:26:21.493 SEVERE [http-nio-127.0.0.1-8443-exec-1] org.apache.coyote.AbstractProtocol$ConnectionHandler.process Error reading request, ignored
java.lang.NoSuchMethodError: java.nio.ByteBuffer.position(I)Ljava/nio/ByteBuffer;
at org.apache.coyote.http11.Http11InputBuffer.init(Http11InputBuffer.java:688)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:672)