mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2021-09-13
This commit is contained in:
@ -30,7 +30,7 @@ We don’t need to distinguish between internal and external works, so that
|
||||
Yesterday I figured out how to monitor DSpace sessions using JMX
|
||||
I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu’s munin-plugins-java package and used the stuff I discovered about JMX in 2018-01
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.87.0" />
|
||||
<meta name="generator" content="Hugo 0.88.1" />
|
||||
|
||||
|
||||
|
||||
@ -128,7 +128,7 @@ I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu’s munin-pl
|
||||
<li>Run all system updates and reboot DSpace Test</li>
|
||||
<li>Wow, I packaged up the <code>jmx_dspace_sessions</code> stuff in the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> and deployed it on CGSpace and it totally works:</li>
|
||||
</ul>
|
||||
<pre><code># munin-run jmx_dspace_sessions
|
||||
<pre tabindex="0"><code># munin-run jmx_dspace_sessions
|
||||
v_.value 223
|
||||
v_jspui.value 1
|
||||
v_oai.value 0
|
||||
@ -139,12 +139,12 @@ v_oai.value 0
|
||||
<li>I finally took a look at the second round of cleanups Peter had sent me for author affiliations in mid January</li>
|
||||
<li>After trimming whitespace and quickly scanning for encoding errors I applied them on CGSpace:</li>
|
||||
</ul>
|
||||
<pre><code>$ ./delete-metadata-values.py -i /tmp/2018-02-03-Affiliations-12-deletions.csv -f cg.contributor.affiliation -m 211 -d dspace -u dspace -p 'fuuu'
|
||||
<pre tabindex="0"><code>$ ./delete-metadata-values.py -i /tmp/2018-02-03-Affiliations-12-deletions.csv -f cg.contributor.affiliation -m 211 -d dspace -u dspace -p 'fuuu'
|
||||
$ ./fix-metadata-values.py -i /tmp/2018-02-03-Affiliations-1116-corrections.csv -f cg.contributor.affiliation -t correct -m 211 -d dspace -u dspace -p 'fuuu'
|
||||
</code></pre><ul>
|
||||
<li>Then I started a full Discovery reindex:</li>
|
||||
</ul>
|
||||
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
|
||||
<pre tabindex="0"><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
|
||||
|
||||
real 96m39.823s
|
||||
user 14m10.975s
|
||||
@ -152,12 +152,12 @@ sys 2m29.088s
|
||||
</code></pre><ul>
|
||||
<li>Generate a new list of affiliations for Peter to sort through:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'affiliation') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/affiliations.csv with csv;
|
||||
<pre tabindex="0"><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'affiliation') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/affiliations.csv with csv;
|
||||
COPY 3723
|
||||
</code></pre><ul>
|
||||
<li>Oh, and it looks like we processed over 3.1 million requests in January, up from 2.9 million in <a href="/cgspace-notes/2017-12/">December</a>:</li>
|
||||
</ul>
|
||||
<pre><code># time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Jan/2018"
|
||||
<pre tabindex="0"><code># time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Jan/2018"
|
||||
3126109
|
||||
|
||||
real 0m23.839s
|
||||
@ -167,14 +167,14 @@ sys 0m1.905s
|
||||
<ul>
|
||||
<li>Toying with correcting authors with trailing spaces via PostgreSQL:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# update metadatavalue set text_value=REGEXP_REPLACE(text_value, '\s+$' , '') where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^.*?\s+$';
|
||||
<pre tabindex="0"><code>dspace=# update metadatavalue set text_value=REGEXP_REPLACE(text_value, '\s+$' , '') where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^.*?\s+$';
|
||||
UPDATE 20
|
||||
</code></pre><ul>
|
||||
<li>I tried the <code>TRIM(TRAILING from text_value)</code> function and it said it changed 20 items but the spaces didn’t go away</li>
|
||||
<li>This is on a fresh import of the CGSpace database, but when I tried to apply it on CGSpace there were no changes detected. Weird.</li>
|
||||
<li>Anyways, Peter wants a new list of authors to clean up, so I exported another CSV:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors-2018-02-05.csv with csv;
|
||||
<pre tabindex="0"><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors-2018-02-05.csv with csv;
|
||||
COPY 55630
|
||||
</code></pre><h2 id="2018-02-06">2018-02-06</h2>
|
||||
<ul>
|
||||
@ -182,7 +182,7 @@ COPY 55630
|
||||
<li>I see 308 PostgreSQL connections in <code>pg_stat_activity</code></li>
|
||||
<li>The usage otherwise seemed low for REST/OAI as well as XMLUI in the last hour:</li>
|
||||
</ul>
|
||||
<pre><code># date
|
||||
<pre tabindex="0"><code># date
|
||||
Tue Feb 6 09:30:32 UTC 2018
|
||||
# cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "6/Feb/2018:(08|09)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
2 223.185.41.40
|
||||
@ -232,7 +232,7 @@ Tue Feb 6 09:30:32 UTC 2018
|
||||
<li>CGSpace crashed again, this time around <code>Wed Feb 7 11:20:28 UTC 2018</code></li>
|
||||
<li>I took a few snapshots of the PostgreSQL activity at the time and as the minutes went on and the connections were very high at first but reduced on their own:</li>
|
||||
</ul>
|
||||
<pre><code>$ psql -c 'select * from pg_stat_activity' > /tmp/pg_stat_activity.txt
|
||||
<pre tabindex="0"><code>$ psql -c 'select * from pg_stat_activity' > /tmp/pg_stat_activity.txt
|
||||
$ grep -c 'PostgreSQL JDBC' /tmp/pg_stat_activity*
|
||||
/tmp/pg_stat_activity1.txt:300
|
||||
/tmp/pg_stat_activity2.txt:272
|
||||
@ -242,7 +242,7 @@ $ grep -c 'PostgreSQL JDBC' /tmp/pg_stat_activity*
|
||||
</code></pre><ul>
|
||||
<li>Interestingly, all of those 751 connections were idle!</li>
|
||||
</ul>
|
||||
<pre><code>$ grep "PostgreSQL JDBC" /tmp/pg_stat_activity* | grep -c idle
|
||||
<pre tabindex="0"><code>$ grep "PostgreSQL JDBC" /tmp/pg_stat_activity* | grep -c idle
|
||||
751
|
||||
</code></pre><ul>
|
||||
<li>Since I was restarting Tomcat anyways, I decided to deploy the changes to create two different pools for web and API apps</li>
|
||||
@ -252,17 +252,17 @@ $ grep -c 'PostgreSQL JDBC' /tmp/pg_stat_activity*
|
||||
<ul>
|
||||
<li>Indeed it seems like there were over 1800 sessions today around the hours of 10 and 11 AM:</li>
|
||||
</ul>
|
||||
<pre><code>$ grep -E '^2018-02-07 (10|11)' dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
|
||||
<pre tabindex="0"><code>$ grep -E '^2018-02-07 (10|11)' dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
|
||||
1828
|
||||
</code></pre><ul>
|
||||
<li>CGSpace went down again a few hours later, and now the connections to the dspaceWeb pool are maxed at 250 (the new limit I imposed with the new separate pool scheme)</li>
|
||||
<li>What’s interesting is that the DSpace log says the connections are all busy:</li>
|
||||
</ul>
|
||||
<pre><code>org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-328] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:250; busy:250; idle:0; lastwait:5000].
|
||||
<pre tabindex="0"><code>org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-328] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:250; busy:250; idle:0; lastwait:5000].
|
||||
</code></pre><ul>
|
||||
<li>… but in PostgreSQL I see them <code>idle</code> or <code>idle in transaction</code>:</li>
|
||||
</ul>
|
||||
<pre><code>$ psql -c 'select * from pg_stat_activity' | grep -c dspaceWeb
|
||||
<pre tabindex="0"><code>$ psql -c 'select * from pg_stat_activity' | grep -c dspaceWeb
|
||||
250
|
||||
$ psql -c 'select * from pg_stat_activity' | grep dspaceWeb | grep -c idle
|
||||
250
|
||||
@ -274,13 +274,13 @@ $ psql -c 'select * from pg_stat_activity' | grep dspaceWeb | grep -c "idle
|
||||
<li>I will try <code>testOnReturn='true'</code> too, just to add more validation, because I’m fucking grasping at straws</li>
|
||||
<li>Also, WTF, there was a heap space error randomly in catalina.out:</li>
|
||||
</ul>
|
||||
<pre><code>Wed Feb 07 15:01:54 UTC 2018 | Query:containerItem:91917 AND type:2
|
||||
<pre tabindex="0"><code>Wed Feb 07 15:01:54 UTC 2018 | Query:containerItem:91917 AND type:2
|
||||
Exception in thread "http-bio-127.0.0.1-8081-exec-58" java.lang.OutOfMemoryError: Java heap space
|
||||
</code></pre><ul>
|
||||
<li>I’m trying to find a way to determine what was using all those Tomcat sessions, but parsing the DSpace log is hard because some IPs are IPv6, which contain colons!</li>
|
||||
<li>Looking at the first crash this morning around 11, I see these IPv4 addresses making requests around 10 and 11AM:</li>
|
||||
</ul>
|
||||
<pre><code>$ grep -E '^2018-02-07 (10|11)' dspace.log.2018-02-07 | grep -o -E 'ip_addr=[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | sort -n | uniq -c | sort -n | tail -n 20
|
||||
<pre tabindex="0"><code>$ grep -E '^2018-02-07 (10|11)' dspace.log.2018-02-07 | grep -o -E 'ip_addr=[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | sort -n | uniq -c | sort -n | tail -n 20
|
||||
34 ip_addr=46.229.168.67
|
||||
34 ip_addr=46.229.168.73
|
||||
37 ip_addr=46.229.168.76
|
||||
@ -304,7 +304,7 @@ Exception in thread "http-bio-127.0.0.1-8081-exec-58" java.lang.OutOfM
|
||||
</code></pre><ul>
|
||||
<li>These IPs made thousands of sessions today:</li>
|
||||
</ul>
|
||||
<pre><code>$ grep 104.196.152.243 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
|
||||
<pre tabindex="0"><code>$ grep 104.196.152.243 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
|
||||
530
|
||||
$ grep 207.46.13.71 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
|
||||
859
|
||||
@ -342,11 +342,11 @@ $ grep 46.229.168 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' |
|
||||
<li>What in the actual fuck, why is our load doing this? It’s gotta be something fucked up with the database pool being “busy” but everything is fucking idle</li>
|
||||
<li>One that I should probably add in nginx is 54.83.138.123, which is apparently the following user agent:</li>
|
||||
</ul>
|
||||
<pre><code>BUbiNG (+http://law.di.unimi.it/BUbiNG.html)
|
||||
<pre tabindex="0"><code>BUbiNG (+http://law.di.unimi.it/BUbiNG.html)
|
||||
</code></pre><ul>
|
||||
<li>This one makes two thousand requests per day or so recently:</li>
|
||||
</ul>
|
||||
<pre><code># grep -c BUbiNG /var/log/nginx/access.log /var/log/nginx/access.log.1
|
||||
<pre tabindex="0"><code># grep -c BUbiNG /var/log/nginx/access.log /var/log/nginx/access.log.1
|
||||
/var/log/nginx/access.log:1925
|
||||
/var/log/nginx/access.log.1:2029
|
||||
</code></pre><ul>
|
||||
@ -355,13 +355,13 @@ $ grep 46.229.168 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' |
|
||||
<li>Helix84 recommends restarting PostgreSQL instead of Tomcat because it restarts quicker</li>
|
||||
<li>This is how the connections looked when it crashed this afternoon:</li>
|
||||
</ul>
|
||||
<pre><code>$ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c
|
||||
<pre tabindex="0"><code>$ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c
|
||||
5 dspaceApi
|
||||
290 dspaceWeb
|
||||
</code></pre><ul>
|
||||
<li>This is how it is right now:</li>
|
||||
</ul>
|
||||
<pre><code>$ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c
|
||||
<pre tabindex="0"><code>$ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c
|
||||
5 dspaceApi
|
||||
5 dspaceWeb
|
||||
</code></pre><ul>
|
||||
@ -378,11 +378,11 @@ $ grep 46.229.168 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' |
|
||||
<li>Switch authority.controlled off and change authorLookup to lookup, and the ORCID badge doesn’t show up on the item</li>
|
||||
<li>Leave all settings but change choices.presentation to lookup and ORCID badge is there and item submission uses LC Name Authority and it breaks with this error:</li>
|
||||
</ul>
|
||||
<pre><code>Field dc_contributor_author has choice presentation of type "select", it may NOT be authority-controlled.
|
||||
<pre tabindex="0"><code>Field dc_contributor_author has choice presentation of type "select", it may NOT be authority-controlled.
|
||||
</code></pre><ul>
|
||||
<li>If I change choices.presentation to suggest it give this error:</li>
|
||||
</ul>
|
||||
<pre><code>xmlui.mirage2.forms.instancedCompositeFields.noSuggestionError
|
||||
<pre tabindex="0"><code>xmlui.mirage2.forms.instancedCompositeFields.noSuggestionError
|
||||
</code></pre><ul>
|
||||
<li>So I don’t think we can disable the ORCID lookup function and keep the ORCID badges</li>
|
||||
</ul>
|
||||
@ -394,12 +394,12 @@ $ grep 46.229.168 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' |
|
||||
<ul>
|
||||
<li>I downloaded the PDF and manually generated a thumbnail with ImageMagick and it looked better:</li>
|
||||
</ul>
|
||||
<pre><code>$ convert CCAFS_WP_223.pdf\[0\] -profile /usr/local/share/ghostscript/9.22/iccprofiles/default_cmyk.icc -thumbnail 600x600 -flatten -profile /usr/local/share/ghostscript/9.22/iccprofiles/default_rgb.icc CCAFS_WP_223.jpg
|
||||
<pre tabindex="0"><code>$ convert CCAFS_WP_223.pdf\[0\] -profile /usr/local/share/ghostscript/9.22/iccprofiles/default_cmyk.icc -thumbnail 600x600 -flatten -profile /usr/local/share/ghostscript/9.22/iccprofiles/default_rgb.icc CCAFS_WP_223.jpg
|
||||
</code></pre><p><img src="/cgspace-notes/2018/02/CCAFS_WP_223.jpg" alt="Manual thumbnail"></p>
|
||||
<ul>
|
||||
<li>Peter sent me corrected author names last week but the file encoding is messed up:</li>
|
||||
</ul>
|
||||
<pre><code>$ isutf8 authors-2018-02-05.csv
|
||||
<pre tabindex="0"><code>$ isutf8 authors-2018-02-05.csv
|
||||
authors-2018-02-05.csv: line 100, char 18, byte 4179: After a first byte between E1 and EC, expecting the 2nd byte between 80 and BF.
|
||||
</code></pre><ul>
|
||||
<li>The <code>isutf8</code> program comes from <code>moreutils</code></li>
|
||||
@ -409,18 +409,18 @@ authors-2018-02-05.csv: line 100, char 18, byte 4179: After a first byte between
|
||||
<li>I updated my <code>fix-metadata-values.py</code> and <code>delete-metadata-values.py</code> scripts on the scripts page: <a href="https://github.com/ilri/DSpace/wiki/Scripts">https://github.com/ilri/DSpace/wiki/Scripts</a></li>
|
||||
<li>I ran the 342 author corrections (after trimming whitespace and excluding those with <code>||</code> and other syntax errors) on CGSpace:</li>
|
||||
</ul>
|
||||
<pre><code>$ ./fix-metadata-values.py -i Correct-342-Authors-2018-02-11.csv -f dc.contributor.author -t correct -m 3 -d dspace -u dspace -p 'fuuu'
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i Correct-342-Authors-2018-02-11.csv -f dc.contributor.author -t correct -m 3 -d dspace -u dspace -p 'fuuu'
|
||||
</code></pre><ul>
|
||||
<li>Then I ran a full Discovery re-indexing:</li>
|
||||
</ul>
|
||||
<pre><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"
|
||||
<pre tabindex="0"><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"
|
||||
$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
||||
</code></pre><ul>
|
||||
<li>That reminds me that Bizu had asked me to fix some of Alan Duncan’s names in December</li>
|
||||
<li>I see he actually has some variations with “Duncan, Alan J.": <a href="https://cgspace.cgiar.org/discover?filtertype_1=author&filter_relational_operator_1=contains&filter_1=Duncan%2C+Alan&submit_apply_filter=&query=">https://cgspace.cgiar.org/discover?filtertype_1=author&filter_relational_operator_1=contains&filter_1=Duncan%2C+Alan&submit_apply_filter=&query=</a></li>
|
||||
<li>I will just update those for her too and then restart the indexing:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Duncan, Alan%';
|
||||
<pre tabindex="0"><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Duncan, Alan%';
|
||||
text_value | authority | confidence
|
||||
-----------------+--------------------------------------+------------
|
||||
Duncan, Alan J. | 5ff35043-942e-4d0a-b377-4daed6e3c1a3 | 600
|
||||
@ -464,7 +464,7 @@ dspace=# commit;
|
||||
<li>I see that in <a href="/cgspace-notes/2017-04/">April, 2017</a> I just used a SQL query to get a user’s submissions by checking the <code>dc.description.provenance</code> field</li>
|
||||
<li>So for Abenet, I can check her submissions in December, 2017 with:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# select * from metadatavalue where resource_type_id=2 and metadata_field_id=28 and text_value ~ '^Submitted.*yabowork.*2017-12.*';
|
||||
<pre tabindex="0"><code>dspace=# select * from metadatavalue where resource_type_id=2 and metadata_field_id=28 and text_value ~ '^Submitted.*yabowork.*2017-12.*';
|
||||
</code></pre><ul>
|
||||
<li>I emailed Peter to ask whether we can move DSpace Test to a new Linode server and attach 300 GB of disk space to it</li>
|
||||
<li>This would be using <a href="https://www.linode.com/blockstorage">Linode’s new block storage volumes</a></li>
|
||||
@ -477,14 +477,14 @@ dspace=# commit;
|
||||
<li>Peter said he was getting a “socket closed” error on CGSpace</li>
|
||||
<li>I looked in the dspace.log.2018-02-13 and saw one recent one:</li>
|
||||
</ul>
|
||||
<pre><code>2018-02-13 12:50:13,656 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL QueryTable Error -
|
||||
<pre tabindex="0"><code>2018-02-13 12:50:13,656 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL QueryTable Error -
|
||||
org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
|
||||
...
|
||||
Caused by: java.net.SocketException: Socket closed
|
||||
</code></pre><ul>
|
||||
<li>Could be because of the <code>removeAbandoned="true"</code> that I enabled in the JDBC connection pool last week?</li>
|
||||
</ul>
|
||||
<pre><code>$ grep -c "java.net.SocketException: Socket closed" dspace.log.2018-02-*
|
||||
<pre tabindex="0"><code>$ grep -c "java.net.SocketException: Socket closed" dspace.log.2018-02-*
|
||||
dspace.log.2018-02-01:0
|
||||
dspace.log.2018-02-02:0
|
||||
dspace.log.2018-02-03:0
|
||||
@ -503,7 +503,7 @@ dspace.log.2018-02-13:4
|
||||
<li>I will increase the removeAbandonedTimeout from its default of 60 to 90 and enable logAbandoned</li>
|
||||
<li>Peter hit this issue one more time, and this is apparently what Tomcat’s catalina.out log says when an abandoned connection is removed:</li>
|
||||
</ul>
|
||||
<pre><code>Feb 13, 2018 2:05:42 PM org.apache.tomcat.jdbc.pool.ConnectionPool abandon
|
||||
<pre tabindex="0"><code>Feb 13, 2018 2:05:42 PM org.apache.tomcat.jdbc.pool.ConnectionPool abandon
|
||||
WARNING: Connection has been abandoned PooledConnection[org.postgresql.jdbc.PgConnection@22e107be]:java.lang.Exception
|
||||
</code></pre><h2 id="2018-02-14">2018-02-14</h2>
|
||||
<ul>
|
||||
@ -521,21 +521,21 @@ WARNING: Connection has been abandoned PooledConnection[org.postgresql.jdbc.PgCo
|
||||
<li>Atmire responded on the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 compatability ticket</a> and said they will let me know if they they want me to give them a clean 5.8 branch</li>
|
||||
<li>I formatted my list of ORCID IDs as a controlled vocabulary, sorted alphabetically, then ran through XML tidy:</li>
|
||||
</ul>
|
||||
<pre><code>$ sort cgspace-orcids.txt > dspace/config/controlled-vocabularies/cg-creator-id.xml
|
||||
<pre tabindex="0"><code>$ sort cgspace-orcids.txt > dspace/config/controlled-vocabularies/cg-creator-id.xml
|
||||
$ add XML formatting...
|
||||
$ tidy -xml -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-id.xml
|
||||
</code></pre><ul>
|
||||
<li>It seems the tidy fucks up accents, for example it turns <code>Adriana Tofiño (0000-0001-7115-7169)</code> into <code>Adriana Tofiño (0000-0001-7115-7169)</code></li>
|
||||
<li>We need to force UTF-8:</li>
|
||||
</ul>
|
||||
<pre><code>$ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-id.xml
|
||||
<pre tabindex="0"><code>$ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-id.xml
|
||||
</code></pre><ul>
|
||||
<li>This preserves special accent characters</li>
|
||||
<li>I tested the display and store of these in the XMLUI and PostgreSQL and it looks good</li>
|
||||
<li>Sisay exported all ILRI, CIAT, etc authors from ORCID and sent a list of 600+</li>
|
||||
<li>Peter combined it with mine and we have 1204 unique ORCIDs!</li>
|
||||
</ul>
|
||||
<pre><code>$ grep -coE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' CGcenter_ORCID_ID_combined.csv
|
||||
<pre tabindex="0"><code>$ grep -coE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' CGcenter_ORCID_ID_combined.csv
|
||||
1204
|
||||
$ grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' CGcenter_ORCID_ID_combined.csv | sort | uniq | wc -l
|
||||
1204
|
||||
@ -543,19 +543,19 @@ $ grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' CGcenter_ORCID_ID_c
|
||||
<li>Also, save that regex for the future because it will be very useful!</li>
|
||||
<li>CIAT sent a list of their authors' ORCIDs and combined with ours there are now 1227:</li>
|
||||
</ul>
|
||||
<pre><code>$ cat CGcenter_ORCID_ID_combined.csv ciat-orcids.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq | wc -l
|
||||
<pre tabindex="0"><code>$ cat CGcenter_ORCID_ID_combined.csv ciat-orcids.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq | wc -l
|
||||
1227
|
||||
</code></pre><ul>
|
||||
<li>There are some formatting issues with names in Peter’s list, so I should remember to re-generate the list of names from ORCID’s API once we’re done</li>
|
||||
<li>The <code>dspace cleanup -v</code> currently fails on CGSpace with the following:</li>
|
||||
</ul>
|
||||
<pre><code> - Deleting bitstream record from database (ID: 149473)
|
||||
<pre tabindex="0"><code> - Deleting bitstream record from database (ID: 149473)
|
||||
Error: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"
|
||||
Detail: Key (bitstream_id)=(149473) is still referenced from table "bundle".
|
||||
</code></pre><ul>
|
||||
<li>The solution is to update the bitstream table, as I’ve discovered several other times in 2016 and 2017:</li>
|
||||
</ul>
|
||||
<pre><code>$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (149473);'
|
||||
<pre tabindex="0"><code>$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (149473);'
|
||||
UPDATE 1
|
||||
</code></pre><ul>
|
||||
<li>Then the cleanup process will continue for awhile and hit another foreign key conflict, and eventually it will complete after you manually resolve them all</li>
|
||||
@ -575,25 +575,25 @@ UPDATE 1
|
||||
<li>I only looked quickly in the logs but saw a bunch of database errors</li>
|
||||
<li>PostgreSQL connections are currently:</li>
|
||||
</ul>
|
||||
<pre><code>$ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | uniq -c
|
||||
<pre tabindex="0"><code>$ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | uniq -c
|
||||
2 dspaceApi
|
||||
1 dspaceWeb
|
||||
3 dspaceApi
|
||||
</code></pre><ul>
|
||||
<li>I see shitloads of memory errors in Tomcat’s logs:</li>
|
||||
</ul>
|
||||
<pre><code># grep -c "Java heap space" /var/log/tomcat7/catalina.out
|
||||
<pre tabindex="0"><code># grep -c "Java heap space" /var/log/tomcat7/catalina.out
|
||||
56
|
||||
</code></pre><ul>
|
||||
<li>And shit tons of database connections abandoned:</li>
|
||||
</ul>
|
||||
<pre><code># grep -c 'org.apache.tomcat.jdbc.pool.ConnectionPool abandon' /var/log/tomcat7/catalina.out
|
||||
<pre tabindex="0"><code># grep -c 'org.apache.tomcat.jdbc.pool.ConnectionPool abandon' /var/log/tomcat7/catalina.out
|
||||
612
|
||||
</code></pre><ul>
|
||||
<li>I have no fucking idea why it crashed</li>
|
||||
<li>The XMLUI activity looks like:</li>
|
||||
</ul>
|
||||
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 /var/log/nginx/error.log /var/log/nginx/error.log.1 | grep -E "15/Feb/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 /var/log/nginx/error.log /var/log/nginx/error.log.1 | grep -E "15/Feb/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
715 63.143.42.244
|
||||
746 213.55.99.121
|
||||
886 68.180.228.157
|
||||
@ -610,7 +610,7 @@ UPDATE 1
|
||||
<li>I made a pull request to fix it ((#354)[https://github.com/ilri/DSpace/pull/354])</li>
|
||||
<li>I should remember to update existing values in PostgreSQL too:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# update metadatavalue set text_value='United States Agency for International Development' where resource_type_id=2 and metadata_field_id=29 and text_value like '%U.S. Agency for International Development%';
|
||||
<pre tabindex="0"><code>dspace=# update metadatavalue set text_value='United States Agency for International Development' where resource_type_id=2 and metadata_field_id=29 and text_value like '%U.S. Agency for International Development%';
|
||||
UPDATE 2
|
||||
</code></pre><h2 id="2018-02-18">2018-02-18</h2>
|
||||
<ul>
|
||||
@ -624,7 +624,7 @@ UPDATE 2
|
||||
<li>Run system updates on DSpace Test (linode02) and reboot the server</li>
|
||||
<li>Looking back at the system errors on 2018-02-15, I wonder what the fuck caused this:</li>
|
||||
</ul>
|
||||
<pre><code>$ wc -l dspace.log.2018-02-1{0..8}
|
||||
<pre tabindex="0"><code>$ wc -l dspace.log.2018-02-1{0..8}
|
||||
383483 dspace.log.2018-02-10
|
||||
275022 dspace.log.2018-02-11
|
||||
249557 dspace.log.2018-02-12
|
||||
@ -638,13 +638,13 @@ UPDATE 2
|
||||
<li>From an average of a few hundred thousand to over four million lines in DSpace log?</li>
|
||||
<li>Using grep’s <code>-B1</code> I can see the line before the heap space error, which has the time, ie:</li>
|
||||
</ul>
|
||||
<pre><code>2018-02-15 16:02:12,748 ERROR org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter @ Serious Error Occurred Processing Request!
|
||||
<pre tabindex="0"><code>2018-02-15 16:02:12,748 ERROR org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter @ Serious Error Occurred Processing Request!
|
||||
org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.OutOfMemoryError: Java heap space
|
||||
</code></pre><ul>
|
||||
<li>So these errors happened at hours 16, 18, 19, and 20</li>
|
||||
<li>Let’s see what was going on in nginx then:</li>
|
||||
</ul>
|
||||
<pre><code># zcat --force /var/log/nginx/*.log.{3,4}.gz | wc -l
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log.{3,4}.gz | wc -l
|
||||
168571
|
||||
# zcat --force /var/log/nginx/*.log.{3,4}.gz | grep -E "15/Feb/2018:(16|18|19|20)" | wc -l
|
||||
8188
|
||||
@ -652,7 +652,7 @@ org.springframework.web.util.NestedServletException: Handler processing failed;
|
||||
<li>Only 8,000 requests during those four hours, out of 170,000 the whole day!</li>
|
||||
<li>And the usage of XMLUI, REST, and OAI looks SUPER boring:</li>
|
||||
</ul>
|
||||
<pre><code># zcat --force /var/log/nginx/*.log.{3,4}.gz | grep -E "15/Feb/2018:(16|18|19|20)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log.{3,4}.gz | grep -E "15/Feb/2018:(16|18|19|20)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
111 95.108.181.88
|
||||
158 45.5.184.221
|
||||
201 104.196.152.243
|
||||
@ -677,20 +677,20 @@ org.springframework.web.util.NestedServletException: Handler processing failed;
|
||||
<ul>
|
||||
<li>Combined list of CGIAR author ORCID iDs is up to 1,500:</li>
|
||||
</ul>
|
||||
<pre><code>$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml ORCID_ID_CIAT_IITA_IWMI-csv.csv CGcenter_ORCID_ID_combined.csv | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq | wc -l
|
||||
<pre tabindex="0"><code>$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml ORCID_ID_CIAT_IITA_IWMI-csv.csv CGcenter_ORCID_ID_combined.csv | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq | wc -l
|
||||
1571
|
||||
</code></pre><ul>
|
||||
<li>I updated my <code>resolve-orcids-from-solr.py</code> script to be able to resolve ORCID identifiers from a text file so I renamed it to <code>resolve-orcids.py</code></li>
|
||||
<li>Also, I updated it so it uses several new options:</li>
|
||||
</ul>
|
||||
<pre><code>$ ./resolve-orcids.py -i input.txt -o output.txt
|
||||
<pre tabindex="0"><code>$ ./resolve-orcids.py -i input.txt -o output.txt
|
||||
$ cat output.txt
|
||||
Ali Ramadhan: 0000-0001-5019-1368
|
||||
Ahmad Maryudi: 0000-0001-5051-7217
|
||||
</code></pre><ul>
|
||||
<li>I was running this on the new list of 1571 and found an error:</li>
|
||||
</ul>
|
||||
<pre><code>Looking up the name associated with ORCID iD: 0000-0001-9634-1958
|
||||
<pre tabindex="0"><code>Looking up the name associated with ORCID iD: 0000-0001-9634-1958
|
||||
Traceback (most recent call last):
|
||||
File "./resolve-orcids.py", line 111, in <module>
|
||||
read_identifiers_from_file()
|
||||
@ -704,7 +704,7 @@ TypeError: 'NoneType' object is not subscriptable
|
||||
<li>I fixed the script so that it checks if the family name is null</li>
|
||||
<li>Now another:</li>
|
||||
</ul>
|
||||
<pre><code>Looking up the name associated with ORCID iD: 0000-0002-1300-3636
|
||||
<pre tabindex="0"><code>Looking up the name associated with ORCID iD: 0000-0002-1300-3636
|
||||
Traceback (most recent call last):
|
||||
File "./resolve-orcids.py", line 117, in <module>
|
||||
read_identifiers_from_file()
|
||||
@ -722,13 +722,13 @@ TypeError: 'NoneType' object is not subscriptable
|
||||
<li>Discuss some of the issues with null values and poor-quality names in some ORCID identifiers with Abenet and I think we’ll now only use ORCID iDs that have been sent to use from partners, not those extracted via keyword searches on orcid.org</li>
|
||||
<li>This should be the version we use (the existing controlled vocabulary generated from CGSpace’s Solr authority core plus the IDs sent to us so far by partners):</li>
|
||||
</ul>
|
||||
<pre><code>$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml ORCID_ID_CIAT_IITA_IWMI.csv | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq > 2018-02-20-combined.txt
|
||||
<pre tabindex="0"><code>$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml ORCID_ID_CIAT_IITA_IWMI.csv | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq > 2018-02-20-combined.txt
|
||||
</code></pre><ul>
|
||||
<li>I updated the <code>resolve-orcids.py</code> to use the “credit-name” if it exists in a profile, falling back to “given-names” + “family-name”</li>
|
||||
<li>Also, I added color coded output to the debug messages and added a “quiet” mode that supresses the normal behavior of printing results to the screen</li>
|
||||
<li>I’m using this as the test input for <code>resolve-orcids.py</code>:</li>
|
||||
</ul>
|
||||
<pre><code>$ cat orcid-test-values.txt
|
||||
<pre tabindex="0"><code>$ cat orcid-test-values.txt
|
||||
# valid identifier with 'given-names' and 'family-name'
|
||||
0000-0001-5019-1368
|
||||
|
||||
@ -770,7 +770,7 @@ TypeError: 'NoneType' object is not subscriptable
|
||||
<li>It looks like Sisay restarted Tomcat because I was offline</li>
|
||||
<li>There was absolutely nothing interesting going on at 13:00 on the server, WTF?</li>
|
||||
</ul>
|
||||
<pre><code># cat /var/log/nginx/*.log | grep -E "22/Feb/2018:13" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
<pre tabindex="0"><code># cat /var/log/nginx/*.log | grep -E "22/Feb/2018:13" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
55 192.99.39.235
|
||||
60 207.46.13.26
|
||||
62 40.77.167.38
|
||||
@ -784,7 +784,7 @@ TypeError: 'NoneType' object is not subscriptable
|
||||
</code></pre><ul>
|
||||
<li>Otherwise there was pretty normal traffic the rest of the day:</li>
|
||||
</ul>
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "22/Feb/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "22/Feb/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
839 216.244.66.245
|
||||
1074 68.180.228.117
|
||||
1114 157.55.39.100
|
||||
@ -798,7 +798,7 @@ TypeError: 'NoneType' object is not subscriptable
|
||||
</code></pre><ul>
|
||||
<li>So I don’t see any definite cause for this crash, I see a shit ton of abandoned PostgreSQL connections today around 1PM!</li>
|
||||
</ul>
|
||||
<pre><code># grep -c 'org.apache.tomcat.jdbc.pool.ConnectionPool abandon' /var/log/tomcat7/catalina.out
|
||||
<pre tabindex="0"><code># grep -c 'org.apache.tomcat.jdbc.pool.ConnectionPool abandon' /var/log/tomcat7/catalina.out
|
||||
729
|
||||
# grep 'Feb 22, 2018 1' /var/log/tomcat7/catalina.out | grep -c 'org.apache.tomcat.jdbc.pool.ConnectionPool abandon'
|
||||
519
|
||||
@ -807,7 +807,7 @@ TypeError: 'NoneType' object is not subscriptable
|
||||
<li>Abandoned connections is not a cause but a symptom, though perhaps something more like a few minutes is better?</li>
|
||||
<li>Also, while looking at the logs I see some new bot:</li>
|
||||
</ul>
|
||||
<pre><code>Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.4.2661.102 Safari/537.36; 360Spider
|
||||
<pre tabindex="0"><code>Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.4.2661.102 Safari/537.36; 360Spider
|
||||
</code></pre><ul>
|
||||
<li>It seems to re-use its user agent but makes tons of useless requests and I wonder if I should add “.<em>spider.</em>” to the Tomcat Crawler Session Manager valve?</li>
|
||||
</ul>
|
||||
@ -820,19 +820,19 @@ TypeError: 'NoneType' object is not subscriptable
|
||||
<li>A few days ago Abenet sent me the list of ORCID iDs from CCAFS</li>
|
||||
<li>We currently have 988 unique identifiers:</li>
|
||||
</ul>
|
||||
<pre><code>$ cat dspace/config/controlled-vocabularies/cg-creator-id.xml | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq | wc -l
|
||||
<pre tabindex="0"><code>$ cat dspace/config/controlled-vocabularies/cg-creator-id.xml | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq | wc -l
|
||||
988
|
||||
</code></pre><ul>
|
||||
<li>After adding the ones from CCAFS we now have 1004:</li>
|
||||
</ul>
|
||||
<pre><code>$ cat dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/ccafs | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq | wc -l
|
||||
<pre tabindex="0"><code>$ cat dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/ccafs | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq | wc -l
|
||||
1004
|
||||
</code></pre><ul>
|
||||
<li>I will add them to DSpace Test but Abenet says she’s still waiting to set us ILRI’s list</li>
|
||||
<li>I will tell her that we should proceed on sharing our work on DSpace Test with the partners this week anyways and we can update the list later</li>
|
||||
<li>While regenerating the names for these ORCID identifiers I saw <a href="https://pub.orcid.org/v2.1/0000-0002-2614-426X/person">one that has a weird value for its names</a>:</li>
|
||||
</ul>
|
||||
<pre><code>Looking up the names associated with ORCID iD: 0000-0002-2614-426X
|
||||
<pre tabindex="0"><code>Looking up the names associated with ORCID iD: 0000-0002-2614-426X
|
||||
Given Names Deactivated Family Name Deactivated: 0000-0002-2614-426X
|
||||
</code></pre><ul>
|
||||
<li>I don’t know if the user accidentally entered this as their name or if that’s how ORCID behaves when the name is private?</li>
|
||||
@ -843,7 +843,7 @@ Given Names Deactivated Family Name Deactivated: 0000-0002-2614-426X
|
||||
<li>Thinking about how to preserve ORCID identifiers attached to existing items in CGSpace</li>
|
||||
<li>We have over 60,000 unique author + authority combinations on CGSpace:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# select count(distinct (text_value, authority)) from metadatavalue where resource_type_id=2 and metadata_field_id=3;
|
||||
<pre tabindex="0"><code>dspace=# select count(distinct (text_value, authority)) from metadatavalue where resource_type_id=2 and metadata_field_id=3;
|
||||
count
|
||||
-------
|
||||
62464
|
||||
@ -853,7 +853,7 @@ Given Names Deactivated Family Name Deactivated: 0000-0002-2614-426X
|
||||
<li>The query in Solr would simply be <code>orcid_id:*</code></li>
|
||||
<li>Assuming I know that authority record with <code>id:d7ef744b-bbd4-4171-b449-00e37e1b776f</code>, then I could query PostgreSQL for all metadata records using that authority:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# select * from metadatavalue where resource_type_id=2 and authority='d7ef744b-bbd4-4171-b449-00e37e1b776f';
|
||||
<pre tabindex="0"><code>dspace=# select * from metadatavalue where resource_type_id=2 and authority='d7ef744b-bbd4-4171-b449-00e37e1b776f';
|
||||
metadata_value_id | resource_id | metadata_field_id | text_value | text_lang | place | authority | confidence | resource_type_id
|
||||
-------------------+-------------+-------------------+---------------------------+-----------+-------+--------------------------------------+------------+------------------
|
||||
2726830 | 77710 | 3 | Rodríguez Chalarca, Jairo | | 2 | d7ef744b-bbd4-4171-b449-00e37e1b776f | 600 | 2
|
||||
@ -862,13 +862,13 @@ Given Names Deactivated Family Name Deactivated: 0000-0002-2614-426X
|
||||
<li>Then I suppose I can use the <code>resource_id</code> to identify the item?</li>
|
||||
<li>Actually, <code>resource_id</code> is the same id we use in CSV, so I could simply build something like this for a metadata import!</li>
|
||||
</ul>
|
||||
<pre><code>id,cg.creator.id
|
||||
<pre tabindex="0"><code>id,cg.creator.id
|
||||
93848,Alan S. Orth: 0000-0002-1735-7458||Peter G. Ballantyne: 0000-0001-9346-2893
|
||||
</code></pre><ul>
|
||||
<li>I just discovered that <a href="https://requests-cache.readthedocs.io">requests-cache</a> can transparently cache HTTP requests</li>
|
||||
<li>Running <code>resolve-orcids.py</code> with my test input takes 10.5 seconds the first time, and then 3.0 seconds the second time!</li>
|
||||
</ul>
|
||||
<pre><code>$ time ./resolve-orcids.py -i orcid-test-values.txt -o /tmp/orcid-names
|
||||
<pre tabindex="0"><code>$ time ./resolve-orcids.py -i orcid-test-values.txt -o /tmp/orcid-names
|
||||
Ali Ramadhan: 0000-0001-5019-1368
|
||||
Alan S. Orth: 0000-0002-1735-7458
|
||||
Ibrahim Mohammed: 0000-0001-5199-5528
|
||||
@ -896,7 +896,7 @@ Nor Azwadi: 0000-0001-9634-1958
|
||||
<li>I need to see which SQL queries are run during that time</li>
|
||||
<li>And only a few hours after I disabled the <code>removeAbandoned</code> thing CGSpace went down and lo and behold, there were 264 connections, most of which were idle:</li>
|
||||
</ul>
|
||||
<pre><code>$ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c
|
||||
<pre tabindex="0"><code>$ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c
|
||||
5 dspaceApi
|
||||
279 dspaceWeb
|
||||
$ psql -c 'select * from pg_stat_activity' | grep dspaceWeb | grep -c "idle in transaction"
|
||||
@ -905,7 +905,7 @@ $ psql -c 'select * from pg_stat_activity' | grep dspaceWeb | grep -c "idle
|
||||
<li>So I’m re-enabling the <code>removeAbandoned</code> setting</li>
|
||||
<li>I grabbed a snapshot of the active connections in <code>pg_stat_activity</code> for all queries running longer than 2 minutes:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# \copy (SELECT now() - query_start as "runtime", application_name, usename, datname, waiting, state, query
|
||||
<pre tabindex="0"><code>dspace=# \copy (SELECT now() - query_start as "runtime", application_name, usename, datname, waiting, state, query
|
||||
FROM pg_stat_activity
|
||||
WHERE now() - query_start > '2 minutes'::interval
|
||||
ORDER BY runtime DESC) to /tmp/2018-02-27-postgresql.txt
|
||||
@ -913,11 +913,11 @@ COPY 263
|
||||
</code></pre><ul>
|
||||
<li>100 of these idle in transaction connections are the following query:</li>
|
||||
</ul>
|
||||
<pre><code>SELECT * FROM resourcepolicy WHERE resource_type_id= $1 AND resource_id= $2 AND action_id= $3
|
||||
<pre tabindex="0"><code>SELECT * FROM resourcepolicy WHERE resource_type_id= $1 AND resource_id= $2 AND action_id= $3
|
||||
</code></pre><ul>
|
||||
<li>… but according to the <a href="https://www.postgresql.org/docs/9.5/static/view-pg-locks.html">pg_locks documentation</a> I should have done this to correlate the locks with the activity:</li>
|
||||
</ul>
|
||||
<pre><code>SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;
|
||||
<pre tabindex="0"><code>SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;
|
||||
</code></pre><ul>
|
||||
<li>Tom Desair from Atmire shared some extra JDBC pool parameters that might be useful on my thread on the dspace-tech mailing list:
|
||||
<ul>
|
||||
@ -936,7 +936,7 @@ COPY 263
|
||||
<li>CGSpace crashed today, the first HTTP 499 in nginx’s access.log was around 09:12</li>
|
||||
<li>There’s nothing interesting going on in nginx’s logs around that time:</li>
|
||||
</ul>
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "28/Feb/2018:09:" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "28/Feb/2018:09:" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
65 197.210.168.174
|
||||
74 213.55.99.121
|
||||
74 66.249.66.90
|
||||
@ -950,12 +950,12 @@ COPY 263
|
||||
</code></pre><ul>
|
||||
<li>Looking in dspace.log-2018-02-28 I see this, though:</li>
|
||||
</ul>
|
||||
<pre><code>2018-02-28 09:19:29,692 ERROR org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter @ Serious Error Occurred Processing Request!
|
||||
<pre tabindex="0"><code>2018-02-28 09:19:29,692 ERROR org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter @ Serious Error Occurred Processing Request!
|
||||
org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.OutOfMemoryError: Java heap space
|
||||
</code></pre><ul>
|
||||
<li>Memory issues seem to be common this month:</li>
|
||||
</ul>
|
||||
<pre><code>$ grep -c 'nested exception is java.lang.OutOfMemoryError: Java heap space' dspace.log.2018-02-*
|
||||
<pre tabindex="0"><code>$ grep -c 'nested exception is java.lang.OutOfMemoryError: Java heap space' dspace.log.2018-02-*
|
||||
dspace.log.2018-02-01:0
|
||||
dspace.log.2018-02-02:0
|
||||
dspace.log.2018-02-03:0
|
||||
@ -987,7 +987,7 @@ dspace.log.2018-02-28:1
|
||||
</code></pre><ul>
|
||||
<li>Top ten users by session during the first twenty minutes of 9AM:</li>
|
||||
</ul>
|
||||
<pre><code>$ grep -E '2018-02-28 09:(0|1)' dspace.log.2018-02-28 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq -c | sort -n | tail -n 10
|
||||
<pre tabindex="0"><code>$ grep -E '2018-02-28 09:(0|1)' dspace.log.2018-02-28 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq -c | sort -n | tail -n 10
|
||||
18 session_id=F2DFF64D3D707CD66AE3A873CEC80C49
|
||||
19 session_id=92E61C64A79F0812BE62A3882DA8F4BA
|
||||
21 session_id=57417F5CB2F9E3871E609CEEBF4E001F
|
||||
@ -1006,13 +1006,13 @@ dspace.log.2018-02-28:1
|
||||
<li>I think I’ll increase the JVM heap size on CGSpace from 6144m to 8192m because I’m sick of this random crashing shit and the server has memory and I’d rather eliminate this so I can get back to solving PostgreSQL issues and doing other real work</li>
|
||||
<li>Run the few corrections from earlier this month for sponsor on CGSpace:</li>
|
||||
</ul>
|
||||
<pre><code>cgspace=# update metadatavalue set text_value='United States Agency for International Development' where resource_type_id=2 and metadata_field_id=29 and text_value like '%U.S. Agency for International Development%';
|
||||
<pre tabindex="0"><code>cgspace=# update metadatavalue set text_value='United States Agency for International Development' where resource_type_id=2 and metadata_field_id=29 and text_value like '%U.S. Agency for International Development%';
|
||||
UPDATE 3
|
||||
</code></pre><ul>
|
||||
<li>I finally got a CGIAR account so I logged into CGSpace with it and tried to delete my old unfinished submissions (22 of them)</li>
|
||||
<li>Eventually it succeeded, but it took about five minutes and I noticed LOTS of locks happening with this query:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# \copy (SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid) to /tmp/locks-aorth.txt;
|
||||
<pre tabindex="0"><code>dspace=# \copy (SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid) to /tmp/locks-aorth.txt;
|
||||
</code></pre><ul>
|
||||
<li>I took a few snapshots during the process and noticed 500, 800, and even 2000 locks at certain times during the process</li>
|
||||
<li>Afterwards I looked a few times and saw only 150 or 200 locks</li>
|
||||
|
Reference in New Issue
Block a user