mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2021-09-13
This commit is contained in:
@ -150,7 +150,7 @@ dspace.log.2018-01-02:34
|
||||
|
||||
Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’s Encrypt if it’s just a handful of domains
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.87.0" />
|
||||
<meta name="generator" content="Hugo 0.88.1" />
|
||||
|
||||
|
||||
|
||||
@ -244,19 +244,19 @@ Danny wrote to ask for help renewing the wildcard ilri.org certificate and I adv
|
||||
<li>In dspace.log around that time I see many errors like “Client closed the connection before file download was complete”</li>
|
||||
<li>And just before that I see this:</li>
|
||||
</ul>
|
||||
<pre><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
|
||||
<pre tabindex="0"><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
|
||||
</code></pre><ul>
|
||||
<li>Ah hah! So the pool was actually empty!</li>
|
||||
<li>I need to increase that, let’s try to bump it up from 50 to 75</li>
|
||||
<li>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don’t know what the hell Uptime Robot saw</li>
|
||||
<li>I notice this error quite a few times in dspace.log:</li>
|
||||
</ul>
|
||||
<pre><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
|
||||
<pre tabindex="0"><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
|
||||
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered " "]" "] "" at line 1, column 32.
|
||||
</code></pre><ul>
|
||||
<li>And there are many of these errors every day for the past month:</li>
|
||||
</ul>
|
||||
<pre><code>$ grep -c "Error while searching for sidebar facets" dspace.log.*
|
||||
<pre tabindex="0"><code>$ grep -c "Error while searching for sidebar facets" dspace.log.*
|
||||
dspace.log.2017-11-21:4
|
||||
dspace.log.2017-11-22:1
|
||||
dspace.log.2017-11-23:4
|
||||
@ -308,7 +308,7 @@ dspace.log.2018-01-02:34
|
||||
<li>I woke up to more up and down of CGSpace, this time UptimeRobot noticed a few rounds of up and down of a few minutes each and Linode also notified of high CPU load from 12 to 2 PM</li>
|
||||
<li>Looks like I need to increase the database pool size again:</li>
|
||||
</ul>
|
||||
<pre><code>$ grep -c "Timeout: Pool empty." dspace.log.2018-01-*
|
||||
<pre tabindex="0"><code>$ grep -c "Timeout: Pool empty." dspace.log.2018-01-*
|
||||
dspace.log.2018-01-01:0
|
||||
dspace.log.2018-01-02:1972
|
||||
dspace.log.2018-01-03:1909
|
||||
@ -319,7 +319,7 @@ dspace.log.2018-01-03:1909
|
||||
<ul>
|
||||
<li>The active IPs in XMLUI are:</li>
|
||||
</ul>
|
||||
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E "3/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
||||
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E "3/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
||||
607 40.77.167.141
|
||||
611 2a00:23c3:8c94:7800:392c:a491:e796:9c50
|
||||
663 188.226.169.37
|
||||
@ -336,12 +336,12 @@ dspace.log.2018-01-03:1909
|
||||
<li>This appears to be the <a href="https://github.com/internetarchive/heritrix3">Internet Archive’s open source bot</a></li>
|
||||
<li>They seem to be re-using their Tomcat session so I don’t need to do anything to them just yet:</li>
|
||||
</ul>
|
||||
<pre><code>$ grep 134.155.96.78 dspace.log.2018-01-03 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
|
||||
<pre tabindex="0"><code>$ grep 134.155.96.78 dspace.log.2018-01-03 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
|
||||
2
|
||||
</code></pre><ul>
|
||||
<li>The API logs show the normal users:</li>
|
||||
</ul>
|
||||
<pre><code># cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "3/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
||||
<pre tabindex="0"><code># cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "3/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
||||
32 207.46.13.182
|
||||
38 40.77.167.132
|
||||
38 68.180.229.254
|
||||
@ -356,12 +356,12 @@ dspace.log.2018-01-03:1909
|
||||
<li>In other related news I see a sizeable amount of requests coming from python-requests</li>
|
||||
<li>For example, just in the last day there were 1700!</li>
|
||||
</ul>
|
||||
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -c python-requests
|
||||
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -c python-requests
|
||||
1773
|
||||
</code></pre><ul>
|
||||
<li>But they come from hundreds of IPs, many of which are 54.x.x.x:</li>
|
||||
</ul>
|
||||
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep python-requests | awk '{print $1}' | sort -n | uniq -c | sort -h | tail -n 30
|
||||
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep python-requests | awk '{print $1}' | sort -n | uniq -c | sort -h | tail -n 30
|
||||
9 54.144.87.92
|
||||
9 54.146.222.143
|
||||
9 54.146.249.249
|
||||
@ -402,7 +402,7 @@ dspace.log.2018-01-03:1909
|
||||
<li>CGSpace went down and up a bunch of times last night and ILRI staff were complaining a lot last night</li>
|
||||
<li>The XMLUI logs show this activity:</li>
|
||||
</ul>
|
||||
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E "4/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
||||
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E "4/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
||||
968 197.211.63.81
|
||||
981 213.55.99.121
|
||||
1039 66.249.64.93
|
||||
@ -416,12 +416,12 @@ dspace.log.2018-01-03:1909
|
||||
</code></pre><ul>
|
||||
<li>Again we ran out of PostgreSQL database connections, even after bumping the pool max active limit from 50 to 75 to 125 yesterday!</li>
|
||||
</ul>
|
||||
<pre><code>2018-01-04 07:36:08,089 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
|
||||
<pre tabindex="0"><code>2018-01-04 07:36:08,089 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
|
||||
org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-256] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:125; busy:125; idle:0; lastwait:5000].
|
||||
</code></pre><ul>
|
||||
<li>So for this week that is the number one problem!</li>
|
||||
</ul>
|
||||
<pre><code>$ grep -c "Timeout: Pool empty." dspace.log.2018-01-*
|
||||
<pre tabindex="0"><code>$ grep -c "Timeout: Pool empty." dspace.log.2018-01-*
|
||||
dspace.log.2018-01-01:0
|
||||
dspace.log.2018-01-02:1972
|
||||
dspace.log.2018-01-03:1909
|
||||
@ -436,7 +436,7 @@ dspace.log.2018-01-04:1559
|
||||
<li>Peter said that CGSpace was down last night and Tsega restarted Tomcat</li>
|
||||
<li>I don’t see any alerts from Linode or UptimeRobot, and there are no PostgreSQL connection errors in the dspace logs for today:</li>
|
||||
</ul>
|
||||
<pre><code>$ grep -c "Timeout: Pool empty." dspace.log.2018-01-*
|
||||
<pre tabindex="0"><code>$ grep -c "Timeout: Pool empty." dspace.log.2018-01-*
|
||||
dspace.log.2018-01-01:0
|
||||
dspace.log.2018-01-02:1972
|
||||
dspace.log.2018-01-03:1909
|
||||
@ -446,13 +446,13 @@ dspace.log.2018-01-05:0
|
||||
<li>Daniel asked for help with their DAGRIS server (linode2328112) that has no disk space</li>
|
||||
<li>I had a look and there is one Apache 2 log file that is 73GB, with lots of this:</li>
|
||||
</ul>
|
||||
<pre><code>[Fri Jan 05 09:31:22.965398 2018] [:error] [pid 9340] [client 213.55.99.121:64476] WARNING: Unable to find a match for "9-16-1-RV.doc" in "/home/files/journals/6//articles/9/". Skipping this file., referer: http://dagris.info/reviewtool/index.php/index/install/upgrade
|
||||
<pre tabindex="0"><code>[Fri Jan 05 09:31:22.965398 2018] [:error] [pid 9340] [client 213.55.99.121:64476] WARNING: Unable to find a match for "9-16-1-RV.doc" in "/home/files/journals/6//articles/9/". Skipping this file., referer: http://dagris.info/reviewtool/index.php/index/install/upgrade
|
||||
</code></pre><ul>
|
||||
<li>I will delete the log file for now and tell Danny</li>
|
||||
<li>Also, I’m still seeing a hundred or so of the “ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer” errors in dspace logs, I need to search the dspace-tech mailing list to see what the cause is</li>
|
||||
<li>I will run a full Discovery reindex in the mean time to see if it’s something wrong with the Discovery Solr core</li>
|
||||
</ul>
|
||||
<pre><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx512m -XX:+TieredCompilation -XX:TieredStopAtLevel=1"
|
||||
<pre tabindex="0"><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx512m -XX:+TieredCompilation -XX:TieredStopAtLevel=1"
|
||||
$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
|
||||
|
||||
real 110m43.985s
|
||||
@ -465,7 +465,7 @@ sys 3m14.890s
|
||||
<ul>
|
||||
<li>I’m still seeing Solr errors in the DSpace logs even after the full reindex yesterday:</li>
|
||||
</ul>
|
||||
<pre><code>org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1983+TO+1989]': Encountered " "]" "] "" at line 1, column 32.
|
||||
<pre tabindex="0"><code>org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1983+TO+1989]': Encountered " "]" "] "" at line 1, column 32.
|
||||
</code></pre><ul>
|
||||
<li>I posted a message to the dspace-tech mailing list to see if anyone can help</li>
|
||||
</ul>
|
||||
@ -474,13 +474,13 @@ sys 3m14.890s
|
||||
<li>Advise Sisay about blank lines in some IITA records</li>
|
||||
<li>Generate a list of author affiliations for Peter to clean up:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'affiliation') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/affiliations.csv with csv;
|
||||
<pre tabindex="0"><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'affiliation') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/affiliations.csv with csv;
|
||||
COPY 4515
|
||||
</code></pre><h2 id="2018-01-10">2018-01-10</h2>
|
||||
<ul>
|
||||
<li>I looked to see what happened to this year’s Solr statistics sharding task that should have run on 2018-01-01 and of course it failed:</li>
|
||||
</ul>
|
||||
<pre><code>Moving: 81742 into core statistics-2010
|
||||
<pre tabindex="0"><code>Moving: 81742 into core statistics-2010
|
||||
Exception: IOException occured when talking to server at: http://localhost:8081/solr//statistics-2010
|
||||
org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://localhost:8081/solr//statistics-2010
|
||||
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566)
|
||||
@ -526,7 +526,7 @@ Caused by: java.net.SocketException: Connection reset
|
||||
</code></pre><ul>
|
||||
<li>DSpace Test has the same error but with creating the 2017 core:</li>
|
||||
</ul>
|
||||
<pre><code>Moving: 2243021 into core statistics-2017
|
||||
<pre tabindex="0"><code>Moving: 2243021 into core statistics-2017
|
||||
Exception: IOException occured when talking to server at: http://localhost:8081/solr//statistics-2017
|
||||
org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://localhost:8081/solr//statistics-2017
|
||||
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566)
|
||||
@ -553,7 +553,7 @@ Caused by: org.apache.http.client.ClientProtocolException
|
||||
<li>I can apparently search for records in the Solr stats core that have an empty <code>owningColl</code> field using this in the Solr admin query: <code>-owningColl:*</code></li>
|
||||
<li>On CGSpace I see 48,000,000 records that have an <code>owningColl</code> field and 34,000,000 that don’t:</li>
|
||||
</ul>
|
||||
<pre><code>$ http 'http://localhost:3000/solr/statistics/select?q=owningColl%3A*&wt=json&indent=true' | grep numFound
|
||||
<pre tabindex="0"><code>$ http 'http://localhost:3000/solr/statistics/select?q=owningColl%3A*&wt=json&indent=true' | grep numFound
|
||||
"response":{"numFound":48476327,"start":0,"docs":[
|
||||
$ http 'http://localhost:3000/solr/statistics/select?q=-owningColl%3A*&wt=json&indent=true' | grep numFound
|
||||
"response":{"numFound":34879872,"start":0,"docs":[
|
||||
@ -561,19 +561,19 @@ $ http 'http://localhost:3000/solr/statistics/select?q=-owningColl%3A*&wt=js
|
||||
<li>I tested the <code>dspace stats-util -s</code> process on my local machine and it failed the same way</li>
|
||||
<li>It doesn’t seem to be helpful, but the dspace log shows this:</li>
|
||||
</ul>
|
||||
<pre><code>2018-01-10 10:51:19,301 INFO org.dspace.statistics.SolrLogger @ Created core with name: statistics-2016
|
||||
<pre tabindex="0"><code>2018-01-10 10:51:19,301 INFO org.dspace.statistics.SolrLogger @ Created core with name: statistics-2016
|
||||
2018-01-10 10:51:19,301 INFO org.dspace.statistics.SolrLogger @ Moving: 3821 records into core statistics-2016
|
||||
</code></pre><ul>
|
||||
<li>Terry Brady has written some notes on the DSpace Wiki about Solr sharing issues: <a href="https://wiki.lyrasis.org/display/%7Eterrywbrady/Statistics+Import+Export+Issues">https://wiki.lyrasis.org/display/%7Eterrywbrady/Statistics+Import+Export+Issues</a></li>
|
||||
<li>Uptime Robot said that CGSpace went down at around 9:43 AM</li>
|
||||
<li>I looked at PostgreSQL’s <code>pg_stat_activity</code> table and saw 161 active connections, but no pool errors in the DSpace logs:</li>
|
||||
</ul>
|
||||
<pre><code>$ grep -c "Timeout: Pool empty." dspace.log.2018-01-10
|
||||
<pre tabindex="0"><code>$ grep -c "Timeout: Pool empty." dspace.log.2018-01-10
|
||||
0
|
||||
</code></pre><ul>
|
||||
<li>The XMLUI logs show quite a bit of activity today:</li>
|
||||
</ul>
|
||||
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep "10/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
||||
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep "10/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
||||
951 207.46.13.159
|
||||
954 157.55.39.123
|
||||
1217 95.108.181.88
|
||||
@ -587,17 +587,17 @@ $ http 'http://localhost:3000/solr/statistics/select?q=-owningColl%3A*&wt=js
|
||||
</code></pre><ul>
|
||||
<li>The user agent for the top six or so IPs are all the same:</li>
|
||||
</ul>
|
||||
<pre><code>"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36"
|
||||
<pre tabindex="0"><code>"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36"
|
||||
</code></pre><ul>
|
||||
<li><code>whois</code> says they come from <a href="http://www.perfectip.net/">Perfect IP</a></li>
|
||||
<li>I’ve never seen those top IPs before, but they have created 50,000 Tomcat sessions today:</li>
|
||||
</ul>
|
||||
<pre><code>$ grep -E '(2607:fa98:40:9:26b6:fdff:feff:1888|2607:fa98:40:9:26b6:fdff:feff:195d|2607:fa98:40:9:26b6:fdff:feff:1c96|70.36.107.49|70.36.107.190|70.36.107.50)' /home/cgspace.cgiar.org/log/dspace.log.2018-01-10 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
|
||||
<pre tabindex="0"><code>$ grep -E '(2607:fa98:40:9:26b6:fdff:feff:1888|2607:fa98:40:9:26b6:fdff:feff:195d|2607:fa98:40:9:26b6:fdff:feff:1c96|70.36.107.49|70.36.107.190|70.36.107.50)' /home/cgspace.cgiar.org/log/dspace.log.2018-01-10 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
|
||||
49096
|
||||
</code></pre><ul>
|
||||
<li>Rather than blocking their IPs, I think I might just add their user agent to the “badbots” zone with Baidu, because they seem to be the only ones using that user agent:</li>
|
||||
</ul>
|
||||
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari
|
||||
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari
|
||||
/537.36" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
||||
6796 70.36.107.50
|
||||
11870 70.36.107.190
|
||||
@ -608,13 +608,13 @@ $ http 'http://localhost:3000/solr/statistics/select?q=-owningColl%3A*&wt=js
|
||||
</code></pre><ul>
|
||||
<li>I added the user agent to nginx’s badbots limit req zone but upon testing the config I got an error:</li>
|
||||
</ul>
|
||||
<pre><code># nginx -t
|
||||
<pre tabindex="0"><code># nginx -t
|
||||
nginx: [emerg] could not build map_hash, you should increase map_hash_bucket_size: 64
|
||||
nginx: configuration file /etc/nginx/nginx.conf test failed
|
||||
</code></pre><ul>
|
||||
<li>According to nginx docs the <a href="https://nginx.org/en/docs/hash.html">bucket size should be a multiple of the CPU’s cache alignment</a>, which is 64 for us:</li>
|
||||
</ul>
|
||||
<pre><code># cat /proc/cpuinfo | grep cache_alignment | head -n1
|
||||
<pre tabindex="0"><code># cat /proc/cpuinfo | grep cache_alignment | head -n1
|
||||
cache_alignment : 64
|
||||
</code></pre><ul>
|
||||
<li>On our servers that is 64, so I increased this parameter to 128 and deployed the changes to nginx</li>
|
||||
@ -637,7 +637,7 @@ cache_alignment : 64
|
||||
<li>Linode rebooted DSpace Test and CGSpace for their host hypervisor kernel updates</li>
|
||||
<li>Following up with the Solr sharding issue on the dspace-tech mailing list, I noticed this interesting snippet in the Tomcat <code>localhost_access_log</code> at the time of my sharding attempt on my test machine:</li>
|
||||
</ul>
|
||||
<pre><code>127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "GET /solr/statistics/select?q=type%3A2+AND+id%3A1&wt=javabin&version=2 HTTP/1.1" 200 107
|
||||
<pre tabindex="0"><code>127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "GET /solr/statistics/select?q=type%3A2+AND+id%3A1&wt=javabin&version=2 HTTP/1.1" 200 107
|
||||
127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "GET /solr/statistics/select?q=*%3A*&rows=0&facet=true&facet.range=time&facet.range.start=NOW%2FYEAR-18YEARS&facet.range.end=NOW%2FYEAR%2B0YEARS&facet.range.gap=%2B1YEAR&facet.mincount=1&wt=javabin&version=2 HTTP/1.1" 200 447
|
||||
127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "GET /solr/admin/cores?action=STATUS&core=statistics-2016&indexInfo=true&wt=javabin&version=2 HTTP/1.1" 200 76
|
||||
127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "GET /solr/admin/cores?action=CREATE&name=statistics-2016&instanceDir=statistics&dataDir=%2FUsers%2Faorth%2Fdspace%2Fsolr%2Fstatistics-2016%2Fdata&wt=javabin&version=2 HTTP/1.1" 200 63
|
||||
@ -649,7 +649,7 @@ cache_alignment : 64
|
||||
<li>This is apparently a common Solr error code that means “version conflict”: <a href="http://yonik.com/solr/optimistic-concurrency/">http://yonik.com/solr/optimistic-concurrency/</a></li>
|
||||
<li>Looks like that bot from the PerfectIP.net host ended up making about 450,000 requests to XMLUI alone yesterday:</li>
|
||||
</ul>
|
||||
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36" | grep "10/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
||||
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36" | grep "10/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
||||
21572 70.36.107.50
|
||||
30722 70.36.107.190
|
||||
34566 70.36.107.49
|
||||
@ -659,7 +659,7 @@ cache_alignment : 64
|
||||
</code></pre><ul>
|
||||
<li>Wow, I just figured out how to set the application name of each database pool in the JNDI config of Tomcat’s <code>server.xml</code>:</li>
|
||||
</ul>
|
||||
<pre><code><Resource name="jdbc/dspaceWeb" auth="Container" type="javax.sql.DataSource"
|
||||
<pre tabindex="0"><code><Resource name="jdbc/dspaceWeb" auth="Container" type="javax.sql.DataSource"
|
||||
driverClassName="org.postgresql.Driver"
|
||||
url="jdbc:postgresql://localhost:5432/dspacetest?ApplicationName=dspaceWeb"
|
||||
username="dspace"
|
||||
@ -677,7 +677,7 @@ cache_alignment : 64
|
||||
<li>Also, I realized that the <code>db.jndi</code> parameter in dspace.cfg needs to match the <code>name</code> value in your applicaiton’s context—not the <code>global</code> one</li>
|
||||
<li>Ah hah! Also, I can name the default DSpace connection pool in dspace.cfg as well, like:</li>
|
||||
</ul>
|
||||
<pre><code>db.url = jdbc:postgresql://localhost:5432/dspacetest?ApplicationName=dspaceDefault
|
||||
<pre tabindex="0"><code>db.url = jdbc:postgresql://localhost:5432/dspacetest?ApplicationName=dspaceDefault
|
||||
</code></pre><ul>
|
||||
<li>With that it is super easy to see where PostgreSQL connections are coming from in <code>pg_stat_activity</code></li>
|
||||
</ul>
|
||||
@ -685,7 +685,7 @@ cache_alignment : 64
|
||||
<ul>
|
||||
<li>I’m looking at the <a href="https://wiki.lyrasis.org/display/DSDOC6x/Installing+DSpace#InstallingDSpace-ServletEngine(ApacheTomcat7orlater,Jetty,CauchoResinorequivalent)">DSpace 6.0 Install docs</a> and notice they tweak the number of threads in their Tomcat connector:</li>
|
||||
</ul>
|
||||
<pre><code><!-- Define a non-SSL HTTP/1.1 Connector on port 8080 -->
|
||||
<pre tabindex="0"><code><!-- Define a non-SSL HTTP/1.1 Connector on port 8080 -->
|
||||
<Connector port="8080"
|
||||
maxThreads="150"
|
||||
minSpareThreads="25"
|
||||
@ -702,7 +702,7 @@ cache_alignment : 64
|
||||
<li>Looks like in Tomcat 8.5 the default URIEncoding for Connectors is UTF-8, so we don’t need to specify that manually anymore: <a href="https://tomcat.apache.org/tomcat-8.5-doc/config/http.html">https://tomcat.apache.org/tomcat-8.5-doc/config/http.html</a></li>
|
||||
<li>Ooh, I just saw the <code>acceptorThreadCount</code> setting (in Tomcat 7 and 8.5):</li>
|
||||
</ul>
|
||||
<pre><code>The number of threads to be used to accept connections. Increase this value on a multi CPU machine, although you would never really need more than 2. Also, with a lot of non keep alive connections, you might want to increase this value as well. Default value is 1.
|
||||
<pre tabindex="0"><code>The number of threads to be used to accept connections. Increase this value on a multi CPU machine, although you would never really need more than 2. Also, with a lot of non keep alive connections, you might want to increase this value as well. Default value is 1.
|
||||
</code></pre><ul>
|
||||
<li>That could be very interesting</li>
|
||||
</ul>
|
||||
@ -711,7 +711,7 @@ cache_alignment : 64
|
||||
<li>Still testing DSpace 6.2 on Tomcat 8.5.24</li>
|
||||
<li>Catalina errors at Tomcat 8.5 startup:</li>
|
||||
</ul>
|
||||
<pre><code>13-Jan-2018 13:59:05.245 WARNING [main] org.apache.tomcat.dbcp.dbcp2.BasicDataSourceFactory.getObjectInstance Name = dspace6 Property maxActive is not used in DBCP2, use maxTotal instead. maxTotal default value is 8. You have set value of "35" for "maxActive" property, which is being ignored.
|
||||
<pre tabindex="0"><code>13-Jan-2018 13:59:05.245 WARNING [main] org.apache.tomcat.dbcp.dbcp2.BasicDataSourceFactory.getObjectInstance Name = dspace6 Property maxActive is not used in DBCP2, use maxTotal instead. maxTotal default value is 8. You have set value of "35" for "maxActive" property, which is being ignored.
|
||||
13-Jan-2018 13:59:05.245 WARNING [main] org.apache.tomcat.dbcp.dbcp2.BasicDataSourceFactory.getObjectInstance Name = dspace6 Property maxWait is not used in DBCP2 , use maxWaitMillis instead. maxWaitMillis default value is -1. You have set value of "5000" for "maxWait" property, which is being ignored.
|
||||
</code></pre><ul>
|
||||
<li>I looked in my Tomcat 7.0.82 logs and I don’t see anything about DBCP2 errors, so I guess this a Tomcat 8.0.x or 8.5.x thing</li>
|
||||
@ -719,7 +719,7 @@ cache_alignment : 64
|
||||
<li>I have updated our <a href="https://github.com/ilri/rmg-ansible-public/commit/246f9d7b06d53794f189f0cc57ad5ddd80f0b014">Ansible infrastructure scripts</a> so that it will be ready whenever we switch to Tomcat 8 (probably with Ubuntu 18.04 later this year)</li>
|
||||
<li>When I enable the ResourceLink in the ROOT.xml context I get the following error in the Tomcat localhost log:</li>
|
||||
</ul>
|
||||
<pre><code>13-Jan-2018 14:14:36.017 SEVERE [localhost-startStop-1] org.apache.catalina.core.StandardContext.listenerStart Exception sending context initialized event to listener instance of class [org.dspace.app.util.DSpaceWebappListener]
|
||||
<pre tabindex="0"><code>13-Jan-2018 14:14:36.017 SEVERE [localhost-startStop-1] org.apache.catalina.core.StandardContext.listenerStart Exception sending context initialized event to listener instance of class [org.dspace.app.util.DSpaceWebappListener]
|
||||
java.lang.ExceptionInInitializerError
|
||||
at org.dspace.app.util.AbstractDSpaceWebapp.register(AbstractDSpaceWebapp.java:74)
|
||||
at org.dspace.app.util.DSpaceWebappListener.contextInitialized(DSpaceWebappListener.java:31)
|
||||
@ -761,7 +761,7 @@ Caused by: java.lang.NullPointerException
|
||||
<li>Help Udana from IWMI export a CSV from DSpace Test so he can start trying a batch upload</li>
|
||||
<li>I’m going to apply these ~130 corrections on CGSpace:</li>
|
||||
</ul>
|
||||
<pre><code>update metadatavalue set text_value='Formally Published' where resource_type_id=2 and metadata_field_id=214 and text_value like 'Formally published';
|
||||
<pre tabindex="0"><code>update metadatavalue set text_value='Formally Published' where resource_type_id=2 and metadata_field_id=214 and text_value like 'Formally published';
|
||||
delete from metadatavalue where resource_type_id=2 and metadata_field_id=214 and text_value like 'NO';
|
||||
update metadatavalue set text_value='en' where resource_type_id=2 and metadata_field_id=38 and text_value ~ '(En|English)';
|
||||
update metadatavalue set text_value='fr' where resource_type_id=2 and metadata_field_id=38 and text_value ~ '(fre|frn|French)';
|
||||
@ -777,11 +777,11 @@ delete from metadatavalue where resource_type_id=2 and metadata_field_id=38 and
|
||||
<ul>
|
||||
<li>Apply corrections using <a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897">fix-metadata-values.py</a>:</li>
|
||||
</ul>
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/2018-01-14-Authors-1300-Corrections.csv -f dc.contributor.author -t correct -m 3 -d dspace-u dspace -p 'fuuu'
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2018-01-14-Authors-1300-Corrections.csv -f dc.contributor.author -t correct -m 3 -d dspace-u dspace -p 'fuuu'
|
||||
</code></pre><ul>
|
||||
<li>In looking at some of the values to delete or check I found some metadata values that I could not resolve their handle via SQL:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# select * from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='Tarawali';
|
||||
<pre tabindex="0"><code>dspace=# select * from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='Tarawali';
|
||||
metadata_value_id | resource_id | metadata_field_id | text_value | text_lang | place | authority | confidence | resource_type_id
|
||||
-------------------+-------------+-------------------+------------+-----------+-------+-----------+------------+------------------
|
||||
2757936 | 4369 | 3 | Tarawali | | 9 | | 600 | 2
|
||||
@ -796,7 +796,7 @@ dspace=# select handle from item, handle where handle.resource_id = item.item_id
|
||||
<li>Otherwise, the <a href="https://wiki.lyrasis.org/display/DSPACE/Helper+SQL+functions+for+DSpace+5">DSpace 5 SQL Helper Functions</a> provide <code>ds5_item2itemhandle()</code>, which is much easier than my long query above that I always have to go search for</li>
|
||||
<li>For example, to find the Handle for an item that has the author “Erni”:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# select * from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='Erni';
|
||||
<pre tabindex="0"><code>dspace=# select * from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='Erni';
|
||||
metadata_value_id | resource_id | metadata_field_id | text_value | text_lang | place | authority | confidence | resource_type_id
|
||||
-------------------+-------------+-------------------+------------+-----------+-------+--------------------------------------+------------+------------------
|
||||
2612150 | 70308 | 3 | Erni | | 9 | 3fe10c68-6773-49a7-89cc-63eb508723f2 | -1 | 2
|
||||
@ -809,16 +809,16 @@ dspace=# select ds5_item2itemhandle(70308);
|
||||
</code></pre><ul>
|
||||
<li>Next I apply the author deletions:</li>
|
||||
</ul>
|
||||
<pre><code>$ ./delete-metadata-values.py -i /tmp/2018-01-14-Authors-5-Deletions.csv -f dc.contributor.author -m 3 -d dspace -u dspace -p 'fuuu'
|
||||
<pre tabindex="0"><code>$ ./delete-metadata-values.py -i /tmp/2018-01-14-Authors-5-Deletions.csv -f dc.contributor.author -m 3 -d dspace -u dspace -p 'fuuu'
|
||||
</code></pre><ul>
|
||||
<li>Now working on the affiliation corrections from Peter:</li>
|
||||
</ul>
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/2018-01-15-Affiliations-888-Corrections.csv -f cg.contributor.affiliation -t correct -m 211 -d dspace -u dspace -p 'fuuu'
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2018-01-15-Affiliations-888-Corrections.csv -f cg.contributor.affiliation -t correct -m 211 -d dspace -u dspace -p 'fuuu'
|
||||
$ ./delete-metadata-values.py -i /tmp/2018-01-15-Affiliations-11-Deletions.csv -f cg.contributor.affiliation -m 211 -d dspace -u dspace -p 'fuuu'
|
||||
</code></pre><ul>
|
||||
<li>Now I made a new list of affiliations for Peter to look through:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where metadata_schema_id = 2 and element = 'contributor' and qualifier = 'affiliation') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/affiliations.csv with csv;
|
||||
<pre tabindex="0"><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where metadata_schema_id = 2 and element = 'contributor' and qualifier = 'affiliation') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/affiliations.csv with csv;
|
||||
COPY 4552
|
||||
</code></pre><ul>
|
||||
<li>Looking over the affiliations again I see dozens of CIAT ones with their affiliation formatted like: International Center for Tropical Agriculture (CIAT)</li>
|
||||
@ -828,11 +828,11 @@ COPY 4552
|
||||
<li>Help Sisay with some thumbnails for book chapters in Open Refine and SAFBuilder</li>
|
||||
<li>CGSpace users were having problems logging in, I think something’s wrong with LDAP because I see this in the logs:</li>
|
||||
</ul>
|
||||
<pre><code>2018-01-15 12:53:15,810 WARN org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=2386749547D03E0AA4EC7E44181A7552:ip_addr=x.x.x.x:ldap_authentication:type=failed_auth javax.naming.AuthenticationException\colon; [LDAP\colon; error code 49 - 80090308\colon; LdapErr\colon; DSID-0C090400, comment\colon; AcceptSecurityContext error, data 775, v1db1^@]
|
||||
<pre tabindex="0"><code>2018-01-15 12:53:15,810 WARN org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=2386749547D03E0AA4EC7E44181A7552:ip_addr=x.x.x.x:ldap_authentication:type=failed_auth javax.naming.AuthenticationException\colon; [LDAP\colon; error code 49 - 80090308\colon; LdapErr\colon; DSID-0C090400, comment\colon; AcceptSecurityContext error, data 775, v1db1^@]
|
||||
</code></pre><ul>
|
||||
<li>Looks like we processed 2.9 million requests on CGSpace in 2017-12:</li>
|
||||
</ul>
|
||||
<pre><code># time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Dec/2017"
|
||||
<pre tabindex="0"><code># time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Dec/2017"
|
||||
2890041
|
||||
|
||||
real 0m25.756s
|
||||
@ -864,14 +864,14 @@ sys 0m2.210s
|
||||
<li>Also, there are whitespace issues in many columns, and the items are mapped to the LIVES and ILRI articles collections, not Theses</li>
|
||||
<li>In any case, importing them like this:</li>
|
||||
</ul>
|
||||
<pre><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx512m -XX:+TieredCompilation -XX:TieredStopAtLevel=1"
|
||||
<pre tabindex="0"><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx512m -XX:+TieredCompilation -XX:TieredStopAtLevel=1"
|
||||
$ dspace import -a -e aorth@mjanja.ch -s /tmp/2018-01-16\ LIVES/SimpleArchiveFormat -m lives.map &> lives.log
|
||||
</code></pre><ul>
|
||||
<li>And fantastic, before I started the import there were 10 PostgreSQL connections, and then CGSpace crashed during the upload</li>
|
||||
<li>When I looked there were 210 PostgreSQL connections!</li>
|
||||
<li>I don’t see any high load in XMLUI or REST/OAI:</li>
|
||||
</ul>
|
||||
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E "17/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
||||
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E "17/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
||||
381 40.77.167.124
|
||||
403 213.55.99.121
|
||||
431 207.46.13.60
|
||||
@ -896,13 +896,13 @@ $ dspace import -a -e aorth@mjanja.ch -s /tmp/2018-01-16\ LIVES/SimpleArchiveFor
|
||||
</code></pre><ul>
|
||||
<li>But I do see this strange message in the dspace log:</li>
|
||||
</ul>
|
||||
<pre><code>2018-01-17 07:59:25,856 INFO org.apache.http.impl.client.SystemDefaultHttpClient @ I/O exception (org.apache.http.NoHttpResponseException) caught when processing request to {}->http://localhost:8081: The target server failed to respond
|
||||
<pre tabindex="0"><code>2018-01-17 07:59:25,856 INFO org.apache.http.impl.client.SystemDefaultHttpClient @ I/O exception (org.apache.http.NoHttpResponseException) caught when processing request to {}->http://localhost:8081: The target server failed to respond
|
||||
2018-01-17 07:59:25,856 INFO org.apache.http.impl.client.SystemDefaultHttpClient @ Retrying request to {}->http://localhost:8081
|
||||
</code></pre><ul>
|
||||
<li>I have NEVER seen this error before, and there is no error before or after that in DSpace’s solr.log</li>
|
||||
<li>Tomcat’s catalina.out does show something interesting, though, right at that time:</li>
|
||||
</ul>
|
||||
<pre><code>[====================> ]40% time remaining: 7 hour(s) 14 minute(s) 45 seconds. timestamp: 2018-01-17 07:57:02
|
||||
<pre tabindex="0"><code>[====================> ]40% time remaining: 7 hour(s) 14 minute(s) 45 seconds. timestamp: 2018-01-17 07:57:02
|
||||
[====================> ]40% time remaining: 7 hour(s) 14 minute(s) 45 seconds. timestamp: 2018-01-17 07:57:11
|
||||
[====================> ]40% time remaining: 7 hour(s) 14 minute(s) 44 seconds. timestamp: 2018-01-17 07:57:37
|
||||
[====================> ]40% time remaining: 7 hour(s) 16 minute(s) 5 seconds. timestamp: 2018-01-17 07:57:49
|
||||
@ -943,7 +943,7 @@ Exception in thread "http-bio-127.0.0.1-8081-exec-627" java.lang.OutOf
|
||||
<li>You can see the timestamp above, which is some Atmire nightly task I think, but I can’t figure out which one</li>
|
||||
<li>So I restarted Tomcat and tried the import again, which finished very quickly and without errors!</li>
|
||||
</ul>
|
||||
<pre><code>$ dspace import -a -e aorth@mjanja.ch -s /tmp/2018-01-16\ LIVES/SimpleArchiveFormat -m lives2.map &> lives2.log
|
||||
<pre tabindex="0"><code>$ dspace import -a -e aorth@mjanja.ch -s /tmp/2018-01-16\ LIVES/SimpleArchiveFormat -m lives2.map &> lives2.log
|
||||
</code></pre><ul>
|
||||
<li>Looking at the JVM graphs from Munin it does look like the heap ran out of memory (see the blue dip just before the green spike when I restarted Tomcat):</li>
|
||||
</ul>
|
||||
@ -951,7 +951,7 @@ Exception in thread "http-bio-127.0.0.1-8081-exec-627" java.lang.OutOf
|
||||
<ul>
|
||||
<li>I’m playing with maven repository caching using Artifactory in a Docker instance: <a href="https://www.jfrog.com/confluence/display/RTF/Installing+with+Docker">https://www.jfrog.com/confluence/display/RTF/Installing+with+Docker</a></li>
|
||||
</ul>
|
||||
<pre><code>$ docker pull docker.bintray.io/jfrog/artifactory-oss:latest
|
||||
<pre tabindex="0"><code>$ docker pull docker.bintray.io/jfrog/artifactory-oss:latest
|
||||
$ docker volume create --name artifactory5_data
|
||||
$ docker network create dspace-build
|
||||
$ docker run --network dspace-build --name artifactory -d -v artifactory5_data:/var/opt/jfrog/artifactory -p 8081:8081 docker.bintray.io/jfrog/artifactory-oss:latest
|
||||
@ -961,11 +961,11 @@ $ docker run --network dspace-build --name artifactory -d -v artifactory5_data:/
|
||||
<li>Wow, I even managed to add the Atmire repository as a remote and map it into the <code>libs-release</code> virtual repository, then tell maven to use it for <code>atmire.com-releases</code> in settings.xml!</li>
|
||||
<li>Hmm, some maven dependencies for the SWORDv2 web application in DSpace 5.5 are broken:</li>
|
||||
</ul>
|
||||
<pre><code>[ERROR] Failed to execute goal on project dspace-swordv2: Could not resolve dependencies for project org.dspace:dspace-swordv2:war:5.5: Failed to collect dependencies at org.swordapp:sword2-server:jar:classes:1.0 -> org.apache.abdera:abdera-client:jar:1.1.1 -> org.apache.abdera:abdera-core:jar:1.1.1 -> org.apache.abdera:abdera-i18n:jar:1.1.1 -> org.apache.geronimo.specs:geronimo-activation_1.0.2_spec:jar:1.1: Failed to read artifact descriptor for org.apache.geronimo.specs:geronimo-activation_1.0.2_spec:jar:1.1: Could not find artifact org.apache.geronimo.specs:specs:pom:1.1 in central (http://localhost:8081/artifactory/libs-release) -> [Help 1]
|
||||
<pre tabindex="0"><code>[ERROR] Failed to execute goal on project dspace-swordv2: Could not resolve dependencies for project org.dspace:dspace-swordv2:war:5.5: Failed to collect dependencies at org.swordapp:sword2-server:jar:classes:1.0 -> org.apache.abdera:abdera-client:jar:1.1.1 -> org.apache.abdera:abdera-core:jar:1.1.1 -> org.apache.abdera:abdera-i18n:jar:1.1.1 -> org.apache.geronimo.specs:geronimo-activation_1.0.2_spec:jar:1.1: Failed to read artifact descriptor for org.apache.geronimo.specs:geronimo-activation_1.0.2_spec:jar:1.1: Could not find artifact org.apache.geronimo.specs:specs:pom:1.1 in central (http://localhost:8081/artifactory/libs-release) -> [Help 1]
|
||||
</code></pre><ul>
|
||||
<li>I never noticed because I build with that web application disabled:</li>
|
||||
</ul>
|
||||
<pre><code>$ mvn -U -Dmirage2.on=true -Dmirage2.deps.included=false -Denv=localhost -P \!dspace-sword,\!dspace-swordv2 clean package
|
||||
<pre tabindex="0"><code>$ mvn -U -Dmirage2.on=true -Dmirage2.deps.included=false -Denv=localhost -P \!dspace-sword,\!dspace-swordv2 clean package
|
||||
</code></pre><ul>
|
||||
<li>UptimeRobot said CGSpace went down for a few minutes</li>
|
||||
<li>I didn’t do anything but it came back up on its own</li>
|
||||
@ -973,7 +973,7 @@ $ docker run --network dspace-build --name artifactory -d -v artifactory5_data:/
|
||||
<li>Now Linode alert says the CPU load is high, <em>sigh</em></li>
|
||||
<li>Regarding the heap space error earlier today, it looks like it does happen a few times a week or month (I’m not sure how far these logs go back, as they are not strictly daily):</li>
|
||||
</ul>
|
||||
<pre><code># zgrep -c java.lang.OutOfMemoryError /var/log/tomcat7/catalina.out* | grep -v :0
|
||||
<pre tabindex="0"><code># zgrep -c java.lang.OutOfMemoryError /var/log/tomcat7/catalina.out* | grep -v :0
|
||||
/var/log/tomcat7/catalina.out:2
|
||||
/var/log/tomcat7/catalina.out.10.gz:7
|
||||
/var/log/tomcat7/catalina.out.11.gz:1
|
||||
@ -1004,7 +1004,7 @@ $ docker run --network dspace-build --name artifactory -d -v artifactory5_data:/
|
||||
<li>I don’t see any errors in the nginx or catalina logs, so I guess UptimeRobot just got impatient and closed the request, which caused nginx to send an HTTP 499</li>
|
||||
<li>I realize I never did a full re-index after the SQL author and affiliation updates last week, so I should force one now:</li>
|
||||
</ul>
|
||||
<pre><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m -XX:+TieredCompilation -XX:TieredStopAtLevel=1"
|
||||
<pre tabindex="0"><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m -XX:+TieredCompilation -XX:TieredStopAtLevel=1"
|
||||
$ time schedtool -D -e ionice -c2 -n7 nice -n19 /home/cgspace.cgiar.org/bin/dspace index-discovery -b
|
||||
</code></pre><ul>
|
||||
<li>Maria from Bioversity asked if I could remove the abstracts from all of their Limited Access items in the <a href="https://cgspace.cgiar.org/handle/10568/35501">Bioversity Journal Articles</a> collection</li>
|
||||
@ -1012,7 +1012,7 @@ $ time schedtool -D -e ionice -c2 -n7 nice -n19 /home/cgspace.cgiar.org/bin/dspa
|
||||
<li>Use this GREL in OpenRefine after isolating all the Limited Access items: <code>value.startsWith("10568/35501")</code></li>
|
||||
<li>UptimeRobot said CGSpace went down AGAIN and both Sisay and Danny immediately logged in and restarted Tomcat without talking to me <em>or</em> each other!</li>
|
||||
</ul>
|
||||
<pre><code>Jan 18 07:01:22 linode18 sudo[10805]: dhmichael : TTY=pts/5 ; PWD=/home/dhmichael ; USER=root ; COMMAND=/bin/systemctl restart tomcat7
|
||||
<pre tabindex="0"><code>Jan 18 07:01:22 linode18 sudo[10805]: dhmichael : TTY=pts/5 ; PWD=/home/dhmichael ; USER=root ; COMMAND=/bin/systemctl restart tomcat7
|
||||
Jan 18 07:01:22 linode18 sudo[10805]: pam_unix(sudo:session): session opened for user root by dhmichael(uid=0)
|
||||
Jan 18 07:01:22 linode18 systemd[1]: Stopping LSB: Start Tomcat....
|
||||
Jan 18 07:01:22 linode18 sudo[10812]: swebshet : TTY=pts/3 ; PWD=/home/swebshet ; USER=root ; COMMAND=/bin/systemctl restart tomcat7
|
||||
@ -1026,14 +1026,14 @@ Jan 18 07:01:22 linode18 sudo[10812]: pam_unix(sudo:session): session opened for
|
||||
<li>Linode alerted and said that the CPU load was 264.1% on CGSpace</li>
|
||||
<li>Start the Discovery indexing again:</li>
|
||||
</ul>
|
||||
<pre><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m -XX:+TieredCompilation -XX:TieredStopAtLevel=1"
|
||||
<pre tabindex="0"><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m -XX:+TieredCompilation -XX:TieredStopAtLevel=1"
|
||||
$ time schedtool -D -e ionice -c2 -n7 nice -n19 /home/cgspace.cgiar.org/bin/dspace index-discovery -b
|
||||
</code></pre><ul>
|
||||
<li>Linode alerted again and said that CGSpace was using 301% CPU</li>
|
||||
<li>Peter emailed to ask why <a href="https://cgspace.cgiar.org/handle/10568/88090">this item</a> doesn’t have an Altmetric badge on CGSpace but does have one on the <a href="https://www.altmetric.com/details/26709041">Altmetric dashboard</a></li>
|
||||
<li>Looks like our badge code calls the <code>handle</code> endpoint which doesn’t exist:</li>
|
||||
</ul>
|
||||
<pre><code>https://api.altmetric.com/v1/handle/10568/88090
|
||||
<pre tabindex="0"><code>https://api.altmetric.com/v1/handle/10568/88090
|
||||
</code></pre><ul>
|
||||
<li>I told Peter we should keep an eye out and try again next week</li>
|
||||
</ul>
|
||||
@ -1041,7 +1041,7 @@ $ time schedtool -D -e ionice -c2 -n7 nice -n19 /home/cgspace.cgiar.org/bin/dspa
|
||||
<ul>
|
||||
<li>Run the authority indexing script on CGSpace and of course it died:</li>
|
||||
</ul>
|
||||
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 /home/cgspace.cgiar.org/bin/dspace index-authority
|
||||
<pre tabindex="0"><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 /home/cgspace.cgiar.org/bin/dspace index-authority
|
||||
Retrieving all data
|
||||
Initialize org.dspace.authority.indexer.DSpaceAuthorityIndexer
|
||||
Exception: null
|
||||
@ -1071,7 +1071,7 @@ sys 0m12.317s
|
||||
<li>In the end there were 324 items in the collection that were Limited Access, but only 199 had abstracts</li>
|
||||
<li>I want to document the workflow of adding a production PostgreSQL database to a development instance of <a href="https://github.com/alanorth/docker-dspace">DSpace in Docker</a>:</li>
|
||||
</ul>
|
||||
<pre><code>$ docker exec dspace_db dropdb -U postgres dspace
|
||||
<pre tabindex="0"><code>$ docker exec dspace_db dropdb -U postgres dspace
|
||||
$ docker exec dspace_db createdb -U postgres -O dspace --encoding=UNICODE dspace
|
||||
$ docker exec dspace_db psql -U postgres dspace -c 'alter user dspace createuser;'
|
||||
$ docker cp test.dump dspace_db:/tmp/test.dump
|
||||
@ -1099,7 +1099,7 @@ $ docker exec dspace_db psql -U dspace -f /tmp/update-sequences.sql dspace
|
||||
<li>The source code is here: <a href="https://gist.github.com/alanorth/ddd7f555f0e487fe0e9d3eb4ff26ce50">rest-find-collections.py</a></li>
|
||||
<li>Peter had said that found a bunch of ILRI collections that were called “untitled”, but I don’t see any:</li>
|
||||
</ul>
|
||||
<pre><code>$ ./rest-find-collections.py 10568/1 | wc -l
|
||||
<pre tabindex="0"><code>$ ./rest-find-collections.py 10568/1 | wc -l
|
||||
308
|
||||
$ ./rest-find-collections.py 10568/1 | grep -i untitled
|
||||
</code></pre><ul>
|
||||
@ -1119,12 +1119,12 @@ $ ./rest-find-collections.py 10568/1 | grep -i untitled
|
||||
<li>Thinking about generating a jmeter test plan for DSpace, along the lines of <a href="https://github.com/Georgetown-University-Libraries/dspace-performance-test">Georgetown’s dspace-performance-test</a></li>
|
||||
<li>I got a list of all the GET requests on CGSpace for January 21st (the last time Linode complained the load was high), excluding admin calls:</li>
|
||||
</ul>
|
||||
<pre><code># zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz /var/log/nginx/library-access.log.2.gz /var/log/nginx/library-access.log.3.gz /var/log/nginx/rest.log.2.gz /var/log/nginx/rest.log.3.gz /var/log/nginx/oai.log.2.gz /var/log/nginx/oai.log.3.gz /var/log/nginx/error.log.2.gz /var/log/nginx/error.log.3.gz | grep "21/Jan/2018" | grep "GET " | grep -c -v "/admin"
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz /var/log/nginx/library-access.log.2.gz /var/log/nginx/library-access.log.3.gz /var/log/nginx/rest.log.2.gz /var/log/nginx/rest.log.3.gz /var/log/nginx/oai.log.2.gz /var/log/nginx/oai.log.3.gz /var/log/nginx/error.log.2.gz /var/log/nginx/error.log.3.gz | grep "21/Jan/2018" | grep "GET " | grep -c -v "/admin"
|
||||
56405
|
||||
</code></pre><ul>
|
||||
<li>Apparently about 28% of these requests were for bitstreams, 30% for the REST API, and 30% for handles:</li>
|
||||
</ul>
|
||||
<pre><code># zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz /var/log/nginx/library-access.log.2.gz /var/log/nginx/library-access.log.3.gz /var/log/nginx/rest.log.2.gz /var/log/nginx/rest.log.3.gz /var/log/nginx/oai.log.2.gz /var/log/nginx/oai.log.3.gz /var/log/nginx/error.log.2.gz /var/log/nginx/error.log.3.gz | grep "21/Jan/2018" | grep "GET " | grep -v "/admin" | awk '{print $7}' | grep -Eo "^/(handle|bitstream|rest|oai)/" | sort | uniq -c | sort -n
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz /var/log/nginx/library-access.log.2.gz /var/log/nginx/library-access.log.3.gz /var/log/nginx/rest.log.2.gz /var/log/nginx/rest.log.3.gz /var/log/nginx/oai.log.2.gz /var/log/nginx/oai.log.3.gz /var/log/nginx/error.log.2.gz /var/log/nginx/error.log.3.gz | grep "21/Jan/2018" | grep "GET " | grep -v "/admin" | awk '{print $7}' | grep -Eo "^/(handle|bitstream|rest|oai)/" | sort | uniq -c | sort -n
|
||||
38 /oai/
|
||||
14406 /bitstream/
|
||||
15179 /rest/
|
||||
@ -1132,14 +1132,14 @@ $ ./rest-find-collections.py 10568/1 | grep -i untitled
|
||||
</code></pre><ul>
|
||||
<li>And 3% were to the homepage or search:</li>
|
||||
</ul>
|
||||
<pre><code># zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz /var/log/nginx/library-access.log.2.gz /var/log/nginx/library-access.log.3.gz /var/log/nginx/rest.log.2.gz /var/log/nginx/rest.log.3.gz /var/log/nginx/oai.log.2.gz /var/log/nginx/oai.log.3.gz /var/log/nginx/error.log.2.gz /var/log/nginx/error.log.3.gz | grep "21/Jan/2018" | grep "GET " | grep -v "/admin" | awk '{print $7}' | grep -Eo '^/($|open-search|discover)' | sort | uniq -c
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz /var/log/nginx/library-access.log.2.gz /var/log/nginx/library-access.log.3.gz /var/log/nginx/rest.log.2.gz /var/log/nginx/rest.log.3.gz /var/log/nginx/oai.log.2.gz /var/log/nginx/oai.log.3.gz /var/log/nginx/error.log.2.gz /var/log/nginx/error.log.3.gz | grep "21/Jan/2018" | grep "GET " | grep -v "/admin" | awk '{print $7}' | grep -Eo '^/($|open-search|discover)' | sort | uniq -c
|
||||
1050 /
|
||||
413 /discover
|
||||
170 /open-search
|
||||
</code></pre><ul>
|
||||
<li>The last 10% or so seem to be for static assets that would be served by nginx anyways:</li>
|
||||
</ul>
|
||||
<pre><code># zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz /var/log/nginx/library-access.log.2.gz /var/log/nginx/library-access.log.3.gz /var/log/nginx/rest.log.2.gz /var/log/nginx/rest.log.3.gz /var/log/nginx/oai.log.2.gz /var/log/nginx/oai.log.3.gz /var/log/nginx/error.log.2.gz /var/log/nginx/error.log.3.gz | grep "21/Jan/2018" | grep "GET " | grep -v "/admin" | awk '{print $7}' | grep -v bitstream | grep -Eo '\.(js|css|png|jpg|jpeg|php|svg|gif|txt|map)$' | sort | uniq -c | sort -n
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz /var/log/nginx/library-access.log.2.gz /var/log/nginx/library-access.log.3.gz /var/log/nginx/rest.log.2.gz /var/log/nginx/rest.log.3.gz /var/log/nginx/oai.log.2.gz /var/log/nginx/oai.log.3.gz /var/log/nginx/error.log.2.gz /var/log/nginx/error.log.3.gz | grep "21/Jan/2018" | grep "GET " | grep -v "/admin" | awk '{print $7}' | grep -v bitstream | grep -Eo '\.(js|css|png|jpg|jpeg|php|svg|gif|txt|map)$' | sort | uniq -c | sort -n
|
||||
2 .gif
|
||||
7 .css
|
||||
84 .js
|
||||
@ -1153,7 +1153,7 @@ $ ./rest-find-collections.py 10568/1 | grep -i untitled
|
||||
<ul>
|
||||
<li>Looking at the REST requests, most of them are to expand all or metadata, but 5% are for retrieving bitstreams:</li>
|
||||
</ul>
|
||||
<pre><code># zcat --force /var/log/nginx/access.log.3.gz /var/log/nginx/access.log.4.gz /var/log/nginx/library-access.log.3.gz /var/log/nginx/library-access.log.4.gz /var/log/nginx/rest.log.3.gz /var/log/nginx/rest.log.4.gz /var/log/nginx/oai.log.3.gz /var/log/nginx/oai.log.4.gz /var/log/nginx/error.log.3.gz /var/log/nginx/error.log.4.gz | grep "21/Jan/2018" | grep "GET " | grep -v "/admin" | awk '{print $7}' | grep -E "^/rest" | grep -Eo "(retrieve|expand=[a-z].*)" | sort | uniq -c | sort -n
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log.3.gz /var/log/nginx/access.log.4.gz /var/log/nginx/library-access.log.3.gz /var/log/nginx/library-access.log.4.gz /var/log/nginx/rest.log.3.gz /var/log/nginx/rest.log.4.gz /var/log/nginx/oai.log.3.gz /var/log/nginx/oai.log.4.gz /var/log/nginx/error.log.3.gz /var/log/nginx/error.log.4.gz | grep "21/Jan/2018" | grep "GET " | grep -v "/admin" | awk '{print $7}' | grep -E "^/rest" | grep -Eo "(retrieve|expand=[a-z].*)" | sort | uniq -c | sort -n
|
||||
1 expand=collections
|
||||
16 expand=all&limit=1
|
||||
45 expand=items
|
||||
@ -1163,12 +1163,12 @@ $ ./rest-find-collections.py 10568/1 | grep -i untitled
|
||||
</code></pre><ul>
|
||||
<li>I finished creating the test plan for DSpace Test and ran it from my Linode with:</li>
|
||||
</ul>
|
||||
<pre><code>$ jmeter -n -t DSpacePerfTest-dspacetest.cgiar.org.jmx -l 2018-01-24-1.jtl
|
||||
<pre tabindex="0"><code>$ jmeter -n -t DSpacePerfTest-dspacetest.cgiar.org.jmx -l 2018-01-24-1.jtl
|
||||
</code></pre><ul>
|
||||
<li>Atmire responded to <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">my issue from two weeks ago</a> and said they will start looking into DSpace 5.8 compatibility for CGSpace</li>
|
||||
<li>I set up a new Arch Linux Linode instance with 8192 MB of RAM and ran the test plan a few times to get a baseline:</li>
|
||||
</ul>
|
||||
<pre><code># lscpu
|
||||
<pre tabindex="0"><code># lscpu
|
||||
# lscpu
|
||||
Architecture: x86_64
|
||||
CPU op-mode(s): 32-bit, 64-bit
|
||||
@ -1212,19 +1212,19 @@ $ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.j
|
||||
</code></pre><ul>
|
||||
<li>Then I generated reports for these runs like this:</li>
|
||||
</ul>
|
||||
<pre><code>$ jmeter -g 2018-01-24-linode5451120-baseline.jtl -o 2018-01-24-linode5451120-baseline
|
||||
<pre tabindex="0"><code>$ jmeter -g 2018-01-24-linode5451120-baseline.jtl -o 2018-01-24-linode5451120-baseline
|
||||
</code></pre><h2 id="2018-01-25">2018-01-25</h2>
|
||||
<ul>
|
||||
<li>Run another round of tests on DSpace Test with jmeter after changing Tomcat’s <code>minSpareThreads</code> to 20 (default is 10) and <code>acceptorThreadCount</code> to 2 (default is 1):</li>
|
||||
</ul>
|
||||
<pre><code>$ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.jmx -l ~/dspace-performance-test/2018-01-25-linode5451120-tomcat-threads.jtl -j ~/dspace-performance-test/2018-01-25-linode5451120-tomcat-threads.log
|
||||
<pre tabindex="0"><code>$ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.jmx -l ~/dspace-performance-test/2018-01-25-linode5451120-tomcat-threads.jtl -j ~/dspace-performance-test/2018-01-25-linode5451120-tomcat-threads.log
|
||||
$ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.jmx -l ~/dspace-performance-test/2018-01-25-linode5451120-tomcat-threads2.jtl -j ~/dspace-performance-test/2018-01-25-linode5451120-tomcat-threads2.log
|
||||
$ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.jmx -l ~/dspace-performance-test/2018-01-25-linode5451120-tomcat-threads3.jtl -j ~/dspace-performance-test/2018-01-25-linode5451120-tomcat-threads3.log
|
||||
</code></pre><ul>
|
||||
<li>I changed the parameters back to the baseline ones and switched the Tomcat JVM garbage collector to G1GC and re-ran the tests</li>
|
||||
<li>JVM options for Tomcat changed from <code>-Xms3072m -Xmx3072m -XX:+UseConcMarkSweepGC</code> to <code>-Xms3072m -Xmx3072m -XX:+UseG1GC -XX:+PerfDisableSharedMem</code></li>
|
||||
</ul>
|
||||
<pre><code>$ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.jmx -l ~/dspace-performance-test/2018-01-25-linode5451120-g1gc.jtl -j ~/dspace-performance-test/2018-01-25-linode5451120-g1gc.log
|
||||
<pre tabindex="0"><code>$ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.jmx -l ~/dspace-performance-test/2018-01-25-linode5451120-g1gc.jtl -j ~/dspace-performance-test/2018-01-25-linode5451120-g1gc.log
|
||||
$ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.jmx -l ~/dspace-performance-test/2018-01-25-linode5451120-g1gc2.jtl -j ~/dspace-performance-test/2018-01-25-linode5451120-g1gc2.log
|
||||
$ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.jmx -l ~/dspace-performance-test/2018-01-25-linode5451120-g1gc3.jtl -j ~/dspace-performance-test/2018-01-25-linode5451120-g1gc3.log
|
||||
</code></pre><ul>
|
||||
@ -1242,7 +1242,7 @@ $ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.j
|
||||
<li>The problem is that Peter wanted to use two questions, one for CG centers and one for other, but using the same metadata value, which isn’t possible (?)</li>
|
||||
<li>So I used some creativity and made several fields display values, but not store any, ie:</li>
|
||||
</ul>
|
||||
<pre><code><pair>
|
||||
<pre tabindex="0"><code><pair>
|
||||
<displayed-value>For products published by another party:</displayed-value>
|
||||
<stored-value></stored-value>
|
||||
</pair>
|
||||
@ -1267,7 +1267,7 @@ $ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.j
|
||||
<li>CGSpace went down this morning for a few minutes, according to UptimeRobot</li>
|
||||
<li>Looking at the DSpace logs I see this error happened just before UptimeRobot noticed it going down:</li>
|
||||
</ul>
|
||||
<pre><code>2018-01-29 05:30:22,226 INFO org.dspace.usage.LoggerUsageEventListener @ anonymous:session_id=3775D4125D28EF0C691B08345D905141:ip_addr=68.180.229.254:view_item:handle=10568/71890
|
||||
<pre tabindex="0"><code>2018-01-29 05:30:22,226 INFO org.dspace.usage.LoggerUsageEventListener @ anonymous:session_id=3775D4125D28EF0C691B08345D905141:ip_addr=68.180.229.254:view_item:handle=10568/71890
|
||||
2018-01-29 05:30:22,322 ERROR org.dspace.app.xmlui.aspect.discovery.AbstractSearch @ org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1994+TO+1999]': Encountered " "]" "] "" at line 1, column 32.
|
||||
Was expecting one of:
|
||||
"TO" ...
|
||||
@ -1284,12 +1284,12 @@ Was expecting one of:
|
||||
<li>I see a few dozen HTTP 499 errors in the nginx access log for a few minutes before this happened, but HTTP 499 is just when nginx says that the client closed the request early</li>
|
||||
<li>Perhaps this from the nginx error log is relevant?</li>
|
||||
</ul>
|
||||
<pre><code>2018/01/29 05:26:34 [warn] 26895#26895: *944759 an upstream response is buffered to a temporary file /var/cache/nginx/proxy_temp/6/16/0000026166 while reading upstream, client: 180.76.15.34, server: cgspace.cgiar.org, request: "GET /bitstream/handle/10947/4658/FISH%20Leaflet.pdf?sequence=12 HTTP/1.1", upstream: "http://127.0.0.1:8443/bitstream/handle/10947/4658/FISH%20Leaflet.pdf?sequence=12", host: "cgspace.cgiar.org"
|
||||
<pre tabindex="0"><code>2018/01/29 05:26:34 [warn] 26895#26895: *944759 an upstream response is buffered to a temporary file /var/cache/nginx/proxy_temp/6/16/0000026166 while reading upstream, client: 180.76.15.34, server: cgspace.cgiar.org, request: "GET /bitstream/handle/10947/4658/FISH%20Leaflet.pdf?sequence=12 HTTP/1.1", upstream: "http://127.0.0.1:8443/bitstream/handle/10947/4658/FISH%20Leaflet.pdf?sequence=12", host: "cgspace.cgiar.org"
|
||||
</code></pre><ul>
|
||||
<li>I think that must be unrelated, probably the client closed the request to nginx because DSpace (Tomcat) was taking too long</li>
|
||||
<li>An interesting <a href="https://gist.github.com/magnetikonline/11312172">snippet to get the maximum and average nginx responses</a>:</li>
|
||||
</ul>
|
||||
<pre><code># awk '($9 ~ /200/) { i++;sum+=$10;max=$10>max?$10:max; } END { printf("Maximum: %d\nAverage: %d\n",max,i?sum/i:0); }' /var/log/nginx/access.log
|
||||
<pre tabindex="0"><code># awk '($9 ~ /200/) { i++;sum+=$10;max=$10>max?$10:max; } END { printf("Maximum: %d\nAverage: %d\n",max,i?sum/i:0); }' /var/log/nginx/access.log
|
||||
Maximum: 2771268
|
||||
Average: 210483
|
||||
</code></pre><ul>
|
||||
@ -1297,7 +1297,7 @@ Average: 210483
|
||||
<li>My best guess is that the Solr search error is related somehow but I can’t figure it out</li>
|
||||
<li>We definitely have enough database connections, as I haven’t seen a pool error in weeks:</li>
|
||||
</ul>
|
||||
<pre><code>$ grep -c "Timeout: Pool empty." dspace.log.2018-01-2*
|
||||
<pre tabindex="0"><code>$ grep -c "Timeout: Pool empty." dspace.log.2018-01-2*
|
||||
dspace.log.2018-01-20:0
|
||||
dspace.log.2018-01-21:0
|
||||
dspace.log.2018-01-22:0
|
||||
@ -1326,7 +1326,7 @@ dspace.log.2018-01-29:0
|
||||
<li>Wow, so apparently you need to specify which connector to check if you want any of the Munin Tomcat plugins besides “tomcat_jvm” to work (the connector name can be seen in the Catalina logs)</li>
|
||||
<li>I modified <em>/etc/munin/plugin-conf.d/tomcat</em> to add the connector (with surrounding quotes!) and now the other plugins work (obviously the credentials are incorrect):</li>
|
||||
</ul>
|
||||
<pre><code>[tomcat_*]
|
||||
<pre tabindex="0"><code>[tomcat_*]
|
||||
env.host 127.0.0.1
|
||||
env.port 8081
|
||||
env.connector "http-bio-127.0.0.1-8443"
|
||||
@ -1335,7 +1335,7 @@ dspace.log.2018-01-29:0
|
||||
</code></pre><ul>
|
||||
<li>For example, I can see the threads:</li>
|
||||
</ul>
|
||||
<pre><code># munin-run tomcat_threads
|
||||
<pre tabindex="0"><code># munin-run tomcat_threads
|
||||
busy.value 0
|
||||
idle.value 20
|
||||
max.value 400
|
||||
@ -1345,18 +1345,18 @@ max.value 400
|
||||
<li>Although following the logic of <em>/usr/share/munin/plugins/jmx_tomcat_dbpools</em> could be useful for getting the active Tomcat sessions</li>
|
||||
<li>From debugging the <code>jmx_tomcat_db_pools</code> script from the <code>munin-plugins-java</code> package, I see that this is how you call arbitrary mbeans:</li>
|
||||
</ul>
|
||||
<pre><code># port=5400 ip="127.0.0.1" /usr/bin/java -cp /usr/share/munin/munin-jmx-plugins.jar org.munin.plugin.jmx.Beans Catalina:type=DataSource,class=javax.sql.DataSource,name=* maxActive
|
||||
<pre tabindex="0"><code># port=5400 ip="127.0.0.1" /usr/bin/java -cp /usr/share/munin/munin-jmx-plugins.jar org.munin.plugin.jmx.Beans Catalina:type=DataSource,class=javax.sql.DataSource,name=* maxActive
|
||||
Catalina:type=DataSource,class=javax.sql.DataSource,name="jdbc/dspace" maxActive 300
|
||||
</code></pre><ul>
|
||||
<li>More notes here: <a href="https://github.com/munin-monitoring/contrib/tree/master/plugins/jmx">https://github.com/munin-monitoring/contrib/tree/master/plugins/jmx</a></li>
|
||||
<li>Looking at the Munin graphs, I that the load is 200% every morning from 03:00 to almost 08:00</li>
|
||||
<li>Tomcat’s catalina.out log file is full of spam from this thing too, with lines like this</li>
|
||||
</ul>
|
||||
<pre><code>[===================> ]38% time remaining: 5 hour(s) 21 minute(s) 47 seconds. timestamp: 2018-01-29 06:25:16
|
||||
<pre tabindex="0"><code>[===================> ]38% time remaining: 5 hour(s) 21 minute(s) 47 seconds. timestamp: 2018-01-29 06:25:16
|
||||
</code></pre><ul>
|
||||
<li>There are millions of these status lines, for example in just this one log file:</li>
|
||||
</ul>
|
||||
<pre><code># zgrep -c "time remaining" /var/log/tomcat7/catalina.out.1.gz
|
||||
<pre tabindex="0"><code># zgrep -c "time remaining" /var/log/tomcat7/catalina.out.1.gz
|
||||
1084741
|
||||
</code></pre><ul>
|
||||
<li>I filed a ticket with Atmire: <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=566">https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=566</a></li>
|
||||
@ -1370,26 +1370,26 @@ Catalina:type=DataSource,class=javax.sql.DataSource,name="jdbc/dspace"
|
||||
<li>Now PostgreSQL activity shows 308 connections!</li>
|
||||
<li>Well this is interesting, there are 400 Tomcat threads busy:</li>
|
||||
</ul>
|
||||
<pre><code># munin-run tomcat_threads
|
||||
<pre tabindex="0"><code># munin-run tomcat_threads
|
||||
busy.value 400
|
||||
idle.value 0
|
||||
max.value 400
|
||||
</code></pre><ul>
|
||||
<li>And wow, we finally exhausted the database connections, from dspace.log:</li>
|
||||
</ul>
|
||||
<pre><code>2018-01-31 08:05:28,964 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
|
||||
<pre tabindex="0"><code>2018-01-31 08:05:28,964 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
|
||||
org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-451] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:300; busy:300; idle:0; lastwait:5000].
|
||||
</code></pre><ul>
|
||||
<li>Now even the nightly Atmire background thing is getting HTTP 500 error:</li>
|
||||
</ul>
|
||||
<pre><code>Jan 31, 2018 8:16:05 AM com.sun.jersey.spi.container.ContainerResponse logException
|
||||
<pre tabindex="0"><code>Jan 31, 2018 8:16:05 AM com.sun.jersey.spi.container.ContainerResponse logException
|
||||
SEVERE: Mapped exception to response: 500 (Internal Server Error)
|
||||
javax.ws.rs.WebApplicationException
|
||||
</code></pre><ul>
|
||||
<li>For now I will restart Tomcat to clear this shit and bring the site back up</li>
|
||||
<li>The top IPs from this morning, during 7 and 8AM in XMLUI and REST/OAI:</li>
|
||||
</ul>
|
||||
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 /var/log/nginx/error.log /var/log/nginx/error.log.1 | grep -E "31/Jan/2018:(07|08)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 /var/log/nginx/error.log /var/log/nginx/error.log.1 | grep -E "31/Jan/2018:(07|08)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
67 66.249.66.70
|
||||
70 207.46.13.12
|
||||
71 197.210.168.174
|
||||
@ -1426,7 +1426,7 @@ javax.ws.rs.WebApplicationException
|
||||
<li>I should make separate database pools for the web applications and the API applications like REST and OAI</li>
|
||||
<li>Ok, so this is interesting: I figured out how to get the MBean path to query Tomcat’s activeSessions from JMX (using <code>munin-plugins-java</code>):</li>
|
||||
</ul>
|
||||
<pre><code># port=5400 ip="127.0.0.1" /usr/bin/java -cp /usr/share/munin/munin-jmx-plugins.jar org.munin.plugin.jmx.Beans Catalina:type=Manager,context=/,host=localhost activeSessions
|
||||
<pre tabindex="0"><code># port=5400 ip="127.0.0.1" /usr/bin/java -cp /usr/share/munin/munin-jmx-plugins.jar org.munin.plugin.jmx.Beans Catalina:type=Manager,context=/,host=localhost activeSessions
|
||||
Catalina:type=Manager,context=/,host=localhost activeSessions 8
|
||||
</code></pre><ul>
|
||||
<li>If you connect to Tomcat in <code>jvisualvm</code> it’s pretty obvious when you hover over the elements</li>
|
||||
|
Reference in New Issue
Block a user