Add notes for 2022-03-04

This commit is contained in:
2022-03-04 15:30:06 +03:00
parent 7453499827
commit 27acbac859
115 changed files with 6550 additions and 6444 deletions

View File

@ -23,11 +23,11 @@ After that one client got an HTTP 499 but then the rest were HTTP 200, so I don&
I notice this error quite a few times in dspace.log:
2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered " "]" "] "" at line 1, column 32.
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered " "]" "] "" at line 1, column 32.
And there are many of these errors every day for the past month:
$ grep -c "Error while searching for sidebar facets" dspace.log.*
$ grep -c "Error while searching for sidebar facets" dspace.log.*
dspace.log.2017-11-21:4
dspace.log.2017-11-22:1
dspace.log.2017-11-23:4
@ -99,11 +99,11 @@ After that one client got an HTTP 499 but then the rest were HTTP 200, so I don&
I notice this error quite a few times in dspace.log:
2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered " "]" "] "" at line 1, column 32.
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered " "]" "] "" at line 1, column 32.
And there are many of these errors every day for the past month:
$ grep -c "Error while searching for sidebar facets" dspace.log.*
$ grep -c "Error while searching for sidebar facets" dspace.log.*
dspace.log.2017-11-21:4
dspace.log.2017-11-22:1
dspace.log.2017-11-23:4
@ -150,7 +150,7 @@ dspace.log.2018-01-02:34
Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’s Encrypt if it’s just a handful of domains
"/>
<meta name="generator" content="Hugo 0.92.2" />
<meta name="generator" content="Hugo 0.93.1" />
@ -252,11 +252,11 @@ Danny wrote to ask for help renewing the wildcard ilri.org certificate and I adv
<li>I notice this error quite a few times in dspace.log:</li>
</ul>
<pre tabindex="0"><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered &quot; &quot;]&quot; &quot;] &quot;&quot; at line 1, column 32.
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse &#39;dateIssued_keyword:[1976+TO+1979]&#39;: Encountered &#34; &#34;]&#34; &#34;] &#34;&#34; at line 1, column 32.
</code></pre><ul>
<li>And there are many of these errors every day for the past month:</li>
</ul>
<pre tabindex="0"><code>$ grep -c &quot;Error while searching for sidebar facets&quot; dspace.log.*
<pre tabindex="0"><code>$ grep -c &#34;Error while searching for sidebar facets&#34; dspace.log.*
dspace.log.2017-11-21:4
dspace.log.2017-11-22:1
dspace.log.2017-11-23:4
@ -308,7 +308,7 @@ dspace.log.2018-01-02:34
<li>I woke up to more up and down of CGSpace, this time UptimeRobot noticed a few rounds of up and down of a few minutes each and Linode also notified of high CPU load from 12 to 2 PM</li>
<li>Looks like I need to increase the database pool size again:</li>
</ul>
<pre tabindex="0"><code>$ grep -c &quot;Timeout: Pool empty.&quot; dspace.log.2018-01-*
<pre tabindex="0"><code>$ grep -c &#34;Timeout: Pool empty.&#34; dspace.log.2018-01-*
dspace.log.2018-01-01:0
dspace.log.2018-01-02:1972
dspace.log.2018-01-03:1909
@ -319,7 +319,7 @@ dspace.log.2018-01-03:1909
<ul>
<li>The active IPs in XMLUI are:</li>
</ul>
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E &quot;3/Jan/2018&quot; | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E &#34;3/Jan/2018&#34; | awk &#39;{print $1}&#39; | sort -n | uniq -c | sort -h | tail
607 40.77.167.141
611 2a00:23c3:8c94:7800:392c:a491:e796:9c50
663 188.226.169.37
@ -336,12 +336,12 @@ dspace.log.2018-01-03:1909
<li>This appears to be the <a href="https://github.com/internetarchive/heritrix3">Internet Archive&rsquo;s open source bot</a></li>
<li>They seem to be re-using their Tomcat session so I don&rsquo;t need to do anything to them just yet:</li>
</ul>
<pre tabindex="0"><code>$ grep 134.155.96.78 dspace.log.2018-01-03 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
<pre tabindex="0"><code>$ grep 134.155.96.78 dspace.log.2018-01-03 | grep -o -E &#39;session_id=[A-Z0-9]{32}&#39; | sort -n | uniq | wc -l
2
</code></pre><ul>
<li>The API logs show the normal users:</li>
</ul>
<pre tabindex="0"><code># cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E &quot;3/Jan/2018&quot; | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
<pre tabindex="0"><code># cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E &#34;3/Jan/2018&#34; | awk &#39;{print $1}&#39; | sort -n | uniq -c | sort -h | tail
32 207.46.13.182
38 40.77.167.132
38 68.180.229.254
@ -361,7 +361,7 @@ dspace.log.2018-01-03:1909
</code></pre><ul>
<li>But they come from hundreds of IPs, many of which are 54.x.x.x:</li>
</ul>
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep python-requests | awk '{print $1}' | sort -n | uniq -c | sort -h | tail -n 30
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep python-requests | awk &#39;{print $1}&#39; | sort -n | uniq -c | sort -h | tail -n 30
9 54.144.87.92
9 54.146.222.143
9 54.146.249.249
@ -402,7 +402,7 @@ dspace.log.2018-01-03:1909
<li>CGSpace went down and up a bunch of times last night and ILRI staff were complaining a lot last night</li>
<li>The XMLUI logs show this activity:</li>
</ul>
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E &quot;4/Jan/2018&quot; | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E &#34;4/Jan/2018&#34; | awk &#39;{print $1}&#39; | sort -n | uniq -c | sort -h | tail
968 197.211.63.81
981 213.55.99.121
1039 66.249.64.93
@ -421,7 +421,7 @@ org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exe
</code></pre><ul>
<li>So for this week that is the number one problem!</li>
</ul>
<pre tabindex="0"><code>$ grep -c &quot;Timeout: Pool empty.&quot; dspace.log.2018-01-*
<pre tabindex="0"><code>$ grep -c &#34;Timeout: Pool empty.&#34; dspace.log.2018-01-*
dspace.log.2018-01-01:0
dspace.log.2018-01-02:1972
dspace.log.2018-01-03:1909
@ -436,7 +436,7 @@ dspace.log.2018-01-04:1559
<li>Peter said that CGSpace was down last night and Tsega restarted Tomcat</li>
<li>I don&rsquo;t see any alerts from Linode or UptimeRobot, and there are no PostgreSQL connection errors in the dspace logs for today:</li>
</ul>
<pre tabindex="0"><code>$ grep -c &quot;Timeout: Pool empty.&quot; dspace.log.2018-01-*
<pre tabindex="0"><code>$ grep -c &#34;Timeout: Pool empty.&#34; dspace.log.2018-01-*
dspace.log.2018-01-01:0
dspace.log.2018-01-02:1972
dspace.log.2018-01-03:1909
@ -446,13 +446,13 @@ dspace.log.2018-01-05:0
<li>Daniel asked for help with their DAGRIS server (linode2328112) that has no disk space</li>
<li>I had a look and there is one Apache 2 log file that is 73GB, with lots of this:</li>
</ul>
<pre tabindex="0"><code>[Fri Jan 05 09:31:22.965398 2018] [:error] [pid 9340] [client 213.55.99.121:64476] WARNING: Unable to find a match for &quot;9-16-1-RV.doc&quot; in &quot;/home/files/journals/6//articles/9/&quot;. Skipping this file., referer: http://dagris.info/reviewtool/index.php/index/install/upgrade
<pre tabindex="0"><code>[Fri Jan 05 09:31:22.965398 2018] [:error] [pid 9340] [client 213.55.99.121:64476] WARNING: Unable to find a match for &#34;9-16-1-RV.doc&#34; in &#34;/home/files/journals/6//articles/9/&#34;. Skipping this file., referer: http://dagris.info/reviewtool/index.php/index/install/upgrade
</code></pre><ul>
<li>I will delete the log file for now and tell Danny</li>
<li>Also, I&rsquo;m still seeing a hundred or so of the &ldquo;ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer&rdquo; errors in dspace logs, I need to search the dspace-tech mailing list to see what the cause is</li>
<li>I will run a full Discovery reindex in the mean time to see if it&rsquo;s something wrong with the Discovery Solr core</li>
</ul>
<pre tabindex="0"><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx512m -XX:+TieredCompilation -XX:TieredStopAtLevel=1&quot;
<pre tabindex="0"><code>$ export JAVA_OPTS=&#34;-Dfile.encoding=UTF-8 -Xmx512m -XX:+TieredCompilation -XX:TieredStopAtLevel=1&#34;
$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
real 110m43.985s
@ -465,7 +465,7 @@ sys 3m14.890s
<ul>
<li>I&rsquo;m still seeing Solr errors in the DSpace logs even after the full reindex yesterday:</li>
</ul>
<pre tabindex="0"><code>org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1983+TO+1989]': Encountered &quot; &quot;]&quot; &quot;] &quot;&quot; at line 1, column 32.
<pre tabindex="0"><code>org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse &#39;dateIssued_keyword:[1983+TO+1989]&#39;: Encountered &#34; &#34;]&#34; &#34;] &#34;&#34; at line 1, column 32.
</code></pre><ul>
<li>I posted a message to the dspace-tech mailing list to see if anyone can help</li>
</ul>
@ -474,7 +474,7 @@ sys 3m14.890s
<li>Advise Sisay about blank lines in some IITA records</li>
<li>Generate a list of author affiliations for Peter to clean up:</li>
</ul>
<pre tabindex="0"><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'affiliation') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/affiliations.csv with csv;
<pre tabindex="0"><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = &#39;contributor&#39; and qualifier = &#39;affiliation&#39;) AND resource_type_id = 2 group by text_value order by count desc) to /tmp/affiliations.csv with csv;
COPY 4515
</code></pre><h2 id="2018-01-10">2018-01-10</h2>
<ul>
@ -553,10 +553,10 @@ Caused by: org.apache.http.client.ClientProtocolException
<li>I can apparently search for records in the Solr stats core that have an empty <code>owningColl</code> field using this in the Solr admin query: <code>-owningColl:*</code></li>
<li>On CGSpace I see 48,000,000 records that have an <code>owningColl</code> field and 34,000,000 that don&rsquo;t:</li>
</ul>
<pre tabindex="0"><code>$ http 'http://localhost:3000/solr/statistics/select?q=owningColl%3A*&amp;wt=json&amp;indent=true' | grep numFound
&quot;response&quot;:{&quot;numFound&quot;:48476327,&quot;start&quot;:0,&quot;docs&quot;:[
$ http 'http://localhost:3000/solr/statistics/select?q=-owningColl%3A*&amp;wt=json&amp;indent=true' | grep numFound
&quot;response&quot;:{&quot;numFound&quot;:34879872,&quot;start&quot;:0,&quot;docs&quot;:[
<pre tabindex="0"><code>$ http &#39;http://localhost:3000/solr/statistics/select?q=owningColl%3A*&amp;wt=json&amp;indent=true&#39; | grep numFound
&#34;response&#34;:{&#34;numFound&#34;:48476327,&#34;start&#34;:0,&#34;docs&#34;:[
$ http &#39;http://localhost:3000/solr/statistics/select?q=-owningColl%3A*&amp;wt=json&amp;indent=true&#39; | grep numFound
&#34;response&#34;:{&#34;numFound&#34;:34879872,&#34;start&#34;:0,&#34;docs&#34;:[
</code></pre><ul>
<li>I tested the <code>dspace stats-util -s</code> process on my local machine and it failed the same way</li>
<li>It doesn&rsquo;t seem to be helpful, but the dspace log shows this:</li>
@ -568,12 +568,12 @@ $ http 'http://localhost:3000/solr/statistics/select?q=-owningColl%3A*&amp;wt=js
<li>Uptime Robot said that CGSpace went down at around 9:43 AM</li>
<li>I looked at PostgreSQL&rsquo;s <code>pg_stat_activity</code> table and saw 161 active connections, but no pool errors in the DSpace logs:</li>
</ul>
<pre tabindex="0"><code>$ grep -c &quot;Timeout: Pool empty.&quot; dspace.log.2018-01-10
<pre tabindex="0"><code>$ grep -c &#34;Timeout: Pool empty.&#34; dspace.log.2018-01-10
0
</code></pre><ul>
<li>The XMLUI logs show quite a bit of activity today:</li>
</ul>
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep &quot;10/Jan/2018&quot; | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep &#34;10/Jan/2018&#34; | awk &#39;{print $1}&#39; | sort -n | uniq -c | sort -h | tail
951 207.46.13.159
954 157.55.39.123
1217 95.108.181.88
@ -587,18 +587,18 @@ $ http 'http://localhost:3000/solr/statistics/select?q=-owningColl%3A*&amp;wt=js
</code></pre><ul>
<li>The user agent for the top six or so IPs are all the same:</li>
</ul>
<pre tabindex="0"><code>&quot;Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36&quot;
<pre tabindex="0"><code>&#34;Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36&#34;
</code></pre><ul>
<li><code>whois</code> says they come from <a href="http://www.perfectip.net/">Perfect IP</a></li>
<li>I&rsquo;ve never seen those top IPs before, but they have created 50,000 Tomcat sessions today:</li>
</ul>
<pre tabindex="0"><code>$ grep -E '(2607:fa98:40:9:26b6:fdff:feff:1888|2607:fa98:40:9:26b6:fdff:feff:195d|2607:fa98:40:9:26b6:fdff:feff:1c96|70.36.107.49|70.36.107.190|70.36.107.50)' /home/cgspace.cgiar.org/log/dspace.log.2018-01-10 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
<pre tabindex="0"><code>$ grep -E &#39;(2607:fa98:40:9:26b6:fdff:feff:1888|2607:fa98:40:9:26b6:fdff:feff:195d|2607:fa98:40:9:26b6:fdff:feff:1c96|70.36.107.49|70.36.107.190|70.36.107.50)&#39; /home/cgspace.cgiar.org/log/dspace.log.2018-01-10 | grep -o -E &#39;session_id=[A-Z0-9]{32}&#39; | sort -n | uniq | wc -l
49096
</code></pre><ul>
<li>Rather than blocking their IPs, I think I might just add their user agent to the &ldquo;badbots&rdquo; zone with Baidu, because they seem to be the only ones using that user agent:</li>
</ul>
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep &quot;Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari
/537.36&quot; | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep &#34;Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari
/537.36&#34; | awk &#39;{print $1}&#39; | sort -n | uniq -c | sort -h | tail
6796 70.36.107.50
11870 70.36.107.190
17323 70.36.107.49
@ -637,19 +637,19 @@ cache_alignment : 64
<li>Linode rebooted DSpace Test and CGSpace for their host hypervisor kernel updates</li>
<li>Following up with the Solr sharding issue on the dspace-tech mailing list, I noticed this interesting snippet in the Tomcat <code>localhost_access_log</code> at the time of my sharding attempt on my test machine:</li>
</ul>
<pre tabindex="0"><code>127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] &quot;GET /solr/statistics/select?q=type%3A2+AND+id%3A1&amp;wt=javabin&amp;version=2 HTTP/1.1&quot; 200 107
127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] &quot;GET /solr/statistics/select?q=*%3A*&amp;rows=0&amp;facet=true&amp;facet.range=time&amp;facet.range.start=NOW%2FYEAR-18YEARS&amp;facet.range.end=NOW%2FYEAR%2B0YEARS&amp;facet.range.gap=%2B1YEAR&amp;facet.mincount=1&amp;wt=javabin&amp;version=2 HTTP/1.1&quot; 200 447
127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] &quot;GET /solr/admin/cores?action=STATUS&amp;core=statistics-2016&amp;indexInfo=true&amp;wt=javabin&amp;version=2 HTTP/1.1&quot; 200 76
127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] &quot;GET /solr/admin/cores?action=CREATE&amp;name=statistics-2016&amp;instanceDir=statistics&amp;dataDir=%2FUsers%2Faorth%2Fdspace%2Fsolr%2Fstatistics-2016%2Fdata&amp;wt=javabin&amp;version=2 HTTP/1.1&quot; 200 63
127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] &quot;GET /solr/statistics/select?csv.mv.separator=%7C&amp;q=*%3A*&amp;fq=time%3A%28%5B2016%5C-01%5C-01T00%5C%3A00%5C%3A00Z+TO+2017%5C-01%5C-01T00%5C%3A00%5C%3A00Z%5D+NOT+2017%5C-01%5C-01T00%5C%3A00%5C%3A00Z%29&amp;rows=10000&amp;wt=csv HTTP/1.1&quot; 200 2137630
127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] &quot;GET /solr/statistics/admin/luke?show=schema&amp;wt=javabin&amp;version=2 HTTP/1.1&quot; 200 16253
127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] &quot;POST /solr//statistics-2016/update/csv?commit=true&amp;softCommit=false&amp;waitSearcher=true&amp;f.previousWorkflowStep.split=true&amp;f.previousWorkflowStep.separator=%7C&amp;f.previousWorkflowStep.encapsulator=%22&amp;f.actingGroupId.split=true&amp;f.actingGroupId.separator=%7C&amp;f.actingGroupId.encapsulator=%22&amp;f.containerCommunity.split=true&amp;f.containerCommunity.separator=%7C&amp;f.containerCommunity.encapsulator=%22&amp;f.range.split=true&amp;f.range.separator=%7C&amp;f.range.encapsulator=%22&amp;f.containerItem.split=true&amp;f.containerItem.separator=%7C&amp;f.containerItem.encapsulator=%22&amp;f.p_communities_map.split=true&amp;f.p_communities_map.separator=%7C&amp;f.p_communities_map.encapsulator=%22&amp;f.ngram_query_search.split=true&amp;f.ngram_query_search.separator=%7C&amp;f.ngram_query_search.encapsulator=%22&amp;f.containerBitstream.split=true&amp;f.containerBitstream.separator=%7C&amp;f.containerBitstream.encapsulator=%22&amp;f.owningItem.split=true&amp;f.owningItem.separator=%7C&amp;f.owningItem.encapsulator=%22&amp;f.actingGroupParentId.split=true&amp;f.actingGroupParentId.separator=%7C&amp;f.actingGroupParentId.encapsulator=%22&amp;f.text.split=true&amp;f.text.separator=%7C&amp;f.text.encapsulator=%22&amp;f.simple_query_search.split=true&amp;f.simple_query_search.separator=%7C&amp;f.simple_query_search.encapsulator=%22&amp;f.owningComm.split=true&amp;f.owningComm.separator=%7C&amp;f.owningComm.encapsulator=%22&amp;f.owner.split=true&amp;f.owner.separator=%7C&amp;f.owner.encapsulator=%22&amp;f.filterquery.split=true&amp;f.filterquery.separator=%7C&amp;f.filterquery.encapsulator=%22&amp;f.p_group_map.split=true&amp;f.p_group_map.separator=%7C&amp;f.p_group_map.encapsulator=%22&amp;f.actorMemberGroupId.split=true&amp;f.actorMemberGroupId.separator=%7C&amp;f.actorMemberGroupId.encapsulator=%22&amp;f.bitstreamId.split=true&amp;f.bitstreamId.separator=%7C&amp;f.bitstreamId.encapsulator=%22&amp;f.group_name.split=true&amp;f.group_name.separator=%7C&amp;f.group_name.encapsulator=%22&amp;f.p_communities_name.split=true&amp;f.p_communities_name.separator=%7C&amp;f.p_communities_name.encapsulator=%22&amp;f.query.split=true&amp;f.query.separator=%7C&amp;f.query.encapsulator=%22&amp;f.workflowStep.split=true&amp;f.workflowStep.separator=%7C&amp;f.workflowStep.encapsulator=%22&amp;f.containerCollection.split=true&amp;f.containerCollection.separator=%7C&amp;f.containerCollection.encapsulator=%22&amp;f.complete_query_search.split=true&amp;f.complete_query_search.separator=%7C&amp;f.complete_query_search.encapsulator=%22&amp;f.p_communities_id.split=true&amp;f.p_communities_id.separator=%7C&amp;f.p_communities_id.encapsulator=%22&amp;f.rangeDescription.split=true&amp;f.rangeDescription.separator=%7C&amp;f.rangeDescription.encapsulator=%22&amp;f.group_id.split=true&amp;f.group_id.separator=%7C&amp;f.group_id.encapsulator=%22&amp;f.bundleName.split=true&amp;f.bundleName.separator=%7C&amp;f.bundleName.encapsulator=%22&amp;f.ngram_simplequery_search.split=true&amp;f.ngram_simplequery_search.separator=%7C&amp;f.ngram_simplequery_search.encapsulator=%22&amp;f.group_map.split=true&amp;f.group_map.separator=%7C&amp;f.group_map.encapsulator=%22&amp;f.owningColl.split=true&amp;f.owningColl.separator=%7C&amp;f.owningColl.encapsulator=%22&amp;f.p_group_id.split=true&amp;f.p_group_id.separator=%7C&amp;f.p_group_id.encapsulator=%22&amp;f.p_group_name.split=true&amp;f.p_group_name.separator=%7C&amp;f.p_group_name.encapsulator=%22&amp;wt=javabin&amp;version=2 HTTP/1.1&quot; 409 156
<pre tabindex="0"><code>127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] &#34;GET /solr/statistics/select?q=type%3A2+AND+id%3A1&amp;wt=javabin&amp;version=2 HTTP/1.1&#34; 200 107
127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] &#34;GET /solr/statistics/select?q=*%3A*&amp;rows=0&amp;facet=true&amp;facet.range=time&amp;facet.range.start=NOW%2FYEAR-18YEARS&amp;facet.range.end=NOW%2FYEAR%2B0YEARS&amp;facet.range.gap=%2B1YEAR&amp;facet.mincount=1&amp;wt=javabin&amp;version=2 HTTP/1.1&#34; 200 447
127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] &#34;GET /solr/admin/cores?action=STATUS&amp;core=statistics-2016&amp;indexInfo=true&amp;wt=javabin&amp;version=2 HTTP/1.1&#34; 200 76
127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] &#34;GET /solr/admin/cores?action=CREATE&amp;name=statistics-2016&amp;instanceDir=statistics&amp;dataDir=%2FUsers%2Faorth%2Fdspace%2Fsolr%2Fstatistics-2016%2Fdata&amp;wt=javabin&amp;version=2 HTTP/1.1&#34; 200 63
127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] &#34;GET /solr/statistics/select?csv.mv.separator=%7C&amp;q=*%3A*&amp;fq=time%3A%28%5B2016%5C-01%5C-01T00%5C%3A00%5C%3A00Z+TO+2017%5C-01%5C-01T00%5C%3A00%5C%3A00Z%5D+NOT+2017%5C-01%5C-01T00%5C%3A00%5C%3A00Z%29&amp;rows=10000&amp;wt=csv HTTP/1.1&#34; 200 2137630
127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] &#34;GET /solr/statistics/admin/luke?show=schema&amp;wt=javabin&amp;version=2 HTTP/1.1&#34; 200 16253
127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] &#34;POST /solr//statistics-2016/update/csv?commit=true&amp;softCommit=false&amp;waitSearcher=true&amp;f.previousWorkflowStep.split=true&amp;f.previousWorkflowStep.separator=%7C&amp;f.previousWorkflowStep.encapsulator=%22&amp;f.actingGroupId.split=true&amp;f.actingGroupId.separator=%7C&amp;f.actingGroupId.encapsulator=%22&amp;f.containerCommunity.split=true&amp;f.containerCommunity.separator=%7C&amp;f.containerCommunity.encapsulator=%22&amp;f.range.split=true&amp;f.range.separator=%7C&amp;f.range.encapsulator=%22&amp;f.containerItem.split=true&amp;f.containerItem.separator=%7C&amp;f.containerItem.encapsulator=%22&amp;f.p_communities_map.split=true&amp;f.p_communities_map.separator=%7C&amp;f.p_communities_map.encapsulator=%22&amp;f.ngram_query_search.split=true&amp;f.ngram_query_search.separator=%7C&amp;f.ngram_query_search.encapsulator=%22&amp;f.containerBitstream.split=true&amp;f.containerBitstream.separator=%7C&amp;f.containerBitstream.encapsulator=%22&amp;f.owningItem.split=true&amp;f.owningItem.separator=%7C&amp;f.owningItem.encapsulator=%22&amp;f.actingGroupParentId.split=true&amp;f.actingGroupParentId.separator=%7C&amp;f.actingGroupParentId.encapsulator=%22&amp;f.text.split=true&amp;f.text.separator=%7C&amp;f.text.encapsulator=%22&amp;f.simple_query_search.split=true&amp;f.simple_query_search.separator=%7C&amp;f.simple_query_search.encapsulator=%22&amp;f.owningComm.split=true&amp;f.owningComm.separator=%7C&amp;f.owningComm.encapsulator=%22&amp;f.owner.split=true&amp;f.owner.separator=%7C&amp;f.owner.encapsulator=%22&amp;f.filterquery.split=true&amp;f.filterquery.separator=%7C&amp;f.filterquery.encapsulator=%22&amp;f.p_group_map.split=true&amp;f.p_group_map.separator=%7C&amp;f.p_group_map.encapsulator=%22&amp;f.actorMemberGroupId.split=true&amp;f.actorMemberGroupId.separator=%7C&amp;f.actorMemberGroupId.encapsulator=%22&amp;f.bitstreamId.split=true&amp;f.bitstreamId.separator=%7C&amp;f.bitstreamId.encapsulator=%22&amp;f.group_name.split=true&amp;f.group_name.separator=%7C&amp;f.group_name.encapsulator=%22&amp;f.p_communities_name.split=true&amp;f.p_communities_name.separator=%7C&amp;f.p_communities_name.encapsulator=%22&amp;f.query.split=true&amp;f.query.separator=%7C&amp;f.query.encapsulator=%22&amp;f.workflowStep.split=true&amp;f.workflowStep.separator=%7C&amp;f.workflowStep.encapsulator=%22&amp;f.containerCollection.split=true&amp;f.containerCollection.separator=%7C&amp;f.containerCollection.encapsulator=%22&amp;f.complete_query_search.split=true&amp;f.complete_query_search.separator=%7C&amp;f.complete_query_search.encapsulator=%22&amp;f.p_communities_id.split=true&amp;f.p_communities_id.separator=%7C&amp;f.p_communities_id.encapsulator=%22&amp;f.rangeDescription.split=true&amp;f.rangeDescription.separator=%7C&amp;f.rangeDescription.encapsulator=%22&amp;f.group_id.split=true&amp;f.group_id.separator=%7C&amp;f.group_id.encapsulator=%22&amp;f.bundleName.split=true&amp;f.bundleName.separator=%7C&amp;f.bundleName.encapsulator=%22&amp;f.ngram_simplequery_search.split=true&amp;f.ngram_simplequery_search.separator=%7C&amp;f.ngram_simplequery_search.encapsulator=%22&amp;f.group_map.split=true&amp;f.group_map.separator=%7C&amp;f.group_map.encapsulator=%22&amp;f.owningColl.split=true&amp;f.owningColl.separator=%7C&amp;f.owningColl.encapsulator=%22&amp;f.p_group_id.split=true&amp;f.p_group_id.separator=%7C&amp;f.p_group_id.encapsulator=%22&amp;f.p_group_name.split=true&amp;f.p_group_name.separator=%7C&amp;f.p_group_name.encapsulator=%22&amp;wt=javabin&amp;version=2 HTTP/1.1&#34; 409 156
</code></pre><ul>
<li>The new core is created but when DSpace attempts to POST to it there is an HTTP 409 error</li>
<li>This is apparently a common Solr error code that means &ldquo;version conflict&rdquo;: <a href="http://yonik.com/solr/optimistic-concurrency/">http://yonik.com/solr/optimistic-concurrency/</a></li>
<li>Looks like that bot from the PerfectIP.net host ended up making about 450,000 requests to XMLUI alone yesterday:</li>
</ul>
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep &quot;Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36&quot; | grep &quot;10/Jan/2018&quot; | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep &#34;Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36&#34; | grep &#34;10/Jan/2018&#34; | awk &#39;{print $1}&#39; | sort -n | uniq -c | sort -h | tail
21572 70.36.107.50
30722 70.36.107.190
34566 70.36.107.49
@ -659,18 +659,18 @@ cache_alignment : 64
</code></pre><ul>
<li>Wow, I just figured out how to set the application name of each database pool in the JNDI config of Tomcat&rsquo;s <code>server.xml</code>:</li>
</ul>
<pre tabindex="0"><code>&lt;Resource name=&quot;jdbc/dspaceWeb&quot; auth=&quot;Container&quot; type=&quot;javax.sql.DataSource&quot;
driverClassName=&quot;org.postgresql.Driver&quot;
url=&quot;jdbc:postgresql://localhost:5432/dspacetest?ApplicationName=dspaceWeb&quot;
username=&quot;dspace&quot;
password=&quot;dspace&quot;
initialSize='5'
maxActive='75'
maxIdle='15'
minIdle='5'
maxWait='5000'
validationQuery='SELECT 1'
testOnBorrow='true' /&gt;
<pre tabindex="0"><code>&lt;Resource name=&#34;jdbc/dspaceWeb&#34; auth=&#34;Container&#34; type=&#34;javax.sql.DataSource&#34;
driverClassName=&#34;org.postgresql.Driver&#34;
url=&#34;jdbc:postgresql://localhost:5432/dspacetest?ApplicationName=dspaceWeb&#34;
username=&#34;dspace&#34;
password=&#34;dspace&#34;
initialSize=&#39;5&#39;
maxActive=&#39;75&#39;
maxIdle=&#39;15&#39;
minIdle=&#39;5&#39;
maxWait=&#39;5000&#39;
validationQuery=&#39;SELECT 1&#39;
testOnBorrow=&#39;true&#39; /&gt;
</code></pre><ul>
<li>So theoretically I could name each connection &ldquo;xmlui&rdquo; or &ldquo;dspaceWeb&rdquo; or something meaningful and it would show up in PostgreSQL&rsquo;s <code>pg_stat_activity</code> table!</li>
<li>This would be super helpful for figuring out where load was coming from (now I wonder if I could figure out how to graph this)</li>
@ -686,16 +686,16 @@ cache_alignment : 64
<li>I&rsquo;m looking at the <a href="https://wiki.lyrasis.org/display/DSDOC6x/Installing+DSpace#InstallingDSpace-ServletEngine(ApacheTomcat7orlater,Jetty,CauchoResinorequivalent)">DSpace 6.0 Install docs</a> and notice they tweak the number of threads in their Tomcat connector:</li>
</ul>
<pre tabindex="0"><code>&lt;!-- Define a non-SSL HTTP/1.1 Connector on port 8080 --&gt;
&lt;Connector port=&quot;8080&quot;
maxThreads=&quot;150&quot;
minSpareThreads=&quot;25&quot;
maxSpareThreads=&quot;75&quot;
enableLookups=&quot;false&quot;
redirectPort=&quot;8443&quot;
acceptCount=&quot;100&quot;
connectionTimeout=&quot;20000&quot;
disableUploadTimeout=&quot;true&quot;
URIEncoding=&quot;UTF-8&quot;/&gt;
&lt;Connector port=&#34;8080&#34;
maxThreads=&#34;150&#34;
minSpareThreads=&#34;25&#34;
maxSpareThreads=&#34;75&#34;
enableLookups=&#34;false&#34;
redirectPort=&#34;8443&#34;
acceptCount=&#34;100&#34;
connectionTimeout=&#34;20000&#34;
disableUploadTimeout=&#34;true&#34;
URIEncoding=&#34;UTF-8&#34;/&gt;
</code></pre><ul>
<li>In Tomcat 8.5 the <code>maxThreads</code> defaults to 200 which is probably fine, but tweaking <code>minSpareThreads</code> could be good</li>
<li>I don&rsquo;t see a setting for <code>maxSpareThreads</code> in the docs so that might be an error</li>
@ -711,8 +711,8 @@ cache_alignment : 64
<li>Still testing DSpace 6.2 on Tomcat 8.5.24</li>
<li>Catalina errors at Tomcat 8.5 startup:</li>
</ul>
<pre tabindex="0"><code>13-Jan-2018 13:59:05.245 WARNING [main] org.apache.tomcat.dbcp.dbcp2.BasicDataSourceFactory.getObjectInstance Name = dspace6 Property maxActive is not used in DBCP2, use maxTotal instead. maxTotal default value is 8. You have set value of &quot;35&quot; for &quot;maxActive&quot; property, which is being ignored.
13-Jan-2018 13:59:05.245 WARNING [main] org.apache.tomcat.dbcp.dbcp2.BasicDataSourceFactory.getObjectInstance Name = dspace6 Property maxWait is not used in DBCP2 , use maxWaitMillis instead. maxWaitMillis default value is -1. You have set value of &quot;5000&quot; for &quot;maxWait&quot; property, which is being ignored.
<pre tabindex="0"><code>13-Jan-2018 13:59:05.245 WARNING [main] org.apache.tomcat.dbcp.dbcp2.BasicDataSourceFactory.getObjectInstance Name = dspace6 Property maxActive is not used in DBCP2, use maxTotal instead. maxTotal default value is 8. You have set value of &#34;35&#34; for &#34;maxActive&#34; property, which is being ignored.
13-Jan-2018 13:59:05.245 WARNING [main] org.apache.tomcat.dbcp.dbcp2.BasicDataSourceFactory.getObjectInstance Name = dspace6 Property maxWait is not used in DBCP2 , use maxWaitMillis instead. maxWaitMillis default value is -1. You have set value of &#34;5000&#34; for &#34;maxWait&#34; property, which is being ignored.
</code></pre><ul>
<li>I looked in my Tomcat 7.0.82 logs and I don&rsquo;t see anything about DBCP2 errors, so I guess this a Tomcat 8.0.x or 8.5.x thing</li>
<li>DBCP2 appears to be Tomcat 8.0.x and up according to the <a href="https://tomcat.apache.org/migration-8.html">Tomcat 8.0 migration guide</a></li>
@ -761,15 +761,15 @@ Caused by: java.lang.NullPointerException
<li>Help Udana from IWMI export a CSV from DSpace Test so he can start trying a batch upload</li>
<li>I&rsquo;m going to apply these ~130 corrections on CGSpace:</li>
</ul>
<pre tabindex="0"><code>update metadatavalue set text_value='Formally Published' where resource_type_id=2 and metadata_field_id=214 and text_value like 'Formally published';
delete from metadatavalue where resource_type_id=2 and metadata_field_id=214 and text_value like 'NO';
update metadatavalue set text_value='en' where resource_type_id=2 and metadata_field_id=38 and text_value ~ '(En|English)';
update metadatavalue set text_value='fr' where resource_type_id=2 and metadata_field_id=38 and text_value ~ '(fre|frn|French)';
update metadatavalue set text_value='es' where resource_type_id=2 and metadata_field_id=38 and text_value ~ '(Spanish|spa)';
update metadatavalue set text_value='vi' where resource_type_id=2 and metadata_field_id=38 and text_value='Vietnamese';
update metadatavalue set text_value='ru' where resource_type_id=2 and metadata_field_id=38 and text_value='Ru';
update metadatavalue set text_value='in' where resource_type_id=2 and metadata_field_id=38 and text_value ~ '(IN|In)';
delete from metadatavalue where resource_type_id=2 and metadata_field_id=38 and text_value ~ '(dc.language.iso|CGIAR Challenge Program on Water and Food)';
<pre tabindex="0"><code>update metadatavalue set text_value=&#39;Formally Published&#39; where resource_type_id=2 and metadata_field_id=214 and text_value like &#39;Formally published&#39;;
delete from metadatavalue where resource_type_id=2 and metadata_field_id=214 and text_value like &#39;NO&#39;;
update metadatavalue set text_value=&#39;en&#39; where resource_type_id=2 and metadata_field_id=38 and text_value ~ &#39;(En|English)&#39;;
update metadatavalue set text_value=&#39;fr&#39; where resource_type_id=2 and metadata_field_id=38 and text_value ~ &#39;(fre|frn|French)&#39;;
update metadatavalue set text_value=&#39;es&#39; where resource_type_id=2 and metadata_field_id=38 and text_value ~ &#39;(Spanish|spa)&#39;;
update metadatavalue set text_value=&#39;vi&#39; where resource_type_id=2 and metadata_field_id=38 and text_value=&#39;Vietnamese&#39;;
update metadatavalue set text_value=&#39;ru&#39; where resource_type_id=2 and metadata_field_id=38 and text_value=&#39;Ru&#39;;
update metadatavalue set text_value=&#39;in&#39; where resource_type_id=2 and metadata_field_id=38 and text_value ~ &#39;(IN|In)&#39;;
delete from metadatavalue where resource_type_id=2 and metadata_field_id=38 and text_value ~ &#39;(dc.language.iso|CGIAR Challenge Program on Water and Food)&#39;;
</code></pre><ul>
<li>Continue proofing Peter&rsquo;s author corrections that I started yesterday, faceting on non blank, non flagged, and briefly scrolling through the values of the corrections to find encoding errors for French and Spanish names</li>
</ul>
@ -777,17 +777,17 @@ delete from metadatavalue where resource_type_id=2 and metadata_field_id=38 and
<ul>
<li>Apply corrections using <a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897">fix-metadata-values.py</a>:</li>
</ul>
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2018-01-14-Authors-1300-Corrections.csv -f dc.contributor.author -t correct -m 3 -d dspace-u dspace -p 'fuuu'
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2018-01-14-Authors-1300-Corrections.csv -f dc.contributor.author -t correct -m 3 -d dspace-u dspace -p &#39;fuuu&#39;
</code></pre><ul>
<li>In looking at some of the values to delete or check I found some metadata values that I could not resolve their handle via SQL:</li>
</ul>
<pre tabindex="0"><code>dspace=# select * from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='Tarawali';
<pre tabindex="0"><code>dspace=# select * from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value=&#39;Tarawali&#39;;
metadata_value_id | resource_id | metadata_field_id | text_value | text_lang | place | authority | confidence | resource_type_id
-------------------+-------------+-------------------+------------+-----------+-------+-----------+------------+------------------
2757936 | 4369 | 3 | Tarawali | | 9 | | 600 | 2
(1 row)
dspace=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id = '4369';
dspace=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id = &#39;4369&#39;;
handle
--------
(0 rows)
@ -796,7 +796,7 @@ dspace=# select handle from item, handle where handle.resource_id = item.item_id
<li>Otherwise, the <a href="https://wiki.lyrasis.org/display/DSPACE/Helper+SQL+functions+for+DSpace+5">DSpace 5 SQL Helper Functions</a> provide <code>ds5_item2itemhandle()</code>, which is much easier than my long query above that I always have to go search for</li>
<li>For example, to find the Handle for an item that has the author &ldquo;Erni&rdquo;:</li>
</ul>
<pre tabindex="0"><code>dspace=# select * from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='Erni';
<pre tabindex="0"><code>dspace=# select * from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value=&#39;Erni&#39;;
metadata_value_id | resource_id | metadata_field_id | text_value | text_lang | place | authority | confidence | resource_type_id
-------------------+-------------+-------------------+------------+-----------+-------+--------------------------------------+------------+------------------
2612150 | 70308 | 3 | Erni | | 9 | 3fe10c68-6773-49a7-89cc-63eb508723f2 | -1 | 2
@ -809,16 +809,16 @@ dspace=# select ds5_item2itemhandle(70308);
</code></pre><ul>
<li>Next I apply the author deletions:</li>
</ul>
<pre tabindex="0"><code>$ ./delete-metadata-values.py -i /tmp/2018-01-14-Authors-5-Deletions.csv -f dc.contributor.author -m 3 -d dspace -u dspace -p 'fuuu'
<pre tabindex="0"><code>$ ./delete-metadata-values.py -i /tmp/2018-01-14-Authors-5-Deletions.csv -f dc.contributor.author -m 3 -d dspace -u dspace -p &#39;fuuu&#39;
</code></pre><ul>
<li>Now working on the affiliation corrections from Peter:</li>
</ul>
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2018-01-15-Affiliations-888-Corrections.csv -f cg.contributor.affiliation -t correct -m 211 -d dspace -u dspace -p 'fuuu'
$ ./delete-metadata-values.py -i /tmp/2018-01-15-Affiliations-11-Deletions.csv -f cg.contributor.affiliation -m 211 -d dspace -u dspace -p 'fuuu'
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2018-01-15-Affiliations-888-Corrections.csv -f cg.contributor.affiliation -t correct -m 211 -d dspace -u dspace -p &#39;fuuu&#39;
$ ./delete-metadata-values.py -i /tmp/2018-01-15-Affiliations-11-Deletions.csv -f cg.contributor.affiliation -m 211 -d dspace -u dspace -p &#39;fuuu&#39;
</code></pre><ul>
<li>Now I made a new list of affiliations for Peter to look through:</li>
</ul>
<pre tabindex="0"><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where metadata_schema_id = 2 and element = 'contributor' and qualifier = 'affiliation') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/affiliations.csv with csv;
<pre tabindex="0"><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where metadata_schema_id = 2 and element = &#39;contributor&#39; and qualifier = &#39;affiliation&#39;) AND resource_type_id = 2 group by text_value order by count desc) to /tmp/affiliations.csv with csv;
COPY 4552
</code></pre><ul>
<li>Looking over the affiliations again I see dozens of CIAT ones with their affiliation formatted like: International Center for Tropical Agriculture (CIAT)</li>
@ -832,7 +832,7 @@ COPY 4552
</code></pre><ul>
<li>Looks like we processed 2.9 million requests on CGSpace in 2017-12:</li>
</ul>
<pre tabindex="0"><code># time zcat --force /var/log/nginx/* | grep -cE &quot;[0-9]{1,2}/Dec/2017&quot;
<pre tabindex="0"><code># time zcat --force /var/log/nginx/* | grep -cE &#34;[0-9]{1,2}/Dec/2017&#34;
2890041
real 0m25.756s
@ -845,7 +845,7 @@ sys 0m2.210s
<li>Discuss standardized names for CRPs and centers with ICARDA (don&rsquo;t wait for CG Core)</li>
<li>Re-send DC rights implementation and forward to everyone so we can move forward with it (without the URI field for now)</li>
<li>Start looking at where I was with the AGROVOC API</li>
<li>Have a controlled vocabulary for CGIAR authors' names and ORCIDs? Perhaps values like: Orth, Alan S. (0000-0002-1735-7458)</li>
<li>Have a controlled vocabulary for CGIAR authors&rsquo; names and ORCIDs? Perhaps values like: Orth, Alan S. (0000-0002-1735-7458)</li>
<li>Need to find the metadata field name that ICARDA is using for their ORCIDs</li>
<li>Update text for DSpace version plan on wiki</li>
<li>Come up with an SLA, something like: <em>In return for your contribution we will, to the best of our ability, ensure 99.5% (&ldquo;two and a half nines&rdquo;) uptime of CGSpace, ensure data is stored in open formats and safely backed up, follow CG Core metadata standards, &hellip;</em></li>
@ -864,14 +864,14 @@ sys 0m2.210s
<li>Also, there are whitespace issues in many columns, and the items are mapped to the LIVES and ILRI articles collections, not Theses</li>
<li>In any case, importing them like this:</li>
</ul>
<pre tabindex="0"><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx512m -XX:+TieredCompilation -XX:TieredStopAtLevel=1&quot;
<pre tabindex="0"><code>$ export JAVA_OPTS=&#34;-Dfile.encoding=UTF-8 -Xmx512m -XX:+TieredCompilation -XX:TieredStopAtLevel=1&#34;
$ dspace import -a -e aorth@mjanja.ch -s /tmp/2018-01-16\ LIVES/SimpleArchiveFormat -m lives.map &amp;&gt; lives.log
</code></pre><ul>
<li>And fantastic, before I started the import there were 10 PostgreSQL connections, and then CGSpace crashed during the upload</li>
<li>When I looked there were 210 PostgreSQL connections!</li>
<li>I don&rsquo;t see any high load in XMLUI or REST/OAI:</li>
</ul>
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E &quot;17/Jan/2018&quot; | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E &#34;17/Jan/2018&#34; | awk &#39;{print $1}&#39; | sort -n | uniq -c | sort -h | tail
381 40.77.167.124
403 213.55.99.121
431 207.46.13.60
@ -882,7 +882,7 @@ $ dspace import -a -e aorth@mjanja.ch -s /tmp/2018-01-16\ LIVES/SimpleArchiveFor
593 54.91.48.104
757 104.196.152.243
776 66.249.66.90
# cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E &quot;17/Jan/2018&quot; | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
# cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E &#34;17/Jan/2018&#34; | awk &#39;{print $1}&#39; | sort -n | uniq -c | sort -h | tail
11 205.201.132.14
11 40.77.167.124
15 35.226.23.240
@ -906,7 +906,7 @@ $ dspace import -a -e aorth@mjanja.ch -s /tmp/2018-01-16\ LIVES/SimpleArchiveFor
[====================&gt; ]40% time remaining: 7 hour(s) 14 minute(s) 45 seconds. timestamp: 2018-01-17 07:57:11
[====================&gt; ]40% time remaining: 7 hour(s) 14 minute(s) 44 seconds. timestamp: 2018-01-17 07:57:37
[====================&gt; ]40% time remaining: 7 hour(s) 16 minute(s) 5 seconds. timestamp: 2018-01-17 07:57:49
Exception in thread &quot;http-bio-127.0.0.1-8081-exec-627&quot; java.lang.OutOfMemoryError: Java heap space
Exception in thread &#34;http-bio-127.0.0.1-8081-exec-627&#34; java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.util.FixedBitSet.clone(FixedBitSet.java:576)
at org.apache.solr.search.BitDocSet.andNot(BitDocSet.java:222)
at org.apache.solr.search.SolrIndexSearcher.getProcessedFilter(SolrIndexSearcher.java:1067)
@ -1004,7 +1004,7 @@ $ docker run --network dspace-build --name artifactory -d -v artifactory5_data:/
<li>I don&rsquo;t see any errors in the nginx or catalina logs, so I guess UptimeRobot just got impatient and closed the request, which caused nginx to send an HTTP 499</li>
<li>I realize I never did a full re-index after the SQL author and affiliation updates last week, so I should force one now:</li>
</ul>
<pre tabindex="0"><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx1024m -XX:+TieredCompilation -XX:TieredStopAtLevel=1&quot;
<pre tabindex="0"><code>$ export JAVA_OPTS=&#34;-Dfile.encoding=UTF-8 -Xmx1024m -XX:+TieredCompilation -XX:TieredStopAtLevel=1&#34;
$ time schedtool -D -e ionice -c2 -n7 nice -n19 /home/cgspace.cgiar.org/bin/dspace index-discovery -b
</code></pre><ul>
<li>Maria from Bioversity asked if I could remove the abstracts from all of their Limited Access items in the <a href="https://cgspace.cgiar.org/handle/10568/35501">Bioversity Journal Articles</a> collection</li>
@ -1026,7 +1026,7 @@ Jan 18 07:01:22 linode18 sudo[10812]: pam_unix(sudo:session): session opened for
<li>Linode alerted and said that the CPU load was 264.1% on CGSpace</li>
<li>Start the Discovery indexing again:</li>
</ul>
<pre tabindex="0"><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx1024m -XX:+TieredCompilation -XX:TieredStopAtLevel=1&quot;
<pre tabindex="0"><code>$ export JAVA_OPTS=&#34;-Dfile.encoding=UTF-8 -Xmx1024m -XX:+TieredCompilation -XX:TieredStopAtLevel=1&#34;
$ time schedtool -D -e ionice -c2 -n7 nice -n19 /home/cgspace.cgiar.org/bin/dspace index-discovery -b
</code></pre><ul>
<li>Linode alerted again and said that CGSpace was using 301% CPU</li>
@ -1073,10 +1073,10 @@ sys 0m12.317s
</ul>
<pre tabindex="0"><code>$ docker exec dspace_db dropdb -U postgres dspace
$ docker exec dspace_db createdb -U postgres -O dspace --encoding=UNICODE dspace
$ docker exec dspace_db psql -U postgres dspace -c 'alter user dspace createuser;'
$ docker exec dspace_db psql -U postgres dspace -c &#39;alter user dspace createuser;&#39;
$ docker cp test.dump dspace_db:/tmp/test.dump
$ docker exec dspace_db pg_restore -U postgres -d dspace /tmp/test.dump
$ docker exec dspace_db psql -U postgres dspace -c 'alter user dspace nocreateuser;'
$ docker exec dspace_db psql -U postgres dspace -c &#39;alter user dspace nocreateuser;&#39;
$ docker exec dspace_db vacuumdb -U postgres dspace
$ docker cp ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspace_db:/tmp
$ docker exec dspace_db psql -U dspace -f /tmp/update-sequences.sql dspace
@ -1119,12 +1119,12 @@ $ ./rest-find-collections.py 10568/1 | grep -i untitled
<li>Thinking about generating a jmeter test plan for DSpace, along the lines of <a href="https://github.com/Georgetown-University-Libraries/dspace-performance-test">Georgetown&rsquo;s dspace-performance-test</a></li>
<li>I got a list of all the GET requests on CGSpace for January 21st (the last time Linode complained the load was high), excluding admin calls:</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz /var/log/nginx/library-access.log.2.gz /var/log/nginx/library-access.log.3.gz /var/log/nginx/rest.log.2.gz /var/log/nginx/rest.log.3.gz /var/log/nginx/oai.log.2.gz /var/log/nginx/oai.log.3.gz /var/log/nginx/error.log.2.gz /var/log/nginx/error.log.3.gz | grep &quot;21/Jan/2018&quot; | grep &quot;GET &quot; | grep -c -v &quot;/admin&quot;
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz /var/log/nginx/library-access.log.2.gz /var/log/nginx/library-access.log.3.gz /var/log/nginx/rest.log.2.gz /var/log/nginx/rest.log.3.gz /var/log/nginx/oai.log.2.gz /var/log/nginx/oai.log.3.gz /var/log/nginx/error.log.2.gz /var/log/nginx/error.log.3.gz | grep &#34;21/Jan/2018&#34; | grep &#34;GET &#34; | grep -c -v &#34;/admin&#34;
56405
</code></pre><ul>
<li>Apparently about 28% of these requests were for bitstreams, 30% for the REST API, and 30% for handles:</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz /var/log/nginx/library-access.log.2.gz /var/log/nginx/library-access.log.3.gz /var/log/nginx/rest.log.2.gz /var/log/nginx/rest.log.3.gz /var/log/nginx/oai.log.2.gz /var/log/nginx/oai.log.3.gz /var/log/nginx/error.log.2.gz /var/log/nginx/error.log.3.gz | grep &quot;21/Jan/2018&quot; | grep &quot;GET &quot; | grep -v &quot;/admin&quot; | awk '{print $7}' | grep -Eo &quot;^/(handle|bitstream|rest|oai)/&quot; | sort | uniq -c | sort -n
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz /var/log/nginx/library-access.log.2.gz /var/log/nginx/library-access.log.3.gz /var/log/nginx/rest.log.2.gz /var/log/nginx/rest.log.3.gz /var/log/nginx/oai.log.2.gz /var/log/nginx/oai.log.3.gz /var/log/nginx/error.log.2.gz /var/log/nginx/error.log.3.gz | grep &#34;21/Jan/2018&#34; | grep &#34;GET &#34; | grep -v &#34;/admin&#34; | awk &#39;{print $7}&#39; | grep -Eo &#34;^/(handle|bitstream|rest|oai)/&#34; | sort | uniq -c | sort -n
38 /oai/
14406 /bitstream/
15179 /rest/
@ -1132,14 +1132,14 @@ $ ./rest-find-collections.py 10568/1 | grep -i untitled
</code></pre><ul>
<li>And 3% were to the homepage or search:</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz /var/log/nginx/library-access.log.2.gz /var/log/nginx/library-access.log.3.gz /var/log/nginx/rest.log.2.gz /var/log/nginx/rest.log.3.gz /var/log/nginx/oai.log.2.gz /var/log/nginx/oai.log.3.gz /var/log/nginx/error.log.2.gz /var/log/nginx/error.log.3.gz | grep &quot;21/Jan/2018&quot; | grep &quot;GET &quot; | grep -v &quot;/admin&quot; | awk '{print $7}' | grep -Eo '^/($|open-search|discover)' | sort | uniq -c
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz /var/log/nginx/library-access.log.2.gz /var/log/nginx/library-access.log.3.gz /var/log/nginx/rest.log.2.gz /var/log/nginx/rest.log.3.gz /var/log/nginx/oai.log.2.gz /var/log/nginx/oai.log.3.gz /var/log/nginx/error.log.2.gz /var/log/nginx/error.log.3.gz | grep &#34;21/Jan/2018&#34; | grep &#34;GET &#34; | grep -v &#34;/admin&#34; | awk &#39;{print $7}&#39; | grep -Eo &#39;^/($|open-search|discover)&#39; | sort | uniq -c
1050 /
413 /discover
170 /open-search
</code></pre><ul>
<li>The last 10% or so seem to be for static assets that would be served by nginx anyways:</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz /var/log/nginx/library-access.log.2.gz /var/log/nginx/library-access.log.3.gz /var/log/nginx/rest.log.2.gz /var/log/nginx/rest.log.3.gz /var/log/nginx/oai.log.2.gz /var/log/nginx/oai.log.3.gz /var/log/nginx/error.log.2.gz /var/log/nginx/error.log.3.gz | grep &quot;21/Jan/2018&quot; | grep &quot;GET &quot; | grep -v &quot;/admin&quot; | awk '{print $7}' | grep -v bitstream | grep -Eo '\.(js|css|png|jpg|jpeg|php|svg|gif|txt|map)$' | sort | uniq -c | sort -n
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz /var/log/nginx/library-access.log.2.gz /var/log/nginx/library-access.log.3.gz /var/log/nginx/rest.log.2.gz /var/log/nginx/rest.log.3.gz /var/log/nginx/oai.log.2.gz /var/log/nginx/oai.log.3.gz /var/log/nginx/error.log.2.gz /var/log/nginx/error.log.3.gz | grep &#34;21/Jan/2018&#34; | grep &#34;GET &#34; | grep -v &#34;/admin&#34; | awk &#39;{print $7}&#39; | grep -v bitstream | grep -Eo &#39;\.(js|css|png|jpg|jpeg|php|svg|gif|txt|map)$&#39; | sort | uniq -c | sort -n
2 .gif
7 .css
84 .js
@ -1153,7 +1153,7 @@ $ ./rest-find-collections.py 10568/1 | grep -i untitled
<ul>
<li>Looking at the REST requests, most of them are to expand all or metadata, but 5% are for retrieving bitstreams:</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log.3.gz /var/log/nginx/access.log.4.gz /var/log/nginx/library-access.log.3.gz /var/log/nginx/library-access.log.4.gz /var/log/nginx/rest.log.3.gz /var/log/nginx/rest.log.4.gz /var/log/nginx/oai.log.3.gz /var/log/nginx/oai.log.4.gz /var/log/nginx/error.log.3.gz /var/log/nginx/error.log.4.gz | grep &quot;21/Jan/2018&quot; | grep &quot;GET &quot; | grep -v &quot;/admin&quot; | awk '{print $7}' | grep -E &quot;^/rest&quot; | grep -Eo &quot;(retrieve|expand=[a-z].*)&quot; | sort | uniq -c | sort -n
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log.3.gz /var/log/nginx/access.log.4.gz /var/log/nginx/library-access.log.3.gz /var/log/nginx/library-access.log.4.gz /var/log/nginx/rest.log.3.gz /var/log/nginx/rest.log.4.gz /var/log/nginx/oai.log.3.gz /var/log/nginx/oai.log.4.gz /var/log/nginx/error.log.3.gz /var/log/nginx/error.log.4.gz | grep &#34;21/Jan/2018&#34; | grep &#34;GET &#34; | grep -v &#34;/admin&#34; | awk &#39;{print $7}&#39; | grep -E &#34;^/rest&#34; | grep -Eo &#34;(retrieve|expand=[a-z].*)&#34; | sort | uniq -c | sort -n
1 expand=collections
16 expand=all&amp;limit=1
45 expand=items
@ -1268,15 +1268,15 @@ $ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.j
<li>Looking at the DSpace logs I see this error happened just before UptimeRobot noticed it going down:</li>
</ul>
<pre tabindex="0"><code>2018-01-29 05:30:22,226 INFO org.dspace.usage.LoggerUsageEventListener @ anonymous:session_id=3775D4125D28EF0C691B08345D905141:ip_addr=68.180.229.254:view_item:handle=10568/71890
2018-01-29 05:30:22,322 ERROR org.dspace.app.xmlui.aspect.discovery.AbstractSearch @ org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1994+TO+1999]': Encountered &quot; &quot;]&quot; &quot;] &quot;&quot; at line 1, column 32.
2018-01-29 05:30:22,322 ERROR org.dspace.app.xmlui.aspect.discovery.AbstractSearch @ org.apache.solr.search.SyntaxError: Cannot parse &#39;dateIssued_keyword:[1994+TO+1999]&#39;: Encountered &#34; &#34;]&#34; &#34;] &#34;&#34; at line 1, column 32.
Was expecting one of:
&quot;TO&quot; ...
&#34;TO&#34; ...
&lt;RANGE_QUOTED&gt; ...
&lt;RANGE_GOOP&gt; ...
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1994+TO+1999]': Encountered &quot; &quot;]&quot; &quot;] &quot;&quot; at line 1, column 32.
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse &#39;dateIssued_keyword:[1994+TO+1999]&#39;: Encountered &#34; &#34;]&#34; &#34;] &#34;&#34; at line 1, column 32.
Was expecting one of:
&quot;TO&quot; ...
&#34;TO&#34; ...
&lt;RANGE_QUOTED&gt; ...
&lt;RANGE_GOOP&gt; ...
</code></pre><ul>
@ -1284,12 +1284,12 @@ Was expecting one of:
<li>I see a few dozen HTTP 499 errors in the nginx access log for a few minutes before this happened, but HTTP 499 is just when nginx says that the client closed the request early</li>
<li>Perhaps this from the nginx error log is relevant?</li>
</ul>
<pre tabindex="0"><code>2018/01/29 05:26:34 [warn] 26895#26895: *944759 an upstream response is buffered to a temporary file /var/cache/nginx/proxy_temp/6/16/0000026166 while reading upstream, client: 180.76.15.34, server: cgspace.cgiar.org, request: &quot;GET /bitstream/handle/10947/4658/FISH%20Leaflet.pdf?sequence=12 HTTP/1.1&quot;, upstream: &quot;http://127.0.0.1:8443/bitstream/handle/10947/4658/FISH%20Leaflet.pdf?sequence=12&quot;, host: &quot;cgspace.cgiar.org&quot;
<pre tabindex="0"><code>2018/01/29 05:26:34 [warn] 26895#26895: *944759 an upstream response is buffered to a temporary file /var/cache/nginx/proxy_temp/6/16/0000026166 while reading upstream, client: 180.76.15.34, server: cgspace.cgiar.org, request: &#34;GET /bitstream/handle/10947/4658/FISH%20Leaflet.pdf?sequence=12 HTTP/1.1&#34;, upstream: &#34;http://127.0.0.1:8443/bitstream/handle/10947/4658/FISH%20Leaflet.pdf?sequence=12&#34;, host: &#34;cgspace.cgiar.org&#34;
</code></pre><ul>
<li>I think that must be unrelated, probably the client closed the request to nginx because DSpace (Tomcat) was taking too long</li>
<li>An interesting <a href="https://gist.github.com/magnetikonline/11312172">snippet to get the maximum and average nginx responses</a>:</li>
</ul>
<pre tabindex="0"><code># awk '($9 ~ /200/) { i++;sum+=$10;max=$10&gt;max?$10:max; } END { printf(&quot;Maximum: %d\nAverage: %d\n&quot;,max,i?sum/i:0); }' /var/log/nginx/access.log
<pre tabindex="0"><code># awk &#39;($9 ~ /200/) { i++;sum+=$10;max=$10&gt;max?$10:max; } END { printf(&#34;Maximum: %d\nAverage: %d\n&#34;,max,i?sum/i:0); }&#39; /var/log/nginx/access.log
Maximum: 2771268
Average: 210483
</code></pre><ul>
@ -1297,7 +1297,7 @@ Average: 210483
<li>My best guess is that the Solr search error is related somehow but I can&rsquo;t figure it out</li>
<li>We definitely have enough database connections, as I haven&rsquo;t seen a pool error in weeks:</li>
</ul>
<pre tabindex="0"><code>$ grep -c &quot;Timeout: Pool empty.&quot; dspace.log.2018-01-2*
<pre tabindex="0"><code>$ grep -c &#34;Timeout: Pool empty.&#34; dspace.log.2018-01-2*
dspace.log.2018-01-20:0
dspace.log.2018-01-21:0
dspace.log.2018-01-22:0
@ -1329,7 +1329,7 @@ dspace.log.2018-01-29:0
<pre tabindex="0"><code>[tomcat_*]
env.host 127.0.0.1
env.port 8081
env.connector &quot;http-bio-127.0.0.1-8443&quot;
env.connector &#34;http-bio-127.0.0.1-8443&#34;
env.user munin
env.password munin
</code></pre><ul>
@ -1345,8 +1345,8 @@ max.value 400
<li>Although following the logic of <em>/usr/share/munin/plugins/jmx_tomcat_dbpools</em> could be useful for getting the active Tomcat sessions</li>
<li>From debugging the <code>jmx_tomcat_db_pools</code> script from the <code>munin-plugins-java</code> package, I see that this is how you call arbitrary mbeans:</li>
</ul>
<pre tabindex="0"><code># port=5400 ip=&quot;127.0.0.1&quot; /usr/bin/java -cp /usr/share/munin/munin-jmx-plugins.jar org.munin.plugin.jmx.Beans Catalina:type=DataSource,class=javax.sql.DataSource,name=* maxActive
Catalina:type=DataSource,class=javax.sql.DataSource,name=&quot;jdbc/dspace&quot; maxActive 300
<pre tabindex="0"><code># port=5400 ip=&#34;127.0.0.1&#34; /usr/bin/java -cp /usr/share/munin/munin-jmx-plugins.jar org.munin.plugin.jmx.Beans Catalina:type=DataSource,class=javax.sql.DataSource,name=* maxActive
Catalina:type=DataSource,class=javax.sql.DataSource,name=&#34;jdbc/dspace&#34; maxActive 300
</code></pre><ul>
<li>More notes here: <a href="https://github.com/munin-monitoring/contrib/tree/master/plugins/jmx">https://github.com/munin-monitoring/contrib/tree/master/plugins/jmx</a></li>
<li>Looking at the Munin graphs, I that the load is 200% every morning from 03:00 to almost 08:00</li>
@ -1356,7 +1356,7 @@ Catalina:type=DataSource,class=javax.sql.DataSource,name=&quot;jdbc/dspace&quot;
</code></pre><ul>
<li>There are millions of these status lines, for example in just this one log file:</li>
</ul>
<pre tabindex="0"><code># zgrep -c &quot;time remaining&quot; /var/log/tomcat7/catalina.out.1.gz
<pre tabindex="0"><code># zgrep -c &#34;time remaining&#34; /var/log/tomcat7/catalina.out.1.gz
1084741
</code></pre><ul>
<li>I filed a ticket with Atmire: <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=566">https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=566</a></li>
@ -1389,7 +1389,7 @@ javax.ws.rs.WebApplicationException
<li>For now I will restart Tomcat to clear this shit and bring the site back up</li>
<li>The top IPs from this morning, during 7 and 8AM in XMLUI and REST/OAI:</li>
</ul>
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 /var/log/nginx/error.log /var/log/nginx/error.log.1 | grep -E &quot;31/Jan/2018:(07|08)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 /var/log/nginx/error.log /var/log/nginx/error.log.1 | grep -E &#34;31/Jan/2018:(07|08)&#34; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
67 66.249.66.70
70 207.46.13.12
71 197.210.168.174
@ -1400,7 +1400,7 @@ javax.ws.rs.WebApplicationException
198 66.249.66.90
219 41.204.190.40
255 2405:204:a208:1e12:132:2a8e:ad28:46c0
# cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E &quot;31/Jan/2018:(07|08)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
# cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E &#34;31/Jan/2018:(07|08)&#34; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
2 65.55.210.187
2 66.249.66.90
3 157.55.39.79
@ -1426,7 +1426,7 @@ javax.ws.rs.WebApplicationException
<li>I should make separate database pools for the web applications and the API applications like REST and OAI</li>
<li>Ok, so this is interesting: I figured out how to get the MBean path to query Tomcat&rsquo;s activeSessions from JMX (using <code>munin-plugins-java</code>):</li>
</ul>
<pre tabindex="0"><code># port=5400 ip=&quot;127.0.0.1&quot; /usr/bin/java -cp /usr/share/munin/munin-jmx-plugins.jar org.munin.plugin.jmx.Beans Catalina:type=Manager,context=/,host=localhost activeSessions
<pre tabindex="0"><code># port=5400 ip=&#34;127.0.0.1&#34; /usr/bin/java -cp /usr/share/munin/munin-jmx-plugins.jar org.munin.plugin.jmx.Beans Catalina:type=Manager,context=/,host=localhost activeSessions
Catalina:type=Manager,context=/,host=localhost activeSessions 8
</code></pre><ul>
<li>If you connect to Tomcat in <code>jvisualvm</code> it&rsquo;s pretty obvious when you hover over the elements</li>