Add notes for 2020-01-27

This commit is contained in:
2020-01-27 16:20:44 +02:00
parent 207ace0883
commit 8feb93be39
112 changed files with 11466 additions and 5158 deletions

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="2018-01-02
Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time
I didn&#39;t get any load alerts from Linode and the REST and XMLUI logs don&#39;t show anything out of the ordinary
I didn&rsquo;t get any load alerts from Linode and the REST and XMLUI logs don&rsquo;t show anything out of the ordinary
The nginx logs show HTTP 200s until 02/Jan/2018:11:27:17 &#43;0000 when Uptime Robot got an HTTP 500
In dspace.log around that time I see many errors like &ldquo;Client closed the connection before file download was complete&rdquo;
And just before that I see this:
@ -17,8 +17,8 @@ And just before that I see this:
Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
Ah hah! So the pool was actually empty!
I need to increase that, let&#39;s try to bump it up from 50 to 75
After that one client got an HTTP 499 but then the rest were HTTP 200, so I don&#39;t know what the hell Uptime Robot saw
I need to increase that, let&rsquo;s try to bump it up from 50 to 75
After that one client got an HTTP 499 but then the rest were HTTP 200, so I don&rsquo;t know what the hell Uptime Robot saw
I notice this error quite a few times in dspace.log:
2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
@ -71,7 +71,7 @@ dspace.log.2017-12-31:53
dspace.log.2018-01-01:45
dspace.log.2018-01-02:34
Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let&#39;s Encrypt if it&#39;s just a handful of domains
Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let&rsquo;s Encrypt if it&rsquo;s just a handful of domains
" />
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-01/" />
@ -83,7 +83,7 @@ Danny wrote to ask for help renewing the wildcard ilri.org certificate and I adv
<meta name="twitter:description" content="2018-01-02
Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time
I didn&#39;t get any load alerts from Linode and the REST and XMLUI logs don&#39;t show anything out of the ordinary
I didn&rsquo;t get any load alerts from Linode and the REST and XMLUI logs don&rsquo;t show anything out of the ordinary
The nginx logs show HTTP 200s until 02/Jan/2018:11:27:17 &#43;0000 when Uptime Robot got an HTTP 500
In dspace.log around that time I see many errors like &ldquo;Client closed the connection before file download was complete&rdquo;
And just before that I see this:
@ -91,8 +91,8 @@ And just before that I see this:
Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
Ah hah! So the pool was actually empty!
I need to increase that, let&#39;s try to bump it up from 50 to 75
After that one client got an HTTP 499 but then the rest were HTTP 200, so I don&#39;t know what the hell Uptime Robot saw
I need to increase that, let&rsquo;s try to bump it up from 50 to 75
After that one client got an HTTP 499 but then the rest were HTTP 200, so I don&rsquo;t know what the hell Uptime Robot saw
I notice this error quite a few times in dspace.log:
2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
@ -145,9 +145,9 @@ dspace.log.2017-12-31:53
dspace.log.2018-01-01:45
dspace.log.2018-01-02:34
Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let&#39;s Encrypt if it&#39;s just a handful of domains
Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let&rsquo;s Encrypt if it&rsquo;s just a handful of domains
"/>
<meta name="generator" content="Hugo 0.62.2" />
<meta name="generator" content="Hugo 0.63.1" />
@ -177,7 +177,7 @@ Danny wrote to ask for help renewing the wildcard ilri.org certificate and I adv
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.a20c1a4367639632cdb341d23c27ca44fedcc75b0f8b3cbea6203010da153d3c.css" rel="stylesheet" integrity="sha256-ogwaQ2djljLNs0HSPCfKRP7cx1sPizy&#43;piAwENoVPTw=" crossorigin="anonymous">
<link href="https://alanorth.github.io/cgspace-notes/css/style.23e2c3298bcc8c1136c19aba330c211ec94c36f7c4454ea15cf4d3548370042a.css" rel="stylesheet" integrity="sha256-I&#43;LDKYvMjBE2wZq6MwwhHslMNvfERU6hXPTTVINwBCo=" crossorigin="anonymous">
<!-- RSS 2.0 feed -->
@ -224,7 +224,7 @@ Danny wrote to ask for help renewing the wildcard ilri.org certificate and I adv
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-01/">January, 2018</a></h2>
<p class="blog-post-meta"><time datetime="2018-01-02T08:35:54-08:00">Tue Jan 02, 2018</time> by Alan Orth in
<i class="fa fa-folder" aria-hidden="true"></i>&nbsp;<a href="/cgspace-notes/categories/notes" rel="category tag">Notes</a>
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes" rel="category tag">Notes</a>
</p>
@ -232,7 +232,7 @@ Danny wrote to ask for help renewing the wildcard ilri.org certificate and I adv
<h2 id="2018-01-02">2018-01-02</h2>
<ul>
<li>Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time</li>
<li>I didn't get any load alerts from Linode and the REST and XMLUI logs don't show anything out of the ordinary</li>
<li>I didn&rsquo;t get any load alerts from Linode and the REST and XMLUI logs don&rsquo;t show anything out of the ordinary</li>
<li>The nginx logs show HTTP 200s until <code>02/Jan/2018:11:27:17 +0000</code> when Uptime Robot got an HTTP 500</li>
<li>In dspace.log around that time I see many errors like &ldquo;Client closed the connection before file download was complete&rdquo;</li>
<li>And just before that I see this:</li>
@ -240,8 +240,8 @@ Danny wrote to ask for help renewing the wildcard ilri.org certificate and I adv
<pre><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
</code></pre><ul>
<li>Ah hah! So the pool was actually empty!</li>
<li>I need to increase that, let's try to bump it up from 50 to 75</li>
<li>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don't know what the hell Uptime Robot saw</li>
<li>I need to increase that, let&rsquo;s try to bump it up from 50 to 75</li>
<li>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don&rsquo;t know what the hell Uptime Robot saw</li>
<li>I notice this error quite a few times in dspace.log:</li>
</ul>
<pre><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
@ -294,7 +294,7 @@ dspace.log.2017-12-31:53
dspace.log.2018-01-01:45
dspace.log.2018-01-02:34
</code></pre><ul>
<li>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let's Encrypt if it's just a handful of domains</li>
<li>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let&rsquo;s Encrypt if it&rsquo;s just a handful of domains</li>
</ul>
<h2 id="2018-01-03">2018-01-03</h2>
<ul>
@ -326,8 +326,8 @@ dspace.log.2018-01-03:1909
</code></pre><ul>
<li>134.155.96.78 appears to be at the University of Mannheim in Germany</li>
<li>They identify as: Mozilla/5.0 (compatible; heritrix/3.2.0 +http://ifm.uni-mannheim.de)</li>
<li>This appears to be the <a href="https://github.com/internetarchive/heritrix3">Internet Archive's open source bot</a></li>
<li>They seem to be re-using their Tomcat session so I don't need to do anything to them just yet:</li>
<li>This appears to be the <a href="https://github.com/internetarchive/heritrix3">Internet Archive&rsquo;s open source bot</a></li>
<li>They seem to be re-using their Tomcat session so I don&rsquo;t need to do anything to them just yet:</li>
</ul>
<pre><code>$ grep 134.155.96.78 dspace.log.2018-01-03 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
2
@ -387,8 +387,8 @@ dspace.log.2018-01-03:1909
139 164.39.7.62
</code></pre><ul>
<li>I have no idea what these are but they seem to be coming from Amazon&hellip;</li>
<li>I guess for now I just have to increase the database connection pool's max active</li>
<li>It's currently 75 and normally I'd just bump it by 25 but let me be a bit daring and push it by 50 to 125, because I used to see at least 121 connections in pg_stat_activity before when we were using the shitty default pooling</li>
<li>I guess for now I just have to increase the database connection pool&rsquo;s max active</li>
<li>It&rsquo;s currently 75 and normally I&rsquo;d just bump it by 25 but let me be a bit daring and push it by 50 to 125, because I used to see at least 121 connections in pg_stat_activity before when we were using the shitty default pooling</li>
</ul>
<h2 id="2018-01-04">2018-01-04</h2>
<ul>
@ -420,14 +420,14 @@ dspace.log.2018-01-02:1972
dspace.log.2018-01-03:1909
dspace.log.2018-01-04:1559
</code></pre><ul>
<li>I will just bump the connection limit to 300 because I'm fucking fed up with this shit</li>
<li>I will just bump the connection limit to 300 because I&rsquo;m fucking fed up with this shit</li>
<li>Once I get back to Amman I will have to try to create different database pools for different web applications, like recently discussed on the dspace-tech mailing list</li>
<li>Create accounts on CGSpace for two CTA staff <a href="mailto:km4ard@cta.int">km4ard@cta.int</a> and <a href="mailto:bheenick@cta.int">bheenick@cta.int</a></li>
</ul>
<h2 id="2018-01-05">2018-01-05</h2>
<ul>
<li>Peter said that CGSpace was down last night and Tsega restarted Tomcat</li>
<li>I don't see any alerts from Linode or UptimeRobot, and there are no PostgreSQL connection errors in the dspace logs for today:</li>
<li>I don&rsquo;t see any alerts from Linode or UptimeRobot, and there are no PostgreSQL connection errors in the dspace logs for today:</li>
</ul>
<pre><code>$ grep -c &quot;Timeout: Pool empty.&quot; dspace.log.2018-01-*
dspace.log.2018-01-01:0
@ -442,8 +442,8 @@ dspace.log.2018-01-05:0
<pre><code>[Fri Jan 05 09:31:22.965398 2018] [:error] [pid 9340] [client 213.55.99.121:64476] WARNING: Unable to find a match for &quot;9-16-1-RV.doc&quot; in &quot;/home/files/journals/6//articles/9/&quot;. Skipping this file., referer: http://dagris.info/reviewtool/index.php/index/install/upgrade
</code></pre><ul>
<li>I will delete the log file for now and tell Danny</li>
<li>Also, I'm still seeing a hundred or so of the &ldquo;ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer&rdquo; errors in dspace logs, I need to search the dspace-tech mailing list to see what the cause is</li>
<li>I will run a full Discovery reindex in the mean time to see if it's something wrong with the Discovery Solr core</li>
<li>Also, I&rsquo;m still seeing a hundred or so of the &ldquo;ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer&rdquo; errors in dspace logs, I need to search the dspace-tech mailing list to see what the cause is</li>
<li>I will run a full Discovery reindex in the mean time to see if it&rsquo;s something wrong with the Discovery Solr core</li>
</ul>
<pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx512m -XX:+TieredCompilation -XX:TieredStopAtLevel=1&quot;
$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
@ -456,7 +456,7 @@ sys 3m14.890s
</ul>
<h2 id="2018-01-06">2018-01-06</h2>
<ul>
<li>I'm still seeing Solr errors in the DSpace logs even after the full reindex yesterday:</li>
<li>I&rsquo;m still seeing Solr errors in the DSpace logs even after the full reindex yesterday:</li>
</ul>
<pre><code>org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1983+TO+1989]': Encountered &quot; &quot;]&quot; &quot;] &quot;&quot; at line 1, column 32.
</code></pre><ul>
@ -471,7 +471,7 @@ sys 3m14.890s
COPY 4515
</code></pre><h2 id="2018-01-10">2018-01-10</h2>
<ul>
<li>I looked to see what happened to this year's Solr statistics sharding task that should have run on 2018-01-01 and of course it failed:</li>
<li>I looked to see what happened to this year&rsquo;s Solr statistics sharding task that should have run on 2018-01-01 and of course it failed:</li>
</ul>
<pre><code>Moving: 81742 into core statistics-2010
Exception: IOException occured when talking to server at: http://localhost:8081/solr//statistics-2010
@ -542,9 +542,9 @@ Caused by: org.apache.http.client.ClientProtocolException
... 10 more
</code></pre><ul>
<li>There is interesting documentation about this on the DSpace Wiki: <a href="https://wiki.duraspace.org/display/DSDOC5x/SOLR+Statistics+Maintenance#SOLRStatisticsMaintenance-SolrShardingByYear">https://wiki.duraspace.org/display/DSDOC5x/SOLR+Statistics+Maintenance#SOLRStatisticsMaintenance-SolrShardingByYear</a></li>
<li>I'm looking to see maybe if we're hitting the issues mentioned in <a href="https://jira.duraspace.org/browse/DS-2212">DS-2212</a> that were apparently fixed in DSpace 5.2</li>
<li>I&rsquo;m looking to see maybe if we&rsquo;re hitting the issues mentioned in <a href="https://jira.duraspace.org/browse/DS-2212">DS-2212</a> that were apparently fixed in DSpace 5.2</li>
<li>I can apparently search for records in the Solr stats core that have an empty <code>owningColl</code> field using this in the Solr admin query: <code>-owningColl:*</code></li>
<li>On CGSpace I see 48,000,000 records that have an <code>owningColl</code> field and 34,000,000 that don't:</li>
<li>On CGSpace I see 48,000,000 records that have an <code>owningColl</code> field and 34,000,000 that don&rsquo;t:</li>
</ul>
<pre><code>$ http 'http://localhost:3000/solr/statistics/select?q=owningColl%3A*&amp;wt=json&amp;indent=true' | grep numFound
&quot;response&quot;:{&quot;numFound&quot;:48476327,&quot;start&quot;:0,&quot;docs&quot;:[
@ -552,14 +552,14 @@ $ http 'http://localhost:3000/solr/statistics/select?q=-owningColl%3A*&amp;wt=js
&quot;response&quot;:{&quot;numFound&quot;:34879872,&quot;start&quot;:0,&quot;docs&quot;:[
</code></pre><ul>
<li>I tested the <code>dspace stats-util -s</code> process on my local machine and it failed the same way</li>
<li>It doesn't seem to be helpful, but the dspace log shows this:</li>
<li>It doesn&rsquo;t seem to be helpful, but the dspace log shows this:</li>
</ul>
<pre><code>2018-01-10 10:51:19,301 INFO org.dspace.statistics.SolrLogger @ Created core with name: statistics-2016
2018-01-10 10:51:19,301 INFO org.dspace.statistics.SolrLogger @ Moving: 3821 records into core statistics-2016
</code></pre><ul>
<li>Terry Brady has written some notes on the DSpace Wiki about Solr sharing issues: <a href="https://wiki.duraspace.org/display/%7Eterrywbrady/Statistics+Import+Export+Issues">https://wiki.duraspace.org/display/%7Eterrywbrady/Statistics+Import+Export+Issues</a></li>
<li>Uptime Robot said that CGSpace went down at around 9:43 AM</li>
<li>I looked at PostgreSQL's <code>pg_stat_activity</code> table and saw 161 active connections, but no pool errors in the DSpace logs:</li>
<li>I looked at PostgreSQL&rsquo;s <code>pg_stat_activity</code> table and saw 161 active connections, but no pool errors in the DSpace logs:</li>
</ul>
<pre><code>$ grep -c &quot;Timeout: Pool empty.&quot; dspace.log.2018-01-10
0
@ -583,7 +583,7 @@ $ http 'http://localhost:3000/solr/statistics/select?q=-owningColl%3A*&amp;wt=js
<pre><code>&quot;Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36&quot;
</code></pre><ul>
<li><code>whois</code> says they come from <a href="http://www.perfectip.net/">Perfect IP</a></li>
<li>I've never seen those top IPs before, but they have created 50,000 Tomcat sessions today:</li>
<li>I&rsquo;ve never seen those top IPs before, but they have created 50,000 Tomcat sessions today:</li>
</ul>
<pre><code>$ grep -E '(2607:fa98:40:9:26b6:fdff:feff:1888|2607:fa98:40:9:26b6:fdff:feff:195d|2607:fa98:40:9:26b6:fdff:feff:1c96|70.36.107.49|70.36.107.190|70.36.107.50)' /home/cgspace.cgiar.org/log/dspace.log.2018-01-10 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
49096
@ -599,20 +599,20 @@ $ http 'http://localhost:3000/solr/statistics/select?q=-owningColl%3A*&amp;wt=js
23401 2607:fa98:40:9:26b6:fdff:feff:195d
47875 2607:fa98:40:9:26b6:fdff:feff:1888
</code></pre><ul>
<li>I added the user agent to nginx's badbots limit req zone but upon testing the config I got an error:</li>
<li>I added the user agent to nginx&rsquo;s badbots limit req zone but upon testing the config I got an error:</li>
</ul>
<pre><code># nginx -t
nginx: [emerg] could not build map_hash, you should increase map_hash_bucket_size: 64
nginx: configuration file /etc/nginx/nginx.conf test failed
</code></pre><ul>
<li>According to nginx docs the <a href="https://nginx.org/en/docs/hash.html">bucket size should be a multiple of the CPU's cache alignment</a>, which is 64 for us:</li>
<li>According to nginx docs the <a href="https://nginx.org/en/docs/hash.html">bucket size should be a multiple of the CPU&rsquo;s cache alignment</a>, which is 64 for us:</li>
</ul>
<pre><code># cat /proc/cpuinfo | grep cache_alignment | head -n1
cache_alignment : 64
</code></pre><ul>
<li>On our servers that is 64, so I increased this parameter to 128 and deployed the changes to nginx</li>
<li>Almost immediately the PostgreSQL connections dropped back down to 40 or so, and UptimeRobot said the site was back up</li>
<li>So that's interesting that we're not out of PostgreSQL connections (current pool maxActive is 300!) but the system is &ldquo;down&rdquo; to UptimeRobot and very slow to use</li>
<li>So that&rsquo;s interesting that we&rsquo;re not out of PostgreSQL connections (current pool maxActive is 300!) but the system is &ldquo;down&rdquo; to UptimeRobot and very slow to use</li>
<li>Linode continues to test mitigations for Meltdown and Spectre: <a href="https://blog.linode.com/2018/01/03/cpu-vulnerabilities-meltdown-spectre/">https://blog.linode.com/2018/01/03/cpu-vulnerabilities-meltdown-spectre/</a></li>
<li>I rebooted DSpace Test to see if the kernel will be updated (currently Linux 4.14.12-x86_64-linode92)&hellip; nope.</li>
<li>It looks like Linode will reboot the KVM hosts later this week, though</li>
@ -650,7 +650,7 @@ cache_alignment : 64
111535 2607:fa98:40:9:26b6:fdff:feff:1c96
161797 2607:fa98:40:9:26b6:fdff:feff:1888
</code></pre><ul>
<li>Wow, I just figured out how to set the application name of each database pool in the JNDI config of Tomcat's <code>server.xml</code>:</li>
<li>Wow, I just figured out how to set the application name of each database pool in the JNDI config of Tomcat&rsquo;s <code>server.xml</code>:</li>
</ul>
<pre><code>&lt;Resource name=&quot;jdbc/dspaceWeb&quot; auth=&quot;Container&quot; type=&quot;javax.sql.DataSource&quot;
driverClassName=&quot;org.postgresql.Driver&quot;
@ -665,9 +665,9 @@ cache_alignment : 64
validationQuery='SELECT 1'
testOnBorrow='true' /&gt;
</code></pre><ul>
<li>So theoretically I could name each connection &ldquo;xmlui&rdquo; or &ldquo;dspaceWeb&rdquo; or something meaningful and it would show up in PostgreSQL's <code>pg_stat_activity</code> table!</li>
<li>So theoretically I could name each connection &ldquo;xmlui&rdquo; or &ldquo;dspaceWeb&rdquo; or something meaningful and it would show up in PostgreSQL&rsquo;s <code>pg_stat_activity</code> table!</li>
<li>This would be super helpful for figuring out where load was coming from (now I wonder if I could figure out how to graph this)</li>
<li>Also, I realized that the <code>db.jndi</code> parameter in dspace.cfg needs to match the <code>name</code> value in your applicaiton's context—not the <code>global</code> one</li>
<li>Also, I realized that the <code>db.jndi</code> parameter in dspace.cfg needs to match the <code>name</code> value in your applicaiton&rsquo;s context—not the <code>global</code> one</li>
<li>Ah hah! Also, I can name the default DSpace connection pool in dspace.cfg as well, like:</li>
</ul>
<pre><code>db.url = jdbc:postgresql://localhost:5432/dspacetest?ApplicationName=dspaceDefault
@ -676,7 +676,7 @@ cache_alignment : 64
</ul>
<h2 id="2018-01-12">2018-01-12</h2>
<ul>
<li>I'm looking at the <a href="https://wiki.duraspace.org/display/DSDOC6x/Installing+DSpace#InstallingDSpace-ServletEngine(ApacheTomcat7orlater,Jetty,CauchoResinorequivalent)">DSpace 6.0 Install docs</a> and notice they tweak the number of threads in their Tomcat connector:</li>
<li>I&rsquo;m looking at the <a href="https://wiki.duraspace.org/display/DSDOC6x/Installing+DSpace#InstallingDSpace-ServletEngine(ApacheTomcat7orlater,Jetty,CauchoResinorequivalent)">DSpace 6.0 Install docs</a> and notice they tweak the number of threads in their Tomcat connector:</li>
</ul>
<pre><code>&lt;!-- Define a non-SSL HTTP/1.1 Connector on port 8080 --&gt;
&lt;Connector port=&quot;8080&quot;
@ -691,8 +691,8 @@ cache_alignment : 64
URIEncoding=&quot;UTF-8&quot;/&gt;
</code></pre><ul>
<li>In Tomcat 8.5 the <code>maxThreads</code> defaults to 200 which is probably fine, but tweaking <code>minSpareThreads</code> could be good</li>
<li>I don't see a setting for <code>maxSpareThreads</code> in the docs so that might be an error</li>
<li>Looks like in Tomcat 8.5 the default URIEncoding for Connectors is UTF-8, so we don't need to specify that manually anymore: <a href="https://tomcat.apache.org/tomcat-8.5-doc/config/http.html">https://tomcat.apache.org/tomcat-8.5-doc/config/http.html</a></li>
<li>I don&rsquo;t see a setting for <code>maxSpareThreads</code> in the docs so that might be an error</li>
<li>Looks like in Tomcat 8.5 the default URIEncoding for Connectors is UTF-8, so we don&rsquo;t need to specify that manually anymore: <a href="https://tomcat.apache.org/tomcat-8.5-doc/config/http.html">https://tomcat.apache.org/tomcat-8.5-doc/config/http.html</a></li>
<li>Ooh, I just saw the <code>acceptorThreadCount</code> setting (in Tomcat 7 and 8.5):</li>
</ul>
<pre><code>The number of threads to be used to accept connections. Increase this value on a multi CPU machine, although you would never really need more than 2. Also, with a lot of non keep alive connections, you might want to increase this value as well. Default value is 1.
@ -707,7 +707,7 @@ cache_alignment : 64
<pre><code>13-Jan-2018 13:59:05.245 WARNING [main] org.apache.tomcat.dbcp.dbcp2.BasicDataSourceFactory.getObjectInstance Name = dspace6 Property maxActive is not used in DBCP2, use maxTotal instead. maxTotal default value is 8. You have set value of &quot;35&quot; for &quot;maxActive&quot; property, which is being ignored.
13-Jan-2018 13:59:05.245 WARNING [main] org.apache.tomcat.dbcp.dbcp2.BasicDataSourceFactory.getObjectInstance Name = dspace6 Property maxWait is not used in DBCP2 , use maxWaitMillis instead. maxWaitMillis default value is -1. You have set value of &quot;5000&quot; for &quot;maxWait&quot; property, which is being ignored.
</code></pre><ul>
<li>I looked in my Tomcat 7.0.82 logs and I don't see anything about DBCP2 errors, so I guess this a Tomcat 8.0.x or 8.5.x thing</li>
<li>I looked in my Tomcat 7.0.82 logs and I don&rsquo;t see anything about DBCP2 errors, so I guess this a Tomcat 8.0.x or 8.5.x thing</li>
<li>DBCP2 appears to be Tomcat 8.0.x and up according to the <a href="https://tomcat.apache.org/migration-8.html">Tomcat 8.0 migration guide</a></li>
<li>I have updated our <a href="https://github.com/ilri/rmg-ansible-public/commit/246f9d7b06d53794f189f0cc57ad5ddd80f0b014">Ansible infrastructure scripts</a> so that it will be ready whenever we switch to Tomcat 8 (probably with Ubuntu 18.04 later this year)</li>
<li>When I enable the ResourceLink in the ROOT.xml context I get the following error in the Tomcat localhost log:</li>
@ -735,24 +735,24 @@ Caused by: java.lang.NullPointerException
... 15 more
</code></pre><ul>
<li>Interesting blog post benchmarking Tomcat JDBC vs Apache Commons DBCP2, with configuration snippets: <a href="http://www.tugay.biz/2016/07/tomcat-connection-pool-vs-apache.html">http://www.tugay.biz/2016/07/tomcat-connection-pool-vs-apache.html</a></li>
<li>The Tomcat vs Apache pool thing is confusing, but apparently we're using Apache Commons DBCP2 because we don't specify <code>factory=&quot;org.apache.tomcat.jdbc.pool.DataSourceFactory&quot;</code> in our global resource</li>
<li>So at least I know that I'm not looking for documentation or troubleshooting on the Tomcat JDBC pool!</li>
<li>I looked at <code>pg_stat_activity</code> during Tomcat's startup and I see that the pool created in server.xml is indeed connecting, just that nothing uses it</li>
<li>The Tomcat vs Apache pool thing is confusing, but apparently we&rsquo;re using Apache Commons DBCP2 because we don&rsquo;t specify <code>factory=&quot;org.apache.tomcat.jdbc.pool.DataSourceFactory&quot;</code> in our global resource</li>
<li>So at least I know that I&rsquo;m not looking for documentation or troubleshooting on the Tomcat JDBC pool!</li>
<li>I looked at <code>pg_stat_activity</code> during Tomcat&rsquo;s startup and I see that the pool created in server.xml is indeed connecting, just that nothing uses it</li>
<li>Also, the fallback connection parameters specified in local.cfg (not dspace.cfg) are used</li>
<li>Shit, this might actually be a DSpace error: <a href="https://jira.duraspace.org/browse/DS-3434">https://jira.duraspace.org/browse/DS-3434</a></li>
<li>I'll comment on that issue</li>
<li>I&rsquo;ll comment on that issue</li>
</ul>
<h2 id="2018-01-14">2018-01-14</h2>
<ul>
<li>Looking at the authors Peter had corrected</li>
<li>Some had multiple and he's corrected them by adding <code>||</code> in the correction column, but I can't process those this way so I will just have to flag them and do those manually later</li>
<li>Some had multiple and he&rsquo;s corrected them by adding <code>||</code> in the correction column, but I can&rsquo;t process those this way so I will just have to flag them and do those manually later</li>
<li>Also, I can flag the values that have &ldquo;DELETE&rdquo;</li>
<li>Then I need to facet the correction column on isBlank(value) and not flagged</li>
</ul>
<h2 id="2018-01-15">2018-01-15</h2>
<ul>
<li>Help Udana from IWMI export a CSV from DSpace Test so he can start trying a batch upload</li>
<li>I'm going to apply these ~130 corrections on CGSpace:</li>
<li>I&rsquo;m going to apply these ~130 corrections on CGSpace:</li>
</ul>
<pre><code>update metadatavalue set text_value='Formally Published' where resource_type_id=2 and metadata_field_id=214 and text_value like 'Formally published';
delete from metadatavalue where resource_type_id=2 and metadata_field_id=214 and text_value like 'NO';
@ -764,7 +764,7 @@ update metadatavalue set text_value='ru' where resource_type_id=2 and metadata_f
update metadatavalue set text_value='in' where resource_type_id=2 and metadata_field_id=38 and text_value ~ '(IN|In)';
delete from metadatavalue where resource_type_id=2 and metadata_field_id=38 and text_value ~ '(dc.language.iso|CGIAR Challenge Program on Water and Food)';
</code></pre><ul>
<li>Continue proofing Peter's author corrections that I started yesterday, faceting on non blank, non flagged, and briefly scrolling through the values of the corrections to find encoding errors for French and Spanish names</li>
<li>Continue proofing Peter&rsquo;s author corrections that I started yesterday, faceting on non blank, non flagged, and briefly scrolling through the values of the corrections to find encoding errors for French and Spanish names</li>
</ul>
<p><img src="/cgspace-notes/2018/01/openrefine-authors.png" alt="OpenRefine Authors"></p>
<ul>
@ -817,9 +817,9 @@ COPY 4552
<li>Looking over the affiliations again I see dozens of CIAT ones with their affiliation formatted like: International Center for Tropical Agriculture (CIAT)</li>
<li>For example, this one is from just last month: <a href="https://cgspace.cgiar.org/handle/10568/89930">https://cgspace.cgiar.org/handle/10568/89930</a></li>
<li>Our controlled vocabulary has this in the format without the abbreviation: International Center for Tropical Agriculture</li>
<li>So some submitters don't know to use the controlled vocabulary lookup</li>
<li>So some submitters don&rsquo;t know to use the controlled vocabulary lookup</li>
<li>Help Sisay with some thumbnails for book chapters in Open Refine and SAFBuilder</li>
<li>CGSpace users were having problems logging in, I think something's wrong with LDAP because I see this in the logs:</li>
<li>CGSpace users were having problems logging in, I think something&rsquo;s wrong with LDAP because I see this in the logs:</li>
</ul>
<pre><code>2018-01-15 12:53:15,810 WARN org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=2386749547D03E0AA4EC7E44181A7552:ip_addr=x.x.x.x:ldap_authentication:type=failed_auth javax.naming.AuthenticationException\colon; [LDAP\colon; error code 49 - 80090308\colon; LdapErr\colon; DSID-0C090400, comment\colon; AcceptSecurityContext error, data 775, v1db1^@]
</code></pre><ul>
@ -835,7 +835,7 @@ sys 0m2.210s
<ul>
<li>Meeting with CGSpace team, a few action items:
<ul>
<li>Discuss standardized names for CRPs and centers with ICARDA (don't wait for CG Core)</li>
<li>Discuss standardized names for CRPs and centers with ICARDA (don&rsquo;t wait for CG Core)</li>
<li>Re-send DC rights implementation and forward to everyone so we can move forward with it (without the URI field for now)</li>
<li>Start looking at where I was with the AGROVOC API</li>
<li>Have a controlled vocabulary for CGIAR authors&rsquo; names and ORCIDs? Perhaps values like: Orth, Alan S. (0000-0002-1735-7458)</li>
@ -845,15 +845,15 @@ sys 0m2.210s
<li>Add Sisay and Danny to Uptime Robot and allow them to restart Tomcat on CGSpace ✔</li>
</ul>
</li>
<li>I removed Tsega's SSH access to the web and DSpace servers, and asked Danny to check whether there is anything he needs from Tsega's home directories so we can delete the accounts completely</li>
<li>I removed Tsega's access to Linode dashboard as well</li>
<li>I removed Tsega&rsquo;s SSH access to the web and DSpace servers, and asked Danny to check whether there is anything he needs from Tsega&rsquo;s home directories so we can delete the accounts completely</li>
<li>I removed Tsega&rsquo;s access to Linode dashboard as well</li>
<li>I ended up creating a Jira issue for my <code>db.jndi</code> documentation fix: <a href="https://jira.duraspace.org/browse/DS-3803">DS-3803</a></li>
<li>The DSpace developers said they wanted each pull request to be associated with a Jira issue</li>
</ul>
<h2 id="2018-01-17">2018-01-17</h2>
<ul>
<li>Abenet asked me to proof and upload 54 records for LIVES</li>
<li>A few records were missing countries (even though they're all from Ethiopia)</li>
<li>A few records were missing countries (even though they&rsquo;re all from Ethiopia)</li>
<li>Also, there are whitespace issues in many columns, and the items are mapped to the LIVES and ILRI articles collections, not Theses</li>
<li>In any case, importing them like this:</li>
</ul>
@ -862,7 +862,7 @@ $ dspace import -a -e aorth@mjanja.ch -s /tmp/2018-01-16\ LIVES/SimpleArchiveFor
</code></pre><ul>
<li>And fantastic, before I started the import there were 10 PostgreSQL connections, and then CGSpace crashed during the upload</li>
<li>When I looked there were 210 PostgreSQL connections!</li>
<li>I don't see any high load in XMLUI or REST/OAI:</li>
<li>I don&rsquo;t see any high load in XMLUI or REST/OAI:</li>
</ul>
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E &quot;17/Jan/2018&quot; | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
381 40.77.167.124
@ -892,8 +892,8 @@ $ dspace import -a -e aorth@mjanja.ch -s /tmp/2018-01-16\ LIVES/SimpleArchiveFor
<pre><code>2018-01-17 07:59:25,856 INFO org.apache.http.impl.client.SystemDefaultHttpClient @ I/O exception (org.apache.http.NoHttpResponseException) caught when processing request to {}-&gt;http://localhost:8081: The target server failed to respond
2018-01-17 07:59:25,856 INFO org.apache.http.impl.client.SystemDefaultHttpClient @ Retrying request to {}-&gt;http://localhost:8081
</code></pre><ul>
<li>I have NEVER seen this error before, and there is no error before or after that in DSpace's solr.log</li>
<li>Tomcat's catalina.out does show something interesting, though, right at that time:</li>
<li>I have NEVER seen this error before, and there is no error before or after that in DSpace&rsquo;s solr.log</li>
<li>Tomcat&rsquo;s catalina.out does show something interesting, though, right at that time:</li>
</ul>
<pre><code>[====================&gt; ]40% time remaining: 7 hour(s) 14 minute(s) 45 seconds. timestamp: 2018-01-17 07:57:02
[====================&gt; ]40% time remaining: 7 hour(s) 14 minute(s) 45 seconds. timestamp: 2018-01-17 07:57:11
@ -933,7 +933,7 @@ Exception in thread &quot;http-bio-127.0.0.1-8081-exec-627&quot; java.lang.OutOf
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
</code></pre><ul>
<li>You can see the timestamp above, which is some Atmire nightly task I think, but I can't figure out which one</li>
<li>You can see the timestamp above, which is some Atmire nightly task I think, but I can&rsquo;t figure out which one</li>
<li>So I restarted Tomcat and tried the import again, which finished very quickly and without errors!</li>
</ul>
<pre><code>$ dspace import -a -e aorth@mjanja.ch -s /tmp/2018-01-16\ LIVES/SimpleArchiveFormat -m lives2.map &amp;&gt; lives2.log
@ -942,7 +942,7 @@ Exception in thread &quot;http-bio-127.0.0.1-8081-exec-627&quot; java.lang.OutOf
</ul>
<p><img src="/cgspace-notes/2018/01/tomcat-jvm-day.png" alt="Tomcat JVM Heap"></p>
<ul>
<li>I'm playing with maven repository caching using Artifactory in a Docker instance: <a href="https://www.jfrog.com/confluence/display/RTF/Installing+with+Docker">https://www.jfrog.com/confluence/display/RTF/Installing+with+Docker</a></li>
<li>I&rsquo;m playing with maven repository caching using Artifactory in a Docker instance: <a href="https://www.jfrog.com/confluence/display/RTF/Installing+with+Docker">https://www.jfrog.com/confluence/display/RTF/Installing+with+Docker</a></li>
</ul>
<pre><code>$ docker pull docker.bintray.io/jfrog/artifactory-oss:latest
$ docker volume create --name artifactory5_data
@ -961,10 +961,10 @@ $ docker run --network dspace-build --name artifactory -d -v artifactory5_data:/
<pre><code>$ mvn -U -Dmirage2.on=true -Dmirage2.deps.included=false -Denv=localhost -P \!dspace-sword,\!dspace-swordv2 clean package
</code></pre><ul>
<li>UptimeRobot said CGSpace went down for a few minutes</li>
<li>I didn't do anything but it came back up on its own</li>
<li>I don't see anything unusual in the XMLUI or REST/OAI logs</li>
<li>I didn&rsquo;t do anything but it came back up on its own</li>
<li>I don&rsquo;t see anything unusual in the XMLUI or REST/OAI logs</li>
<li>Now Linode alert says the CPU load is high, <em>sigh</em></li>
<li>Regarding the heap space error earlier today, it looks like it does happen a few times a week or month (I'm not sure how far these logs go back, as they are not strictly daily):</li>
<li>Regarding the heap space error earlier today, it looks like it does happen a few times a week or month (I&rsquo;m not sure how far these logs go back, as they are not strictly daily):</li>
</ul>
<pre><code># zgrep -c java.lang.OutOfMemoryError /var/log/tomcat7/catalina.out* | grep -v :0
/var/log/tomcat7/catalina.out:2
@ -994,14 +994,14 @@ $ docker run --network dspace-build --name artifactory -d -v artifactory5_data:/
<h2 id="2018-01-18">2018-01-18</h2>
<ul>
<li>UptimeRobot said CGSpace was down for 1 minute last night</li>
<li>I don't see any errors in the nginx or catalina logs, so I guess UptimeRobot just got impatient and closed the request, which caused nginx to send an HTTP 499</li>
<li>I don&rsquo;t see any errors in the nginx or catalina logs, so I guess UptimeRobot just got impatient and closed the request, which caused nginx to send an HTTP 499</li>
<li>I realize I never did a full re-index after the SQL author and affiliation updates last week, so I should force one now:</li>
</ul>
<pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx1024m -XX:+TieredCompilation -XX:TieredStopAtLevel=1&quot;
$ time schedtool -D -e ionice -c2 -n7 nice -n19 /home/cgspace.cgiar.org/bin/dspace index-discovery -b
</code></pre><ul>
<li>Maria from Bioversity asked if I could remove the abstracts from all of their Limited Access items in the <a href="https://cgspace.cgiar.org/handle/10568/35501">Bioversity Journal Articles</a> collection</li>
<li>It's easy enough to do in OpenRefine, but you have to be careful to only get those items that are uploaded into Bioversity's collection, not the ones that are mapped from others!</li>
<li>It&rsquo;s easy enough to do in OpenRefine, but you have to be careful to only get those items that are uploaded into Bioversity&rsquo;s collection, not the ones that are mapped from others!</li>
<li>Use this GREL in OpenRefine after isolating all the Limited Access items: <code>value.startsWith(&quot;10568/35501&quot;)</code></li>
<li>UptimeRobot said CGSpace went down AGAIN and both Sisay and Danny immediately logged in and restarted Tomcat without talking to me <em>or</em> each other!</li>
</ul>
@ -1011,8 +1011,8 @@ Jan 18 07:01:22 linode18 systemd[1]: Stopping LSB: Start Tomcat....
Jan 18 07:01:22 linode18 sudo[10812]: swebshet : TTY=pts/3 ; PWD=/home/swebshet ; USER=root ; COMMAND=/bin/systemctl restart tomcat7
Jan 18 07:01:22 linode18 sudo[10812]: pam_unix(sudo:session): session opened for user root by swebshet(uid=0)
</code></pre><ul>
<li>I had to cancel the Discovery indexing and I'll have to re-try it another time when the server isn't so busy (it had already taken two hours and wasn't even close to being done)</li>
<li>For now I've increased the Tomcat JVM heap from 5632 to 6144m, to give ~1GB of free memory over the average usage to hopefully account for spikes caused by load or background jobs</li>
<li>I had to cancel the Discovery indexing and I&rsquo;ll have to re-try it another time when the server isn&rsquo;t so busy (it had already taken two hours and wasn&rsquo;t even close to being done)</li>
<li>For now I&rsquo;ve increased the Tomcat JVM heap from 5632 to 6144m, to give ~1GB of free memory over the average usage to hopefully account for spikes caused by load or background jobs</li>
</ul>
<h2 id="2018-01-19">2018-01-19</h2>
<ul>
@ -1023,8 +1023,8 @@ Jan 18 07:01:22 linode18 sudo[10812]: pam_unix(sudo:session): session opened for
$ time schedtool -D -e ionice -c2 -n7 nice -n19 /home/cgspace.cgiar.org/bin/dspace index-discovery -b
</code></pre><ul>
<li>Linode alerted again and said that CGSpace was using 301% CPU</li>
<li>Peter emailed to ask why <a href="https://cgspace.cgiar.org/handle/10568/88090">this item</a> doesn't have an Altmetric badge on CGSpace but does have one on the <a href="https://www.altmetric.com/details/26709041">Altmetric dashboard</a></li>
<li>Looks like our badge code calls the <code>handle</code> endpoint which doesn't exist:</li>
<li>Peter emailed to ask why <a href="https://cgspace.cgiar.org/handle/10568/88090">this item</a> doesn&rsquo;t have an Altmetric badge on CGSpace but does have one on the <a href="https://www.altmetric.com/details/26709041">Altmetric dashboard</a></li>
<li>Looks like our badge code calls the <code>handle</code> endpoint which doesn&rsquo;t exist:</li>
</ul>
<pre><code>https://api.altmetric.com/v1/handle/10568/88090
</code></pre><ul>
@ -1060,7 +1060,7 @@ real 7m2.241s
user 1m33.198s
sys 0m12.317s
</code></pre><ul>
<li>I tested the abstract cleanups on Bioversity's Journal Articles collection again that I had started a few days ago</li>
<li>I tested the abstract cleanups on Bioversity&rsquo;s Journal Articles collection again that I had started a few days ago</li>
<li>In the end there were 324 items in the collection that were Limited Access, but only 199 had abstracts</li>
<li>I want to document the workflow of adding a production PostgreSQL database to a development instance of <a href="https://github.com/alanorth/docker-dspace">DSpace in Docker</a>:</li>
</ul>
@ -1075,7 +1075,7 @@ $ docker cp ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspace_db:
$ docker exec dspace_db psql -U dspace -f /tmp/update-sequences.sql dspace
</code></pre><h2 id="2018-01-22">2018-01-22</h2>
<ul>
<li>Look over Udana's CSV of 25 WLE records from last week</li>
<li>Look over Udana&rsquo;s CSV of 25 WLE records from last week</li>
<li>I sent him some corrections:
<ul>
<li>The file encoding is Windows-1252</li>
@ -1090,7 +1090,7 @@ $ docker exec dspace_db psql -U dspace -f /tmp/update-sequences.sql dspace
</li>
<li>I wrote a quick Python script to use the DSpace REST API to find all collections under a given community</li>
<li>The source code is here: <a href="https://gist.github.com/alanorth/ddd7f555f0e487fe0e9d3eb4ff26ce50">rest-find-collections.py</a></li>
<li>Peter had said that found a bunch of ILRI collections that were called &ldquo;untitled&rdquo;, but I don't see any:</li>
<li>Peter had said that found a bunch of ILRI collections that were called &ldquo;untitled&rdquo;, but I don&rsquo;t see any:</li>
</ul>
<pre><code>$ ./rest-find-collections.py 10568/1 | wc -l
308
@ -1099,17 +1099,17 @@ $ ./rest-find-collections.py 10568/1 | grep -i untitled
<li>Looking at the <a href="https://tomcat.apache.org/tomcat-7.0-doc/config/http.html">Tomcat connector docs</a> I think we really need to increase <code>maxThreads</code></li>
<li>The default is 200, which can easily be taken up by bots considering that Google and Bing each browse with fifty (50) connections each sometimes!</li>
<li>Before I increase this I want to see if I can measure and graph this, and then benchmark</li>
<li>I'll probably also increase <code>minSpareThreads</code> to 20 (its default is 10)</li>
<li>I&rsquo;ll probably also increase <code>minSpareThreads</code> to 20 (its default is 10)</li>
<li>I still want to bump up <code>acceptorThreadCount</code> from 1 to 2 as well, as the documentation says this should be increased on multi-core systems</li>
<li>I spent quite a bit of time looking at <code>jvisualvm</code> and <code>jconsole</code> today</li>
<li>Run system updates on DSpace Test and reboot it</li>
<li>I see I can monitor the number of Tomcat threads and some detailed JVM memory stuff if I install <code>munin-plugins-java</code></li>
<li>I'd still like to get arbitrary mbeans like activeSessions etc, though</li>
<li>I can't remember if I had to configure the jmx settings in <code>/etc/munin/plugin-conf.d/munin-node</code> or not—I think all I did was re-run the <code>munin-node-configure</code> script and of course enable JMX in Tomcat's JVM options</li>
<li>I&rsquo;d still like to get arbitrary mbeans like activeSessions etc, though</li>
<li>I can&rsquo;t remember if I had to configure the jmx settings in <code>/etc/munin/plugin-conf.d/munin-node</code> or not—I think all I did was re-run the <code>munin-node-configure</code> script and of course enable JMX in Tomcat&rsquo;s JVM options</li>
</ul>
<h2 id="2018-01-23">2018-01-23</h2>
<ul>
<li>Thinking about generating a jmeter test plan for DSpace, along the lines of <a href="https://github.com/Georgetown-University-Libraries/dspace-performance-test">Georgetown's dspace-performance-test</a></li>
<li>Thinking about generating a jmeter test plan for DSpace, along the lines of <a href="https://github.com/Georgetown-University-Libraries/dspace-performance-test">Georgetown&rsquo;s dspace-performance-test</a></li>
<li>I got a list of all the GET requests on CGSpace for January 21st (the last time Linode complained the load was high), excluding admin calls:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz /var/log/nginx/library-access.log.2.gz /var/log/nginx/library-access.log.3.gz /var/log/nginx/rest.log.2.gz /var/log/nginx/rest.log.3.gz /var/log/nginx/oai.log.2.gz /var/log/nginx/oai.log.3.gz /var/log/nginx/error.log.2.gz /var/log/nginx/error.log.3.gz | grep &quot;21/Jan/2018&quot; | grep &quot;GET &quot; | grep -c -v &quot;/admin&quot;
@ -1208,7 +1208,7 @@ $ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.j
<pre><code>$ jmeter -g 2018-01-24-linode5451120-baseline.jtl -o 2018-01-24-linode5451120-baseline
</code></pre><h2 id="2018-01-25">2018-01-25</h2>
<ul>
<li>Run another round of tests on DSpace Test with jmeter after changing Tomcat's <code>minSpareThreads</code> to 20 (default is 10) and <code>acceptorThreadCount</code> to 2 (default is 1):</li>
<li>Run another round of tests on DSpace Test with jmeter after changing Tomcat&rsquo;s <code>minSpareThreads</code> to 20 (default is 10) and <code>acceptorThreadCount</code> to 2 (default is 1):</li>
</ul>
<pre><code>$ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.jmx -l ~/dspace-performance-test/2018-01-25-linode5451120-tomcat-threads.jtl -j ~/dspace-performance-test/2018-01-25-linode5451120-tomcat-threads.log
$ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.jmx -l ~/dspace-performance-test/2018-01-25-linode5451120-tomcat-threads2.jtl -j ~/dspace-performance-test/2018-01-25-linode5451120-tomcat-threads2.log
@ -1221,18 +1221,18 @@ $ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.j
$ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.jmx -l ~/dspace-performance-test/2018-01-25-linode5451120-g1gc2.jtl -j ~/dspace-performance-test/2018-01-25-linode5451120-g1gc2.log
$ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.jmx -l ~/dspace-performance-test/2018-01-25-linode5451120-g1gc3.jtl -j ~/dspace-performance-test/2018-01-25-linode5451120-g1gc3.log
</code></pre><ul>
<li>I haven't had time to look at the results yet</li>
<li>I haven&rsquo;t had time to look at the results yet</li>
</ul>
<h2 id="2018-01-26">2018-01-26</h2>
<ul>
<li>Peter followed up about some of the points from the Skype meeting last week</li>
<li>Regarding the ORCID field issue, I see <a href="http://repo.mel.cgiar.org/handle/20.500.11766/7668?show=full">ICARDA's MELSpace is using <code>cg.creator.ID</code></a>: 0000-0001-9156-7691</li>
<li>Regarding the ORCID field issue, I see <a href="http://repo.mel.cgiar.org/handle/20.500.11766/7668?show=full">ICARDA&rsquo;s MELSpace is using <code>cg.creator.ID</code></a>: 0000-0001-9156-7691</li>
<li>I had floated the idea of using a controlled vocabulary with values formatted something like: Orth, Alan S. (0000-0002-1735-7458)</li>
<li>Update PostgreSQL JDBC driver version from 42.1.4 to 42.2.1 on DSpace Test, see: <a href="https://jdbc.postgresql.org/">https://jdbc.postgresql.org/</a></li>
<li>Reboot DSpace Test to get new Linode kernel (Linux 4.14.14-x86_64-linode94)</li>
<li>I am testing my old work on the <code>dc.rights</code> field, I had added a branch for it a few months ago</li>
<li>I added a list of Creative Commons and other licenses in <code>input-forms.xml</code></li>
<li>The problem is that Peter wanted to use two questions, one for CG centers and one for other, but using the same metadata value, which isn't possible (?)</li>
<li>The problem is that Peter wanted to use two questions, one for CG centers and one for other, but using the same metadata value, which isn&rsquo;t possible (?)</li>
<li>So I used some creativity and made several fields display values, but not store any, ie:</li>
</ul>
<pre><code>&lt;pair&gt;
@ -1240,7 +1240,7 @@ $ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.j
&lt;stored-value&gt;&lt;/stored-value&gt;
&lt;/pair&gt;
</code></pre><ul>
<li>I was worried that if a user selected this field for some reason that DSpace would store an empty value, but it simply doesn't register that as a valid option:</li>
<li>I was worried that if a user selected this field for some reason that DSpace would store an empty value, but it simply doesn&rsquo;t register that as a valid option:</li>
</ul>
<p><img src="/cgspace-notes/2018/01/dc-rights-submission.png" alt="Rights"></p>
<ul>
@ -1286,9 +1286,9 @@ Was expecting one of:
Maximum: 2771268
Average: 210483
</code></pre><ul>
<li>I guess responses that don't fit in RAM get saved to disk (a default of 1024M), so this is definitely not the issue here, and that warning is totally unrelated</li>
<li>My best guess is that the Solr search error is related somehow but I can't figure it out</li>
<li>We definitely have enough database connections, as I haven't seen a pool error in weeks:</li>
<li>I guess responses that don&rsquo;t fit in RAM get saved to disk (a default of 1024M), so this is definitely not the issue here, and that warning is totally unrelated</li>
<li>My best guess is that the Solr search error is related somehow but I can&rsquo;t figure it out</li>
<li>We definitely have enough database connections, as I haven&rsquo;t seen a pool error in weeks:</li>
</ul>
<pre><code>$ grep -c &quot;Timeout: Pool empty.&quot; dspace.log.2018-01-2*
dspace.log.2018-01-20:0
@ -1305,7 +1305,7 @@ dspace.log.2018-01-29:0
<li>Adam Hunt from WLE complained that pages take &ldquo;1-2 minutes&rdquo; to load each, from France and Sri Lanka</li>
<li>I asked him which particular pages, as right now pages load in 2 or 3 seconds for me</li>
<li>UptimeRobot said CGSpace went down again, and I looked at PostgreSQL and saw 211 active database connections</li>
<li>If it's not memory and it's not database, it's gotta be Tomcat threads, seeing as the default <code>maxThreads</code> is 200 anyways, it actually makes sense</li>
<li>If it&rsquo;s not memory and it&rsquo;s not database, it&rsquo;s gotta be Tomcat threads, seeing as the default <code>maxThreads</code> is 200 anyways, it actually makes sense</li>
<li>I decided to change the Tomcat thread settings on CGSpace:
<ul>
<li><code>maxThreads</code> from 200 (default) to 400</li>
@ -1333,8 +1333,8 @@ busy.value 0
idle.value 20
max.value 400
</code></pre><ul>
<li>Apparently you can't monitor more than one connector, so I guess the most important to monitor would be the one that nginx is sending stuff to</li>
<li>So for now I think I'll just monitor these and skip trying to configure the jmx plugins</li>
<li>Apparently you can&rsquo;t monitor more than one connector, so I guess the most important to monitor would be the one that nginx is sending stuff to</li>
<li>So for now I think I&rsquo;ll just monitor these and skip trying to configure the jmx plugins</li>
<li>Although following the logic of <em>/usr/share/munin/plugins/jmx_tomcat_dbpools</em> could be useful for getting the active Tomcat sessions</li>
<li>From debugging the <code>jmx_tomcat_db_pools</code> script from the <code>munin-plugins-java</code> package, I see that this is how you call arbitrary mbeans:</li>
</ul>
@ -1343,7 +1343,7 @@ Catalina:type=DataSource,class=javax.sql.DataSource,name=&quot;jdbc/dspace&quot;
</code></pre><ul>
<li>More notes here: <a href="https://github.com/munin-monitoring/contrib/tree/master/plugins/jmx">https://github.com/munin-monitoring/contrib/tree/master/plugins/jmx</a></li>
<li>Looking at the Munin graphs, I that the load is 200% every morning from 03:00 to almost 08:00</li>
<li>Tomcat's catalina.out log file is full of spam from this thing too, with lines like this</li>
<li>Tomcat&rsquo;s catalina.out log file is full of spam from this thing too, with lines like this</li>
</ul>
<pre><code>[===================&gt; ]38% time remaining: 5 hour(s) 21 minute(s) 47 seconds. timestamp: 2018-01-29 06:25:16
</code></pre><ul>
@ -1359,7 +1359,7 @@ Catalina:type=DataSource,class=javax.sql.DataSource,name=&quot;jdbc/dspace&quot;
<li>UptimeRobot says CGSpace went down at 7:57 AM, and indeed I see a lot of HTTP 499 codes in nginx logs</li>
<li>PostgreSQL activity shows 222 database connections</li>
<li>Now PostgreSQL activity shows 265 database connections!</li>
<li>I don't see any errors anywhere&hellip;</li>
<li>I don&rsquo;t see any errors anywhere&hellip;</li>
<li>Now PostgreSQL activity shows 308 connections!</li>
<li>Well this is interesting, there are 400 Tomcat threads busy:</li>
</ul>
@ -1411,18 +1411,18 @@ javax.ws.rs.WebApplicationException
<ul>
<li>We need to start graphing the Tomcat sessions as well, though that requires JMX</li>
<li>Also, I wonder if I could disable the nightly Atmire thing</li>
<li>God, I don't know where this load is coming from</li>
<li>God, I don&rsquo;t know where this load is coming from</li>
<li>Since I bumped up the Tomcat threads from 200 to 400 the load on the server has been sustained at about 200% for almost a whole day:</li>
</ul>
<p><img src="/cgspace-notes/2018/01/cpu-week.png" alt="CPU usage week"></p>
<ul>
<li>I should make separate database pools for the web applications and the API applications like REST and OAI</li>
<li>Ok, so this is interesting: I figured out how to get the MBean path to query Tomcat's activeSessions from JMX (using <code>munin-plugins-java</code>):</li>
<li>Ok, so this is interesting: I figured out how to get the MBean path to query Tomcat&rsquo;s activeSessions from JMX (using <code>munin-plugins-java</code>):</li>
</ul>
<pre><code># port=5400 ip=&quot;127.0.0.1&quot; /usr/bin/java -cp /usr/share/munin/munin-jmx-plugins.jar org.munin.plugin.jmx.Beans Catalina:type=Manager,context=/,host=localhost activeSessions
Catalina:type=Manager,context=/,host=localhost activeSessions 8
</code></pre><ul>
<li>If you connect to Tomcat in <code>jvisualvm</code> it's pretty obvious when you hover over the elements</li>
<li>If you connect to Tomcat in <code>jvisualvm</code> it&rsquo;s pretty obvious when you hover over the elements</li>
</ul>
<p><img src="/cgspace-notes/2018/01/jvisualvm-mbeans-path.png" alt="MBeans in JVisualVM"></p>