<li>Update <ahref="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> for DSpace 6+ UUIDs
<ul>
<li>Tag version 1.2.0 on GitHub</li>
</ul>
</li>
<li>Test migrating legacy Solr statistics to UUIDs with the as-of-yet unreleased <ahref="https://github.com/DSpace/DSpace/commit/184f2b2153479045fba6239342c63e7f8564b8b6#diff-0350ce2e13b28d5d61252b7a8f50a059">SolrUpgradePre6xStatistics.java</a>
<ul>
<li>You need to download this into the DSpace 6.x source and compile it</li>
<li>Skype with Peter and Abenet to discuss the CG Core survey
<ul>
<li>We also discussed some other CGSpace issues</li>
</ul>
</li>
</ul>
<h2id="2020-03-04">2020-03-04</h2>
<ul>
<li>Abenet asked me to add some new ILRI subjects to CGSpace
<ul>
<li>I <ahref="https://github.com/ilri/DSpace/commit/b51a242e773bd8658d3cab4ac883975708b00386">updated the input-forms.xml</a> in our <code>5_x-prod</code> branch on GitHub</li>
<li>Abenet said we are changing <code>HEALTH</code> to <code>HUMAN HEALTH</code> so I need to fix those using my <code>fix-metadata-values.py</code> script:</li>
<li>I found a very <ahref="https://lucene.apache.org/solr/guide/8_1/solr-system-requirements.html#lucene-solr-prior-to-7-0">interesting comment on the Solr 8.1 guide</a> about Java compatibility:</li>
</ul>
<blockquote>
<p>Lucene/Solr 7.0 was the first version that successfully passed our tests using Java 9 and higher. You should avoid Java 9 or later for Lucene/Solr 6.x or earlier.</p>
</blockquote>
<h2id="2020-03-08">2020-03-08</h2>
<ul>
<li>I want to try to consolidate our yearly Solr statistics cores back into one <code>statistics</code> core using the solr-import-export-json tool</li>
<li>I will try it on DSpace test, doing one year at a time:</li>
</ul>
<pre><code>$ ./run.sh -s http://localhost:8081/solr/statistics-2010 -a export -o /tmp/statistics-2010.json -k uid
$ ./run.sh -s http://localhost:8081/solr/statistics -a import -o /tmp/statistics-2010.json -k uid
<li>Upgrade PostgreSQL from 9.6 to 10 on DSpace Test (linode19)
<ul>
<li>I’ve been running it for one month in my local environment, and others have reported on the dspace-tech mailing list that they are using 10 and 11</li>
<li>Peter noticed that the Solr stats were not showing anything before 2020
<ul>
<li>I had to restart Tomcat three times before all cores loaded properly…</li>
</ul>
</li>
</ul>
<h2id="2020-03-10">2020-03-10</h2>
<ul>
<li>Fix some logic issues in the nginx config
<ul>
<li>Use generic blocking of <code>[Bb]ot</code> and <code>[Cc]rawl</code> and <code>[Ss]pider</code> in the “badbots” rate limiting logic instead of trying to list them all one by one (bots should not be trying to index dynamic pages <em>no matter what</em> so we punish hard here)</li>
<li>We were not properly forwarding the remote IP address to Tomcat in all nginx location blocks, which led some locations to log a hit from 127.0.0.1 (because we need to explicitly add the global proxy params when setting other headers in location blocks)</li>
<li>Unfortunately this affected the REST API and there are a few hundred thousand requests from this user agent:</li>
</ul>
</li>
</ul>
<pre><code>Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 (.NET CLR 3.5.30729)
</code></pre><ul>
<li>It seems to only be a problem in the last week:</li>
<li>It is making 10,000 to 40,000 requests to XMLUI per day…</li>
</ul>
<pre><code># zgrep -c 'Mozilla/5.0 ((Windows; U; Windows NT 6.1; fr; rv:1.9.2) Gecko/20100115 Firefox/3.6)' /var/log/nginx/access.log.{1..9}
/var/log/nginx/access.log.30.gz:18687
/var/log/nginx/access.log.31.gz:28936
/var/log/nginx/access.log.32.gz:36402
/var/log/nginx/access.log.33.gz:38886
/var/log/nginx/access.log.34.gz:30607
/var/log/nginx/access.log.35.gz:19040
/var/log/nginx/access.log.36.gz:10780
/var/log/nginx/access.log.37.gz:5808
/var/log/nginx/access.log.38.gz:3100
/var/log/nginx/access.log.39.gz:1485
/var/log/nginx/access.log.3.gz:2898
/var/log/nginx/access.log.40.gz:373
/var/log/nginx/access.log.41.gz:3909
/var/log/nginx/access.log.42.gz:4729
/var/log/nginx/access.log.43.gz:3906
</code></pre><ul>
<li>I will purge those hits too!</li>
</ul>
<pre><code>$ curl -s "http://localhost:8081/solr/statistics/update?softCommit=true" -H "Content-Type: text/xml" --data-binary '<delete><query>userAgent:"Mozilla/5.0 ((Windows; U; Windows NT 6.1; fr; rv:1.9.2) Gecko/20100115 Firefox/3.6)"</query></delete>'
</code></pre><ul>
<li>Shit, and something happened and a few thousand hits from user agents with “Bot” in their user agent got through
<ul>
<li>I need to re-run the <code>check-bot-hits.sh</code> script with the standard COUNTER-Robots list again, but add my own versions of a few because the script/Solr doesn’t support case-insensitive regular expressions:</li>