A few users noticed that CGSpace wasn’t loading items today, item pages seem blank
I looked at the PostgreSQL locks but they don’t seem unusual
I guess this is the same “blank item page” issue that we had a few times in 2019 that we never solved
I restarted Tomcat and PostgreSQL and the issue was gone
Since I was restarting Tomcat anyways I decided to redeploy the latest changes from the 5_x-prod branch and I added a note about COVID-19 items to the CGSpace frontpage at Peter’s request
A few users noticed that CGSpace wasn’t loading items today, item pages seem blank
I looked at the PostgreSQL locks but they don’t seem unusual
I guess this is the same “blank item page” issue that we had a few times in 2019 that we never solved
I restarted Tomcat and PostgreSQL and the issue was gone
Since I was restarting Tomcat anyways I decided to redeploy the latest changes from the 5_x-prod branch and I added a note about COVID-19 items to the CGSpace frontpage at Peter’s request
<li>A few users noticed that CGSpace wasn’t loading items today, item pages seem blank
<ul>
<li>I looked at the PostgreSQL locks but they don’t seem unusual</li>
<li>I guess this is the same “blank item page” issue that we had a few times in 2019 that we never solved</li>
<li>I restarted Tomcat and PostgreSQL and the issue was gone</li>
</ul>
</li>
<li>Since I was restarting Tomcat anyways I decided to redeploy the latest changes from the <code>5_x-prod</code> branch and I added a note about COVID-19 items to the CGSpace frontpage at Peter’s request</li>
</ul>
<ul>
<li>Also, Linode is alerting that we had high outbound traffic rate early this morning around midnight AND high CPU load later in the morning</li>
<li>First looking at the traffic in the morning:</li>
<li>64.39.99.13 belongs to Qualys, but I see they are using a normal desktop user agent:</li>
</ul>
<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Safari/605.1.15
</code></pre><ul>
<li>I will purge hits from that IP from Solr</li>
<li>The 199.47.87.x IPs belong to Turnitin, and apparently they are NOT marked as bots and we have 40,000 hits from them in 2020 statistics alone:</li>
<li>They used to be “TurnitinBot”… hhmmmm, seems they use both: <ahref="https://turnitin.com/robot/crawlerinfo.html">https://turnitin.com/robot/crawlerinfo.html</a></li>
<li>I will add Turnitin to the DSpace bot user agent list, but I see they are reqesting <code>robots.txt</code> and only requesting item pages, so that’s impressive! I don’t need to add them to the “bad bot” rate limit list in nginx</li>
<li>While looking at the logs I noticed eighty-one IPs in the range 185.152.250.x making little requests this user agent:</li>
</ul>
<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:76.0) Gecko/20100101 Firefox/76.0
<li>It’s only a few hundred requests each, but I am very suspicious so I will record it here and purge their IPs from Solr</li>
<li>Then I see 185.187.30.14 and 185.187.30.13 making requests also, with several different “normal” user agents
<ul>
<li>They are both apparently in France, belonging to Scalair FR hosting</li>
<li>I will purge their requests from Solr too</li>
</ul>
</li>
<li>Now I see some other new bots I hadn’t noticed before:
<ul>
<li><code>Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0) LinkCheck by Siteimprove.com</code></li>
<li><code>Consilio (WebHare Platform 4.28.2-dev); LinkChecker)</code>, which appears to be a <ahref="https://www.utwente.nl/en/websites/webhare/">university CMS</a></li>
<li>I will add <code>LinkCheck</code>, <code>Consilio</code>, and <code>WebHare</code> to the list of DSpace bot agents and purge them from Solr stats</li>
<li>COUNTER-Robots list already has <code>link.?check</code> but for some reason DSpace didn’t match that and I see hits for some of these…</li>
<li>Maybe I should add <code>[Ll]ink.?[Cc]heck.?</code> to a custom list for now?</li>
<li>For now I added <code>Turnitin</code> to the <ahref="https://github.com/atmire/COUNTER-Robots/pull/34">new bots pull request on COUNTER-Robots</a></li>
</ul>
</li>
<li>I purged 20,000 hits from IPs and 45,000 hits from user agents</li>
<li>I will revert the default “example” agents file back to the upstream master branch of COUNTER-Robots, and then add all my custom ones that are pending in pull requests they haven’t merged yet:</li>
<pre><code> [java] java.lang.RuntimeException: Failed to startup the DSpace Service Manager: failure starting up spring service manager: Error creating bean with name 'DefaultStorageUpdateConfig': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire method: public void com.atmire.statistics.util.StorageReportsUpdater.setStorageReportServices(java.util.List); nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'cuaEPersonStorageReportService': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: private com.atmire.dspace.cua.dao.storage.CUAEPersonStorageReportDAO com.atmire.dspace.cua.CUAStorageReportServiceImpl$CUAEPersonStorageReportServiceImpl.CUAEPersonStorageReportDAO; nested exception is org.springframework.beans.factory.NoUniqueBeanDefinitionException: No qualifying bean of type [com.atmire.dspace.cua.dao.storage.CUAEPersonStorageReportDAO] is defined: expected single matching bean but found 2: com.atmire.dspace.cua.dao.impl.CUAStorageReportDAOImpl$CUAEPersonStorageReportDAOImpl#0,com.atmire.dspace.cua.dao.impl.CUAStorageReportDAOImpl$CUAEPersonStorageReportDAOImpl#1
<li>I had told Atmire about this several weeks ago… but I reminded them again in the ticket
<ul>
<li>Atmire says they are able to build fine, so I tried again and noticed that I had been building with <code>-Denv=dspacetest.cgiar.org</code>, which is not necessary for DSpace 6 of course</li>
<li>Once I removed that it builds fine</li>
</ul>
</li>
<li>I quickly re-applied the Font Awesome 5 changes to use SVG+JS instead of web fonts (from 2020-04) and things are looking good!</li>
<li>I need to export some Solr statistics data from CGSpace to test Salem’s modifications to the dspace-statistics-api
<ul>
<li>He modified it to query Solr on the fly instead of indexing it, which will be heavier and slower, but allows us to get more granular stats and countries/cities</li>
<li>Because have so many records I want to use solr-import-export-json to get several months at a time with a date range, but it seems there are first issues with curl (need to disable globbing with <code>-g</code> and URL encode the range)</li>
<li>For reference, the <ahref="https://lucene.apache.org/solr/4_10_2/solr-core/org/apache/solr/schema/DateField.html">Solr 4.10.x DateField docs</a></li>
<li>This range works in Solr UI: <code>[2019-01-01T00:00:00Z TO 2019-06-30T23:59:59Z]</code></li>
<li>But not in solr-import-export-json… hmmm… seems we need to URL encode <em>only</em> the date range itself, but not the brackets:</li>