<li>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</li>
</ul>
<pre><code>There is insufficient memory for the Java Runtime Environment to continue.
</code></pre>
<p></p>
<ul>
<li>As the machine only has 8GB of RAM, I reduced the Tomcat memory heap from 5120m to 4096m so I could try to allocate more to the build process:</li>
<li>Discuss AgriKnowledge including our Handle identifier on their harvested items from CGSpace</li>
<li>They seem to be only interested in Gates-funded outputs, for example: <ahref="https://www.agriknowledge.org/files/tm70mv21t">https://www.agriknowledge.org/files/tm70mv21t</a></li>
<li>Finally finish with the CIFOR Archive records (a total of 2448):
<ul>
<li>I mapped the 50 items that were duplicates from elsewhere in CGSpace into <ahref="https://cgspace.cgiar.org/handle/10568/16702">CIFOR Archive</a></li>
<li>I did one last check of the remaining 2398 items and found eight who have a <code>cg.identifier.doi</code> that links to some URL other than a DOI so I moved those to <code>cg.identifier.url</code> and <code>cg.identifier.googleurl</code> as appropriate</li>
<li>Also, thirteen items had a DOI in their citation, but did not have a <code>cg.identifier.doi</code> field, so I added those</li>
<li>Then I imported those 2398 items in two batches (to deal with memory issues):</li>
<li>I noticed there are many items that use HTTP instead of HTTPS for their Google Books URL, and some missing HTTP entirely:</li>
</ul>
<pre><code>dspace=# select count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=222 and text_value like 'http://books.google.%';
count
-------
785
dspace=# select count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=222 and text_value ~ '^books\.google\..*';
count
-------
4
</code></pre>
<ul>
<li>I think I should fix that as well as some other garbage values like “test” and “dspace.ilri.org” etc:</li>
</ul>
<pre><code>dspace=# begin;
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://books.google', 'https://books.google') where resource_type_id=2 and metadata_field_id=222 and text_value like 'http://books.google.%';
UPDATE 785
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'books.google', 'https://books.google') where resource_type_id=2 and metadata_field_id=222 and text_value ~ '^books\.google\..*';
UPDATE 4
dspace=# update metadatavalue set text_value='https://books.google.com/books?id=meF1CLdPSF4C' where resource_type_id=2 and metadata_field_id=222 and text_value='meF1CLdPSF4C';
UPDATE 1
dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=222 and metadata_value_id in (2299312, 10684, 10700, 996403);
<li>Testing DSpace 5.8 with PostgreSQL 9.6 and Tomcat 8.5.32 (instead of my usual 7.0.88) and for some reason I get autowire errors on Catalina startup with 8.5.32:</li>
</ul>
<pre><code>03-Jul-2018 19:51:37.272 SEVERE [localhost-startStop-1] org.apache.catalina.core.StandardContext.listenerStart Exception sending context initialized event to listener instance of class [org.dspace.servicemanager.servlet.DSpaceKernelServletContextListener]
java.lang.RuntimeException: Failure during filter init: Failed to startup the DSpace Service Manager: failure starting up spring service manager: Error creating bean with name 'conversionService' defined in file [/home/aorth/dspace/config/spring/xmlui/spring-dspace-addon-cua-services.xml]: Cannot create inner bean 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#3f6c3e6a' of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter] while setting bean property 'converters' with key [1]; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#3f6c3e6a': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: private com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter.filterConverter; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No matching bean of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {@org.springframework.beans.factory.annotation.Autowired(required=true)}
at org.dspace.servicemanager.servlet.DSpaceKernelServletContextListener.contextInitialized(DSpaceKernelServletContextListener.java:92)
at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4792)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5256)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:754)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:730)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:734)
at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:629)
at org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1839)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Failed to startup the DSpace Service Manager: failure starting up spring service manager: Error creating bean with name 'conversionService' defined in file [/home/aorth/dspace/config/spring/xmlui/spring-dspace-addon-cua-services.xml]: Cannot create inner bean 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#3f6c3e6a' of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter] while setting bean property 'converters' with key [1]; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#3f6c3e6a': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: private com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter.filterConverter; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No matching bean of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {@org.springframework.beans.factory.annotation.Autowired(required=true)}
<li>I verified that the autowire error indeed only occurs on Tomcat 8.5, but the application works fine on Tomcat 7</li>
<li>I have raised this in the <ahref="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 compatibility ticket on Atmire’s tracker</a></li>
<li>Abenet wants me to add “United Kingdom government” to the sponsors on CGSpace so I created a ticket to track it (<ahref="https://github.com/ilri/DSpace/issues/381">#381</a>)</li>
<li>Also, Udana wants me to add “Enhancing Sustainability Across Agricultural Systems” to the WLE Phase II research themes so I created a ticket to track that (<ahref="https://github.com/ilri/DSpace/issues/382">#382</a></li>
<li>I need to try to finish this DSpace 5.8 business first because I have too many branches with cherry-picks going on right now!</li>
<li>CCAFS want me to add “PII-FP2_MSCCCAFS” to their Phase II project tags on CGSpace (<ahref="https://github.com/ilri/DSpace/issues/383">#383</a></li>
<li>I’ll do it in a batch with all the other metadata updates next week</li>
</ul>
<h2id="2018-07-08">2018-07-08</h2>
<ul>
<li>I was tempted to do the Linode instance upgrade on CGSpace (linode18), but after looking closely at the system backups I noticed that Solr isn’t being backed up to S3</li>
<li>I apparently noticed this—and fixed it!—in <ahref="/cgspace-notes/2016-07/">2016-07</a>, but it doesn’t look like the backup has been updated since then!</li>
<li>It looks like I added Solr to the <code>backup_to_s3.sh</code> script, but that script is not even being used (<code>s3cmd</code> is run directly from root’s crontab)</li>
<li>For now I have just initiated a manual S3 backup of the Solr data:</li>
<li>Abenet wants to be able to search by journal title (dc.source) in the advanced Discovery search so I opened an issue for it (<ahref="https://github.com/ilri/DSpace/issues/384">#384</a>)</li>
<li>I regenerated the list of names for all our ORCID iDs using my <ahref="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script:</li>
<li>Uptime Robot said that CGSpace was down for two minutes again later in the day, and this time I saw a memory error in Tomcat’s <code>catalina.out</code>:</li>
</ul>
<pre><code>Exception in thread "http-bio-127.0.0.1-8081-exec-557" java.lang.OutOfMemoryError: Java heap space
</code></pre>
<ul>
<li>I’m not sure if it’s the same error, but I see this in DSpace’s <code>solr.log</code>:</li>
</ul>
<pre><code>2018-07-09 06:25:09,913 ERROR org.apache.solr.servlet.SolrDispatchFilter @ null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
</code></pre>
<ul>
<li>I see a strange error around that time in <code>dspace.log.2018-07-08</code>:</li>
</ul>
<pre><code>2018-07-09 06:23:43,510 ERROR com.atmire.statistics.SolrLogThread @ IOException occured when talking to server at: http://localhost:8081/solr/statistics
org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://localhost:8081/solr/statistics
<li>Of those, <em>all</em> except <code>70.32.83.92</code> and <code>50.116.102.77</code> are <em>NOT</em> re-using their Tomcat sessions, for example from the XMLUI logs:</li>
<li><code>95.108.181.88</code> appears to be Yandex, so I dunno why it’s creating so many sessions, as its user agent should match Tomcat’s Crawler Session Manager Valve</li>
<li><code>70.32.83.92</code> is on MediaTemple but I’m not sure who it is. They are mostly hitting REST so I guess that’s fine</li>
<li><code>35.227.26.162</code> doesn’t declare a user agent and is on Google Cloud, so I should probably mark them as a bot in nginx</li>
<li><code>178.154.200.38</code> is Yandex again</li>
<li><code>207.46.13.47</code> is Bing</li>
<li><code>157.55.39.234</code> is Bing</li>
<li><code>137.108.70.6</code> is our old friend CORE bot</li>
<li><code>50.116.102.77</code> doesn’t declare a user agent and lives on HostGator, but mostly just hits the REST API so I guess that’s fine</li>
<li><code>40.77.167.84</code> is Bing again</li>
<li>Interestingly, the first time that I see <code>35.227.26.162</code> was on 2018-06-08</li>
<li>I’ve added <code>35.227.26.162</code> to the bot tagging logic in the nginx vhost</li>
<li>Add “United Kingdom government” to sponsors (<ahref="https://github.com/ilri/DSpace/issues/381">#381</a>)</li>
<li>Add “Enhancing Sustainability Across Agricultural Systems” to WLE Phase II Research Themes (<ahref="https://github.com/ilri/DSpace/issues/382">#382</a>)</li>
<li>Add “PII-FP2_MSCCCAFS” to CCAFS Phase II Project Tags (<ahref="https://github.com/ilri/DSpace/issues/383">#383</a>)</li>
<li>Add journal title (dc.source) to Discovery search filters (<ahref="https://github.com/ilri/DSpace/issues/384">#384</a>)</li>
<li>All were tested and merged to the <code>5_x-prod</code> branch and will be deployed on CGSpace this coming weekend when I do the Linode server upgrade</li>
<li>I need to get them onto the 5.8 testing branch too, either via cherry-picking or by rebasing after we finish testing Atmire’s 5.8 pull request (<ahref="https://github.com/ilri/DSpace/pull/378">#378</a>)</li>
<li>Linode sent an alert about CPU usage on CGSpace again, about 13:00UTC</li>
<li>These are the top ten users in the last two hours:</li>
<li>I think the problem is that, despite the bot requesting <code>robots.txt</code>, it almost exlusively requests dynamic pages from <code>/discover</code>:</li>
<li>So this bot is just like Baiduspider, and I need to add it to the nginx rate limiting</li>
<li>I’ll also add it to Tomcat’s Crawler Session Manager Valve to force the re-use of a common Tomcat sesssion for all crawlers just in case</li>
<li>Generate a list of all affiliations in CGSpace to send to Mohamed Salem to compare with the list on MEL (sorting the list by most occurrences):</li>
</ul>
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc) to /tmp/affiliations.csv with csv header