<li><p>As the machine only has 8GB of RAM, I reduced the Tomcat memory heap from 5120m to 4096m so I could try to allocate more to the build process:</p>
<li><p>After that I started Tomcat 7 and DSpace seems to be working, now I need to tell our colleagues to try stuff and report issues they have</p></li>
<li>Discuss AgriKnowledge including our Handle identifier on their harvested items from CGSpace</li>
<li>They seem to be only interested in Gates-funded outputs, for example: <ahref="https://www.agriknowledge.org/files/tm70mv21t">https://www.agriknowledge.org/files/tm70mv21t</a></li>
<li>I mapped the 50 items that were duplicates from elsewhere in CGSpace into <ahref="https://cgspace.cgiar.org/handle/10568/16702">CIFOR Archive</a></li>
<li>I did one last check of the remaining 2398 items and found eight who have a <code>cg.identifier.doi</code> that links to some URL other than a DOI so I moved those to <code>cg.identifier.url</code> and <code>cg.identifier.googleurl</code> as appropriate</li>
<li>Also, thirteen items had a DOI in their citation, but did not have a <code>cg.identifier.doi</code> field, so I added those</li>
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://books.google', 'https://books.google') where resource_type_id=2 and metadata_field_id=222 and text_value like 'http://books.google.%';
UPDATE 785
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'books.google', 'https://books.google') where resource_type_id=2 and metadata_field_id=222 and text_value ~ '^books\.google\..*';
UPDATE 4
dspace=# update metadatavalue set text_value='https://books.google.com/books?id=meF1CLdPSF4C' where resource_type_id=2 and metadata_field_id=222 and text_value='meF1CLdPSF4C';
UPDATE 1
dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=222 and metadata_value_id in (2299312, 10684, 10700, 996403);
<li><p>Testing DSpace 5.8 with PostgreSQL 9.6 and Tomcat 8.5.32 (instead of my usual 7.0.88) and for some reason I get autowire errors on Catalina startup with 8.5.32:</p>
<pre><code>03-Jul-2018 19:51:37.272 SEVERE [localhost-startStop-1] org.apache.catalina.core.StandardContext.listenerStart Exception sending context initialized event to listener instance of class [org.dspace.servicemanager.servlet.DSpaceKernelServletContextListener]
java.lang.RuntimeException: Failure during filter init: Failed to startup the DSpace Service Manager: failure starting up spring service manager: Error creating bean with name 'conversionService' defined in file [/home/aorth/dspace/config/spring/xmlui/spring-dspace-addon-cua-services.xml]: Cannot create inner bean 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#3f6c3e6a' of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter] while setting bean property 'converters' with key [1]; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#3f6c3e6a': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: private com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter.filterConverter; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No matching bean of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {@org.springframework.beans.factory.annotation.Autowired(required=true)}
at org.dspace.servicemanager.servlet.DSpaceKernelServletContextListener.contextInitialized(DSpaceKernelServletContextListener.java:92)
at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4792)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5256)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:754)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:730)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:734)
at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:629)
at org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1839)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Failed to startup the DSpace Service Manager: failure starting up spring service manager: Error creating bean with name 'conversionService' defined in file [/home/aorth/dspace/config/spring/xmlui/spring-dspace-addon-cua-services.xml]: Cannot create inner bean 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#3f6c3e6a' of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter] while setting bean property 'converters' with key [1]; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#3f6c3e6a': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: private com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter.filterConverter; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No matching bean of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {@org.springframework.beans.factory.annotation.Autowired(required=true)}
<li>I verified that the autowire error indeed only occurs on Tomcat 8.5, but the application works fine on Tomcat 7</li>
<li>I have raised this in the <ahref="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 compatibility ticket on Atmire’s tracker</a></li>
<li>Abenet wants me to add “United Kingdom government” to the sponsors on CGSpace so I created a ticket to track it (<ahref="https://github.com/ilri/DSpace/issues/381">#381</a>)</li>
<li>Also, Udana wants me to add “Enhancing Sustainability Across Agricultural Systems” to the WLE Phase II research themes so I created a ticket to track that (<ahref="https://github.com/ilri/DSpace/issues/382">#382</a>)</li>
<li>CCAFS want me to add “PII-FP2_MSCCCAFS” to their Phase II project tags on CGSpace (<ahref="https://github.com/ilri/DSpace/issues/383">#383</a>)</li>
<li>I’ll do it in a batch with all the other metadata updates next week</li>
</ul>
<h2id="2018-07-08">2018-07-08</h2>
<ul>
<li>I was tempted to do the Linode instance upgrade on CGSpace (linode18), but after looking closely at the system backups I noticed that Solr isn’t being backed up to S3</li>
<li>I apparently noticed this—and fixed it!—in <ahref="/cgspace-notes/2016-07/">2016-07</a>, but it doesn’t look like the backup has been updated since then!</li>
<li>It looks like I added Solr to the <code>backup_to_s3.sh</code> script, but that script is not even being used (<code>s3cmd</code> is run directly from root’s crontab)</li>
<li><p>I wonder if I should convert some of the cron jobs to systemd services / timers…</p></li>
<li><p>I sent a note to all our users on Yammer to ask them about possible maintenance on Sunday, July 14th</p></li>
<li><p>Abenet wants to be able to search by journal title (dc.source) in the advanced Discovery search so I opened an issue for it (<ahref="https://github.com/ilri/DSpace/issues/384">#384</a>)</p></li>
<li><p>I regenerated the list of names for all our ORCID iDs using my <ahref="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script:</p>
<li><p>Uptime Robot said that CGSpace was down for two minutes again later in the day, and this time I saw a memory error in Tomcat’s <code>catalina.out</code>:</p>
<pre><code>2018-07-09 06:23:43,510 ERROR com.atmire.statistics.SolrLogThread @ IOException occured when talking to server at: http://localhost:8081/solr/statistics
org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://localhost:8081/solr/statistics
<li><p>Of those, <em>all</em> except <code>70.32.83.92</code> and <code>50.116.102.77</code> are <em>NOT</em> re-using their Tomcat sessions, for example from the XMLUI logs:</p>
<li><p><code>95.108.181.88</code> appears to be Yandex, so I dunno why it’s creating so many sessions, as its user agent should match Tomcat’s Crawler Session Manager Valve</p></li>
<li><p><code>70.32.83.92</code> is on MediaTemple but I’m not sure who it is. They are mostly hitting REST so I guess that’s fine</p></li>
<li><p><code>35.227.26.162</code> doesn’t declare a user agent and is on Google Cloud, so I should probably mark them as a bot in nginx</p></li>
<li><p><code>178.154.200.38</code> is Yandex again</p></li>
<li><p><code>207.46.13.47</code> is Bing</p></li>
<li><p><code>157.55.39.234</code> is Bing</p></li>
<li><p><code>137.108.70.6</code> is our old friend CORE bot</p></li>
<li><p><code>50.116.102.77</code> doesn’t declare a user agent and lives on HostGator, but mostly just hits the REST API so I guess that’s fine</p></li>
<li><p><code>40.77.167.84</code> is Bing again</p></li>
<li><p>Interestingly, the first time that I see <code>35.227.26.162</code> was on 2018-06-08</p></li>
<li><p>I’ve added <code>35.227.26.162</code> to the bot tagging logic in the nginx vhost</p></li>
<li>Add “United Kingdom government” to sponsors (<ahref="https://github.com/ilri/DSpace/issues/381">#381</a>)</li>
<li>Add “Enhancing Sustainability Across Agricultural Systems” to WLE Phase II Research Themes (<ahref="https://github.com/ilri/DSpace/issues/382">#382</a>)</li>
<li>Add “PII-FP2_MSCCCAFS” to CCAFS Phase II Project Tags (<ahref="https://github.com/ilri/DSpace/issues/383">#383</a>)</li>
<li>Add journal title (dc.source) to Discovery search filters (<ahref="https://github.com/ilri/DSpace/issues/384">#384</a>)</li>
<li>All were tested and merged to the <code>5_x-prod</code> branch and will be deployed on CGSpace this coming weekend when I do the Linode server upgrade</li>
<li>I need to get them onto the 5.8 testing branch too, either via cherry-picking or by rebasing after we finish testing Atmire’s 5.8 pull request (<ahref="https://github.com/ilri/DSpace/pull/378">#378</a>)</li>
<li>Linode sent an alert about CPU usage on CGSpace again, about 13:00UTC</li>
<li><p>I think the problem is that, despite the bot requesting <code>robots.txt</code>, it almost exlusively requests dynamic pages from <code>/discover</code>:</p>
<li><p>So this bot is just like Baiduspider, and I need to add it to the nginx rate limiting</p></li>
<li><p>I’ll also add it to Tomcat’s Crawler Session Manager Valve to force the re-use of a common Tomcat sesssion for all crawlers just in case</p></li>
<li><p>Generate a list of all affiliations in CGSpace to send to Mohamed Salem to compare with the list on MEL (sorting the list by most occurrences):</p>
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc) to /tmp/affiliations.csv with csv header
<li><p>Generate a list of affiliations for Peter and Abenet to go over so we can batch correct them before we deploy the new data visualization dashboard:</p>
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'affiliation') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/affiliations.csv with csv header;
<li>Run all system updates on CGSpace, add latest metadata changes from last week, and start the Linode instance upgrade</li>
<li>After the upgrade I see we have more disk space available in the instance’s dashboard, so I shut the instance down and resized it from 392GB to 650GB</li>
<li>The resize was very quick (less than one minute) and after booting the instance back up I now have 631GB for the root filesystem (with 267GB available)!</li>
<li>Peter had asked a question about how mapped items are displayed in the Altmetric dashboard</li>
<li>For example, <ahref="10568/82810"><sup>10568</sup>⁄<sub>82810</sub></a> is mapped to four collections, but only shows up in one “department” in their dashboard</li>
<li>Altmetric help said that <ahref="https://cgspace.cgiar.org/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:cgspace.cgiar.org:10568/82810">according to OAI that item is only in one department</a></li>
<li><p>Now I see four colletions in OAI for that item!</p></li>
<li><p>I need to ask the dspace-tech mailing list if the nightly OAI import catches the case of old items that have had metadata or mappings change</p></li>
<li><p>ICARDA sent me a list of the ORCID iDs they have in the MEL system and it looks like almost 150 are new and unique to us!</p>
<li><p>I combined the two lists and regenerated the names for all our the ORCID iDs using my <ahref="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script:</p>
<li><p>Then I added the XML formatting for controlled vocabularies, sorted the list with GNU sort in vim via <code>% !sort</code> and then checked the formatting with tidy:</p>
<li>Participate in call with IWMI and WLE to discuss Altmetric, CGSpace, and social media</li>
<li>I told them that they should try to be including the Handle link on their social media shares because that’s the only way to get Altmetric to notice them and associate them with their DOIs</li>
<li>I suggested that we should have a wider meeting about this, and that I would post that on Yammer</li>
<li>I tested a submission via SAF bundle to DSpace 5.8 and it worked fine</li>
<li>In addition to testing DSpace 5.8, I specifically wanted to see if the issue with specifying collections in metadata instead of on the command line would work (<ahref="https://jira.duraspace.org/browse/DS-3583">DS-3583</a>)</li>
<li>Post a note on Yammer about Altmetric and Handle best practices</li>
<li>Update PostgreSQL JDBC jar from 42.2.2 to 42.2.4 in the <ahref="https://github.com/ilri/rmg-ansible-public">RMG Ansible playbooks</a></li>
<li>IWMI asked why all the dates in their <ahref="https://cgspace.cgiar.org/open-search/discover?query=dateIssued:2018&scope=10568/16814&sort_by=2&order=DESC&rpp=100&format=rss">OpenSearch RSS feed</a> show up as January 01, 2018</li>
<li>On closer inspection I notice that many of their items use “2018” as their <code>dc.date.issued</code>, which is a valid ISO 8601 date but it’s not very specific so DSpace assumes it is January 01, 2018 00:00:00…</li>
<li>I told her that they need to start using more accurate dates for their issue dates</li>
<li>In the example item I looked at the DOI has a publish date of 2018-03-16, so they should really try to capture that</li>
<li>I told the IWMI people that they can use <code>sort_by=3</code> in their OpenSearch query to sort the results by <code>dc.date.accessioned</code> instead of <code>dc.date.issued</code></li>
<li>They say that it is a burden for them to capture the issue dates, so I cautioned them that this is in their own benefit for future posterity and that everyone else on CGSpace manages to capture the issue dates!</li>
<li><p>For future reference, as I had previously noted in <ahref="/cgspace-notes/2018-04/">2018-04</a>, sort options are configured in <code>dspace.cfg</code>, for example:</p>
<li><p>Just because I was curious I made sure that these options are working as expected in DSpace 5.8 on DSpace Test (they are)</p></li>
<li><p>I tested the Atmire Listings and Reports (L&R) module one last time on my local test environment with a new snapshot of CGSpace’s database and re-generated Discovery index and it worked fine</p></li>
<li><p>I finally informed Atmire that we’re ready to proceed with deploying this to CGSpace and that they should advise whether we should wait about the SNAPSHOT versions in <code>pom.xml</code></p></li>
<li><p>There is no word on the issue I reported with Tomcat 8.5.32 yet, though…</p></li>
dspace=# select count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id=15 and text_value ~ '^[0-9]{4}-[0-9]{2}-[0-9]{2}$';
<li>Follow up with Atmire again about the SNAPSHOT versions in our <code>pom.xml</code> because I want to finalize the DSpace 5.8 upgrade soon and I haven’t heard from them in a month (<ahref="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">ticket 560</a>)</li>