
495 lines
22 KiB

<!DOCTYPE html>
<html lang="en">
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:title" content="July, 2018" />
<meta property="og:description" content="2018-07-01
I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:
$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
During the mvn package stage on the 5.8 branch I kept getting issues with java running out of memory:
There is insufficient memory for the Java Runtime Environment to continue.
" />
<meta property="og:type" content="article" />
<meta property="og:url" content="" />
<meta property="article:published_time" content="2018-07-01T12:56:54&#43;03:00"/>
<meta property="article:modified_time" content="2018-07-10T17:19:06&#43;03:00"/>
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="July, 2018"/>
<meta name="twitter:description" content="2018-07-01
I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:
$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
During the mvn package stage on the 5.8 branch I kept getting issues with java running out of memory:
There is insufficient memory for the Java Runtime Environment to continue.
<meta name="generator" content="Hugo 0.42.2" />
<script type="application/ld+json">
"@context": "",
"@type": "BlogPosting",
"headline": "July, 2018",
"url": "",
"wordCount": "1678",
"datePublished": "2018-07-01T12:56:54&#43;03:00",
"dateModified": "2018-07-10T17:19:06&#43;03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
"keywords": "Notes"
<link rel="canonical" href="">
<title>July, 2018 | CGSpace Notes</title>
<!-- combined, minified CSS -->
<link href="" rel="stylesheet" integrity="sha384-TbfEhJn4HkgPUIZUhhHaAYsycYKHxSuIloCjZOiyCSpbVunRQxg5T5pxKVFwxilF" crossorigin="anonymous">
<div class="blog-masthead">
<div class="container">
<nav class="nav blog-nav">
<a class="nav-link " href="">Home</a>
<header class="blog-header">
<div class="container">
<h1 class="blog-title"><a href="" rel="home">CGSpace Notes</a></h1>
<p class="lead blog-description">Documenting day-to-day work on the <a href="">CGSpace</a> repository.</p>
<div class="container">
<div class="row">
<div class="col-sm-8 blog-main">
<article class="blog-post">
<h2 class="blog-post-title"><a href="">July, 2018</a></h2>
<p class="blog-post-meta"><time datetime="2018-07-01T12:56:54&#43;03:00">Sun Jul 01, 2018</time> by Alan Orth in
<i class="fa fa-tag" aria-hidden="true"></i>&nbsp;<a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
<h2 id="2018-07-01">2018-07-01</h2>
<li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
<pre><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
<li>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</li>
<pre><code>There is insufficient memory for the Java Runtime Environment to continue.
<li>As the machine only has 8GB of RAM, I reduced the Tomcat memory heap from 5120m to 4096m so I could try to allocate more to the build process:</li>
<pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx1024m&quot;
$ mvn -U -Dmirage2.on=true -Dmirage2.deps.included=false -P \!dspace-lni,\!dspace-rdf,\!dspace-sword,\!dspace-swordv2 clean package
<li>Then I stopped the Tomcat 7 service, ran the ant update, and manually ran the old and ignored SQL migrations:</li>
<pre><code>$ sudo su - postgres
$ psql dspace
dspace=# begin;
dspace=# \i Atmire-DSpace-5.8-Schema-Migration.sql
dspace=# commit
dspace=# \q
$ exit
$ dspace database migrate ignored
<li>After that I started Tomcat 7 and DSpace seems to be working, now I need to tell our colleagues to try stuff and report issues they have</li>
<h2 id="2018-07-02">2018-07-02</h2>
<li>Discuss AgriKnowledge including our Handle identifier on their harvested items from CGSpace</li>
<li>They seem to be only interested in Gates-funded outputs, for example: <a href=""></a></li>
<h2 id="2018-07-03">2018-07-03</h2>
<li>Finally finish with the CIFOR Archive records (a total of 2448):
<li>I mapped the 50 items that were duplicates from elsewhere in CGSpace into <a href="">CIFOR Archive</a></li>
<li>I did one last check of the remaining 2398 items and found eight who have a <code>cg.identifier.doi</code> that links to some URL other than a DOI so I moved those to <code>cg.identifier.url</code> and <code>cg.identifier.googleurl</code> as appropriate</li>
<li>Also, thirteen items had a DOI in their citation, but did not have a <code>cg.identifier.doi</code> field, so I added those</li>
<li>Then I imported those 2398 items in two batches (to deal with memory issues):</li>
<pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx1024m&quot;
$ dspace metadata-import -e -f /tmp/2018-06-27-New-CIFOR-Archive.csv
$ dspace metadata-import -e -f /tmp/2018-06-27-New-CIFOR-Archive2.csv
<li>I noticed there are many items that use HTTP instead of HTTPS for their Google Books URL, and some missing HTTP entirely:</li>
<pre><code>dspace=# select count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=222 and text_value like '';
dspace=# select count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=222 and text_value ~ '^books\.google\..*';
<li>I think I should fix that as well as some other garbage values like &ldquo;test&rdquo; and &ldquo;; etc:</li>
<pre><code>dspace=# begin;
dspace=# update metadatavalue set text_value = regexp_replace(text_value, '', '') where resource_type_id=2 and metadata_field_id=222 and text_value like '';
dspace=# update metadatavalue set text_value = regexp_replace(text_value, '', '') where resource_type_id=2 and metadata_field_id=222 and text_value ~ '^books\.google\..*';
dspace=# update metadatavalue set text_value='' where resource_type_id=2 and metadata_field_id=222 and text_value='meF1CLdPSF4C';
dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=222 and metadata_value_id in (2299312, 10684, 10700, 996403);
dspace=# commit;
<li>Testing DSpace 5.8 with PostgreSQL 9.6 and Tomcat 8.5.32 (instead of my usual 7.0.88) and for some reason I get autowire errors on Catalina startup with 8.5.32:</li>
<pre><code>03-Jul-2018 19:51:37.272 SEVERE [localhost-startStop-1] org.apache.catalina.core.StandardContext.listenerStart Exception sending context initialized event to listener instance of class [org.dspace.servicemanager.servlet.DSpaceKernelServletContextListener]
java.lang.RuntimeException: Failure during filter init: Failed to startup the DSpace Service Manager: failure starting up spring service manager: Error creating bean with name 'conversionService' defined in file [/home/aorth/dspace/config/spring/xmlui/spring-dspace-addon-cua-services.xml]: Cannot create inner bean '$ColumnsConverter#3f6c3e6a' of type [$ColumnsConverter] while setting bean property 'converters' with key [1]; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name '$ColumnsConverter#3f6c3e6a': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: private$FilterConverter$ColumnsConverter.filterConverter; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No matching bean of type [$FilterConverter] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {@org.springframework.beans.factory.annotation.Autowired(required=true)}
at org.dspace.servicemanager.servlet.DSpaceKernelServletContextListener.contextInitialized(
at org.apache.catalina.core.StandardContext.listenerStart(
at org.apache.catalina.core.StandardContext.startInternal(
at org.apache.catalina.util.LifecycleBase.start(
at org.apache.catalina.core.ContainerBase.addChildInternal(
at org.apache.catalina.core.ContainerBase.addChild(
at org.apache.catalina.core.StandardHost.addChild(
at org.apache.catalina.startup.HostConfig.deployDescriptor(
at org.apache.catalina.startup.HostConfig$
at java.util.concurrent.Executors$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
Caused by: java.lang.RuntimeException: Failed to startup the DSpace Service Manager: failure starting up spring service manager: Error creating bean with name 'conversionService' defined in file [/home/aorth/dspace/config/spring/xmlui/spring-dspace-addon-cua-services.xml]: Cannot create inner bean '$ColumnsConverter#3f6c3e6a' of type [$ColumnsConverter] while setting bean property 'converters' with key [1]; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name '$ColumnsConverter#3f6c3e6a': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: private$FilterConverter$ColumnsConverter.filterConverter; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No matching bean of type [$FilterConverter] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {@org.springframework.beans.factory.annotation.Autowired(required=true)}
<li>Gotta check that out later&hellip;</li>
<h2 id="2018-07-04">2018-07-04</h2>
<li>I verified that the autowire error indeed only occurs on Tomcat 8.5, but the application works fine on Tomcat 7</li>
<li>I have raised this in the <a href="">DSpace 5.8 compatibility ticket on Atmire&rsquo;s tracker</a></li>
<li>Abenet wants me to add &ldquo;United Kingdom government&rdquo; to the sponsors on CGSpace so I created a ticket to track it (<a href="">#381</a>)</li>
<li>Also, Udana wants me to add &ldquo;Enhancing Sustainability Across Agricultural Systems&rdquo; to the WLE Phase II research themes so I created a ticket to track that (<a href="">#382</a></li>
<li>I need to try to finish this DSpace 5.8 business first because I have too many branches with cherry-picks going on right now!</li>
<h2 id="2018-07-06">2018-07-06</h2>
<li>CCAFS want me to add &ldquo;PII-FP2_MSCCCAFS&rdquo; to their Phase II project tags on CGSpace (<a href="">#383</a></li>
<li>I&rsquo;ll do it in a batch with all the other metadata updates next week</li>
<h2 id="2018-07-08">2018-07-08</h2>
<li>I was tempted to do the Linode instance upgrade on CGSpace (linode18), but after looking closely at the system backups I noticed that Solr isn&rsquo;t being backed up to S3</li>
<li>I apparently noticed this—and fixed it!—in <a href="/cgspace-notes/2016-07/">2016-07</a>, but it doesn&rsquo;t look like the backup has been updated since then!</li>
<li>It looks like I added Solr to the <code></code> script, but that script is not even being used (<code>s3cmd</code> is run directly from root&rsquo;s crontab)</li>
<li>For now I have just initiated a manual S3 backup of the Solr data:</li>
<pre><code># s3cmd sync --delete-removed /home/backup/solr/ s3://
<li>But I need to add this to cron!</li>
<li>I wonder if I should convert some of the cron jobs to systemd services / timers&hellip;</li>
<li>I sent a note to all our users on Yammer to ask them about possible maintenance on Sunday, July 14th</li>
<li>Abenet wants to be able to search by journal title (dc.source) in the advanced Discovery search so I opened an issue for it (<a href="">#384</a>)</li>
<li>I regenerated the list of names for all our ORCID iDs using my <a href=""></a> script:</li>
<pre><code>$ grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml | sort | uniq &gt; /tmp/2018-07-08-orcids.txt
$ ./ -i /tmp/2018-07-08-orcids.txt -o /tmp/2018-07-08-names.txt -d
<li>But after comparing to the existing list of names I didn&rsquo;t see much change, so I just ignored it</li>
<h2 id="2018-07-09">2018-07-09</h2>
<li>Uptime Robot said that CGSpace was down for two minutes early this morning but I don&rsquo;t see anything in Tomcat logs or dmesg</li>
<li>Uptime Robot said that CGSpace was down for two minutes again later in the day, and this time I saw a memory error in Tomcat&rsquo;s <code>catalina.out</code>:</li>
<pre><code>Exception in thread &quot;http-bio-; java.lang.OutOfMemoryError: Java heap space
<li>I&rsquo;m not sure if it&rsquo;s the same error, but I see this in DSpace&rsquo;s <code>solr.log</code>:</li>
<pre><code>2018-07-09 06:25:09,913 ERROR org.apache.solr.servlet.SolrDispatchFilter @ null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
<li>I see a strange error around that time in <code>dspace.log.2018-07-08</code>:</li>
<pre><code>2018-07-09 06:23:43,510 ERROR com.atmire.statistics.SolrLogThread @ IOException occured when talking to server at: http://localhost:8081/solr/statistics
org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://localhost:8081/solr/statistics
<li>But not sure what caused that&hellip;</li>
<li>I got a message from Linode tonight that CPU usage was high on CGSpace for the past few hours around 8PM GMT</li>
<li>Looking in the nginx logs I see the top ten IP addresses active today:</li>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;09/Jul/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
<li>Of those, <em>all</em> except <code></code> and <code></code> are <em>NOT</em> re-using their Tomcat sessions, for example from the XMLUI logs:</li>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=' dspace.log.2018-07-09
<li><code></code> appears to be Yandex, so I dunno why it&rsquo;s creating so many sessions, as its user agent should match Tomcat&rsquo;s Crawler Session Manager Valve</li>
<li><code></code> is on MediaTemple but I&rsquo;m not sure who it is. They are mostly hitting REST so I guess that&rsquo;s fine</li>
<li><code></code> doesn&rsquo;t declare a user agent and is on Google Cloud, so I should probably mark them as a bot in nginx</li>
<li><code></code> is Yandex again</li>
<li><code></code> is Bing</li>
<li><code></code> is Bing</li>
<li><code></code> is our old friend CORE bot</li>
<li><code></code> doesn&rsquo;t declare a user agent and lives on HostGator, but mostly just hits the REST API so I guess that&rsquo;s fine</li>
<li><code></code> is Bing again</li>
<li>Interestingly, the first time that I see <code></code> was on 2018-06-08</li>
<li>I&rsquo;ve added <code></code> to the bot tagging logic in the nginx vhost</li>
<h2 id="2018-07-10">2018-07-10</h2>
<li>Add &ldquo;United Kingdom government&rdquo; to sponsors (<a href="">#381</a>)</li>
<li>Add &ldquo;Enhancing Sustainability Across Agricultural Systems&rdquo; to WLE Phase II Research Themes (<a href="">#382</a>)</li>
<li>Add &ldquo;PII-FP2_MSCCCAFS&rdquo; to CCAFS Phase II Project Tags (<a href="">#383</a>)</li>
<li>Add journal title (dc.source) to Discovery search filters (<a href="">#384</a>)</li>
<li>All were tested and merged to the <code>5_x-prod</code> branch and will be deployed on CGSpace this coming weekend when I do the Linode server upgrade</li>
<li>I need to get them onto the 5.8 testing branch too, either via cherry-picking or by rebasing after we finish testing Atmire&rsquo;s 5.8 pull request (<a href="">#378</a>)</li>
<li>Linode sent an alert about CPU usage on CGSpace again, about 13:00UTC</li>
<li>These are the top ten users in the last two hours:</li>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;10/Jul/2018:(11|12|13)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
<li>Looks like <code></code> is Moayad testing his new CGSpace vizualization thing:</li>
<pre><code> - - [10/Jul/2018:13:39:41 +0000] &quot;GET /bitstream/handle/10568/75668/dryad.png HTTP/2.0&quot; 200 53750 &quot;http://localhost:4200/&quot; &quot;Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36&quot;
<li>He said there was a bug that caused his app to request a bunch of invalid URLs</li>
<li>I&rsquo;ll have to keep and eye on this and see how their platform evolves</li>
<!-- vim: set sw=2 ts=2: -->
</div> <!-- /.blog-main -->
<aside class="col-sm-3 ml-auto blog-sidebar">
<section class="sidebar-module">
<h4>Recent Posts</h4>
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2018-07/">July, 2018</a></li>
<li><a href="/cgspace-notes/2018-06/">June, 2018</a></li>
<li><a href="/cgspace-notes/2018-05/">May, 2018</a></li>
<li><a href="/cgspace-notes/2018-04/">April, 2018</a></li>
<li><a href="/cgspace-notes/2018-03/">March, 2018</a></li>
<section class="sidebar-module">
<ol class="list-unstyled">
<li><a href="">CGSpace</a></li>
<li><a href="">DSpace Test</a></li>
<li><a href="">CGSpace @ GitHub</a></li>
</div> <!-- /.row -->
</div> <!-- /.container -->
<footer class="blog-footer">
Blog template created by <a href="">@mdo</a>, ported to Hugo by <a href=''>@mralanorth</a>.
<a href="#">Back to top</a>