Add notes for 2020-01-27

This commit is contained in:
2020-01-27 16:20:44 +02:00
parent 207ace0883
commit 8feb93be39
112 changed files with 11466 additions and 5158 deletions

View File

@ -15,10 +15,10 @@ DSpace Test had crashed at some point yesterday morning and I see the following
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight
From the DSpace log I see that eventually Solr stopped responding, so I guess the java process that was OOM killed above was Tomcat's
I'm not sure why Tomcat didn't crash with an OutOfMemoryError…
From the DSpace log I see that eventually Solr stopped responding, so I guess the java process that was OOM killed above was Tomcat’s
I’m not sure why Tomcat didn’t crash with an OutOfMemoryError…
Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core
The server only has 8GB of RAM so we'll eventually need to upgrade to a larger one because we'll start starving the OS, PostgreSQL, and command line batch processes
The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes
I ran all system updates on DSpace Test and rebooted it
" />
<meta property="og:type" content="article" />
@ -37,13 +37,13 @@ DSpace Test had crashed at some point yesterday morning and I see the following
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight
From the DSpace log I see that eventually Solr stopped responding, so I guess the java process that was OOM killed above was Tomcat&#39;s
I&#39;m not sure why Tomcat didn&#39;t crash with an OutOfMemoryError&hellip;
From the DSpace log I see that eventually Solr stopped responding, so I guess the java process that was OOM killed above was Tomcat&rsquo;s
I&rsquo;m not sure why Tomcat didn&rsquo;t crash with an OutOfMemoryError&hellip;
Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core
The server only has 8GB of RAM so we&#39;ll eventually need to upgrade to a larger one because we&#39;ll start starving the OS, PostgreSQL, and command line batch processes
The server only has 8GB of RAM so we&rsquo;ll eventually need to upgrade to a larger one because we&rsquo;ll start starving the OS, PostgreSQL, and command line batch processes
I ran all system updates on DSpace Test and rebooted it
"/>
<meta name="generator" content="Hugo 0.62.2" />
<meta name="generator" content="Hugo 0.63.1" />
@ -73,7 +73,7 @@ I ran all system updates on DSpace Test and rebooted it
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.a20c1a4367639632cdb341d23c27ca44fedcc75b0f8b3cbea6203010da153d3c.css" rel="stylesheet" integrity="sha256-ogwaQ2djljLNs0HSPCfKRP7cx1sPizy&#43;piAwENoVPTw=" crossorigin="anonymous">
<link href="https://alanorth.github.io/cgspace-notes/css/style.23e2c3298bcc8c1136c19aba330c211ec94c36f7c4454ea15cf4d3548370042a.css" rel="stylesheet" integrity="sha256-I&#43;LDKYvMjBE2wZq6MwwhHslMNvfERU6hXPTTVINwBCo=" crossorigin="anonymous">
<!-- RSS 2.0 feed -->
@ -120,7 +120,7 @@ I ran all system updates on DSpace Test and rebooted it
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-08/">August, 2018</a></h2>
<p class="blog-post-meta"><time datetime="2018-08-01T11:52:54&#43;03:00">Wed Aug 01, 2018</time> by Alan Orth in
<i class="fa fa-folder" aria-hidden="true"></i>&nbsp;<a href="/cgspace-notes/categories/notes" rel="category tag">Notes</a>
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes" rel="category tag">Notes</a>
</p>
@ -134,10 +134,10 @@ I ran all system updates on DSpace Test and rebooted it
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
</code></pre><ul>
<li>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</li>
<li>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat's</li>
<li>I'm not sure why Tomcat didn't crash with an OutOfMemoryError&hellip;</li>
<li>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat&rsquo;s</li>
<li>I&rsquo;m not sure why Tomcat didn&rsquo;t crash with an OutOfMemoryError&hellip;</li>
<li>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</li>
<li>The server only has 8GB of RAM so we'll eventually need to upgrade to a larger one because we'll start starving the OS, PostgreSQL, and command line batch processes</li>
<li>The server only has 8GB of RAM so we&rsquo;ll eventually need to upgrade to a larger one because we&rsquo;ll start starving the OS, PostgreSQL, and command line batch processes</li>
<li>I ran all system updates on DSpace Test and rebooted it</li>
</ul>
<ul>
@ -152,23 +152,23 @@ I ran all system updates on DSpace Test and rebooted it
</ul>
<h2 id="2018-08-02">2018-08-02</h2>
<ul>
<li>DSpace Test crashed again and I don't see the only error I see is this in <code>dmesg</code>:</li>
<li>DSpace Test crashed again and I don&rsquo;t see the only error I see is this in <code>dmesg</code>:</li>
</ul>
<pre><code>[Thu Aug 2 00:00:12 2018] Out of memory: Kill process 1407 (java) score 787 or sacrifice child
[Thu Aug 2 00:00:12 2018] Killed process 1407 (java) total-vm:18876328kB, anon-rss:6323836kB, file-rss:0kB, shmem-rss:0kB
</code></pre><ul>
<li>I am still assuming that this is the Tomcat process that is dying, so maybe actually we need to reduce its memory instead of increasing it?</li>
<li>The risk we run there is that we'll start getting OutOfMemory errors from Tomcat</li>
<li>The risk we run there is that we&rsquo;ll start getting OutOfMemory errors from Tomcat</li>
<li>So basically we need a new test server with more RAM very soon&hellip;</li>
<li>Abenet asked about the workflow statistics in the Atmire CUA module again</li>
<li>Last year Atmire told me that it's disabled by default but you can enable it with <code>workflow.stats.enabled = true</code> in the CUA configuration file</li>
<li>There was a bug with adding users so they sent a patch, but I didn't merge it because it was <a href="https://github.com/ilri/DSpace/pull/319">very dirty</a> and I wasn't sure it actually fixed the problem</li>
<li>I just tried to enable the stats again on DSpace Test now that we're on DSpace 5.8 with updated Atmire modules, but every user I search for shows &ldquo;No data available&rdquo;</li>
<li>Last year Atmire told me that it&rsquo;s disabled by default but you can enable it with <code>workflow.stats.enabled = true</code> in the CUA configuration file</li>
<li>There was a bug with adding users so they sent a patch, but I didn&rsquo;t merge it because it was <a href="https://github.com/ilri/DSpace/pull/319">very dirty</a> and I wasn&rsquo;t sure it actually fixed the problem</li>
<li>I just tried to enable the stats again on DSpace Test now that we&rsquo;re on DSpace 5.8 with updated Atmire modules, but every user I search for shows &ldquo;No data available&rdquo;</li>
<li>As a test I submitted a new item and I was able to see it in the workflow statistics &ldquo;data&rdquo; tab, but not in the graph</li>
</ul>
<h2 id="2018-08-15">2018-08-15</h2>
<ul>
<li>Run through Peter's list of author affiliations from earlier this month</li>
<li>Run through Peter&rsquo;s list of author affiliations from earlier this month</li>
<li>I did some quick sanity checks and small cleanups in Open Refine, checking for spaces, weird accents, and encoding errors</li>
<li>Finally I did a test run with the <a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897"><code>fix-metadata-value.py</code></a> script:</li>
</ul>
@ -210,8 +210,8 @@ Verchot, L.V.
Verchot, LV
Verchot, Louis V.
</code></pre><ul>
<li>I'll just tag them all with Louis Verchot's ORCID identifier&hellip;</li>
<li>In the end, I'll run the following CSV with my <a href="https://gist.github.com/alanorth/a49d85cd9c5dea89cddbe809813a7050">add-orcid-identifiers-csv.py</a> script:</li>
<li>I&rsquo;ll just tag them all with Louis Verchot&rsquo;s ORCID identifier&hellip;</li>
<li>In the end, I&rsquo;ll run the following CSV with my <a href="https://gist.github.com/alanorth/a49d85cd9c5dea89cddbe809813a7050">add-orcid-identifiers-csv.py</a> script:</li>
</ul>
<pre><code>dc.contributor.author,cg.creator.id
&quot;Campbell, Bruce&quot;,Bruce M Campbell: 0000-0002-0123-4859
@ -290,17 +290,17 @@ sys 2m20.248s
# grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=5.9.6.51' dspace.log.2018-08-19
1724
</code></pre><ul>
<li>I don't even know how its possible for the bot to use MORE sessions than total requests&hellip;</li>
<li>I don&rsquo;t even know how its possible for the bot to use MORE sessions than total requests&hellip;</li>
<li>The user agent is:</li>
</ul>
<pre><code>Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)
</code></pre><ul>
<li>So I'm thinking we should add &ldquo;crawl&rdquo; to the Tomcat Crawler Session Manager valve, as we already have &ldquo;bot&rdquo; that catches Googlebot, Bingbot, etc.</li>
<li>So I&rsquo;m thinking we should add &ldquo;crawl&rdquo; to the Tomcat Crawler Session Manager valve, as we already have &ldquo;bot&rdquo; that catches Googlebot, Bingbot, etc.</li>
</ul>
<h2 id="2018-08-20">2018-08-20</h2>
<ul>
<li>Help Sisay with some UTF-8 encoding issues in a file Peter sent him</li>
<li>Finish up reconciling Atmire's pull request for DSpace 5.8 changes with the latest status of our <code>5_x-prod</code> branch</li>
<li>Finish up reconciling Atmire&rsquo;s pull request for DSpace 5.8 changes with the latest status of our <code>5_x-prod</code> branch</li>
<li>I had to do some <code>git rev-list --reverse --no-merges oldestcommit..newestcommit</code> and <code>git cherry-pick -S</code> hackery to get everything all in order</li>
<li>After building I ran the Atmire schema migrations and forced old migrations, then did the <code>ant update</code></li>
<li>I tried to build it on DSpace Test, but it seems to still need more RAM to complete (like I experienced last month), so I stopped Tomcat and set <code>JAVA_OPTS</code> to 1024m and tried the <code>mvn package</code> again</li>
@ -308,8 +308,8 @@ sys 2m20.248s
<li>I will try to reduce Tomcat memory from 4608m to 4096m and then retry the <code>mvn package</code> with 1024m of <code>JAVA_OPTS</code> again</li>
<li>After running the <code>mvn package</code> for the third time and waiting an hour, I attached <code>strace</code> to the Java process and saw that it was indeed reading XMLUI theme data&hellip; so I guess I just need to wait more</li>
<li>After waiting two hours the maven process completed and installation was successful</li>
<li>I restarted Tomcat and it seems everything is working well, so I'll merge the pull request and try to schedule the CGSpace upgrade for this coming Sunday, August 26th</li>
<li>I merged <a href="https://github.com/ilri/DSpace/pull/378">Atmire's pull request</a> into our <code>5_x-dspace-5.8</code> temporary brach and then cherry-picked all the changes from <code>5_x-prod</code> since April, 2018 when that temporary branch was created</li>
<li>I restarted Tomcat and it seems everything is working well, so I&rsquo;ll merge the pull request and try to schedule the CGSpace upgrade for this coming Sunday, August 26th</li>
<li>I merged <a href="https://github.com/ilri/DSpace/pull/378">Atmire&rsquo;s pull request</a> into our <code>5_x-dspace-5.8</code> temporary brach and then cherry-picked all the changes from <code>5_x-prod</code> since April, 2018 when that temporary branch was created</li>
<li>As the branch histories are very different I cannot merge the new 5.8 branch into the current <code>5_x-prod</code> branch</li>
<li>Instead, I will archive the current <code>5_x-prod</code> DSpace 5.5 branch as <code>5_x-prod-dspace-5.5</code> and then hard reset <code>5_x-prod</code> based on <code>5_x-dspace-5.8</code></li>
<li>Unfortunately this will mess up the references in pull requests and issues on GitHub</li>
@ -320,8 +320,8 @@ sys 2m20.248s
</ul>
<pre><code>[INFO] Processing overlay [ id org.dspace.modules:xmlui-mirage2]
</code></pre><ul>
<li>It's the same on DSpace Test, my local laptop, and CGSpace&hellip;</li>
<li>It wasn't this way before when I was constantly building the previous 5.8 branch with Atmire patches&hellip;</li>
<li>It&rsquo;s the same on DSpace Test, my local laptop, and CGSpace&hellip;</li>
<li>It wasn&rsquo;t this way before when I was constantly building the previous 5.8 branch with Atmire patches&hellip;</li>
<li>I will restore the previous <code>5_x-dspace-5.8</code> and <code>atmire-module-upgrades-5.8</code> branches to see if the build time is different there</li>
<li>&hellip; it seems that the <code>atmire-module-upgrades-5.8</code> branch still takes 1 hour and 23 minutes on my local machine&hellip;</li>
<li>Let me try to build the old <code>5_x-prod-dspace-5.5</code> branch on my local machine and see how long it takes</li>
@ -330,7 +330,7 @@ sys 2m20.248s
</ul>
<pre><code>[INFO] --- maven-war-plugin:2.4:war (default-war) @ xmlui ---
</code></pre><ul>
<li>And I notice that Atmire changed something in the XMLUI module's <code>pom.xml</code> as part of the DSpace 5.8 changes, specifically to remove the exclude for <code>node_modules</code> in the <code>maven-war-plugin</code> step</li>
<li>And I notice that Atmire changed something in the XMLUI module&rsquo;s <code>pom.xml</code> as part of the DSpace 5.8 changes, specifically to remove the exclude for <code>node_modules</code> in the <code>maven-war-plugin</code> step</li>
<li>This exclude is <em>present</em> in vanilla DSpace, and if I add it back the build time goes from 1 hour 23 minutes to 12 minutes!</li>
<li>It makes sense that it would take longer to complete this step because the <code>node_modules</code> folder has tens of thousands of files, and we have 27 themes!</li>
<li>I need to test to see if this has any side effects when deployed&hellip;</li>
@ -342,14 +342,14 @@ sys 2m20.248s
<li>They say they want to start working on the ContentDM harvester middleware again</li>
<li>I sent a list of the top 1500 author affiliations on CGSpace to CodeObia so we can compare ours with the ones on MELSpace</li>
<li>Discuss CTA items with Sisay, he was trying to figure out how to do the collection mapping in combination with SAFBuilder</li>
<li>It appears that the web UI's upload interface <em>requires</em> you to specify the collection, whereas the CLI interface allows you to omit the collection command line flag and defer to the <code>collections</code> file inside each item in the bundle</li>
<li>It appears that the web UI&rsquo;s upload interface <em>requires</em> you to specify the collection, whereas the CLI interface allows you to omit the collection command line flag and defer to the <code>collections</code> file inside each item in the bundle</li>
<li>I imported the CTA items on CGSpace for Sisay:</li>
</ul>
<pre><code>$ dspace import -a -e s.webshet@cgiar.org -s /home/swebshet/ictupdates_uploads_August_21 -m /tmp/2018-08-23-cta-ictupdates.map
</code></pre><h2 id="2018-08-26">2018-08-26</h2>
<ul>
<li>Doing the DSpace 5.8 upgrade on CGSpace (linode18)</li>
<li>I already finished the Maven build, now I'll take a backup of the PostgreSQL database and do a database cleanup just in case:</li>
<li>I already finished the Maven build, now I&rsquo;ll take a backup of the PostgreSQL database and do a database cleanup just in case:</li>
</ul>
<pre><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-08-26-before-dspace-58.backup dspace
$ dspace cleanup -v
@ -371,7 +371,7 @@ dspace=&gt; \q
$ dspace database migrate ignored
</code></pre><ul>
<li>Then I'll run all system updates and reboot the server:</li>
<li>Then I&rsquo;ll run all system updates and reboot the server:</li>
</ul>
<pre><code>$ sudo su -
# apt update &amp;&amp; apt full-upgrade
@ -380,9 +380,9 @@ $ dspace database migrate ignored
</code></pre><ul>
<li>After reboot I logged in and cleared all the XMLUI caches and everything looked to be working fine</li>
<li>Adam from WLE had asked a few weeks ago about getting the metadata for a bunch of items related to gender from 2013 until now</li>
<li>They want a CSV with <em>all</em> metadata, which the Atmire Listings and Reports module can't do</li>
<li>They want a CSV with <em>all</em> metadata, which the Atmire Listings and Reports module can&rsquo;t do</li>
<li>I exported a list of items from Listings and Reports with the following criteria: from year 2013 until now, have WLE subject <code>GENDER</code> or <code>GENDER POVERTY AND INSTITUTIONS</code>, and CRP <code>Water, Land and Ecosystems</code></li>
<li>Then I extracted the Handle links from the report so I could export each item's metadata as CSV</li>
<li>Then I extracted the Handle links from the report so I could export each item&rsquo;s metadata as CSV</li>
</ul>
<pre><code>$ grep -o -E &quot;[0-9]{5}/[0-9]{0,5}&quot; listings-export.txt &gt; /tmp/iwmi-gender-items.txt
</code></pre><ul>
@ -391,21 +391,21 @@ $ dspace database migrate ignored
<pre><code>$ while read -r line; do dspace metadata-export -f &quot;/tmp/${line/\//-}.csv&quot; -i $line; sleep 2; done &lt; /tmp/iwmi-gender-items.txt
</code></pre><ul>
<li>But from here I realized that each of the fifty-nine items will have different columns in their CSVs, making it difficult to combine them</li>
<li>I'm not sure how to proceed without writing some script to parse and join the CSVs, and I don't think it's worth my time</li>
<li>I tested DSpace 5.8 in Tomcat 8.5.32 and it seems to work now, so I'm not sure why I got those errors last time I tried</li>
<li>I&rsquo;m not sure how to proceed without writing some script to parse and join the CSVs, and I don&rsquo;t think it&rsquo;s worth my time</li>
<li>I tested DSpace 5.8 in Tomcat 8.5.32 and it seems to work now, so I&rsquo;m not sure why I got those errors last time I tried</li>
<li>It could have been a configuration issue, though, as I also reconciled the <code>server.xml</code> with the one in <a href="https://github.com/ilri/rmg-ansible-public">our Ansible infrastructure scripts</a></li>
<li>But now I can start testing and preparing to move DSpace Test to Ubuntu 18.04 + Tomcat 8.5 + OpenJDK + PostgreSQL 9.6&hellip;</li>
<li>Actually, upon closer inspection, it seems that when you try to go to Listings and Reports under Tomcat 8.5.33 you are taken to the JSPUI login page despite having already logged in in XMLUI</li>
<li>If I type my username and password again it <em>does</em> take me to Listings and Reports, though&hellip;</li>
<li>I don't see anything interesting in the Catalina or DSpace logs, so I might have to file a bug with Atmire</li>
<li>For what it's worth, the Content and Usage (CUA) module does load, though I can't seem to get any results in the graph</li>
<li>I don&rsquo;t see anything interesting in the Catalina or DSpace logs, so I might have to file a bug with Atmire</li>
<li>For what it&rsquo;s worth, the Content and Usage (CUA) module does load, though I can&rsquo;t seem to get any results in the graph</li>
<li>I just checked to see if the Listings and Reports issue with using the CGSpace citation field was fixed as planned alongside the DSpace 5.8 upgrades (<a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=589">#589</a></li>
<li>I was able to create a new layout containing only the citation field, so I closed the ticket</li>
</ul>
<h2 id="2018-08-29">2018-08-29</h2>
<ul>
<li>Discuss <a href="https://copo-project.org/copo/">COPO</a> with Martin Mueller</li>
<li>He and the consortium's idea is to use this for metadata annotation (submission?) to all repositories</li>
<li>He and the consortium&rsquo;s idea is to use this for metadata annotation (submission?) to all repositories</li>
<li>It is somehow related to adding events as items in the repository, and then linking related papers, presentations, etc to the event item using <code>dc.relation</code>, etc.</li>
<li>Discuss Linode server charges with Abenet, apparently we want to start charging these to Big Data</li>
</ul>