mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-22 14:45:03 +01:00
303 lines
11 KiB
HTML
303 lines
11 KiB
HTML
<!DOCTYPE html>
|
|
<html lang="en">
|
|
|
|
<head>
|
|
<meta charset="utf-8">
|
|
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
|
|
|
|
<meta property="og:title" content="August, 2018" />
|
|
<meta property="og:description" content="2018-08-01
|
|
|
|
|
|
DSpace Test had crashed at some point yesterday morning and I see the following in dmesg:
|
|
|
|
|
|
[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
|
|
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
|
|
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
|
|
|
|
|
|
|
|
Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight
|
|
From the DSpace log I see that eventually Solr stopped responding, so I guess the java process that was OOM killed above was Tomcat’s
|
|
I’m not sure why Tomcat didn’t crash with an OutOfMemoryError…
|
|
Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core
|
|
The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes
|
|
I ran all system updates on DSpace Test and rebooted it
|
|
|
|
|
|
" />
|
|
<meta property="og:type" content="article" />
|
|
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-08/" />
|
|
|
|
|
|
|
|
<meta property="article:published_time" content="2018-08-01T11:52:54+03:00"/>
|
|
|
|
<meta property="article:modified_time" content="2018-08-15T10:56:38+01:00"/>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<meta name="twitter:card" content="summary"/>
|
|
<meta name="twitter:title" content="August, 2018"/>
|
|
<meta name="twitter:description" content="2018-08-01
|
|
|
|
|
|
DSpace Test had crashed at some point yesterday morning and I see the following in dmesg:
|
|
|
|
|
|
[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
|
|
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
|
|
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
|
|
|
|
|
|
|
|
Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight
|
|
From the DSpace log I see that eventually Solr stopped responding, so I guess the java process that was OOM killed above was Tomcat’s
|
|
I’m not sure why Tomcat didn’t crash with an OutOfMemoryError…
|
|
Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core
|
|
The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes
|
|
I ran all system updates on DSpace Test and rebooted it
|
|
|
|
|
|
"/>
|
|
<meta name="generator" content="Hugo 0.46" />
|
|
|
|
|
|
|
|
<script type="application/ld+json">
|
|
{
|
|
"@context": "http://schema.org",
|
|
"@type": "BlogPosting",
|
|
"headline": "August, 2018",
|
|
"url": "https://alanorth.github.io/cgspace-notes/2018-08/",
|
|
"wordCount": "649",
|
|
"datePublished": "2018-08-01T11:52:54+03:00",
|
|
"dateModified": "2018-08-15T10:56:38+01:00",
|
|
"author": {
|
|
"@type": "Person",
|
|
"name": "Alan Orth"
|
|
},
|
|
"keywords": "Notes"
|
|
}
|
|
</script>
|
|
|
|
|
|
|
|
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2018-08/">
|
|
|
|
<title>August, 2018 | CGSpace Notes</title>
|
|
|
|
<!-- combined, minified CSS -->
|
|
<link href="https://alanorth.github.io/cgspace-notes/css/style.css" rel="stylesheet" integrity="sha384-Upm5uY/SXdvbjuIGH6fBjF5vOYUr9DguqBskM+EQpLBzO9U+9fMVmWEt+TTlGrWQ" crossorigin="anonymous">
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
</head>
|
|
|
|
<body>
|
|
|
|
|
|
<div class="blog-masthead">
|
|
<div class="container">
|
|
<nav class="nav blog-nav">
|
|
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
|
|
</nav>
|
|
</div>
|
|
</div>
|
|
|
|
|
|
|
|
<header class="blog-header">
|
|
<div class="container">
|
|
<h1 class="blog-title"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
|
|
<p class="lead blog-description">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
|
|
</div>
|
|
</header>
|
|
|
|
|
|
|
|
<div class="container">
|
|
<div class="row">
|
|
<div class="col-sm-8 blog-main">
|
|
|
|
|
|
|
|
|
|
<article class="blog-post">
|
|
<header>
|
|
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2018-08/">August, 2018</a></h2>
|
|
<p class="blog-post-meta"><time datetime="2018-08-01T11:52:54+03:00">Wed Aug 01, 2018</time> by Alan Orth in
|
|
|
|
<i class="fa fa-tag" aria-hidden="true"></i> <a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
|
|
|
|
</p>
|
|
</header>
|
|
<h2 id="2018-08-01">2018-08-01</h2>
|
|
|
|
<ul>
|
|
<li>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</li>
|
|
</ul>
|
|
|
|
<pre><code>[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
|
|
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
|
|
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</li>
|
|
<li>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat’s</li>
|
|
<li>I’m not sure why Tomcat didn’t crash with an OutOfMemoryError…</li>
|
|
<li>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</li>
|
|
<li>The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes</li>
|
|
<li>I ran all system updates on DSpace Test and rebooted it</li>
|
|
</ul>
|
|
|
|
<p></p>
|
|
|
|
<ul>
|
|
<li>I started looking over the latest round of IITA batch records from Sisay on DSpace Test: <a href="https://dspacetest.cgiar.org/handle/10568/103250">IITA July_30</a>
|
|
|
|
<ul>
|
|
<li>incorrect authorship types</li>
|
|
<li>dozens of inconsistencies, spelling mistakes, and white space in author affiliations</li>
|
|
<li>minor issues in countries (California is not a country)</li>
|
|
<li>minor issues in IITA subjects, ISBNs, languages, and AGROVOC subjects</li>
|
|
</ul></li>
|
|
</ul>
|
|
|
|
<h2 id="2018-08-02">2018-08-02</h2>
|
|
|
|
<ul>
|
|
<li>DSpace Test crashed again and I don’t see the only error I see is this in <code>dmesg</code>:</li>
|
|
</ul>
|
|
|
|
<pre><code>[Thu Aug 2 00:00:12 2018] Out of memory: Kill process 1407 (java) score 787 or sacrifice child
|
|
[Thu Aug 2 00:00:12 2018] Killed process 1407 (java) total-vm:18876328kB, anon-rss:6323836kB, file-rss:0kB, shmem-rss:0kB
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>I am still assuming that this is the Tomcat process that is dying, so maybe actually we need to reduce its memory instead of increasing it?</li>
|
|
<li>The risk we run there is that we’ll start getting OutOfMemory errors from Tomcat</li>
|
|
<li>So basically we need a new test server with more RAM very soon…</li>
|
|
<li>Abenet asked about the workflow statistics in the Atmire CUA module again</li>
|
|
<li>Last year Atmire told me that it’s disabled by default but you can enable it with <code>workflow.stats.enabled = true</code> in the CUA configuration file</li>
|
|
<li>There was a bug with adding users so they sent a patch, but I didn’t merge it because it was <a href="https://github.com/ilri/DSpace/pull/319">very dirty</a> and I wasn’t sure it actually fixed the problem</li>
|
|
<li>I just tried to enable the stats again on DSpace Test now that we’re on DSpace 5.8 with updated Atmire modules, but every user I search for shows “No data available”</li>
|
|
<li>As a test I submitted a new item and I was able to see it in the workflow statistics “data” tab, but not in the graph</li>
|
|
</ul>
|
|
|
|
<h2 id="2018-08-15">2018-08-15</h2>
|
|
|
|
<ul>
|
|
<li>Run through Peter’s list of author affiliations from earlier this month</li>
|
|
<li>I did some quick sanity checks and small cleanups in Open Refine, checking for spaces, weird accents, and encoding errors</li>
|
|
<li>Finally I did a test run with the <a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897"><code>fix-metadata-value.py</code></a> script:</li>
|
|
</ul>
|
|
|
|
<pre><code>$ ./fix-metadata-values.py -i 2018-08-15-Correct-1083-Affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -t correct -m 211
|
|
$ ./delete-metadata-values.py -i 2018-08-15-Remove-11-Affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -m 211
|
|
</code></pre>
|
|
|
|
<h2 id="2018-08-16">2018-08-16</h2>
|
|
|
|
<ul>
|
|
<li>Generate a list of the top 1,500 authors on CGSpace for Sisay so he can create the controlled vocabulary:</li>
|
|
</ul>
|
|
|
|
<pre><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc limit 1500) to /tmp/2018-08-16-top-1500-authors.csv with csv;
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Start working on adding the ORCID metadata to a handful of CIAT authors as requested by Elizabeth earlier this month</li>
|
|
<li>I might need to overhaul the <a href="https://gist.github.com/alanorth/a49d85cd9c5dea89cddbe809813a7050">add-orcid-identifiers-csv.py</a> script to be a little more robust about author order and ORCID metadata that might have been altered manually by editors after submission, as this script was written without that consideration</li>
|
|
</ul>
|
|
|
|
<!-- vim: set sw=2 ts=2: -->
|
|
|
|
|
|
|
|
|
|
|
|
</article>
|
|
|
|
|
|
|
|
</div> <!-- /.blog-main -->
|
|
|
|
<aside class="col-sm-3 ml-auto blog-sidebar">
|
|
|
|
|
|
|
|
<section class="sidebar-module">
|
|
<h4>Recent Posts</h4>
|
|
<ol class="list-unstyled">
|
|
|
|
|
|
<li><a href="/cgspace-notes/2018-08/">August, 2018</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2018-07/">July, 2018</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2018-06/">June, 2018</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2018-05/">May, 2018</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2018-04/">April, 2018</a></li>
|
|
|
|
</ol>
|
|
</section>
|
|
|
|
|
|
|
|
|
|
<section class="sidebar-module">
|
|
<h4>Links</h4>
|
|
<ol class="list-unstyled">
|
|
|
|
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
|
|
|
|
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
|
|
|
|
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
|
|
|
|
</ol>
|
|
</section>
|
|
|
|
</aside>
|
|
|
|
|
|
</div> <!-- /.row -->
|
|
</div> <!-- /.container -->
|
|
|
|
|
|
|
|
<footer class="blog-footer">
|
|
<p>
|
|
|
|
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
|
|
|
|
</p>
|
|
<p>
|
|
<a href="#">Back to top</a>
|
|
</p>
|
|
</footer>
|
|
|
|
|
|
</body>
|
|
|
|
</html>
|