mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
archetypes
content
data
docs
2015
2015-11
2015-12
2016
2016-01
2016-02
2016-03
2016-04
2016-05
2016-06
2016-07
2016-08
2016-09
2016-10
2016-11
2016-12
2017
2017-01
2017-02
2017-03
2017-04
2017-05
2017-06
2017-07
2017-08
2017-09
2017-10
2017-11
2017-12
2018
2018-01
2018-02
2018-03
2018-04
2018-05
2018-06
2018-07
2018-08
2018-09
2018-10
2018-11
2018-12
2019
2019-01
2019-02
categories
cgiar-library-migration
css
fonts
js
page
posts
page
index.html
index.xml
tags
.gitignore
404.html
index.html
index.xml
robots.txt
sitemap.xml
layouts
static
themes
.gitignore
.gitmodules
README.md
config.toml
504 lines
17 KiB
HTML
504 lines
17 KiB
HTML
<!DOCTYPE html>
|
|
<html lang="en">
|
|
|
|
<head>
|
|
<meta charset="utf-8">
|
|
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
|
|
|
|
<meta property="og:title" content="Posts" />
|
|
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
|
<meta property="og:type" content="website" />
|
|
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
|
|
|
<meta property="og:updated_time" content="2019-02-01T21:37:30+02:00"/>
|
|
|
|
<meta name="twitter:card" content="summary"/>
|
|
<meta name="twitter:title" content="Posts"/>
|
|
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
|
<meta name="generator" content="Hugo 0.54.0" />
|
|
|
|
|
|
|
|
<script type="application/ld+json">
|
|
{
|
|
"@context": "http://schema.org",
|
|
"@type": "Blog",
|
|
"headline": "CGSpace Notes",
|
|
"url" : "https://alanorth.github.io/cgspace-notes/posts/",
|
|
"author": {
|
|
"@type": "Person",
|
|
"name": "Alan Orth"
|
|
},
|
|
"dateModified": "2019-02-01T21:37:30+02:00",
|
|
"keywords": "notes,notes,",
|
|
"description": "Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."
|
|
}
|
|
</script>
|
|
|
|
|
|
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/posts/">
|
|
|
|
<title>CGSpace Notes</title>
|
|
|
|
<!-- combined, minified CSS -->
|
|
<link href="https://alanorth.github.io/cgspace-notes/css/style.css" rel="stylesheet" integrity="sha384-6+EGfPoOzk/n2DVJSlglKT8TV1TgIMvVcKI73IZgBswLasPBn94KommV6ilJqCXE" crossorigin="anonymous">
|
|
|
|
|
|
<!-- RSS 2.0 feed -->
|
|
<link href="https://alanorth.github.io/cgspace-notes/posts/index.xml" rel="alternate" type="application/rss+xml" title="CGSpace Notes" />
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
</head>
|
|
|
|
<body>
|
|
|
|
|
|
<div class="blog-masthead">
|
|
<div class="container">
|
|
<nav class="nav blog-nav">
|
|
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
|
|
</nav>
|
|
</div>
|
|
</div>
|
|
|
|
|
|
|
|
|
|
<header class="blog-header">
|
|
<div class="container">
|
|
<h1 class="blog-title"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
|
|
<p class="lead blog-description">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
|
|
</div>
|
|
</header>
|
|
|
|
|
|
|
|
|
|
<div class="container">
|
|
<div class="row">
|
|
<div class="col-sm-8 blog-main">
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<article class="blog-post">
|
|
<header>
|
|
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2019-02/">February, 2019</a></h2>
|
|
<p class="blog-post-meta"><time datetime="2019-02-01T21:37:30+02:00">Fri Feb 01, 2019</time> by Alan Orth in
|
|
|
|
<i class="fa fa-tag" aria-hidden="true"></i> <a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
|
|
|
|
</p>
|
|
</header>
|
|
<h2 id="2019-02-01">2019-02-01</h2>
|
|
|
|
<ul>
|
|
<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
|
|
<li>The top IPs before, during, and after this latest alert tonight were:</li>
|
|
</ul>
|
|
|
|
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "01/Feb/2019:(17|18|19|20|21)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
|
245 207.46.13.5
|
|
332 54.70.40.11
|
|
385 5.143.231.38
|
|
405 207.46.13.173
|
|
405 207.46.13.75
|
|
1117 66.249.66.219
|
|
1121 35.237.175.180
|
|
1546 5.9.6.51
|
|
2474 45.5.186.2
|
|
5490 85.25.237.71
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li><code>85.25.237.71</code> is the “Linguee Bot” that I first saw last month</li>
|
|
<li>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</li>
|
|
<li>There were just over 3 million accesses in the nginx logs last month:</li>
|
|
</ul>
|
|
|
|
<pre><code># time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Jan/2019"
|
|
3018243
|
|
|
|
real 0m19.873s
|
|
user 0m22.203s
|
|
sys 0m1.979s
|
|
</code></pre>
|
|
<a href='https://alanorth.github.io/cgspace-notes/2019-02/'>Read more →</a>
|
|
</article>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<article class="blog-post">
|
|
<header>
|
|
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2019-01/">January, 2019</a></h2>
|
|
<p class="blog-post-meta"><time datetime="2019-01-02T09:48:30+02:00">Wed Jan 02, 2019</time> by Alan Orth in
|
|
|
|
<i class="fa fa-tag" aria-hidden="true"></i> <a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
|
|
|
|
</p>
|
|
</header>
|
|
<h2 id="2019-01-02">2019-01-02</h2>
|
|
|
|
<ul>
|
|
<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
|
|
<li>I don’t see anything interesting in the web server logs around that time though:</li>
|
|
</ul>
|
|
|
|
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
|
92 40.77.167.4
|
|
99 210.7.29.100
|
|
120 38.126.157.45
|
|
177 35.237.175.180
|
|
177 40.77.167.32
|
|
216 66.249.75.219
|
|
225 18.203.76.93
|
|
261 46.101.86.248
|
|
357 207.46.13.1
|
|
903 54.70.40.11
|
|
</code></pre>
|
|
<a href='https://alanorth.github.io/cgspace-notes/2019-01/'>Read more →</a>
|
|
</article>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<article class="blog-post">
|
|
<header>
|
|
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2018-12/">December, 2018</a></h2>
|
|
<p class="blog-post-meta"><time datetime="2018-12-02T02:09:30+02:00">Sun Dec 02, 2018</time> by Alan Orth in
|
|
|
|
<i class="fa fa-tag" aria-hidden="true"></i> <a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
|
|
|
|
</p>
|
|
</header>
|
|
<h2 id="2018-12-01">2018-12-01</h2>
|
|
|
|
<ul>
|
|
<li>Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK</li>
|
|
<li>I manually installed OpenJDK, then removed Oracle JDK, then re-ran the <a href="http://github.com/ilri/rmg-ansible-public">Ansible playbook</a> to update all configuration files, etc</li>
|
|
<li>Then I ran all system updates and restarted the server</li>
|
|
</ul>
|
|
|
|
<h2 id="2018-12-02">2018-12-02</h2>
|
|
|
|
<ul>
|
|
<li>I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another <a href="https://usn.ubuntu.com/3831-1/">Ghostscript vulnerability last week</a></li>
|
|
</ul>
|
|
<a href='https://alanorth.github.io/cgspace-notes/2018-12/'>Read more →</a>
|
|
</article>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<article class="blog-post">
|
|
<header>
|
|
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2018-11/">November, 2018</a></h2>
|
|
<p class="blog-post-meta"><time datetime="2018-11-01T16:41:30+02:00">Thu Nov 01, 2018</time> by Alan Orth in
|
|
|
|
<i class="fa fa-tag" aria-hidden="true"></i> <a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
|
|
|
|
</p>
|
|
</header>
|
|
<h2 id="2018-11-01">2018-11-01</h2>
|
|
|
|
<ul>
|
|
<li>Finalize AReS Phase I and Phase II ToRs</li>
|
|
<li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li>
|
|
</ul>
|
|
|
|
<h2 id="2018-11-03">2018-11-03</h2>
|
|
|
|
<ul>
|
|
<li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li>
|
|
<li>Today these are the top 10 IPs:</li>
|
|
</ul>
|
|
<a href='https://alanorth.github.io/cgspace-notes/2018-11/'>Read more →</a>
|
|
</article>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<article class="blog-post">
|
|
<header>
|
|
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2018-10/">October, 2018</a></h2>
|
|
<p class="blog-post-meta"><time datetime="2018-10-01T22:31:54+03:00">Mon Oct 01, 2018</time> by Alan Orth in
|
|
|
|
<i class="fa fa-tag" aria-hidden="true"></i> <a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
|
|
|
|
</p>
|
|
</header>
|
|
<h2 id="2018-10-01">2018-10-01</h2>
|
|
|
|
<ul>
|
|
<li>Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items</li>
|
|
<li>I created a GitHub issue to track this <a href="https://github.com/ilri/DSpace/issues/389">#389</a>, because I’m super busy in Nairobi right now</li>
|
|
</ul>
|
|
<a href='https://alanorth.github.io/cgspace-notes/2018-10/'>Read more →</a>
|
|
</article>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<article class="blog-post">
|
|
<header>
|
|
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2018-09/">September, 2018</a></h2>
|
|
<p class="blog-post-meta"><time datetime="2018-09-02T09:55:54+03:00">Sun Sep 02, 2018</time> by Alan Orth in
|
|
|
|
<i class="fa fa-tag" aria-hidden="true"></i> <a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
|
|
|
|
</p>
|
|
</header>
|
|
<h2 id="2018-09-02">2018-09-02</h2>
|
|
|
|
<ul>
|
|
<li>New <a href="https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5">PostgreSQL JDBC driver version 42.2.5</a></li>
|
|
<li>I’ll update the DSpace role in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> and run the updated playbooks on CGSpace and DSpace Test</li>
|
|
<li>Also, I’ll re-run the <code>postgresql</code> tasks because the custom PostgreSQL variables are dynamic according to the system’s RAM, and we never re-ran them after migrating to larger Linodes last month</li>
|
|
<li>I’m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I’m getting those autowire errors in Tomcat 8.5.30 again:</li>
|
|
</ul>
|
|
<a href='https://alanorth.github.io/cgspace-notes/2018-09/'>Read more →</a>
|
|
</article>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<article class="blog-post">
|
|
<header>
|
|
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2018-08/">August, 2018</a></h2>
|
|
<p class="blog-post-meta"><time datetime="2018-08-01T11:52:54+03:00">Wed Aug 01, 2018</time> by Alan Orth in
|
|
|
|
<i class="fa fa-tag" aria-hidden="true"></i> <a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
|
|
|
|
</p>
|
|
</header>
|
|
<h2 id="2018-08-01">2018-08-01</h2>
|
|
|
|
<ul>
|
|
<li>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</li>
|
|
</ul>
|
|
|
|
<pre><code>[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
|
|
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
|
|
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</li>
|
|
<li>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat’s</li>
|
|
<li>I’m not sure why Tomcat didn’t crash with an OutOfMemoryError…</li>
|
|
<li>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</li>
|
|
<li>The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes</li>
|
|
<li>I ran all system updates on DSpace Test and rebooted it</li>
|
|
</ul>
|
|
<a href='https://alanorth.github.io/cgspace-notes/2018-08/'>Read more →</a>
|
|
</article>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<article class="blog-post">
|
|
<header>
|
|
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2018-07/">July, 2018</a></h2>
|
|
<p class="blog-post-meta"><time datetime="2018-07-01T12:56:54+03:00">Sun Jul 01, 2018</time> by Alan Orth in
|
|
|
|
<i class="fa fa-tag" aria-hidden="true"></i> <a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
|
|
|
|
</p>
|
|
</header>
|
|
<h2 id="2018-07-01">2018-07-01</h2>
|
|
|
|
<ul>
|
|
<li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
|
|
</ul>
|
|
|
|
<pre><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</li>
|
|
</ul>
|
|
|
|
<pre><code>There is insufficient memory for the Java Runtime Environment to continue.
|
|
</code></pre>
|
|
<a href='https://alanorth.github.io/cgspace-notes/2018-07/'>Read more →</a>
|
|
</article>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<article class="blog-post">
|
|
<header>
|
|
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2018-06/">June, 2018</a></h2>
|
|
<p class="blog-post-meta"><time datetime="2018-06-04T19:49:54-07:00">Mon Jun 04, 2018</time> by Alan Orth in
|
|
|
|
<i class="fa fa-tag" aria-hidden="true"></i> <a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
|
|
|
|
</p>
|
|
</header>
|
|
<h2 id="2018-06-04">2018-06-04</h2>
|
|
|
|
<ul>
|
|
<li>Test the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 module upgrades from Atmire</a> (<a href="https://github.com/ilri/DSpace/pull/378">#378</a>)
|
|
|
|
<ul>
|
|
<li>There seems to be a problem with the CUA and L&R versions in <code>pom.xml</code> because they are using SNAPSHOT and it doesn’t build</li>
|
|
</ul></li>
|
|
<li>I added the new CCAFS Phase II Project Tag <code>PII-FP1_PACCA2</code> and merged it into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/379">#379</a>)</li>
|
|
<li>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</li>
|
|
</ul>
|
|
|
|
<pre><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="/cgspace-notes/2018-03/">March, 2018</a></li>
|
|
<li>Time to index ~70,000 items on CGSpace:</li>
|
|
</ul>
|
|
|
|
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
|
|
|
|
real 74m42.646s
|
|
user 8m5.056s
|
|
sys 2m7.289s
|
|
</code></pre>
|
|
<a href='https://alanorth.github.io/cgspace-notes/2018-06/'>Read more →</a>
|
|
</article>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<article class="blog-post">
|
|
<header>
|
|
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2018-05/">May, 2018</a></h2>
|
|
<p class="blog-post-meta"><time datetime="2018-05-01T16:43:54+03:00">Tue May 01, 2018</time> by Alan Orth in
|
|
|
|
<i class="fa fa-tag" aria-hidden="true"></i> <a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
|
|
|
|
</p>
|
|
</header>
|
|
<h2 id="2018-05-01">2018-05-01</h2>
|
|
|
|
<ul>
|
|
<li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
|
|
|
|
<ul>
|
|
<li><a href="http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E">http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E</a></li>
|
|
<li><a href="http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E">http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E</a></li>
|
|
</ul></li>
|
|
<li>Then I reduced the JVM heap size from 6144 back to 5120m</li>
|
|
<li>Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to support hosts choosing which distribution they want to use</li>
|
|
</ul>
|
|
<a href='https://alanorth.github.io/cgspace-notes/2018-05/'>Read more →</a>
|
|
</article>
|
|
|
|
|
|
|
|
|
|
|
|
<nav class="blog-pagination">
|
|
|
|
|
|
|
|
<a class="btn btn-outline-primary disabled" href="#" role="button" aria-disabled="true">Previous page</a>
|
|
<a class="btn btn-outline-primary" href="/cgspace-notes/posts/page/2/" rel="next" role="button">Next page</a>
|
|
|
|
</nav>
|
|
|
|
|
|
|
|
|
|
|
|
</div> <!-- /.blog-main -->
|
|
|
|
<aside class="col-sm-3 ml-auto blog-sidebar">
|
|
|
|
|
|
|
|
<section class="sidebar-module">
|
|
<h4>Recent Posts</h4>
|
|
<ol class="list-unstyled">
|
|
|
|
|
|
<li><a href="/cgspace-notes/2019-02/">February, 2019</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2019-01/">January, 2019</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2018-12/">December, 2018</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2018-11/">November, 2018</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2018-10/">October, 2018</a></li>
|
|
|
|
</ol>
|
|
</section>
|
|
|
|
|
|
|
|
|
|
<section class="sidebar-module">
|
|
<h4>Links</h4>
|
|
<ol class="list-unstyled">
|
|
|
|
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
|
|
|
|
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
|
|
|
|
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
|
|
|
|
</ol>
|
|
</section>
|
|
|
|
</aside>
|
|
|
|
|
|
</div> <!-- /.row -->
|
|
</div> <!-- /.container -->
|
|
|
|
|
|
|
|
<footer class="blog-footer">
|
|
<p>
|
|
|
|
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
|
|
|
|
</p>
|
|
<p>
|
|
<a href="#">Back to top</a>
|
|
</p>
|
|
</footer>
|
|
|
|
|
|
</body>
|
|
|
|
</html>
|