mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-12-23 21:44:30 +01:00
456 lines
16 KiB
HTML
456 lines
16 KiB
HTML
<!DOCTYPE html>
|
||
<html lang="en" >
|
||
|
||
<head>
|
||
<meta charset="utf-8">
|
||
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
|
||
|
||
<meta property="og:title" content="Posts" />
|
||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||
<meta property="og:type" content="website" />
|
||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||
<meta property="og:updated_time" content="2019-12-01T11:22:30+02:00" />
|
||
|
||
<meta name="twitter:card" content="summary"/>
|
||
<meta name="twitter:title" content="Posts"/>
|
||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||
<meta name="generator" content="Hugo 0.62.0" />
|
||
|
||
|
||
|
||
<script type="application/ld+json">
|
||
{
|
||
"@context": "http://schema.org",
|
||
"@type": "Blog",
|
||
"headline": "CGSpace Notes",
|
||
"url" : "https:\/\/alanorth.github.io\/cgspace-notes\/posts\/",
|
||
"author": {
|
||
"@type": "Person",
|
||
"name": "Alan Orth"
|
||
},
|
||
"dateModified": "2019-12-01T11:22:30+02:00",
|
||
"keywords": "notes,migration,notes,",
|
||
"description": "Documenting day-to-day work on the [CGSpace](https:\/\/cgspace.cgiar.org) repository."
|
||
}
|
||
</script>
|
||
|
||
|
||
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/posts/">
|
||
|
||
<title>CGSpace Notes</title>
|
||
|
||
|
||
<!-- combined, minified CSS -->
|
||
<link href="https://alanorth.github.io/cgspace-notes/css/style.css" rel="stylesheet" integrity="sha384-G5B34w7DFTumWTswxYzTX7NWfbvQEg1HbFFEg6ItN03uTAAoS2qkPS/fu3LhuuSA" crossorigin="anonymous">
|
||
|
||
|
||
<!-- RSS 2.0 feed -->
|
||
<link rel="alternate" type="application/rss+xml" href="https://alanorth.github.io/cgspace-notes/posts/index.xml" title="CGSpace Notes" />
|
||
|
||
|
||
|
||
|
||
|
||
|
||
</head>
|
||
|
||
<body>
|
||
|
||
|
||
<div class="blog-masthead">
|
||
<div class="container">
|
||
<nav class="nav blog-nav">
|
||
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
|
||
</nav>
|
||
</div>
|
||
</div>
|
||
|
||
|
||
|
||
|
||
<header class="blog-header">
|
||
<div class="container">
|
||
<h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
|
||
<p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
|
||
</div>
|
||
</header>
|
||
|
||
|
||
|
||
|
||
<div class="container">
|
||
<div class="row">
|
||
<div class="col-sm-8 blog-main">
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
<article class="blog-post">
|
||
<header>
|
||
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-03/">March, 2019</a></h2>
|
||
<p class="blog-post-meta"><time datetime="2019-03-01T12:16:30+01:00">Fri Mar 01, 2019</time> by Alan Orth in
|
||
<i class="fa fa-folder" aria-hidden="true"></i> <a href="/cgspace-notes/categories/notes" rel="category tag">Notes</a>
|
||
|
||
|
||
</p>
|
||
</header>
|
||
<h2 id="2019-03-01">2019-03-01</h2>
|
||
<ul>
|
||
<li>I checked IITA's 259 Feb 14 records from last month for duplicates using Atmire's Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li>
|
||
<li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc…</li>
|
||
<li>Looking at the other half of Udana's WLE records from 2018-11
|
||
<ul>
|
||
<li>I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)</li>
|
||
<li>I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items</li>
|
||
<li>Most worryingly, there are encoding errors in the abstracts for eleven items, for example:</li>
|
||
<li>68.15% <20> 9.45 instead of 68.15% ± 9.45</li>
|
||
<li>2003<EFBFBD>2013 instead of 2003–2013</li>
|
||
</ul>
|
||
</li>
|
||
<li>I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs</li>
|
||
</ul>
|
||
<a href='https://alanorth.github.io/cgspace-notes/2019-03/'>Read more →</a>
|
||
</article>
|
||
|
||
|
||
|
||
|
||
|
||
|
||
<article class="blog-post">
|
||
<header>
|
||
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-02/">February, 2019</a></h2>
|
||
<p class="blog-post-meta"><time datetime="2019-02-01T21:37:30+02:00">Fri Feb 01, 2019</time> by Alan Orth in
|
||
<i class="fa fa-folder" aria-hidden="true"></i> <a href="/cgspace-notes/categories/notes" rel="category tag">Notes</a>
|
||
|
||
|
||
</p>
|
||
</header>
|
||
<h2 id="2019-02-01">2019-02-01</h2>
|
||
<ul>
|
||
<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
|
||
<li>The top IPs before, during, and after this latest alert tonight were:</li>
|
||
</ul>
|
||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "01/Feb/2019:(17|18|19|20|21)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||
245 207.46.13.5
|
||
332 54.70.40.11
|
||
385 5.143.231.38
|
||
405 207.46.13.173
|
||
405 207.46.13.75
|
||
1117 66.249.66.219
|
||
1121 35.237.175.180
|
||
1546 5.9.6.51
|
||
2474 45.5.186.2
|
||
5490 85.25.237.71
|
||
</code></pre><ul>
|
||
<li><code>85.25.237.71</code> is the “Linguee Bot” that I first saw last month</li>
|
||
<li>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</li>
|
||
<li>There were just over 3 million accesses in the nginx logs last month:</li>
|
||
</ul>
|
||
<pre><code># time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Jan/2019"
|
||
3018243
|
||
|
||
real 0m19.873s
|
||
user 0m22.203s
|
||
sys 0m1.979s
|
||
</code></pre>
|
||
<a href='https://alanorth.github.io/cgspace-notes/2019-02/'>Read more →</a>
|
||
</article>
|
||
|
||
|
||
|
||
|
||
|
||
|
||
<article class="blog-post">
|
||
<header>
|
||
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-01/">January, 2020</a></h2>
|
||
<p class="blog-post-meta"><time datetime="2019-01-06T10:48:30+02:00">Sun Jan 06, 2019</time> by Alan Orth in
|
||
<i class="fa fa-folder" aria-hidden="true"></i> <a href="/cgspace-notes/categories/notes" rel="category tag">Notes</a>
|
||
|
||
|
||
</p>
|
||
</header>
|
||
2020-01-06 Open a ticket with Atmire to request a quote for the upgrade to DSpace 6 Last week Altmetric responded about the item that had a lower score than than its DOI The score is now linked to the DOI Another item that had the same problem in 2019 has now also linked to the score for its DOI Another item that had the same problem in 2019 has also been fixed 2020-01-07 Peter Ballantyne highlighted one more WLE item that is missing the Altmetric score that its DOI has The DOI has a score of 259, but the Handle has no score at all I tweeted the CGSpace repository link
|
||
<a href='https://alanorth.github.io/cgspace-notes/2019-01/'>Read more →</a>
|
||
</article>
|
||
|
||
|
||
|
||
|
||
|
||
|
||
<article class="blog-post">
|
||
<header>
|
||
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-01/">January, 2019</a></h2>
|
||
<p class="blog-post-meta"><time datetime="2019-01-02T09:48:30+02:00">Wed Jan 02, 2019</time> by Alan Orth in
|
||
<i class="fa fa-folder" aria-hidden="true"></i> <a href="/cgspace-notes/categories/notes" rel="category tag">Notes</a>
|
||
|
||
|
||
</p>
|
||
</header>
|
||
<h2 id="2019-01-02">2019-01-02</h2>
|
||
<ul>
|
||
<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
|
||
<li>I don't see anything interesting in the web server logs around that time though:</li>
|
||
</ul>
|
||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||
92 40.77.167.4
|
||
99 210.7.29.100
|
||
120 38.126.157.45
|
||
177 35.237.175.180
|
||
177 40.77.167.32
|
||
216 66.249.75.219
|
||
225 18.203.76.93
|
||
261 46.101.86.248
|
||
357 207.46.13.1
|
||
903 54.70.40.11
|
||
</code></pre>
|
||
<a href='https://alanorth.github.io/cgspace-notes/2019-01/'>Read more →</a>
|
||
</article>
|
||
|
||
|
||
|
||
|
||
|
||
|
||
<article class="blog-post">
|
||
<header>
|
||
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-12/">December, 2018</a></h2>
|
||
<p class="blog-post-meta"><time datetime="2018-12-02T02:09:30+02:00">Sun Dec 02, 2018</time> by Alan Orth in
|
||
<i class="fa fa-folder" aria-hidden="true"></i> <a href="/cgspace-notes/categories/notes" rel="category tag">Notes</a>
|
||
|
||
|
||
</p>
|
||
</header>
|
||
<h2 id="2018-12-01">2018-12-01</h2>
|
||
<ul>
|
||
<li>Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK</li>
|
||
<li>I manually installed OpenJDK, then removed Oracle JDK, then re-ran the <a href="http://github.com/ilri/rmg-ansible-public">Ansible playbook</a> to update all configuration files, etc</li>
|
||
<li>Then I ran all system updates and restarted the server</li>
|
||
</ul>
|
||
<h2 id="2018-12-02">2018-12-02</h2>
|
||
<ul>
|
||
<li>I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another <a href="https://usn.ubuntu.com/3831-1/">Ghostscript vulnerability last week</a></li>
|
||
</ul>
|
||
<a href='https://alanorth.github.io/cgspace-notes/2018-12/'>Read more →</a>
|
||
</article>
|
||
|
||
|
||
|
||
|
||
|
||
|
||
<article class="blog-post">
|
||
<header>
|
||
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-11/">November, 2018</a></h2>
|
||
<p class="blog-post-meta"><time datetime="2018-11-01T16:41:30+02:00">Thu Nov 01, 2018</time> by Alan Orth in
|
||
<i class="fa fa-folder" aria-hidden="true"></i> <a href="/cgspace-notes/categories/notes" rel="category tag">Notes</a>
|
||
|
||
|
||
</p>
|
||
</header>
|
||
<h2 id="2018-11-01">2018-11-01</h2>
|
||
<ul>
|
||
<li>Finalize AReS Phase I and Phase II ToRs</li>
|
||
<li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li>
|
||
</ul>
|
||
<h2 id="2018-11-03">2018-11-03</h2>
|
||
<ul>
|
||
<li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li>
|
||
<li>Today these are the top 10 IPs:</li>
|
||
</ul>
|
||
<a href='https://alanorth.github.io/cgspace-notes/2018-11/'>Read more →</a>
|
||
</article>
|
||
|
||
|
||
|
||
|
||
|
||
|
||
<article class="blog-post">
|
||
<header>
|
||
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-10/">October, 2018</a></h2>
|
||
<p class="blog-post-meta"><time datetime="2018-10-01T22:31:54+03:00">Mon Oct 01, 2018</time> by Alan Orth in
|
||
<i class="fa fa-folder" aria-hidden="true"></i> <a href="/cgspace-notes/categories/notes" rel="category tag">Notes</a>
|
||
|
||
|
||
</p>
|
||
</header>
|
||
<h2 id="2018-10-01">2018-10-01</h2>
|
||
<ul>
|
||
<li>Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items</li>
|
||
<li>I created a GitHub issue to track this <a href="https://github.com/ilri/DSpace/issues/389">#389</a>, because I'm super busy in Nairobi right now</li>
|
||
</ul>
|
||
<a href='https://alanorth.github.io/cgspace-notes/2018-10/'>Read more →</a>
|
||
</article>
|
||
|
||
|
||
|
||
|
||
|
||
|
||
<article class="blog-post">
|
||
<header>
|
||
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-09/">September, 2018</a></h2>
|
||
<p class="blog-post-meta"><time datetime="2018-09-02T09:55:54+03:00">Sun Sep 02, 2018</time> by Alan Orth in
|
||
<i class="fa fa-folder" aria-hidden="true"></i> <a href="/cgspace-notes/categories/notes" rel="category tag">Notes</a>
|
||
|
||
|
||
</p>
|
||
</header>
|
||
<h2 id="2018-09-02">2018-09-02</h2>
|
||
<ul>
|
||
<li>New <a href="https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5">PostgreSQL JDBC driver version 42.2.5</a></li>
|
||
<li>I'll update the DSpace role in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> and run the updated playbooks on CGSpace and DSpace Test</li>
|
||
<li>Also, I'll re-run the <code>postgresql</code> tasks because the custom PostgreSQL variables are dynamic according to the system's RAM, and we never re-ran them after migrating to larger Linodes last month</li>
|
||
<li>I'm testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I'm getting those autowire errors in Tomcat 8.5.30 again:</li>
|
||
</ul>
|
||
<a href='https://alanorth.github.io/cgspace-notes/2018-09/'>Read more →</a>
|
||
</article>
|
||
|
||
|
||
|
||
|
||
|
||
|
||
<article class="blog-post">
|
||
<header>
|
||
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-08/">August, 2018</a></h2>
|
||
<p class="blog-post-meta"><time datetime="2018-08-01T11:52:54+03:00">Wed Aug 01, 2018</time> by Alan Orth in
|
||
<i class="fa fa-folder" aria-hidden="true"></i> <a href="/cgspace-notes/categories/notes" rel="category tag">Notes</a>
|
||
|
||
|
||
</p>
|
||
</header>
|
||
<h2 id="2018-08-01">2018-08-01</h2>
|
||
<ul>
|
||
<li>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</li>
|
||
</ul>
|
||
<pre><code>[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
|
||
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
|
||
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
|
||
</code></pre><ul>
|
||
<li>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</li>
|
||
<li>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat's</li>
|
||
<li>I'm not sure why Tomcat didn't crash with an OutOfMemoryError…</li>
|
||
<li>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</li>
|
||
<li>The server only has 8GB of RAM so we'll eventually need to upgrade to a larger one because we'll start starving the OS, PostgreSQL, and command line batch processes</li>
|
||
<li>I ran all system updates on DSpace Test and rebooted it</li>
|
||
</ul>
|
||
<a href='https://alanorth.github.io/cgspace-notes/2018-08/'>Read more →</a>
|
||
</article>
|
||
|
||
|
||
|
||
|
||
|
||
|
||
<article class="blog-post">
|
||
<header>
|
||
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-07/">July, 2018</a></h2>
|
||
<p class="blog-post-meta"><time datetime="2018-07-01T12:56:54+03:00">Sun Jul 01, 2018</time> by Alan Orth in
|
||
<i class="fa fa-folder" aria-hidden="true"></i> <a href="/cgspace-notes/categories/notes" rel="category tag">Notes</a>
|
||
|
||
|
||
</p>
|
||
</header>
|
||
<h2 id="2018-07-01">2018-07-01</h2>
|
||
<ul>
|
||
<li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
|
||
</ul>
|
||
<pre><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
|
||
</code></pre><ul>
|
||
<li>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</li>
|
||
</ul>
|
||
<pre><code>There is insufficient memory for the Java Runtime Environment to continue.
|
||
</code></pre>
|
||
<a href='https://alanorth.github.io/cgspace-notes/2018-07/'>Read more →</a>
|
||
</article>
|
||
|
||
|
||
|
||
|
||
|
||
<nav class="blog-pagination">
|
||
|
||
<a class="btn btn-outline-primary" href="/cgspace-notes/posts/" rel="prev" role="button">Previous page</a>
|
||
<a class="btn btn-outline-primary" href="/cgspace-notes/posts/page/3/" rel="next" role="button">Next page</a>
|
||
|
||
|
||
|
||
</nav>
|
||
|
||
|
||
|
||
|
||
|
||
</div> <!-- /.blog-main -->
|
||
|
||
<aside class="col-sm-3 ml-auto blog-sidebar">
|
||
|
||
|
||
|
||
<section class="sidebar-module">
|
||
<h4>Recent Posts</h4>
|
||
<ol class="list-unstyled">
|
||
|
||
|
||
<li><a href="/cgspace-notes/2019-12/">December, 2019</a></li>
|
||
|
||
<li><a href="/cgspace-notes/2019-11/">November, 2019</a></li>
|
||
|
||
<li><a href="/cgspace-notes/cgspace-cgcorev2-migration/">CGSpace CG Core v2 Migration</a></li>
|
||
|
||
<li><a href="/cgspace-notes/2019-10/">October, 2019</a></li>
|
||
|
||
<li><a href="/cgspace-notes/2019-09/">September, 2019</a></li>
|
||
|
||
</ol>
|
||
</section>
|
||
|
||
|
||
|
||
|
||
<section class="sidebar-module">
|
||
<h4>Links</h4>
|
||
<ol class="list-unstyled">
|
||
|
||
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
|
||
|
||
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
|
||
|
||
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
|
||
|
||
</ol>
|
||
</section>
|
||
|
||
</aside>
|
||
|
||
|
||
</div> <!-- /.row -->
|
||
</div> <!-- /.container -->
|
||
|
||
|
||
|
||
<footer class="blog-footer">
|
||
<p dir="auto">
|
||
|
||
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
|
||
|
||
</p>
|
||
<p>
|
||
<a href="#">Back to top</a>
|
||
</p>
|
||
</footer>
|
||
|
||
|
||
</body>
|
||
|
||
</html>
|