cgspace-notes/docs/2019-09/index.html

414 lines
15 KiB
HTML

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:title" content="September, 2019" />
<meta property="og:description" content="2019-09-01
Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning
Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
# zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
440 17.58.101.255
441 157.55.39.101
485 207.46.13.43
728 169.60.128.125
730 207.46.13.108
758 157.55.39.9
808 66.160.140.179
814 207.46.13.212
2472 163.172.71.23
6092 3.94.211.189
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
33 2a01:7e00::f03c:91ff:fe16:fcb
57 3.83.192.124
57 3.87.77.25
57 54.82.1.8
822 2a01:9cc0:47:1:1a:4:0:2
1223 45.5.184.72
1633 172.104.229.92
5112 205.186.128.185
7249 2a01:7e00::f03c:91ff:fe18:7396
9124 45.5.186.2
" />
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-09/" />
<meta property="article:published_time" content="2019-09-01T10:17:51+03:00" />
<meta property="article:modified_time" content="2019-09-12T18:21:43+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="September, 2019"/>
<meta name="twitter:description" content="2019-09-01
Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning
Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
# zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
440 17.58.101.255
441 157.55.39.101
485 207.46.13.43
728 169.60.128.125
730 207.46.13.108
758 157.55.39.9
808 66.160.140.179
814 207.46.13.212
2472 163.172.71.23
6092 3.94.211.189
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
33 2a01:7e00::f03c:91ff:fe16:fcb
57 3.83.192.124
57 3.87.77.25
57 54.82.1.8
822 2a01:9cc0:47:1:1a:4:0:2
1223 45.5.184.72
1633 172.104.229.92
5112 205.186.128.185
7249 2a01:7e00::f03c:91ff:fe18:7396
9124 45.5.186.2
"/>
<meta name="generator" content="Hugo 0.58.1" />
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "September, 2019",
"url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-09\/",
"wordCount": "1000",
"datePublished": "2019-09-01T10:17:51\x2b03:00",
"dateModified": "2019-09-12T18:21:43\x2b03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"keywords": "Notes"
}
</script>
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2019-09/">
<title>September, 2019 | CGSpace Notes</title>
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.css" rel="stylesheet" integrity="sha384-G5B34w7DFTumWTswxYzTX7NWfbvQEg1HbFFEg6ItN03uTAAoS2qkPS/fu3LhuuSA" crossorigin="anonymous">
<!-- RSS 2.0 feed -->
</head>
<body>
<div class="blog-masthead">
<div class="container">
<nav class="nav blog-nav">
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
</nav>
</div>
</div>
<header class="blog-header">
<div class="container">
<h1 class="blog-title"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
<p class="lead blog-description">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
</div>
</header>
<div class="container">
<div class="row">
<div class="col-sm-8 blog-main">
<article class="blog-post">
<header>
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2019-09/">September, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-09-01T10:17:51&#43;03:00">Sun Sep 01, 2019</time> by Alan Orth in
<i class="fa fa-tag" aria-hidden="true"></i>&nbsp;<a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
</p>
</header>
<h2 id="2019-09-01">2019-09-01</h2>
<ul>
<li>Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning</li>
<li><p>Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:</p>
<pre><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
440 17.58.101.255
441 157.55.39.101
485 207.46.13.43
728 169.60.128.125
730 207.46.13.108
758 157.55.39.9
808 66.160.140.179
814 207.46.13.212
2472 163.172.71.23
6092 3.94.211.189
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
33 2a01:7e00::f03c:91ff:fe16:fcb
57 3.83.192.124
57 3.87.77.25
57 54.82.1.8
822 2a01:9cc0:47:1:1a:4:0:2
1223 45.5.184.72
1633 172.104.229.92
5112 205.186.128.185
7249 2a01:7e00::f03c:91ff:fe18:7396
9124 45.5.186.2
</code></pre></li>
</ul>
<ul>
<li><code>3.94.211.189</code> is MauiBot, and most of its requests are to Discovery and get rate limited with HTTP 503</li>
<li><p><code>163.172.71.23</code> is some IP on Online SAS in France and its user agent is:</p>
<pre><code>Mozilla/5.0 ((Windows; U; Windows NT 6.1; fr; rv:1.9.2) Gecko/20100115 Firefox/3.6)
</code></pre></li>
<li><p>It actually got mostly HTTP 200 responses:</p>
<pre><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &quot;01/Sep/2019:0&quot; | grep 163.172.71.23 | awk '{print $9}' | sort | uniq -c
1775 200
703 499
72 503
</code></pre></li>
<li><p>And it was mostly requesting Discover pages:</p>
<pre><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &quot;01/Sep/2019:0&quot; | grep 163.172.71.23 | grep -o -E &quot;(bitstream|discover|handle)&quot; | sort | uniq -c
2350 discover
71 handle
</code></pre></li>
<li><p>I&rsquo;m not sure why the outbound traffic rate was so high&hellip;</p></li>
</ul>
<h2 id="2019-09-02">2019-09-02</h2>
<ul>
<li>Follow up with Carol and Francesca from Bioversity as they were on holiday during the mid-to-late August
<ul>
<li>I told them to check the <a href="https://dspacetest.cgiar.org/handle/10568/103999">temporary collection on DSpace Test</a> where I uploaded the 1,427 items so they can see how it will look</li>
<li>Also, I told them to advise me about the strange file extensions (.7z, .zip, .lck)</li>
<li>Also, I reminded Abenet to check the metadata, as the institutional authors at least will need some modification</li>
</ul></li>
</ul>
<h2 id="2019-09-10">2019-09-10</h2>
<ul>
<li>Altmetric responded to say that they have fixed an issue with their badge code so now research outputs with multiple handles are showing badges!
<ul>
<li>See: <a href="https://hdl.handle.net/handle/10568/97825">https://hdl.handle.net/handle/10568/97825</a></li>
</ul></li>
<li>Follow up with Bosede about the mixup with PDFs in the items uploaded in 2018-12 (aka Daniel1807.xsl)
<ul>
<li>These are the same ones that Peter noticed last week, that Bosede and I had been discussing earlier this year that we never sorted out</li>
<li>It looks like these items were uploaded by Sisay on 2018-12-19 so we can use the <a href="https://cgspace.cgiar.org/handle/10568/68616/discover?filtertype_1=dateAccessioned&amp;filter_relational_operator_1=contains&amp;filter_1=2018-12-19&amp;submit_apply_filter=&amp;query=">accession date as a filter</a> to narrow it down to 230 items (of which only 104 have PDFs, according to the Daniel1807.xls input input file)</li>
<li>Now I just checked a few manually and they are correct in the original input file, so something must have happened when Sisay was processing them for upload</li>
<li>I have asked Sisay to fix them&hellip;</li>
</ul></li>
<li>Continue working on CG Core v2 migration, focusing on the crosswalk mappings
<ul>
<li>I think we can skip the MODS crosswalk for now because it is only used in <a href="https://wiki.duraspace.org/display/DSDOC5x/DSpace+AIP+Format#DSpaceAIPFormat-MODSSchema">AIP exports that are meant for non-DSpace systems</a></li>
<li>We should probably do the QDC crosswalk as well as those in <code>xhtml-head-item.properties</code>&hellip;</li>
<li>Ouch, there is potentially a lot of work in the OAI metadata formats like DIM, METS, and QDC (see <code>dspace/config/crosswalks/oai/*.xsl</code>)</li>
<li>In general I think I should only modify the left side of the crosswalk mappings (ie, where metadata is coming from) so we maintain the same exact output for search engines, etc</li>
</ul></li>
</ul>
<h2 id="2019-09-11">2019-09-11</h2>
<ul>
<li>Maria Garruccio asked me to add two new Bioversity ORCID identifiers to CGSpace so I created a <a href="https://github.com/ilri/DSpace/pull/431">pull request</a></li>
<li>Marissa Van Epp asked me to add new CCAFS Phase II project tags to CGSpace so I created a <a href="https://github.com/ilri/DSpace/pull/432">pull request</a>
<ul>
<li>I will wait until I hear from her to merge it because there is one tag that seems to be a duplicate because its name (PII-WA_agrosylvopast) is similar to one that already exists (PII-WA_AgroSylvopastoralSystems)</li>
</ul></li>
<li>More work on the CG Core v2 migrations
<ul>
<li>I have updated my <a href="https://gist.github.com/alanorth/2db39e91f48d116e00a4edffd6ba6409">notes on the possible changes</a> and done more work on the XMLUI replacements</li>
</ul></li>
</ul>
<h2 id="2019-09-12">2019-09-12</h2>
<ul>
<li>Deploy <a href="https://jdbc.postgresql.org/">PostgreSQL JDBC driver</a> version 42.2.7 on DSpace Test and update the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a></li>
</ul>
<h2 id="2019-09-15">2019-09-15</h2>
<ul>
<li>Deploy Bioversity ORCID identifier updates to CGSpace</li>
<li>Deploy PostgreSQL JDBC driver 42.2.7 on CGSpace</li>
<li>Run system updates on CGSpace (linode18) and restart the server
<ul>
<li>After restarting the system Tomcat came back up, but not all Solr statistics cores were loaded</li>
<li>I had to restart Tomcat one more time until the cores were loaded (verified in the Solr admin)</li>
</ul></li>
<li>Update nginx TLS cipher suite to the latest <a href="https://ssl-config.mozilla.org/#server=nginx&amp;server-version=1.16.1&amp;config=intermediate&amp;openssl-version=1.0.2g">Mozilla intermediate recommendations for nginx 1.16.0 and openssl 1.0.2</a>
<ul>
<li>DSpace Test (linode19) is running Ubuntu 18.04 with nginx 1.17.x and openssl 1.1.1 so it can even use TLS v1.3 if we override the nginx ssl protocol in its host vars</li>
</ul></li>
<li><p>XMLUI item view pages are blank on CGSpace right now</p>
<ul>
<li><p>Like earliert this year, I see the following error in the Cocoon log while browsing:</p>
<pre><code>2019-09-15 15:32:18,137 WARN org.apache.cocoon.components.xslt.TraxErrorListener - Can not load requested doc: unknown protocol: cocoon at jndi:/localhost/themes/CIAT/xsl/../../0_CGIAR/xsl//aspect/artifactbrowser/common.xsl:141:90
</code></pre></li>
</ul></li>
<li><p>Around the same time I see the following in the DSpace log:</p>
<pre><code>2019-09-15 15:32:18,079 INFO org.dspace.usage.LoggerUsageEventListener @ aorth@blah:session_id=A11C362A7127004C24E77198AF9E4418:ip_addr=x.x.x.x:view_item:handle=10568/103644
2019-09-15 15:32:18,135 WARN org.dspace.core.PluginManager @ Cannot find named plugin for interface=org.dspace.content.crosswalk.DisseminationCrosswalk, name=&quot;METSRIGHTS&quot;
</code></pre></li>
<li><p>I see a lot of these errors today, but not earlier this month:</p>
<pre><code># grep -c 'Cannot find named plugin' dspace.log.2019-09-*
dspace.log.2019-09-01:0
dspace.log.2019-09-02:0
dspace.log.2019-09-03:0
dspace.log.2019-09-04:0
dspace.log.2019-09-05:0
dspace.log.2019-09-06:0
dspace.log.2019-09-07:0
dspace.log.2019-09-08:0
dspace.log.2019-09-09:0
dspace.log.2019-09-10:0
dspace.log.2019-09-11:0
dspace.log.2019-09-12:0
dspace.log.2019-09-13:0
dspace.log.2019-09-14:0
dspace.log.2019-09-15:808
</code></pre></li>
<li><p>Something must have happened when I restarted Tomcat a few hours ago, because earlier in the DSpace log I see a bunch of errors like this:</p>
<pre><code>2019-09-15 13:59:24,136 ERROR org.dspace.core.PluginManager @ Name collision in named plugin, implementation class=&quot;org.dspace.content.crosswalk.METSRightsCrosswalk&quot;, name=&quot;METSRIGHTS&quot;
2019-09-15 13:59:24,136 ERROR org.dspace.core.PluginManager @ Name collision in named plugin, implementation class=&quot;org.dspace.content.crosswalk.OREDisseminationCrosswalk&quot;, name=&quot;ore&quot;
2019-09-15 13:59:24,136 ERROR org.dspace.core.PluginManager @ Name collision in named plugin, implementation class=&quot;org.dspace.content.crosswalk.DIMDisseminationCrosswalk&quot;, name=&quot;dim&quot;
</code></pre></li>
<li><p>I restarted Tomcat and the item views came back, but then the Solr statistics cores didn&rsquo;t all load properly</p>
<ul>
<li>After restarting Tomcat once again, both the item views and the Solr statistics cores all came back OK</li>
</ul></li>
</ul>
<!-- vim: set sw=2 ts=2: -->
</article>
</div> <!-- /.blog-main -->
<aside class="col-sm-3 ml-auto blog-sidebar">
<section class="sidebar-module">
<h4>Recent Posts</h4>
<ol class="list-unstyled">
<li><a href="/cgspace-notes/posts/">Posts</a></li>
<li><a href="/cgspace-notes/2019-09/">September, 2019</a></li>
<li><a href="/cgspace-notes/2019-08/">August, 2019</a></li>
<li><a href="/cgspace-notes/2019-07/">July, 2019</a></li>
<li><a href="/cgspace-notes/2019-06/">June, 2019</a></li>
</ol>
</section>
<section class="sidebar-module">
<h4>Links</h4>
<ol class="list-unstyled">
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
</ol>
</section>
</aside>
</div> <!-- /.row -->
</div> <!-- /.container -->
<footer class="blog-footer">
<p>
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
</p>
<p>
<a href="#">Back to top</a>
</p>
</footer>
</body>
</html>