mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-22 19:43:24 +01:00
649 lines
33 KiB
HTML
649 lines
33 KiB
HTML
<!DOCTYPE html>
|
|
<html lang="en" >
|
|
|
|
<head>
|
|
<meta charset="utf-8">
|
|
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
|
|
|
|
|
|
<meta property="og:title" content="April, 2018" />
|
|
<meta property="og:description" content="2018-04-01
|
|
|
|
I tried to test something on DSpace Test but noticed that it’s down since god knows when
|
|
Catalina logs at least show some memory errors yesterday:
|
|
" />
|
|
<meta property="og:type" content="article" />
|
|
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-04/" />
|
|
<meta property="article:published_time" content="2018-04-01T16:13:54+02:00" />
|
|
<meta property="article:modified_time" content="2019-10-28T13:39:25+02:00" />
|
|
|
|
|
|
|
|
<meta name="twitter:card" content="summary"/>
|
|
<meta name="twitter:title" content="April, 2018"/>
|
|
<meta name="twitter:description" content="2018-04-01
|
|
|
|
I tried to test something on DSpace Test but noticed that it’s down since god knows when
|
|
Catalina logs at least show some memory errors yesterday:
|
|
"/>
|
|
<meta name="generator" content="Hugo 0.79.1" />
|
|
|
|
|
|
|
|
<script type="application/ld+json">
|
|
{
|
|
"@context": "http://schema.org",
|
|
"@type": "BlogPosting",
|
|
"headline": "April, 2018",
|
|
"url": "https://alanorth.github.io/cgspace-notes/2018-04/",
|
|
"wordCount": "3016",
|
|
"datePublished": "2018-04-01T16:13:54+02:00",
|
|
"dateModified": "2019-10-28T13:39:25+02:00",
|
|
"author": {
|
|
"@type": "Person",
|
|
"name": "Alan Orth"
|
|
},
|
|
"keywords": "Notes"
|
|
}
|
|
</script>
|
|
|
|
|
|
|
|
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2018-04/">
|
|
|
|
<title>April, 2018 | CGSpace Notes</title>
|
|
|
|
|
|
<!-- combined, minified CSS -->
|
|
|
|
<link href="https://alanorth.github.io/cgspace-notes/css/style.16633182cd803b52b9bf9e29ea1ef4b2e3d460deee0ded49466d7e16e449c158.css" rel="stylesheet" integrity="sha256-FmMxgs2AO1K5v54p6h70suPUYN7uDe1JRm1+FuRJwVg=" crossorigin="anonymous">
|
|
|
|
|
|
<!-- minified Font Awesome for SVG icons -->
|
|
|
|
<script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.4ed405d7c7002b970d34cbe6026ff44a556b0808cb98a9db4008752110ed964b.js" integrity="sha256-TtQF18cAK5cNNMvmAm/0SlVrCAjLmKnbQAh1IRDtlks=" crossorigin="anonymous"></script>
|
|
|
|
<!-- RSS 2.0 feed -->
|
|
|
|
|
|
|
|
|
|
</head>
|
|
|
|
<body>
|
|
|
|
|
|
<div class="blog-masthead">
|
|
<div class="container">
|
|
<nav class="nav blog-nav">
|
|
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
|
|
</nav>
|
|
</div>
|
|
</div>
|
|
|
|
|
|
|
|
|
|
<header class="blog-header">
|
|
<div class="container">
|
|
<h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
|
|
<p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
|
|
</div>
|
|
</header>
|
|
|
|
|
|
|
|
|
|
<div class="container">
|
|
<div class="row">
|
|
<div class="col-sm-8 blog-main">
|
|
|
|
|
|
|
|
|
|
<article class="blog-post">
|
|
<header>
|
|
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-04/">April, 2018</a></h2>
|
|
<p class="blog-post-meta">
|
|
<time datetime="2018-04-01T16:13:54+02:00">Sun Apr 01, 2018</time>
|
|
in
|
|
<span class="fas fa-folder" aria-hidden="true"></span> <a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
|
|
|
|
|
|
</p>
|
|
</header>
|
|
<h2 id="2018-04-01">2018-04-01</h2>
|
|
<ul>
|
|
<li>I tried to test something on DSpace Test but noticed that it’s down since god knows when</li>
|
|
<li>Catalina logs at least show some memory errors yesterday:</li>
|
|
</ul>
|
|
<pre><code>Mar 31, 2018 10:26:42 PM org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor run
|
|
SEVERE: Unexpected death of background thread ContainerBackgroundProcessor[StandardEngine[Catalina]]
|
|
java.lang.OutOfMemoryError: Java heap space
|
|
|
|
Exception in thread "ContainerBackgroundProcessor[StandardEngine[Catalina]]" java.lang.OutOfMemoryError: Java heap space
|
|
</code></pre><ul>
|
|
<li>So this is getting super annoying</li>
|
|
<li>I ran all system updates on DSpace Test and rebooted it</li>
|
|
<li>For some reason Listings and Reports is not giving any results for any queries now…</li>
|
|
<li>I posted a message on Yammer to ask if people are using the Duplicate Check step from the Metadata Quality Module</li>
|
|
<li>Help Lili Szilagyi with a question about statistics on some CCAFS items</li>
|
|
</ul>
|
|
<h2 id="2018-04-04">2018-04-04</h2>
|
|
<ul>
|
|
<li>Peter noticed that there were still some old CRP names on CGSpace, because I hadn’t forced the Discovery index to be updated after I fixed the others last week</li>
|
|
<li>For completeness I re-ran the CRP corrections on CGSpace:</li>
|
|
</ul>
|
|
<pre><code>$ ./fix-metadata-values.py -i /tmp/Correct-21-CRPs-2018-03-16.csv -f cg.contributor.crp -t correct -m 230 -db dspace -u dspace -p 'fuuu'
|
|
Fixed 1 occurences of: AGRICULTURE FOR NUTRITION AND HEALTH
|
|
</code></pre><ul>
|
|
<li>Then started a full Discovery index:</li>
|
|
</ul>
|
|
<pre><code>$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx1024m'
|
|
$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
|
|
|
real 76m13.841s
|
|
user 8m22.960s
|
|
sys 2m2.498s
|
|
</code></pre><ul>
|
|
<li>Elizabeth from CIAT emailed to ask if I could help her by adding ORCID identifiers to all of Joseph Tohme’s items</li>
|
|
<li>I used my <a href="https://gist.githubusercontent.com/alanorth/a49d85cd9c5dea89cddbe809813a7050/raw/f67b6e45a9a940732882ae4bb26897a9b245ef31/add-orcid-identifiers-csv.py">add-orcid-identifiers-csv.py</a> script:</li>
|
|
</ul>
|
|
<pre><code>$ ./add-orcid-identifiers-csv.py -i /tmp/jtohme-2018-04-04.csv -db dspace -u dspace -p 'fuuu'
|
|
</code></pre><ul>
|
|
<li>The CSV format of <code>jtohme-2018-04-04.csv</code> was:</li>
|
|
</ul>
|
|
<pre><code class="language-csv" data-lang="csv">dc.contributor.author,cg.creator.id
|
|
"Tohme, Joseph M.",Joe Tohme: 0000-0003-2765-7101
|
|
</code></pre><ul>
|
|
<li>There was a quoting error in my CRP CSV and the replacements for <code>Forests, Trees and Agroforestry</code> got messed up</li>
|
|
<li>So I fixed them and had to re-index again!</li>
|
|
<li>I started preparing the git branch for the the DSpace 5.5→5.8 upgrade:</li>
|
|
</ul>
|
|
<pre><code>$ git checkout -b 5_x-dspace-5.8 5_x-prod
|
|
$ git reset --hard ilri/5_x-prod
|
|
$ git rebase -i dspace-5.8
|
|
</code></pre><ul>
|
|
<li>I was prepared to skip some commits that I had cherry picked from the upstream <code>dspace-5_x</code> branch when we did the DSpace 5.5 upgrade (see notes on 2016-10-19 and 2017-12-17):
|
|
<ul>
|
|
<li>[DS-3246] Improve cleanup in recyclable components (upstream commit on dspace-5_x: 9f0f5940e7921765c6a22e85337331656b18a403)</li>
|
|
<li>[DS-3250] applying patch provided by Atmire (upstream commit on dspace-5_x: c6fda557f731dbc200d7d58b8b61563f86fe6d06)</li>
|
|
<li>bump up to latest minor pdfbox version (upstream commit on dspace-5_x: b5330b78153b2052ed3dc2fd65917ccdbfcc0439)</li>
|
|
<li>DS-3583 Usage of correct Collection Array (#1731) (upstream commit on dspace-5_x: c8f62e6f496fa86846bfa6bcf2d16811087d9761)</li>
|
|
</ul>
|
|
</li>
|
|
<li>… but somehow git knew, and didn’t include them in my interactive rebase!</li>
|
|
<li>I need to send this branch to Atmire and also arrange payment (see <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">ticket #560</a> in their tracker)</li>
|
|
<li>Fix Sisay’s SSH access to the new DSpace Test server (linode19)</li>
|
|
</ul>
|
|
<h2 id="2018-04-05">2018-04-05</h2>
|
|
<ul>
|
|
<li>Fix Sisay’s sudo access on the new DSpace Test server (linode19)</li>
|
|
<li>The reindexing process on DSpace Test took <em>forever</em> yesterday:</li>
|
|
</ul>
|
|
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
|
|
|
real 599m32.961s
|
|
user 9m3.947s
|
|
sys 2m52.585s
|
|
</code></pre><ul>
|
|
<li>So we really should not use this Linode block storage for Solr</li>
|
|
<li>Assetstore might be fine but would complicate things with configuration and deployment (ughhh)</li>
|
|
<li>Better to use Linode block storage only for backup</li>
|
|
<li>Help Peter with the GDPR compliance / reporting form for CGSpace</li>
|
|
<li>DSpace Test crashed due to memory issues again:</li>
|
|
</ul>
|
|
<pre><code># grep -c 'java.lang.OutOfMemoryError: Java heap space' /var/log/tomcat7/catalina.out
|
|
16
|
|
</code></pre><ul>
|
|
<li>I ran all system updates on DSpace Test and rebooted it</li>
|
|
<li>Proof some records on DSpace Test for Udana from IWMI</li>
|
|
<li>He has done better with the small syntax and consistency issues but then there are larger concerns with not linking to DOIs, copying titles incorrectly, etc</li>
|
|
</ul>
|
|
<h2 id="2018-04-10">2018-04-10</h2>
|
|
<ul>
|
|
<li>I got a notice that CGSpace CPU usage was very high this morning</li>
|
|
<li>Looking at the nginx logs, here are the top users today so far:</li>
|
|
</ul>
|
|
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "10/Apr/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
|
282 207.46.13.112
|
|
286 54.175.208.220
|
|
287 207.46.13.113
|
|
298 66.249.66.153
|
|
322 207.46.13.114
|
|
780 104.196.152.243
|
|
3994 178.154.200.38
|
|
4295 70.32.83.92
|
|
4388 95.108.181.88
|
|
7653 45.5.186.2
|
|
</code></pre><ul>
|
|
<li>45.5.186.2 is of course CIAT</li>
|
|
<li>95.108.181.88 appears to be Yandex:</li>
|
|
</ul>
|
|
<pre><code>95.108.181.88 - - [09/Apr/2018:06:34:16 +0000] "GET /bitstream/handle/10568/21794/ILRI_logo_usage.jpg.jpg HTTP/1.1" 200 2638 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
|
|
</code></pre><ul>
|
|
<li>And for some reason Yandex created a lot of Tomcat sessions today:</li>
|
|
</ul>
|
|
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=95.108.181.88' dspace.log.2018-04-10
|
|
4363
|
|
</code></pre><ul>
|
|
<li>70.32.83.92 appears to be some harvester we’ve seen before, but on a new IP</li>
|
|
<li>They are not creating new Tomcat sessions so there is no problem there</li>
|
|
<li>178.154.200.38 also appears to be Yandex, and is also creating many Tomcat sessions:</li>
|
|
</ul>
|
|
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=178.154.200.38' dspace.log.2018-04-10
|
|
3982
|
|
</code></pre><ul>
|
|
<li>I’m not sure why Yandex creates so many Tomcat sessions, as its user agent should match the Crawler Session Manager valve</li>
|
|
<li>Let’s try a manual request with and without their user agent:</li>
|
|
</ul>
|
|
<pre><code>$ http --print Hh https://cgspace.cgiar.org/bitstream/handle/10568/21794/ILRI_logo_usage.jpg.jpg 'User-Agent:Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)'
|
|
GET /bitstream/handle/10568/21794/ILRI_logo_usage.jpg.jpg HTTP/1.1
|
|
Accept: */*
|
|
Accept-Encoding: gzip, deflate
|
|
Connection: keep-alive
|
|
Host: cgspace.cgiar.org
|
|
User-Agent: Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)
|
|
|
|
HTTP/1.1 200 OK
|
|
Connection: keep-alive
|
|
Content-Language: en-US
|
|
Content-Length: 2638
|
|
Content-Type: image/jpeg;charset=ISO-8859-1
|
|
Date: Tue, 10 Apr 2018 05:18:37 GMT
|
|
Expires: Tue, 10 Apr 2018 06:18:37 GMT
|
|
Last-Modified: Tue, 25 Apr 2017 07:05:54 GMT
|
|
Server: nginx
|
|
Strict-Transport-Security: max-age=15768000
|
|
Vary: User-Agent
|
|
X-Cocoon-Version: 2.2.0
|
|
X-Content-Type-Options: nosniff
|
|
X-Frame-Options: SAMEORIGIN
|
|
X-XSS-Protection: 1; mode=block
|
|
|
|
$ http --print Hh https://cgspace.cgiar.org/bitstream/handle/10568/21794/ILRI_logo_usage.jpg.jpg
|
|
GET /bitstream/handle/10568/21794/ILRI_logo_usage.jpg.jpg HTTP/1.1
|
|
Accept: */*
|
|
Accept-Encoding: gzip, deflate
|
|
Connection: keep-alive
|
|
Host: cgspace.cgiar.org
|
|
User-Agent: HTTPie/0.9.9
|
|
|
|
HTTP/1.1 200 OK
|
|
Connection: keep-alive
|
|
Content-Language: en-US
|
|
Content-Length: 2638
|
|
Content-Type: image/jpeg;charset=ISO-8859-1
|
|
Date: Tue, 10 Apr 2018 05:20:08 GMT
|
|
Expires: Tue, 10 Apr 2018 06:20:08 GMT
|
|
Last-Modified: Tue, 25 Apr 2017 07:05:54 GMT
|
|
Server: nginx
|
|
Set-Cookie: JSESSIONID=31635DB42B66D6A4208CFCC96DD96875; Path=/; Secure; HttpOnly
|
|
Strict-Transport-Security: max-age=15768000
|
|
Vary: User-Agent
|
|
X-Cocoon-Version: 2.2.0
|
|
X-Content-Type-Options: nosniff
|
|
X-Frame-Options: SAMEORIGIN
|
|
X-XSS-Protection: 1; mode=block
|
|
</code></pre><ul>
|
|
<li>So it definitely looks like Yandex requests are getting assigned a session from the Crawler Session Manager valve</li>
|
|
<li>And if I look at the DSpace log I see its IP sharing a session with other crawlers like Google (66.249.66.153)</li>
|
|
<li>Indeed the number of Tomcat sessions appears to be normal:</li>
|
|
</ul>
|
|
<p><img src="/cgspace-notes/2018/04/jmx_dspace_sessions-week.png" alt="Tomcat sessions week"></p>
|
|
<ul>
|
|
<li>In other news, it looks like the number of total requests processed by nginx in March went down from the previous months:</li>
|
|
</ul>
|
|
<pre><code># time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Mar/2018"
|
|
2266594
|
|
|
|
real 0m13.658s
|
|
user 0m16.533s
|
|
sys 0m1.087s
|
|
</code></pre><ul>
|
|
<li>In other other news, the database cleanup script has an issue again:</li>
|
|
</ul>
|
|
<pre><code>$ dspace cleanup -v
|
|
...
|
|
Error: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"
|
|
Detail: Key (bitstream_id)=(151626) is still referenced from table "bundle".
|
|
</code></pre><ul>
|
|
<li>The solution is, as always:</li>
|
|
</ul>
|
|
<pre><code>$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (151626);'
|
|
UPDATE 1
|
|
</code></pre><ul>
|
|
<li>Looking at abandoned connections in Tomcat:</li>
|
|
</ul>
|
|
<pre><code># zcat /var/log/tomcat7/catalina.out.[1-9].gz | grep -c 'org.apache.tomcat.jdbc.pool.ConnectionPool abandon'
|
|
2115
|
|
</code></pre><ul>
|
|
<li>Apparently from these stacktraces we should be able to see which code is not closing connections properly</li>
|
|
<li>Here’s a pretty good overview of days where we had database issues recently:</li>
|
|
</ul>
|
|
<pre><code># zcat /var/log/tomcat7/catalina.out.[1-9].gz | grep 'org.apache.tomcat.jdbc.pool.ConnectionPool abandon' | awk '{print $1,$2, $3}' | sort | uniq -c | sort -n
|
|
1 Feb 18, 2018
|
|
1 Feb 19, 2018
|
|
1 Feb 20, 2018
|
|
1 Feb 24, 2018
|
|
2 Feb 13, 2018
|
|
3 Feb 17, 2018
|
|
5 Feb 16, 2018
|
|
5 Feb 23, 2018
|
|
5 Feb 27, 2018
|
|
6 Feb 25, 2018
|
|
40 Feb 14, 2018
|
|
63 Feb 28, 2018
|
|
154 Mar 19, 2018
|
|
202 Feb 21, 2018
|
|
264 Feb 26, 2018
|
|
268 Mar 21, 2018
|
|
524 Feb 22, 2018
|
|
570 Feb 15, 2018
|
|
</code></pre><ul>
|
|
<li>In Tomcat 8.5 the <code>removeAbandoned</code> property has been split into two: <code>removeAbandonedOnBorrow</code> and <code>removeAbandonedOnMaintenance</code></li>
|
|
<li>See: <a href="https://tomcat.apache.org/tomcat-8.5-doc/jndi-datasource-examples-howto.html#Database_Connection_Pool_(DBCP_2)_Configurations">https://tomcat.apache.org/tomcat-8.5-doc/jndi-datasource-examples-howto.html#Database_Connection_Pool_(DBCP_2)_Configurations</a></li>
|
|
<li>I assume we want <code>removeAbandonedOnBorrow</code> and make updates to the Tomcat 8 templates in Ansible</li>
|
|
<li>After reading more documentation I see that Tomcat 8.5’s default DBCP seems to now be Commons DBCP2 instead of Tomcat DBCP</li>
|
|
<li>It can be overridden in Tomcat’s <em>server.xml</em> by setting <code>factory="org.apache.tomcat.jdbc.pool.DataSourceFactory"</code> in the <code><Resource></code></li>
|
|
<li>I think we should use this default, so we’ll need to remove some other settings that are specific to Tomcat’s DBCP like <code>jdbcInterceptors</code> and <code>abandonWhenPercentageFull</code></li>
|
|
<li>Merge the changes adding ORCID identifier to advanced search and Atmire Listings and Reports (<a href="https://github.com/ilri/DSpace/pull/371">#371</a>)</li>
|
|
<li>Fix one more issue of missing XMLUI strings (for CRP subject when clicking “view more” in the Discovery sidebar)</li>
|
|
<li>I told Udana to fix the citation and abstract of the one item, and to correct the <code>dc.language.iso</code> for the five Spanish items in his Book Chapters collection</li>
|
|
<li>Then we can import the records to CGSpace</li>
|
|
</ul>
|
|
<h2 id="2018-04-11">2018-04-11</h2>
|
|
<ul>
|
|
<li>DSpace Test (linode19) crashed again some time since yesterday:</li>
|
|
</ul>
|
|
<pre><code># grep -c 'java.lang.OutOfMemoryError: Java heap space' /var/log/tomcat7/catalina.out
|
|
168
|
|
</code></pre><ul>
|
|
<li>I ran all system updates and rebooted the server</li>
|
|
</ul>
|
|
<h2 id="2018-04-12">2018-04-12</h2>
|
|
<ul>
|
|
<li>I caught wind of an interesting XMLUI performance optimization coming in DSpace 6.3: <a href="https://jira.duraspace.org/browse/DS-3883">https://jira.duraspace.org/browse/DS-3883</a></li>
|
|
<li>I asked for it to be ported to DSpace 5.x</li>
|
|
</ul>
|
|
<h2 id="2018-04-13">2018-04-13</h2>
|
|
<ul>
|
|
<li>Add <code>PII-LAM_CSAGender</code> to CCAFS Phase II project tags in <code>input-forms.xml</code></li>
|
|
</ul>
|
|
<h2 id="2018-04-15">2018-04-15</h2>
|
|
<ul>
|
|
<li>While testing an XMLUI patch for <a href="https://jira.duraspace.org/browse/DS-3883">DS-3883</a> I noticed that there is still some remaining Authority / Solr configuration left that we need to remove:</li>
|
|
</ul>
|
|
<pre><code>2018-04-14 18:55:25,841 ERROR org.dspace.authority.AuthoritySolrServiceImpl @ Authority solr is not correctly configured, check "solr.authority.server" property in the dspace.cfg
|
|
java.lang.NullPointerException
|
|
</code></pre><ul>
|
|
<li>I assume we need to remove <code>authority</code> from the consumers in <code>dspace/config/dspace.cfg</code>:</li>
|
|
</ul>
|
|
<pre><code>event.dispatcher.default.consumers = authority, versioning, discovery, eperson, harvester, statistics,batchedit, versioningmqm
|
|
</code></pre><ul>
|
|
<li>I see the same error on DSpace Test so this is definitely a problem</li>
|
|
<li>After disabling the authority consumer I no longer see the error</li>
|
|
<li>I merged a pull request to the <code>5_x-prod</code> branch to clean that up (<a href="https://github.com/ilri/DSpace/pull/372">#372</a>)</li>
|
|
<li>File a ticket on DSpace’s Jira for the <code>target="_blank"</code> security and performance issue (<a href="https://jira.duraspace.org/browse/DS-3891">DS-3891</a>)</li>
|
|
<li>I re-deployed DSpace Test (linode19) and was surprised by how long it took the ant update to complete:</li>
|
|
</ul>
|
|
<pre><code>BUILD SUCCESSFUL
|
|
Total time: 4 minutes 12 seconds
|
|
</code></pre><ul>
|
|
<li>The Linode block storage is much slower than the instance storage</li>
|
|
<li>I ran all system updates and rebooted DSpace Test (linode19)</li>
|
|
</ul>
|
|
<h2 id="2018-04-16">2018-04-16</h2>
|
|
<ul>
|
|
<li>Communicate with Bioversity about their project to migrate their e-Library (Typo3) and Sci-lit databases to CGSpace</li>
|
|
</ul>
|
|
<h2 id="2018-04-18">2018-04-18</h2>
|
|
<ul>
|
|
<li>IWMI people are asking about building a search query that outputs RSS for their reports</li>
|
|
<li>They want the same results as this Discovery query: <a href="https://cgspace.cgiar.org/discover?filtertype_1=dateAccessioned&filter_relational_operator_1=contains&filter_1=2018&submit_apply_filter=&query=&scope=10568%2F16814&rpp=100&sort_by=dc.date.issued_dt&order=desc">https://cgspace.cgiar.org/discover?filtertype_1=dateAccessioned&filter_relational_operator_1=contains&filter_1=2018&submit_apply_filter=&query=&scope=10568%2F16814&rpp=100&sort_by=dc.date.issued_dt&order=desc</a></li>
|
|
<li>They will need to use OpenSearch, but I can’t remember all the parameters</li>
|
|
<li>Apparently search sort options for OpenSearch are in <code>dspace.cfg</code>:</li>
|
|
</ul>
|
|
<pre><code>webui.itemlist.sort-option.1 = title:dc.title:title
|
|
webui.itemlist.sort-option.2 = dateissued:dc.date.issued:date
|
|
webui.itemlist.sort-option.3 = dateaccessioned:dc.date.accessioned:date
|
|
webui.itemlist.sort-option.4 = type:dc.type:text
|
|
</code></pre><ul>
|
|
<li>They want items by issue date, so we need to use sort option 2</li>
|
|
<li>According to the DSpace Manual there are only the following parameters to OpenSearch: format, scope, rpp, start, and sort_by</li>
|
|
<li>The OpenSearch <code>query</code> parameter expects a Discovery search filter that is defined in <code>dspace/config/spring/api/discovery.xml</code></li>
|
|
<li>So for IWMI they should be able to use something like this: <a href="https://cgspace.cgiar.org/open-search/discover?query=dateIssued:2018&scope=10568/16814&sort_by=2&order=DESC&format=rss">https://cgspace.cgiar.org/open-search/discover?query=dateIssued:2018&scope=10568/16814&sort_by=2&order=DESC&format=rss</a></li>
|
|
<li>There are also <code>rpp</code> (results per page) and <code>start</code> parameters but in my testing now on DSpace 5.5 they behave very strangely</li>
|
|
<li>For example, set <code>rpp=1</code> and then check the results for <code>start</code> values of 0, 1, and 2 and they are all the same!</li>
|
|
<li>If I have time I will check if this behavior persists on DSpace 6.x on the official DSpace demo and file a bug</li>
|
|
<li>Also, the DSpace Manual as of 5.x has very poor documentation for OpenSearch</li>
|
|
<li>They don’t tell you to use Discovery search filters in the <code>query</code> (with format <code>query=dateIssued:2018</code>)</li>
|
|
<li>They don’t tell you that the sort options are actually defined in <code>dspace.cfg</code> (ie, you need to use <code>2</code> instead of <code>dc.date.issued_dt</code>)</li>
|
|
<li>They are missing the <code>order</code> parameter (ASC vs DESC)</li>
|
|
<li>I notice that DSpace Test has crashed again, due to memory:</li>
|
|
</ul>
|
|
<pre><code># grep -c 'java.lang.OutOfMemoryError: Java heap space' /var/log/tomcat7/catalina.out
|
|
178
|
|
</code></pre><ul>
|
|
<li>I will increase the JVM heap size from 5120M to 6144M, though we don’t have much room left to grow as DSpace Test (linode19) is using a smaller instance size than CGSpace</li>
|
|
<li>Gabriela from CIP asked if I could send her a list of all CIP authors so she can do some replacements on the name formats</li>
|
|
<li>I got a list of all the CIP collections manually and use the same query that I used in <a href="/cgspace-notes/2017-08">August, 2017</a>:</li>
|
|
</ul>
|
|
<pre><code>dspace#= \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/89347', '10568/88229', '10568/53086', '10568/53085', '10568/69069', '10568/53087', '10568/53088', '10568/53089', '10568/53090', '10568/53091', '10568/53092', '10568/70150', '10568/53093', '10568/64874', '10568/53094'))) group by text_value order by count desc) to /tmp/cip-authors.csv with csv;
|
|
</code></pre><h2 id="2018-04-19">2018-04-19</h2>
|
|
<ul>
|
|
<li>Run updates on DSpace Test (linode19) and reboot the server</li>
|
|
<li>Also try deploying updated GeoLite database during ant update while re-deploying code:</li>
|
|
</ul>
|
|
<pre><code>$ ant update update_geolite clean_backups
|
|
</code></pre><ul>
|
|
<li>I also re-deployed CGSpace (linode18) to make the ORCID search, authority cleanup, CCAFS project tag <code>PII-LAM_CSAGender</code> live</li>
|
|
<li>When re-deploying I also updated the GeoLite databases so I hope the country stats become more accurate…</li>
|
|
<li>After re-deployment I ran all system updates on the server and rebooted it</li>
|
|
<li>After the reboot I forced a reïndexing of the Discovery to populate the new ORCID index:</li>
|
|
</ul>
|
|
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
|
|
|
real 73m42.635s
|
|
user 8m15.885s
|
|
sys 2m2.687s
|
|
</code></pre><ul>
|
|
<li>This time is with about 70,000 items in the repository</li>
|
|
</ul>
|
|
<h2 id="2018-04-20">2018-04-20</h2>
|
|
<ul>
|
|
<li>Gabriela from CIP emailed to say that CGSpace was returning a white page, but I haven’t seen any emails from UptimeRobot</li>
|
|
<li>I confirm that it’s just giving a white page around 4:16</li>
|
|
<li>The DSpace logs show that there are no database connections:</li>
|
|
</ul>
|
|
<pre><code>org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-715] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:250; busy:18; idle:0; lastwait:5000].
|
|
</code></pre><ul>
|
|
<li>And there have been shit tons of errors in the last (starting only 20 minutes ago luckily):</li>
|
|
</ul>
|
|
<pre><code># grep -c 'org.apache.tomcat.jdbc.pool.PoolExhaustedException' /home/cgspace.cgiar.org/log/dspace.log.2018-04-20
|
|
32147
|
|
</code></pre><ul>
|
|
<li>I can’t even log into PostgreSQL as the <code>postgres</code> user, WTF?</li>
|
|
</ul>
|
|
<pre><code>$ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c
|
|
^C
|
|
</code></pre><ul>
|
|
<li>Here are the most active IPs today:</li>
|
|
</ul>
|
|
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "20/Apr/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
|
917 207.46.13.182
|
|
935 213.55.99.121
|
|
970 40.77.167.134
|
|
978 207.46.13.80
|
|
1422 66.249.64.155
|
|
1577 50.116.102.77
|
|
2456 95.108.181.88
|
|
3216 104.196.152.243
|
|
4325 70.32.83.92
|
|
10718 45.5.184.2
|
|
</code></pre><ul>
|
|
<li>It doesn’t even seem like there is a lot of traffic compared to the previous days:</li>
|
|
</ul>
|
|
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "20/Apr/2018" | wc -l
|
|
74931
|
|
# zcat --force /var/log/nginx/*.log.1 /var/log/nginx/*.log.2.gz| grep -E "19/Apr/2018" | wc -l
|
|
91073
|
|
# zcat --force /var/log/nginx/*.log.2.gz /var/log/nginx/*.log.3.gz| grep -E "18/Apr/2018" | wc -l
|
|
93459
|
|
</code></pre><ul>
|
|
<li>I tried to restart Tomcat but <code>systemctl</code> hangs</li>
|
|
<li>I tried to reboot the server from the command line but after a few minutes it didn’t come back up</li>
|
|
<li>Looking at the Linode console I see that it is stuck trying to shut down</li>
|
|
<li>Even “Reboot” via Linode console doesn’t work!</li>
|
|
<li>After shutting it down a few times via the Linode console it finally rebooted</li>
|
|
<li>Everything is back but I have no idea what caused this—I suspect something with the hosting provider</li>
|
|
<li>Also super weird, the last entry in the DSpace log file is from <code>2018-04-20 16:35:09</code>, and then immediately it goes to <code>2018-04-20 19:15:04</code> (three hours later!):</li>
|
|
</ul>
|
|
<pre><code>2018-04-20 16:35:09,144 ERROR org.dspace.app.util.AbstractDSpaceWebapp @ Failed to record shutdown in Webapp table.
|
|
org.apache.tomcat.jdbc.pool.PoolExhaustedException: [localhost-startStop-2] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:250; busy:18; idle
|
|
:0; lastwait:5000].
|
|
at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:685)
|
|
at org.apache.tomcat.jdbc.pool.ConnectionPool.getConnection(ConnectionPool.java:187)
|
|
at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:128)
|
|
at org.dspace.storage.rdbms.DatabaseManager.getConnection(DatabaseManager.java:632)
|
|
at org.dspace.core.Context.init(Context.java:121)
|
|
at org.dspace.core.Context.<init>(Context.java:95)
|
|
at org.dspace.app.util.AbstractDSpaceWebapp.deregister(AbstractDSpaceWebapp.java:97)
|
|
at org.dspace.app.util.DSpaceContextListener.contextDestroyed(DSpaceContextListener.java:146)
|
|
at org.apache.catalina.core.StandardContext.listenerStop(StandardContext.java:5115)
|
|
at org.apache.catalina.core.StandardContext.stopInternal(StandardContext.java:5779)
|
|
at org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:224)
|
|
at org.apache.catalina.core.ContainerBase$StopChild.call(ContainerBase.java:1588)
|
|
at org.apache.catalina.core.ContainerBase$StopChild.call(ContainerBase.java:1577)
|
|
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
|
|
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
|
|
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
|
|
at java.lang.Thread.run(Thread.java:748)
|
|
2018-04-20 19:15:04,006 INFO org.dspace.core.ConfigurationManager @ Loading from classloader: file:/home/cgspace.cgiar.org/config/dspace.cfg
|
|
</code></pre><ul>
|
|
<li>Very suspect!</li>
|
|
</ul>
|
|
<h2 id="2018-04-24">2018-04-24</h2>
|
|
<ul>
|
|
<li>Testing my Ansible playbooks with a clean and updated installation of Ubuntu 18.04 and I fixed some issues that I hadn’t run into a few weeks ago</li>
|
|
<li>There seems to be a new issue with Java dependencies, though</li>
|
|
<li>The <code>default-jre</code> package is going to be Java 10 on Ubuntu 18.04, but I want to use <code>openjdk-8-jre-headless</code> (well, the JDK actually, but it uses this JRE)</li>
|
|
<li>Tomcat and Ant are fine with Java 8, but the <code>maven</code> package wants to pull in Java 10 for some reason</li>
|
|
<li>Looking closer, I see that <code>maven</code> depends on <code>java7-runtime-headless</code>, which is indeed provided by <code>openjdk-8-jre-headless</code></li>
|
|
<li>So it must be one of Maven’s dependencies…</li>
|
|
<li>I will watch it for a few days because it could be an issue that will be resolved before Ubuntu 18.04’s release</li>
|
|
<li>Otherwise I will post a bug to the ubuntu-release mailing list</li>
|
|
<li>Looks like the only way to fix this is to install <code>openjdk-8-jdk-headless</code> before (so it pulls in the JRE) in a separate transaction, or to manually install <code>openjdk-8-jre-headless</code> in the same apt transaction as <code>maven</code></li>
|
|
<li>Also, I started porting PostgreSQL 9.6 into the Ansible infrastructure scripts</li>
|
|
<li>This should be a drop in I believe, though I will definitely test it more locally as well as on DSpace Test once we move to DSpace 5.8 and Ubuntu 18.04 in the coming months</li>
|
|
</ul>
|
|
<h2 id="2018-04-25">2018-04-25</h2>
|
|
<ul>
|
|
<li>Still testing the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> for Ubuntu 18.04, Tomcat 8.5, and PostgreSQL 9.6</li>
|
|
<li>One other new thing I notice is that PostgreSQL 9.6 no longer uses <code>createuser</code> and <code>nocreateuser</code>, as those have actually meant <code>superuser</code> and <code>nosuperuser</code> and have been deprecated for <em>ten years</em></li>
|
|
<li>So for my notes, when I’m importing a CGSpace database dump I need to amend my notes to give super user permission to a user, rather than create user:</li>
|
|
</ul>
|
|
<pre><code>$ psql dspacetest -c 'alter user dspacetest superuser;'
|
|
$ pg_restore -O -U dspacetest -d dspacetest -W -h localhost /tmp/dspace_2018-04-18.backup
|
|
</code></pre><ul>
|
|
<li>There’s another issue with Tomcat in Ubuntu 18.04:</li>
|
|
</ul>
|
|
<pre><code>25-Apr-2018 13:26:21.493 SEVERE [http-nio-127.0.0.1-8443-exec-1] org.apache.coyote.AbstractProtocol$ConnectionHandler.process Error reading request, ignored
|
|
java.lang.NoSuchMethodError: java.nio.ByteBuffer.position(I)Ljava/nio/ByteBuffer;
|
|
at org.apache.coyote.http11.Http11InputBuffer.init(Http11InputBuffer.java:688)
|
|
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:672)
|
|
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
|
|
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:790)
|
|
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1459)
|
|
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
|
|
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
|
|
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
|
|
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
|
|
at java.lang.Thread.run(Thread.java:748)
|
|
</code></pre><ul>
|
|
<li>There’s a <a href="https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=895866">Debian bug about this from a few weeks ago</a></li>
|
|
<li>Apparently Tomcat was compiled with Java 9, so doesn’t work with Java 8</li>
|
|
</ul>
|
|
<h2 id="2018-04-29">2018-04-29</h2>
|
|
<ul>
|
|
<li>DSpace Test crashed again, looks like memory issues again</li>
|
|
<li>JVM heap size was last increased to 6144m but the system only has 8GB total so there’s not much we can do here other than get a bigger Linode instance or remove the massive Solr Statistics data</li>
|
|
</ul>
|
|
<h2 id="2018-04-30">2018-04-30</h2>
|
|
<ul>
|
|
<li>DSpace Test crashed again</li>
|
|
<li>I will email the CGSpace team to ask them whether or not we want to commit to having a public test server that accurately mirrors CGSpace (ie, to upgrade to the next largest Linode)</li>
|
|
</ul>
|
|
|
|
|
|
|
|
|
|
|
|
</article>
|
|
|
|
|
|
|
|
</div> <!-- /.blog-main -->
|
|
|
|
<aside class="col-sm-3 ml-auto blog-sidebar">
|
|
|
|
|
|
|
|
<section class="sidebar-module">
|
|
<h4>Recent Posts</h4>
|
|
<ol class="list-unstyled">
|
|
|
|
|
|
<li><a href="/cgspace-notes/2020-12/">December, 2020</a></li>
|
|
|
|
<li><a href="/cgspace-notes/cgspace-dspace6-upgrade/">CGSpace DSpace 6 Upgrade</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2020-11/">November, 2020</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
|
|
|
|
</ol>
|
|
</section>
|
|
|
|
|
|
|
|
|
|
<section class="sidebar-module">
|
|
<h4>Links</h4>
|
|
<ol class="list-unstyled">
|
|
|
|
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
|
|
|
|
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
|
|
|
|
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
|
|
|
|
</ol>
|
|
</section>
|
|
|
|
</aside>
|
|
|
|
|
|
</div> <!-- /.row -->
|
|
</div> <!-- /.container -->
|
|
|
|
|
|
|
|
<footer class="blog-footer">
|
|
<p dir="auto">
|
|
|
|
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
|
|
|
|
</p>
|
|
<p>
|
|
<a href="#">Back to top</a>
|
|
</p>
|
|
</footer>
|
|
|
|
|
|
</body>
|
|
|
|
</html>
|