cgspace-notes/docs/2020-06/index.html

332 lines
13 KiB
HTML
Raw Normal View History

2020-06-02 14:12:32 +02:00
<!DOCTYPE html>
<html lang="en" >
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:title" content="June, 2020" />
<meta property="og:description" content="2020-06-01
I tried to run the AtomicStatisticsUpdateCLI CUA migration script on DSpace Test (linode26) again and it is still going very slowly and has tons of errors like I noticed yesterday
I sent Atmire the dspace.log from today and told them to log into the server to debug the process
In other news, I checked the statistics API on DSpace 6 and it&rsquo;s working
I tried to build the OAI registry on the freshly migrated DSpace 6 on DSpace Test and I get an error:
" />
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2020-06/" />
<meta property="article:published_time" content="2020-06-01T13:55:39+03:00" />
2020-06-06 12:33:23 +02:00
<meta property="article:modified_time" content="2020-06-04T14:43:40+03:00" />
2020-06-02 14:12:32 +02:00
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="June, 2020"/>
<meta name="twitter:description" content="2020-06-01
I tried to run the AtomicStatisticsUpdateCLI CUA migration script on DSpace Test (linode26) again and it is still going very slowly and has tons of errors like I noticed yesterday
I sent Atmire the dspace.log from today and told them to log into the server to debug the process
In other news, I checked the statistics API on DSpace 6 and it&rsquo;s working
I tried to build the OAI registry on the freshly migrated DSpace 6 on DSpace Test and I get an error:
"/>
2020-06-04 13:43:40 +02:00
<meta name="generator" content="Hugo 0.72.0" />
2020-06-02 14:12:32 +02:00
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "June, 2020",
"url": "https://alanorth.github.io/cgspace-notes/2020-06/",
2020-06-06 12:33:23 +02:00
"wordCount": "846",
2020-06-02 14:12:32 +02:00
"datePublished": "2020-06-01T13:55:39+03:00",
2020-06-06 12:33:23 +02:00
"dateModified": "2020-06-04T14:43:40+03:00",
2020-06-02 14:12:32 +02:00
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"keywords": "Notes"
}
</script>
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2020-06/">
<title>June, 2020 | CGSpace Notes</title>
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.6da5c906cc7a8fbb93f31cd2316c5dbe3f19ac4aa6bfb066f1243045b8f6061e.css" rel="stylesheet" integrity="sha256-baXJBsx6j7uT8xzSMWxdvj8ZrEqmv7Bm8SQwRbj2Bh4=" crossorigin="anonymous">
<!-- minified Font Awesome for SVG icons -->
<script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f3d2a1f5980bab30ddd0d8cadbd496475309fc48e2b1d052c5c09e6facffcb0f.js" integrity="sha256-89Kh9ZgLqzDd0NjK29SWR1MJ/EjisdBSxcCeb6z/yw8=" crossorigin="anonymous"></script>
<!-- RSS 2.0 feed -->
</head>
<body>
<div class="blog-masthead">
<div class="container">
<nav class="nav blog-nav">
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
</nav>
</div>
</div>
<header class="blog-header">
<div class="container">
<h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
<p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
</div>
</header>
<div class="container">
<div class="row">
<div class="col-sm-8 blog-main">
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2020-06/">June, 2020</a></h2>
<p class="blog-post-meta"><time datetime="2020-06-01T13:55:39+03:00">Mon Jun 01, 2020</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2020-06-01">2020-06-01</h2>
<ul>
<li>I tried to run the <code>AtomicStatisticsUpdateCLI</code> CUA migration script on DSpace Test (linode26) again and it is still going very slowly and has tons of errors like I noticed yesterday
<ul>
<li>I sent Atmire the dspace.log from today and told them to log into the server to debug the process</li>
</ul>
</li>
<li>In other news, I checked the statistics API on DSpace 6 and it&rsquo;s working</li>
<li>I tried to build the OAI registry on the freshly migrated DSpace 6 on DSpace Test and I get an error:</li>
</ul>
<pre><code>$ dspace oai import -c
OAI 2.0 manager action started
Loading @mire database changes for module MQM
Changes have been processed
Clearing index
Index cleared
Using full import.
Full import
java.lang.NullPointerException
at org.dspace.xoai.app.XOAI.willChangeStatus(XOAI.java:438)
at org.dspace.xoai.app.XOAI.index(XOAI.java:368)
at org.dspace.xoai.app.XOAI.index(XOAI.java:280)
at org.dspace.xoai.app.XOAI.indexAll(XOAI.java:227)
at org.dspace.xoai.app.XOAI.index(XOAI.java:134)
at org.dspace.xoai.app.XOAI.main(XOAI.java:560)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)
</code></pre><h2 id="2020-06-02">2020-06-02</h2>
<ul>
<li>I noticed that I was able to do a partial OAI import (ie, without <code>-c</code>)
<ul>
<li>Then I tried to clear the OAI Solr core and import, but I get the same error:</li>
</ul>
</li>
</ul>
<pre><code>$ curl http://localhost:8080/solr/oai/update -H &quot;Content-type: text/xml&quot; --data-binary '&lt;delete&gt;&lt;query&gt;*:*&lt;/query&gt;&lt;/delete&gt;'
$ curl http://localhost:8080/solr/oai/update -H &quot;Content-type: text/xml&quot; --data-binary '&lt;commit /&gt;'
$ ~/dspace63/bin/dspace oai import
OAI 2.0 manager action started
...
There are no indexed documents, using full import.
Full import
java.lang.NullPointerException
at org.dspace.xoai.app.XOAI.willChangeStatus(XOAI.java:438)
at org.dspace.xoai.app.XOAI.index(XOAI.java:368)
at org.dspace.xoai.app.XOAI.index(XOAI.java:280)
at org.dspace.xoai.app.XOAI.indexAll(XOAI.java:227)
at org.dspace.xoai.app.XOAI.index(XOAI.java:143)
at org.dspace.xoai.app.XOAI.main(XOAI.java:560)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)
</code></pre><ul>
<li>I found a <a href="https://jira.lyrasis.org/browse/DS-4363">bug report on DSpace Jira</a> describing this issue affecting someone else running DSpace 6.3
<ul>
<li>They suspect it has to do with the item having some missing group names in its authorization policies</li>
<li>I added some debugging to <code>dspace-oai/src/main/java/org/dspace/xoai/app/XOAI.java</code> to print the Handle of the item that causes the crash and then I looked at its authorization policies</li>
<li>Indeed there are some blank group names:</li>
</ul>
</li>
</ul>
<p><img src="/cgspace-notes/2020/06/item-authorizations-dspace63.png" alt="Missing group names in DSpace 6.3 item authorization policy"></p>
<ul>
<li>The same item on CGSpace (DSpace 5.8) also has groups with no name:</li>
</ul>
<p><img src="/cgspace-notes/2020/06/item-authorizations-dspace58.png" alt="Missing group names in DSpace 5.8 item authorization policy"></p>
<ul>
<li>I added some debugging and found exactly where this happens
<ul>
<li>As it turns out we can just check if the group policy is null there and it allows the OAI import to proceed</li>
<li>Aaaaand as it turns out, this was fixed in <code>dspace-6_x</code> in 2018 after DSpace 6.3 was released (see <a href="https://jira.lyrasis.org/browse/DS-4019">DS-4019</a>), so that was a waste of three hours.</li>
<li>I cherry picked 150e83558103ed7f50e8f323b6407b9cbdf33717 into our current <code>6_x-dev-atmire-modules</code> branch</li>
</ul>
</li>
</ul>
2020-06-04 13:43:40 +02:00
<h2 id="2020-06-04">2020-06-04</h2>
<ul>
<li>Maria was asking about some items they are trying to map from the CGIAR Big Data collection into their Alliance of Bioversity and CIAT journal articles collection, but for some reason the items don&rsquo;t show up in the item mapper
<ul>
<li>The items don&rsquo;t even show up in the XMLUI Discover advanced search, and actually I don&rsquo;t even see any recent items on the recently submitted part of the collection (but the item pages exist of course)</li>
<li>Perhaps I need to try a full Discovery re-index:</li>
</ul>
</li>
</ul>
<pre><code>$ time chrt -i 0 ionice -c2 -n7 nice -n19 dspace index-discovery -b
2020-06-06 12:33:23 +02:00
2020-06-04 13:43:40 +02:00
real 125m37.423s
user 11m20.312s
sys 3m19.965s
</code></pre><ul>
<li>Still I don&rsquo;t see the item in XMLUI search or in the item mapper (and I made sure to clear the Cocoon cache)
<ul>
<li>I&rsquo;m starting to think it&rsquo;s something related to the database transaction issue&hellip;</li>
<li>I removed our custom JDBC driver from <code>/usr/local/apache-tomcat...</code> so that DSpace will use its own much older one, version 9.1-901-1.jdbc4</li>
<li>I ran all system updates on the server (linode18) and rebooted it</li>
<li>After it came back up I had to restart Tomcat five times before all Solr statistics cores came up properly</li>
<li>Unfortunately this means that the Tomcat JDBC pooling via JNDI doesn&rsquo;t work, so we&rsquo;re using only the 30 connections reserved for the DSpace CLI from DSpace&rsquo;s own internal pool</li>
2020-06-06 12:33:23 +02:00
<li>Perhaps our previous issues with the database pool from a few years ago will be less now that we have much more aggressive blocking and rate limiting of bots in nginx</li>
2020-06-04 13:43:40 +02:00
</ul>
</li>
<li>I will also import a fresh database snapshot from CGSpace and check if I can map the item in my local environment
<ul>
<li>After importing and forcing a full reindex locally I can see the item in search and in the item mapper</li>
</ul>
</li>
<li>Abenet sent another message about two users who are having issues with submission, and I see the number of locks in PostgreSQL has sky rocketed again as of a few days ago:</li>
</ul>
<p><img src="/cgspace-notes/2020/06/postgres_locks_ALL-week.png" alt="PostgreSQL locks week"></p>
<ul>
<li>As far as I can tell this started happening for the first time in April, connections and locks:</li>
</ul>
<p><img src="/cgspace-notes/2020/06/postgres_connections_ALL-year.png" alt="PostgreSQL connections year">
<img src="/cgspace-notes/2020/06/postgres_locks_ALL-year.png" alt="PostgreSQL locks year"></p>
<ul>
<li>I think I need to just leave this as is with the DSpace default JDBC driver for now, but perhaps I could also downgrade the Tomcat version (I deployed Tomcat 7.0.103 in March, so perhaps that&rsquo;s relevant)</li>
2020-06-06 12:33:23 +02:00
<li>Also, I&rsquo;ll start <em>another</em> full reindexing to see if the issue with mapping is somehow also resolved now that the database connections are working better
<ul>
<li>Perhaps related, but this one finished much faster:</li>
</ul>
</li>
</ul>
<pre><code>$ time chrt -i 0 ionice -c2 -n7 nice -n19 dspace index-discovery -b
real 101m41.195s
user 10m9.569s
sys 3m13.929s
</code></pre><ul>
<li>Unfortunately the item is still not showing up in the item mapper&hellip;</li>
<li>Something happened to AReS Explorer (linode20) so I ran all system updates and rebooted it</li>
2020-06-04 13:43:40 +02:00
</ul>
2020-06-02 14:12:32 +02:00
<!-- raw HTML omitted -->
</article>
</div> <!-- /.blog-main -->
<aside class="col-sm-3 ml-auto blog-sidebar">
<section class="sidebar-module">
<h4>Recent Posts</h4>
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
<li><a href="/cgspace-notes/2020-04/">April, 2020</a></li>
<li><a href="/cgspace-notes/2020-03/">March, 2020</a></li>
<li><a href="/cgspace-notes/2020-02/">February, 2020</a></li>
</ol>
</section>
<section class="sidebar-module">
<h4>Links</h4>
<ol class="list-unstyled">
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
</ol>
</section>
</aside>
</div> <!-- /.row -->
</div> <!-- /.container -->
<footer class="blog-footer">
<p dir="auto">
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
</p>
<p>
<a href="#">Back to top</a>
</p>
</footer>
</body>
</html>