<!DOCTYPE html> <html lang="en" > <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"> <meta property="og:title" content="March, 2021" /> <meta property="og:description" content="2021-03-01 Discuss some OpenRXV issues with Abdullah from CodeObia He’s trying to work on the DSpace 6+ metadata schema autoimport using the DSpace 6+ REST API Also, we found some issues building and running OpenRXV currently due to ecosystem shift in the Node.js dependencies " /> <meta property="og:type" content="article" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2021-03/" /> <meta property="article:published_time" content="2021-03-01T10:13:54+02:00" /> <meta property="article:modified_time" content="2021-04-13T21:13:08+03:00" /> <meta name="twitter:card" content="summary"/> <meta name="twitter:title" content="March, 2021"/> <meta name="twitter:description" content="2021-03-01 Discuss some OpenRXV issues with Abdullah from CodeObia He’s trying to work on the DSpace 6+ metadata schema autoimport using the DSpace 6+ REST API Also, we found some issues building and running OpenRXV currently due to ecosystem shift in the Node.js dependencies "/> <meta name="generator" content="Hugo 0.105.0"> <script type="application/ld+json"> { "@context": "http://schema.org", "@type": "BlogPosting", "headline": "March, 2021", "url": "https://alanorth.github.io/cgspace-notes/2021-03/", "wordCount": "4453", "datePublished": "2021-03-01T10:13:54+02:00", "dateModified": "2021-04-13T21:13:08+03:00", "author": { "@type": "Person", "name": "Alan Orth" }, "keywords": "Notes" } </script> <link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2021-03/"> <title>March, 2021 | CGSpace Notes</title> <!-- combined, minified CSS --> <link href="https://alanorth.github.io/cgspace-notes/css/style.c6ba80bc50669557645abe05f86b73cc5af84408ed20f1551a267bc19ece8228.css" rel="stylesheet" integrity="sha256-xrqAvFBmlVdkWr4F+GtzzFr4RAjtIPFVGiZ7wZ7Ogig=" crossorigin="anonymous"> <!-- minified Font Awesome for SVG icons --> <script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f5072c55a0721857184db93a50561d7dc13975b4de2e19db7f81eb5f3fa57270.js" integrity="sha256-9QcsVaByGFcYTbk6UFYdfcE5dbTeLhnbf4HrXz+lcnA=" crossorigin="anonymous"></script> <!-- RSS 2.0 feed --> </head> <body> <div class="blog-masthead"> <div class="container"> <nav class="nav blog-nav"> <a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a> </nav> </div> </div> <header class="blog-header"> <div class="container"> <h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1> <p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p> </div> </header> <div class="container"> <div class="row"> <div class="col-sm-8 blog-main"> <article class="blog-post"> <header> <h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2021-03/">March, 2021</a></h2> <p class="blog-post-meta"> <time datetime="2021-03-01T10:13:54+02:00">Mon Mar 01, 2021</time> in <span class="fas fa-folder" aria-hidden="true"></span> <a href="/categories/notes/" rel="category tag">Notes</a> </p> </header> <h2 id="2021-03-01">2021-03-01</h2> <ul> <li>Discuss some OpenRXV issues with Abdullah from CodeObia <ul> <li>He’s trying to work on the DSpace 6+ metadata schema autoimport using the DSpace 6+ REST API</li> <li>Also, we found some issues building and running OpenRXV currently due to ecosystem shift in the Node.js dependencies</li> </ul> </li> </ul> <h2 id="2021-03-02">2021-03-02</h2> <ul> <li>I fixed three build and runtime issues in OpenRXV: <ul> <li><a href="https://github.com/ilri/OpenRXV/pull/80">fix highcharts-angular and ngx-tour-core build</a></li> <li><a href="https://github.com/ilri/OpenRXV/pull/82">frontend/package.json: Pin @types/ramda at 0.27.34</a></li> </ul> </li> <li>Then I merged a few fixes that Abdullah had worked on last week</li> </ul> <h2 id="2021-03-03">2021-03-03</h2> <ul> <li>I <a href="https://github.com/ilri/OpenRXV/issues/83">fixed another frontend build warning on OpenRXV</a></li> <li>Then I <a href="https://github.com/ilri/OpenRXV/pull/84">updated the frontend container to use Node.js 12 and Ubuntu 20.04</a></li> <li>Also, I <a href="https://github.com/ilri/OpenRXV/pull/85">added a GitHub Actions workflow to build the frontend</a></li> <li>I did some testing of Abdullah’s patch for the values mapping search on OpenRXV <ul> <li>It still doesn’t work with multi-word values, so I recorded a video with wf-recorder and uploaded it to <a href="https://github.com/ilri/OpenRXV/issues/43">the issue</a> for him to investigate</li> </ul> </li> </ul> <h2 id="2021-03-04">2021-03-04</h2> <ul> <li>Peter is having issues with the workflow since yesterday <ul> <li>I looked at the Munin stats and see a high number of database locks since yesterday</li> </ul> </li> </ul> <p><img src="/cgspace-notes/2021/03/postgres_locks_ALL-week.png" alt="PostgreSQL locks week"> <img src="/cgspace-notes/2021/03/postgres_connections_cgspace-week.png" alt="PostgreSQL connections week"></p> <ul> <li>I looked at the number of connections in PostgreSQL and it’s definitely high again:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ psql -c <span style="color:#e6db74">'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;'</span> | wc -l </span></span><span style="display:flex;"><span>1020 </span></span></code></pre></div><ul> <li>I reported it to Atmire to take a look, on the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=851">same issue</a> we had been tracking this before</li> <li>Abenet asked me to add a new ORCID for ILRI staff member Zoe Campbell</li> <li>I added it to the controlled vocabulary and then tagged her existing items on CGSpace using my <code>add-orcid-identifier.py</code> script:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ cat 2021-03-04-add-zoe-campbell-orcid.csv </span></span><span style="display:flex;"><span>dc.contributor.author,cg.creator.identifier </span></span><span style="display:flex;"><span>"Campbell, Zoë","Zoe Campbell: 0000-0002-4759-9976" </span></span><span style="display:flex;"><span>"Campbell, Zoe A.","Zoe Campbell: 0000-0002-4759-9976" </span></span><span style="display:flex;"><span>$ ./ilri/add-orcid-identifiers-csv.py -i 2021-03-04-add-zoe-campbell-orcid.csv -db dspace -u dspace -p <span style="color:#e6db74">'fuuu'</span> </span></span></code></pre></div><ul> <li>I still need to do cleanup on the journal articles metadata <ul> <li>Peter sent me some cleanups but I can’t use them in the search/replace format he gave</li> <li>I think it’s better to export the metadata values with IDs and import cleaned up ones as CSV</li> </ul> </li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>localhost/dspace63= > \COPY (SELECT dspace_object_id AS id, text_value as "cg.journal" FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=251) to /tmp/2021-02-24-journals.csv WITH CSV HEADER; </span></span><span style="display:flex;"><span>COPY 32087 </span></span></code></pre></div><ul> <li>I used OpenRefine to remove all journal values that didn’t have one of these values: ; ( ) <ul> <li>Then I cloned the <code>cg.journal</code> field to <code>cg.volume</code> and <code>cg.issue</code></li> <li>I used some GREL expressions like these to extract the journal name, volume, and issue:</li> </ul> </li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>value.partition(';')[0].trim() # to get journal names </span></span><span style="display:flex;"><span>value.partition(/[0-9]+\([0-9]+\)/)[1].replace(/^(\d+)\(\d+\)/,"$1") # to get journal volumes </span></span><span style="display:flex;"><span>value.partition(/[0-9]+\([0-9]+\)/)[1].replace(/^\d+\((\d+)\)/,"$1") # to get journal issues </span></span></code></pre></div><ul> <li>Then I uploaded the changes to CGSpace using <code>dspace metadata-import</code></li> <li>Margarita from CCAFS was asking about an error deleting some items that were showing up in Google and should have been private <ul> <li>The error was “Authorization denied for action OBSOLETE (DELETE) on BITSTREAM:bd157345-448e …”</li> <li>I searched the DSpace issue tracker and found several issues reporting this: <ul> <li><a href="https://jira.lyrasis.org/browse/DS-3985">DS-3985 Delete item fails</a></li> <li><a href="https://jira.lyrasis.org/browse/DS-4004">DS-4004 Authorization denied Exception when trying to delete permanently an item, collection or community as a non-Admin user</a></li> <li><a href="https://jira.lyrasis.org/browse/DS-4297">DS-4297 Authorization error when trying to delete item by submitter/administrator</a></li> </ul> </li> <li>The issue is apparently with non-admin users who are in the admin and submit groups of the owning collection…</li> <li>In this case the item was uploaded to the CCAFS Reports collection, and Margarita is a non-admin user who is a member of the collection’s admin and submit groups, exactly as the issue described</li> <li>I added a comment about our issue to <a href="https://jira.lyrasis.org/browse/DS-4297">DS-4297</a></li> </ul> </li> <li>Yesterday Abenet added me to a WLE collection approver/editer steps so we can try to figure out why Niroshini is having issues adding metadata to Udana’s submissions <ul> <li>I edited Udana’s submission to CGSpace: <ul> <li>corrected the title</li> <li>added language English</li> <li>changed the link to the external item page instead of PDF</li> <li>added SDGs from the external item page</li> <li>added AGROVOC subjects from the external item page</li> <li>added pagination (extent)</li> <li>changed the license to “other” because CC-BY-NC-ND is not printed anywhere in the PDF or external item page</li> </ul> </li> </ul> </li> </ul> <h2 id="2021-03-05">2021-03-05</h2> <ul> <li>I migrated the Docker bind mount for the AReS Elasticsearch container to a Docker volume:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ docker-compose -f docker/docker-compose.yml down </span></span><span style="display:flex;"><span>$ docker volume create docker_esData_7 </span></span><span style="display:flex;"><span>$ docker container create --name es_dummy -v docker_esData_7:/usr/share/elasticsearch/data:rw elasticsearch:7.6.2 </span></span><span style="display:flex;"><span>$ docker cp docker/esData_7/nodes es_dummy:/usr/share/elasticsearch/data </span></span><span style="display:flex;"><span>$ docker rm es_dummy </span></span><span style="display:flex;"><span># edit docker/docker-compose.yml to switch from bind mount to volume </span></span><span style="display:flex;"><span>$ docker-compose -f docker/docker-compose.yml up -d </span></span></code></pre></div><ul> <li>The trick is that when you create a volume like “myvolume” from a <code>docker-compose.yml</code> file, Docker will create it with the name “docker_myvolume” <ul> <li>If you create it manually on the command line with <code>docker volume create myvolume</code> then the name is literally “myvolume”</li> </ul> </li> <li>I still need to make the changes to git master and add these notes to the pull request so Moayad and others can benefit</li> <li>Delete the <code>openrxv-items-temp</code> index to test a fresh harvesting:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -XDELETE <span style="color:#e6db74">'http://localhost:9200/openrxv-items-temp'</span> </span></span></code></pre></div><h2 id="2021-03-05-1">2021-03-05</h2> <ul> <li>Check the results of the AReS harvesting from last night:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -s <span style="color:#e6db74">'http://localhost:9200/openrxv-items-temp/_count?q=*&pretty'</span> </span></span><span style="display:flex;"><span>{ </span></span><span style="display:flex;"><span> "count" : 101761, </span></span><span style="display:flex;"><span> "_shards" : { </span></span><span style="display:flex;"><span> "total" : 1, </span></span><span style="display:flex;"><span> "successful" : 1, </span></span><span style="display:flex;"><span> "skipped" : 0, </span></span><span style="display:flex;"><span> "failed" : 0 </span></span><span style="display:flex;"><span> } </span></span><span style="display:flex;"><span>} </span></span></code></pre></div><ul> <li>Set the current items index to read only and make a backup:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -X PUT <span style="color:#e6db74">"localhost:9200/openrxv-items/_settings"</span> -H <span style="color:#e6db74">'Content-Type: application/json'</span> -d<span style="color:#e6db74">' {"settings": {"index.blocks.write":true}}'</span> </span></span><span style="display:flex;"><span>$ curl -s -X POST http://localhost:9200/openrxv-items/_clone/openrxv-items-2021-03-05 </span></span></code></pre></div><ul> <li>Delete the current items index and clone the temp one to it:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -XDELETE <span style="color:#e6db74">'http://localhost:9200/openrxv-items'</span> </span></span><span style="display:flex;"><span>$ curl -X PUT <span style="color:#e6db74">"localhost:9200/openrxv-items-temp/_settings"</span> -H <span style="color:#e6db74">'Content-Type: application/json'</span> -d<span style="color:#e6db74">'{"settings": {"index.blocks.write": true}}'</span> </span></span><span style="display:flex;"><span>$ curl -s -X POST http://localhost:9200/openrxv-items-temp/_clone/openrxv-items </span></span></code></pre></div><ul> <li>Then delete the temp and backup:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -XDELETE <span style="color:#e6db74">'http://localhost:9200/openrxv-items-temp'</span> </span></span><span style="display:flex;"><span>{"acknowledged":true}% </span></span><span style="display:flex;"><span>$ curl -XDELETE <span style="color:#e6db74">'http://localhost:9200/openrxv-items-2021-03-05'</span> </span></span></code></pre></div><ul> <li>I made some pull requests to OpenRXV: <ul> <li><a href="https://github.com/ilri/OpenRXV/pull/86">docker/docker-compose.yml: Use docker volumes</a></li> <li><a href="https://github.com/ilri/OpenRXV/pull/87">docker/docker-compose.yml: Pin Redis to version 5</a></li> </ul> </li> <li>I deployed the latest changes from the last few days on AReS production</li> </ul> <h2 id="2021-03-07">2021-03-07</h2> <ul> <li>I realized there is something wrong with the Elasticsearch indexes on AReS <ul> <li>On a new test environment I see <code>openrxv-items</code> is correctly an alias of <code>openrxv-items-final</code>:</li> </ul> </li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -s <span style="color:#e6db74">'http://localhost:9200/_alias/'</span> | python -m json.tool | less </span></span><span style="display:flex;"><span>... </span></span><span style="display:flex;"><span> "openrxv-items-final": { </span></span><span style="display:flex;"><span> "aliases": { </span></span><span style="display:flex;"><span> "openrxv-items": {} </span></span><span style="display:flex;"><span> } </span></span><span style="display:flex;"><span> }, </span></span></code></pre></div><ul> <li>But on AReS production <code>openrxv-items</code> has somehow become a concrete index:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -s <span style="color:#e6db74">'http://localhost:9200/_alias/'</span> | python -m json.tool | less </span></span><span style="display:flex;"><span>... </span></span><span style="display:flex;"><span> "openrxv-items": { </span></span><span style="display:flex;"><span> "aliases": {} </span></span><span style="display:flex;"><span> }, </span></span><span style="display:flex;"><span> "openrxv-items-final": { </span></span><span style="display:flex;"><span> "aliases": {} </span></span><span style="display:flex;"><span> }, </span></span><span style="display:flex;"><span> "openrxv-items-temp": { </span></span><span style="display:flex;"><span> "aliases": {} </span></span><span style="display:flex;"><span> }, </span></span></code></pre></div><ul> <li>I fixed the issue on production by cloning the <code>openrxv-items</code> index to <code>openrxv-items-final</code>, deleting <code>openrxv-items</code>, and then re-creating it as an alias:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -X PUT <span style="color:#e6db74">"localhost:9200/openrxv-items/_settings"</span> -H <span style="color:#e6db74">'Content-Type: application/json'</span> -d<span style="color:#e6db74">'{"settings": {"index.blocks.write": true}}'</span> </span></span><span style="display:flex;"><span>$ curl -s -X POST http://localhost:9200/openrxv-items/_clone/openrxv-items-2021-03-07 </span></span><span style="display:flex;"><span>$ curl -XDELETE <span style="color:#e6db74">'http://localhost:9200/openrxv-items-final'</span> </span></span><span style="display:flex;"><span>$ curl -s -X POST http://localhost:9200/openrxv-items/_clone/openrxv-items-final </span></span><span style="display:flex;"><span>$ curl -XDELETE <span style="color:#e6db74">'http://localhost:9200/openrxv-items'</span> </span></span><span style="display:flex;"><span>$ curl -s -X POST <span style="color:#e6db74">'http://localhost:9200/_aliases'</span> -H <span style="color:#e6db74">'Content-Type: application/json'</span> -d<span style="color:#e6db74">'{"actions" : [{"add" : { "index" : "openrxv-items-final", "alias" : "openrxv-items"}}]}'</span> </span></span></code></pre></div><ul> <li>Delete backups and remove read-only mode on <code>openrxv-items</code>:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -XDELETE <span style="color:#e6db74">'http://localhost:9200/openrxv-items-2021-03-07'</span> </span></span><span style="display:flex;"><span>$ curl -X PUT <span style="color:#e6db74">"localhost:9200/openrxv-items/_settings"</span> -H <span style="color:#e6db74">'Content-Type: application/json'</span> -d<span style="color:#e6db74">'{"settings": {"index.blocks.write": false}}'</span> </span></span></code></pre></div><ul> <li>Linode sent alerts about the CPU usage on CGSpace yesterday and the day before <ul> <li>Looking in the logs I see a few IPs making heavy usage on the REST API and XMLUI:</li> </ul> </li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz | grep -E <span style="color:#e6db74">'0[56]/Mar/2021'</span> | goaccess --log-format<span style="color:#f92672">=</span>COMBINED - </span></span></code></pre></div><ul> <li>I see the usual IPs for CCAFS and ILRI importer bots, but also <code>143.233.242.132</code> which appears to be for GARDIAN:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># zgrep <span style="color:#e6db74">'143.233.242.132'</span> /var/log/nginx/access.log.1 | grep -c Delphi </span></span><span style="display:flex;"><span>6237 </span></span><span style="display:flex;"><span># zgrep <span style="color:#e6db74">'143.233.242.132'</span> /var/log/nginx/access.log.1 | grep -c -v Delphi </span></span><span style="display:flex;"><span>6418 </span></span></code></pre></div><ul> <li>They seem to make requests twice, once with the Delphi user agent that we know and already mark as a bot, and once with a “normal” user agent <ul> <li>Looking in Solr I see they have been using this IP for awhile, as they have 100,000 hits going back into 2020</li> <li>I will add this IP to the list of bots in nginx and purge it from Solr with my <code>check-spider-ip-hits.sh</code> script</li> </ul> </li> <li>I made a few changes to OpenRXV: <ul> <li><a href="https://github.com/ilri/OpenRXV/issues/89">Migrated away from links to use networks</a></li> <li><a href="https://github.com/ilri/OpenRXV/issues/68">Converted the backend container to use a custom image that includes <code>unoconv</code></a> so we don’t have to manually install it anymore</li> </ul> </li> </ul> <h2 id="2021-03-08">2021-03-08</h2> <ul> <li>I approved the WLE item that I edited last week, and all the metadata is there: <a href="https://hdl.handle.net/10568/111810">https://hdl.handle.net/10568/111810</a> <ul> <li>So I’m not sure what Niroshini’s issue with metadata is…</li> </ul> </li> <li>Peter sent a message yesterday saying that his item finally got committed <ul> <li>I looked at the Munin graphs and there was a MASSIVE spike in database activity two days ago, and now database locks are back down to normal levels (from 1000+):</li> </ul> </li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ psql -c <span style="color:#e6db74">'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;'</span> | wc -l </span></span><span style="display:flex;"><span>13 </span></span></code></pre></div><ul> <li>On 2021-03-03 the PostgreSQL transactions started rising:</li> </ul> <p><img src="/cgspace-notes/2021/03/postgres_querylength_ALL-week.png" alt="PostgreSQL query length week"></p> <ul> <li>After that the connections and locks started going up, peaking on 2021-03-06:</li> </ul> <p><img src="/cgspace-notes/2021/03/postgres_locks_ALL-week.png" alt="PostgreSQL locks week"> <img src="/cgspace-notes/2021/03/postgres_connections_ALL-week.png" alt="PostgreSQL connections week"></p> <ul> <li>I sent another message to Atmire to ask if they have time to look into this</li> <li>CIFOR is pressuring me to upload the batch items from last week <ul> <li>Vika sent me a final file with some duplicates that Peter identified removed</li> <li>I extracted and re-applied my basic corrections from last week in OpenRefine, then ran the items through <code>csv-metadata-quality</code> checker and uploaded them to CGSpace</li> <li>In total there are 1,088 items</li> </ul> </li> <li>Udana from IWMI emailed to ask about CGSpace thumbnails</li> <li>Udana from IWMI emailed to ask about an item uploaded recently that does not appear in AReS <ul> <li><a href="https://hdl.handle.net/10568/111794">The item</a> was added to the archive on 2021-03-05, and I last harvested on 2021-03-06, so this might be an issue of a missing item</li> </ul> </li> <li>Abenet got a quote from Atmire to buy 125 credits for 3750€</li> <li>Maria at Bioversity sent some feedback about duplicate items on AReS</li> <li>I’m wondering if the issue of the <code>openrxv-items-final</code> index not getting cleared after a successful harvest (which results in having 200,000, then 300,000, etc items) has to do with the alias issue I fixed yesterday <ul> <li>I will start a fresh harvest on AReS without now to check, but first back up the current index just in case:</li> </ul> </li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -X PUT <span style="color:#e6db74">"localhost:9200/openrxv-items-final/_settings"</span> -H <span style="color:#e6db74">'Content-Type: application/json'</span> -d<span style="color:#e6db74">'{"settings": {"index.blocks.write": true}}'</span> </span></span><span style="display:flex;"><span>$ curl -s -X POST http://localhost:9200/openrxv-items-final/_clone/openrxv-items-final-2021-03-08 </span></span><span style="display:flex;"><span># start harvesting on AReS </span></span></code></pre></div><ul> <li>As I saw on my local test instance, even when you cancel a harvesting, it replaces the <code>openrxv-items-final</code> index with whatever is in <code>openrxv-items-temp</code> automatically, so I assume it will do the same now</li> </ul> <h2 id="2021-03-09">2021-03-09</h2> <ul> <li>The harvesting on AReS finished last night and everything worked as expected, with no manual intervention <ul> <li>This means that <a href="https://github.com/ilri/OpenRXV/issues/64">the issue</a> we were facing for a few months was due to the <code>openrxv-items</code> index being deleted and re-created as a standalone index instead of an alias of <code>openrxv-items-final</code></li> </ul> </li> <li>Talk to Moayad about OpenRXV development <ul> <li>We realized that the missing/duplicate items issue is probably due to the long harvesting time on the REST API, as the time between starting the harvesting on page 0 and finishing the harvesting on page 900 (in the CGSpace example), some items will have been added to the repository, which causes the pages to shift</li> <li>I proposed a solution in the <a href="https://github.com/ilri/OpenRXV/issues/67">GitHub issue</a>, where we consult the site’s XML sitemap after harvesting to see if we missed any items, and then we harvest them individually</li> </ul> </li> <li>Peter sent me a list of 356 DOIs from Altmetric that don’t have our Handles, so we need to Tweet them <ul> <li>I used my <code>doi-to-handle.py</code> script to generate a list of handles and titles for him:</li> </ul> </li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./ilri/doi-to-handle.py -i /tmp/dois.txt -o /tmp/handles.txt -db dspace -u dspace -p <span style="color:#e6db74">'fuuu'</span> </span></span></code></pre></div><h2 id="2021-03-10">2021-03-10</h2> <ul> <li>Colleagues from ICARDA asked about how we should handle ISI journals in CG Core, as CGSpace uses <code>cg.isijournal</code> and MELSpace uses <code>mel.impact-factor</code> <ul> <li>I filed <a href="https://github.com/AgriculturalSemantics/cg-core/issues/39">an issue</a> on the cg-core project to ask colleagues for ideas</li> </ul> </li> <li>Peter said he doesn’t see “Source Code” or “Software” in the <a href="https://cgspace.cgiar.org/handle/10568/1/search-filter?field=type">output type facet on the ILRI community</a>, but I see it on the home page, so I will try to do a full Discovery re-index:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ time chrt -b <span style="color:#ae81ff">0</span> ionice -c2 -n7 nice -n19 dspace index-discovery -b </span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>real 318m20.485s </span></span><span style="display:flex;"><span>user 215m15.196s </span></span><span style="display:flex;"><span>sys 2m51.529s </span></span></code></pre></div><ul> <li>Now I see ten items for “Source Code” in the facets…</li> <li>Add GPL and MIT licenses to the list of licenses on CGSpace input form since we will start capturing more software and source code</li> <li>Added the ability to check <code>dcterms.license</code> values against the SPDX licenses in the csv-metadata-quality tool <ul> <li>Also, I made some other minor fixes and released <a href="https://github.com/ilri/csv-metadata-quality/releases/tag/v0.4.6">version 0.4.6</a> on GitHub</li> </ul> </li> <li>Proof and upload twenty-seven items to CGSpace for Peter Ballantyne <ul> <li>Mostly Ugandan outputs for CRP Livestock and Livestock and Fish</li> </ul> </li> </ul> <h2 id="2021-03-14">2021-03-14</h2> <ul> <li>Switch to linux-kvm kernel on linode20 and linode18:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># apt update <span style="color:#f92672">&&</span> apt full-upgrade </span></span><span style="display:flex;"><span># apt install linux-kvm </span></span><span style="display:flex;"><span># apt remove linux-generic linux-image-generic linux-headers-generic linux-firmware </span></span><span style="display:flex;"><span># apt autoremove <span style="color:#f92672">&&</span> apt autoclean </span></span><span style="display:flex;"><span># reboot </span></span></code></pre></div><ul> <li>Deploy latest changes from <code>6_x-prod</code> branch on CGSpace</li> <li>Deploy latest changes from OpenRXV <code>master</code> branch on AReS</li> <li>Last week Peter added OpenRXV to CGSpace: <a href="https://hdl.handle.net/10568/112982">https://hdl.handle.net/10568/112982</a></li> <li>Back up the current <code>openrxv-items-final</code> index on AReS to start a new harvest:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -X PUT <span style="color:#e6db74">"localhost:9200/openrxv-items-final/_settings"</span> -H <span style="color:#e6db74">'Content-Type: application/json'</span> -d<span style="color:#e6db74">'{"settings": {"index.blocks.write": true}}'</span> </span></span><span style="display:flex;"><span>$ curl -s -X POST http://localhost:9200/openrxv-items-final/_clone/openrxv-items-final-2021-03-14 </span></span><span style="display:flex;"><span>$ curl -X PUT <span style="color:#e6db74">"localhost:9200/openrxv-items-final/_settings"</span> -H <span style="color:#e6db74">'Content-Type: application/json'</span> -d<span style="color:#e6db74">'{"settings": {"index.blocks.write": false}}'</span> </span></span></code></pre></div><ul> <li>After the harvesting finished it seems the indexes got messed up again, as <code>openrxv-items</code> is an alias of <code>openrxv-items-temp</code> instead of <code>openrxv-items-final</code>:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -s <span style="color:#e6db74">'http://localhost:9200/_alias/'</span> | python -m json.tool | less </span></span><span style="display:flex;"><span>... </span></span><span style="display:flex;"><span> "openrxv-items-final": { </span></span><span style="display:flex;"><span> "aliases": {} </span></span><span style="display:flex;"><span> }, </span></span><span style="display:flex;"><span> "openrxv-items-temp": { </span></span><span style="display:flex;"><span> "aliases": { </span></span><span style="display:flex;"><span> "openrxv-items": {} </span></span><span style="display:flex;"><span> } </span></span><span style="display:flex;"><span> }, </span></span></code></pre></div><ul> <li>Anyways, the number of items in <code>openrxv-items</code> seems OK and the AReS Explorer UI is working fine <ul> <li>I will have to manually fix the indexes before the next harvesting</li> </ul> </li> <li>Publish the web version of the DSpace CSV Metadata Quality checker tool that I wrote this weekend on GitHub: <a href="https://github.com/ilri/csv-metadata-quality-web">https://github.com/ilri/csv-metadata-quality-web</a> <ul> <li>Also, it is deployed on Heroku: <a href="https://fierce-ocean-30836.herokuapp.com/">https://fierce-ocean-30836.herokuapp.com/</a></li> <li>I was running it on Google App Engine originally, but they have <em>way</em> too aggressive caching of static assets</li> </ul> </li> </ul> <h2 id="2021-03-16">2021-03-16</h2> <ul> <li>Review ten items for Livestock and Fish and Dryland Systems from Peter <ul> <li>I told him to try the new web-based CSV Metadata Qualiter checker and he thought it was cool</li> <li>I found one exact duplicate item and it gave me an idea to try to detect this in the tool</li> </ul> </li> </ul> <h2 id="2021-03-17">2021-03-17</h2> <ul> <li>I added the ability to check for duplicate items to csv-metadata-quality</li> <li>I also made some minor optimizations in the Pandas code</li> <li>I <a href="https://github.com/ilri/csv-metadata-quality/releases/tag/v0.4.7">tagged version 0.4.7 of csv-metadata-quality on GitHub</a></li> </ul> <h2 id="2021-03-18">2021-03-18</h2> <ul> <li>I added the ability to check for, and fix, “mojibake” characters in csv-metadata-quality</li> </ul> <h2 id="2021-03-21">2021-03-21</h2> <ul> <li>Last week Atmire asked me which browser I was using to test the duplicate checker, which I had <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=934">reported</a> as not loading <ul> <li>I tried to load it in Chrome and it works… hmmm</li> </ul> </li> <li>Back up the current <code>openrxv-items-final</code> index to start a fresh AReS Harvest:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -X PUT <span style="color:#e6db74">"localhost:9200/openrxv-items-final/_settings"</span> -H <span style="color:#e6db74">'Content-Type: application/json'</span> -d<span style="color:#e6db74">'{"settings": {"index.blocks.write": true}}'</span> </span></span><span style="display:flex;"><span>$ curl -s -X POST http://localhost:9200/openrxv-items-final/_clone/openrxv-items-final-2021-03-21 </span></span><span style="display:flex;"><span>$ curl -X PUT <span style="color:#e6db74">"localhost:9200/openrxv-items-final/_settings"</span> -H <span style="color:#e6db74">'Content-Type: application/json'</span> -d<span style="color:#e6db74">'{"settings": {"index.blocks.write": false}}'</span> </span></span></code></pre></div><ul> <li>Then start harvesting in the AReS Explorer admin UI</li> </ul> <h2 id="2021-03-22">2021-03-22</h2> <ul> <li>The harvesting on AReS yesterday completed, but somehow I have twice the number of items:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -s <span style="color:#e6db74">'http://localhost:9200/openrxv-items-final/_count?q=*&pretty'</span> </span></span><span style="display:flex;"><span>{ </span></span><span style="display:flex;"><span> "count" : 206204, </span></span><span style="display:flex;"><span> "_shards" : { </span></span><span style="display:flex;"><span> "total" : 1, </span></span><span style="display:flex;"><span> "successful" : 1, </span></span><span style="display:flex;"><span> "skipped" : 0, </span></span><span style="display:flex;"><span> "failed" : 0 </span></span><span style="display:flex;"><span> } </span></span><span style="display:flex;"><span>} </span></span></code></pre></div><ul> <li>Hmmm and even my backup index has a strange number of items:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -s <span style="color:#e6db74">'http://localhost:9200/openrxv-items-final-2021-03-21/_count?q=*&pretty'</span> </span></span><span style="display:flex;"><span>{ </span></span><span style="display:flex;"><span> "count" : 844, </span></span><span style="display:flex;"><span> "_shards" : { </span></span><span style="display:flex;"><span> "total" : 1, </span></span><span style="display:flex;"><span> "successful" : 1, </span></span><span style="display:flex;"><span> "skipped" : 0, </span></span><span style="display:flex;"><span> "failed" : 0 </span></span><span style="display:flex;"><span> } </span></span><span style="display:flex;"><span>} </span></span></code></pre></div><ul> <li>I deleted all indexes and re-created the openrxv-items alias:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -s -X POST <span style="color:#e6db74">'http://localhost:9200/_aliases'</span> -H <span style="color:#e6db74">'Content-Type: application/json'</span> -d<span style="color:#e6db74">'{"actions" : [{"add" : { "index" : "openrxv-items-final", "alias" : "openrxv-items"}}]}'</span> </span></span><span style="display:flex;"><span>$ curl -s <span style="color:#e6db74">'http://localhost:9200/_alias/'</span> | python -m json.tool | less </span></span><span style="display:flex;"><span>... </span></span><span style="display:flex;"><span> "openrxv-items-temp": { </span></span><span style="display:flex;"><span> "aliases": {} </span></span><span style="display:flex;"><span> }, </span></span><span style="display:flex;"><span> "openrxv-items-final": { </span></span><span style="display:flex;"><span> "aliases": { </span></span><span style="display:flex;"><span> "openrxv-items": {} </span></span><span style="display:flex;"><span> } </span></span><span style="display:flex;"><span> } </span></span></code></pre></div><ul> <li>Then I started a new harvesting</li> <li>I switched the Node.js in the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to v12 since v10 will cease to be supported soon <ul> <li>I re-deployed DSpace Test (linode26) with Node.js 12 and restarted the server</li> </ul> </li> <li>The AReS harvest finally finished, with 1047 pages of items, but the <code>openrxv-items-final</code> index is empty and the <code>openrxv-items-temp</code> index has a 103,000 items:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -s <span style="color:#e6db74">'http://localhost:9200/openrxv-items-temp/_count?q=*&pretty'</span> </span></span><span style="display:flex;"><span>{ </span></span><span style="display:flex;"><span> "count" : 103162, </span></span><span style="display:flex;"><span> "_shards" : { </span></span><span style="display:flex;"><span> "total" : 1, </span></span><span style="display:flex;"><span> "successful" : 1, </span></span><span style="display:flex;"><span> "skipped" : 0, </span></span><span style="display:flex;"><span> "failed" : 0 </span></span><span style="display:flex;"><span> } </span></span><span style="display:flex;"><span>} </span></span></code></pre></div><ul> <li>I tried to clone the temp index to the final, but got an error:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -s -X POST http://localhost:9200/openrxv-items-temp/_clone/openrxv-items-final </span></span><span style="display:flex;"><span>{"error":{"root_cause":[{"type":"resource_already_exists_exception","reason":"index [openrxv-items-final/LmxH-rQsTRmTyWex2d8jxw] already exists","index_uuid":"LmxH-rQsTRmTyWex2d8jxw","index":"openrxv-items-final"}],"type":"resource_already_exists_exception","reason":"index [openrxv-items-final/LmxH-rQsTRmTyWex2d8jxw] already exists","index_uuid":"LmxH-rQsTRmTyWex2d8jxw","index":"openrxv-items-final"},"status":400}% </span></span></code></pre></div><ul> <li>I looked in the Docker logs for Elasticsearch and saw a few memory errors:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>java.lang.OutOfMemoryError: Java heap space </span></span></code></pre></div><ul> <li>According to <code>/usr/share/elasticsearch/config/jvm.options</code> in the Elasticsearch container the default JVM heap is 1g <ul> <li>I see the running Java process has <code>-Xms 1g -Xmx 1g</code> in its process invocation so I guess that it must be indeed using 1g</li> <li>We can <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html">change the heap size with the ES_JAVA_OPTS environment variable</a></li> <li>Or perhaps better, we should <a href="https://www.elastic.co/guide/en/elasticsearch/reference/master/jvm-options.html">use a jvm.options.d file</a> because if you use the environment variable it overrides all other JVM options from the default <code>jvm.options</code></li> <li>I tried to set memory to 1536m by binding an options file and restarting the container, but it didn’t seem to work</li> <li>Nevertheless, after restarting I see 103,000 items in the Explorer…</li> <li>But the indexes are still kinda messed up… the <code>openrxv-items</code> index is an alias of the wrong index!</li> </ul> </li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span> "openrxv-items-final": { </span></span><span style="display:flex;"><span> "aliases": {} </span></span><span style="display:flex;"><span> }, </span></span><span style="display:flex;"><span> "openrxv-items-temp": { </span></span><span style="display:flex;"><span> "aliases": { </span></span><span style="display:flex;"><span> "openrxv-items": {} </span></span><span style="display:flex;"><span> } </span></span><span style="display:flex;"><span> }, </span></span></code></pre></div><h2 id="2021-03-23">2021-03-23</h2> <ul> <li>For reference you can also get the Elasticsearch JVM stats from the API:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -s <span style="color:#e6db74">'http://localhost:9200/_nodes/jvm?human'</span> | python -m json.tool </span></span></code></pre></div><ul> <li>I re-deployed AReS with 1.5GB of heap using the <code>ES_JAVA_OPTS</code> environment variable <ul> <li>It turns out that this <em>is</em> the recommended way to set the heap: <a href="https://www.elastic.co/guide/en/elasticsearch/reference/7.6/jvm-options.html">https://www.elastic.co/guide/en/elasticsearch/reference/7.6/jvm-options.html</a></li> </ul> </li> <li>Then I fixed the aliases to make sure <code>openrxv-items</code> was an alias of <code>openrxv-items-final</code>, similar to how I did a few weeks ago</li> <li>I re-created the temp index:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -XPUT <span style="color:#e6db74">'http://localhost:9200/openrxv-items-temp'</span> </span></span></code></pre></div><h2 id="2021-03-24">2021-03-24</h2> <ul> <li>Atmire responded to the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=934">ticket about the Duplicate Checker</a> <ul> <li>He says it works for him in Firefox, so I checked and it seems to have been an issue with my LocalCDN addon</li> </ul> </li> <li>I re-deployed DSpace Test (linode26) from the latest CGSpace (linode18) data <ul> <li>I want to try to finish up processing the duplicates in Solr that <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=839">Atmire advised on last month</a></li> <li>The current statistics core is 57861236 kilobytes:</li> </ul> </li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># du -s /home/dspacetest.cgiar.org/solr/statistics </span></span><span style="display:flex;"><span>57861236 /home/dspacetest.cgiar.org/solr/statistics </span></span></code></pre></div><ul> <li>I applied their changes to <code>config/spring/api/atmire-cua-update.xml</code> and started the duplicate processor:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ export JAVA_OPTS<span style="color:#f92672">=</span><span style="color:#e6db74">'-Dfile.encoding=UTF-8 -Xmx4096m'</span> </span></span><span style="display:flex;"><span>$ chrt -b <span style="color:#ae81ff">0</span> dspace dsrun com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI -r <span style="color:#ae81ff">1000</span> -c statistics -t <span style="color:#ae81ff">12</span> </span></span></code></pre></div><ul> <li>The default number of records per query is 10,000, which caused memory issues, so I will try with 1000 (Atmire used 100, but that seems too low!)</li> <li>Hah, I still got a memory error after only a few minutes:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>... </span></span><span style="display:flex;"><span>Run 1 — 80% — 5,000/6,263 docs — 25s — 6m 31s </span></span><span style="display:flex;"><span>Exception: GC overhead limit exceeded </span></span><span style="display:flex;"><span>java.lang.OutOfMemoryError: GC overhead limit exceeded </span></span></code></pre></div><ul> <li>I guess we really do have to use <code>-r 100</code></li> <li>Now the thing runs for a few minutes and “finishes”:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ chrt -b <span style="color:#ae81ff">0</span> dspace dsrun com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI -r <span style="color:#ae81ff">100</span> -c statistics -t <span style="color:#ae81ff">12</span> </span></span><span style="display:flex;"><span>Loading @mire database changes for module MQM </span></span><span style="display:flex;"><span>Changes have been processed </span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>************************* </span></span><span style="display:flex;"><span>* Update Script Started * </span></span><span style="display:flex;"><span>************************* </span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>Run 1 </span></span><span style="display:flex;"><span>Start updating Solr Storage Reports | Wed Mar 24 14:42:17 CET 2021 </span></span><span style="display:flex;"><span>Deleting old storage docs from Solr... | Wed Mar 24 14:42:17 CET 2021 </span></span><span style="display:flex;"><span>Done. | Wed Mar 24 14:42:17 CET 2021 </span></span><span style="display:flex;"><span>Processing storage reports for type: eperson | Wed Mar 24 14:42:17 CET 2021 </span></span><span style="display:flex;"><span>Done. | Wed Mar 24 14:42:41 CET 2021 </span></span><span style="display:flex;"><span>Processing storage reports for type: group | Wed Mar 24 14:42:41 CET 2021 </span></span><span style="display:flex;"><span>Done. | Wed Mar 24 14:45:46 CET 2021 </span></span><span style="display:flex;"><span>Processing storage reports for type: collection | Wed Mar 24 14:45:46 CET 2021 </span></span><span style="display:flex;"><span>Done. | Wed Mar 24 14:45:54 CET 2021 </span></span><span style="display:flex;"><span>Processing storage reports for type: community | Wed Mar 24 14:45:54 CET 2021 </span></span><span style="display:flex;"><span>Done. | Wed Mar 24 14:45:58 CET 2021 </span></span><span style="display:flex;"><span>Committing to Solr... | Wed Mar 24 14:45:58 CET 2021 </span></span><span style="display:flex;"><span>Done. | Wed Mar 24 14:45:59 CET 2021 </span></span><span style="display:flex;"><span>Successfully finished updating Solr Storage Reports | Wed Mar 24 14:45:59 CET 2021 </span></span><span style="display:flex;"><span>Run 1 — 2% — 100/4,824 docs — 3m 47s — 3m 47s </span></span><span style="display:flex;"><span>Run 1 — 4% — 200/4,824 docs — 2s — 3m 50s </span></span><span style="display:flex;"><span>Run 1 — 6% — 300/4,824 docs — 2s — 3m 53s </span></span><span style="display:flex;"><span>Run 1 — 8% — 400/4,824 docs — 2s — 3m 55s </span></span><span style="display:flex;"><span>Run 1 — 10% — 500/4,824 docs — 2s — 3m 58s </span></span><span style="display:flex;"><span>Run 1 — 12% — 600/4,824 docs — 2s — 4m 1s </span></span><span style="display:flex;"><span>Run 1 — 15% — 700/4,824 docs — 2s — 4m 3s </span></span><span style="display:flex;"><span>Run 1 — 17% — 800/4,824 docs — 2s — 4m 6s </span></span><span style="display:flex;"><span>Run 1 — 19% — 900/4,824 docs — 2s — 4m 9s </span></span><span style="display:flex;"><span>Run 1 — 21% — 1,000/4,824 docs — 2s — 4m 11s </span></span><span style="display:flex;"><span>Run 1 — 23% — 1,100/4,824 docs — 2s — 4m 14s </span></span><span style="display:flex;"><span>Run 1 — 25% — 1,200/4,824 docs — 2s — 4m 16s </span></span><span style="display:flex;"><span>Run 1 — 27% — 1,300/4,824 docs — 2s — 4m 19s </span></span><span style="display:flex;"><span>Run 1 — 29% — 1,400/4,824 docs — 2s — 4m 22s </span></span><span style="display:flex;"><span>Run 1 — 31% — 1,500/4,824 docs — 2s — 4m 24s </span></span><span style="display:flex;"><span>Run 1 — 33% — 1,600/4,824 docs — 2s — 4m 27s </span></span><span style="display:flex;"><span>Run 1 — 35% — 1,700/4,824 docs — 2s — 4m 29s </span></span><span style="display:flex;"><span>Run 1 — 37% — 1,800/4,824 docs — 2s — 4m 32s </span></span><span style="display:flex;"><span>Run 1 — 39% — 1,900/4,824 docs — 2s — 4m 35s </span></span><span style="display:flex;"><span>Run 1 — 41% — 2,000/4,824 docs — 2s — 4m 37s </span></span><span style="display:flex;"><span>Run 1 — 44% — 2,100/4,824 docs — 2s — 4m 40s </span></span><span style="display:flex;"><span>Run 1 — 46% — 2,200/4,824 docs — 2s — 4m 42s </span></span><span style="display:flex;"><span>Run 1 — 48% — 2,300/4,824 docs — 2s — 4m 45s </span></span><span style="display:flex;"><span>Run 1 — 50% — 2,400/4,824 docs — 2s — 4m 48s </span></span><span style="display:flex;"><span>Run 1 — 52% — 2,500/4,824 docs — 2s — 4m 50s </span></span><span style="display:flex;"><span>Run 1 — 54% — 2,600/4,824 docs — 2s — 4m 53s </span></span><span style="display:flex;"><span>Run 1 — 56% — 2,700/4,824 docs — 2s — 4m 55s </span></span><span style="display:flex;"><span>Run 1 — 58% — 2,800/4,824 docs — 2s — 4m 58s </span></span><span style="display:flex;"><span>Run 1 — 60% — 2,900/4,824 docs — 2s — 5m 1s </span></span><span style="display:flex;"><span>Run 1 — 62% — 3,000/4,824 docs — 2s — 5m 3s </span></span><span style="display:flex;"><span>Run 1 — 64% — 3,100/4,824 docs — 2s — 5m 6s </span></span><span style="display:flex;"><span>Run 1 — 66% — 3,200/4,824 docs — 3s — 5m 9s </span></span><span style="display:flex;"><span>Run 1 — 68% — 3,300/4,824 docs — 2s — 5m 12s </span></span><span style="display:flex;"><span>Run 1 — 70% — 3,400/4,824 docs — 2s — 5m 14s </span></span><span style="display:flex;"><span>Run 1 — 73% — 3,500/4,824 docs — 2s — 5m 17s </span></span><span style="display:flex;"><span>Run 1 — 75% — 3,600/4,824 docs — 2s — 5m 20s </span></span><span style="display:flex;"><span>Run 1 — 77% — 3,700/4,824 docs — 2s — 5m 22s </span></span><span style="display:flex;"><span>Run 1 — 79% — 3,800/4,824 docs — 2s — 5m 25s </span></span><span style="display:flex;"><span>Run 1 — 81% — 3,900/4,824 docs — 2s — 5m 27s </span></span><span style="display:flex;"><span>Run 1 — 83% — 4,000/4,824 docs — 2s — 5m 30s </span></span><span style="display:flex;"><span>Run 1 — 85% — 4,100/4,824 docs — 2s — 5m 33s </span></span><span style="display:flex;"><span>Run 1 — 87% — 4,200/4,824 docs — 2s — 5m 35s </span></span><span style="display:flex;"><span>Run 1 — 89% — 4,300/4,824 docs — 2s — 5m 38s </span></span><span style="display:flex;"><span>Run 1 — 91% — 4,400/4,824 docs — 2s — 5m 41s </span></span><span style="display:flex;"><span>Run 1 — 93% — 4,500/4,824 docs — 2s — 5m 43s </span></span><span style="display:flex;"><span>Run 1 — 95% — 4,600/4,824 docs — 2s — 5m 46s </span></span><span style="display:flex;"><span>Run 1 — 97% — 4,700/4,824 docs — 2s — 5m 49s </span></span><span style="display:flex;"><span>Run 1 — 100% — 4,800/4,824 docs — 2s — 5m 51s </span></span><span style="display:flex;"><span>Run 1 — 100% — 4,824/4,824 docs — 2s — 5m 53s </span></span><span style="display:flex;"><span>Run 1 took 5m 53s </span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>************************** </span></span><span style="display:flex;"><span>* Update Script Finished * </span></span><span style="display:flex;"><span>************************** </span></span></code></pre></div><ul> <li>If I run it again it finds the same 4,824 docs and processes them… <ul> <li>I asked Atmire for feedback on this: <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=839">https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=839</a></li> </ul> </li> </ul> <h2 id="2021-03-25">2021-03-25</h2> <ul> <li>Niroshini from IWMI is still having problems adding metadata during the edit step of the workflow on CGSpace <ul> <li>I told her to try to register using a private email account and we’ll add her to the WLE group so she can try that way</li> </ul> </li> </ul> <h2 id="2021-03-28">2021-03-28</h2> <ul> <li>Make a backup of the <code>openrxv-items-final</code> index on AReS Explorer and start a new harvest</li> </ul> <h2 id="2021-03-29">2021-03-29</h2> <ul> <li>The AReS harvesting that I started yesterday finished successfully and all indexes look OK: <ul> <li><code>openrxv-items</code> is an alias of <code>openrxv-items-final</code> and has a correct number of items</li> </ul> </li> <li>Last week Bosede from IITA said she was trying to move an item from one collection to another and the system was “rolling” and never finished <ul> <li>I looked in Munin and I don’t see anything particularly wrong that day, so I told her to try again</li> </ul> </li> <li>Marianne Gadeberg asked about mapping an item last week <ul> <li>Searched for <a href="https://hdl.handle.net/10568/110633">the item</a>’s handle, the title, the title in quotes, the UUID, with pluses instead of spaces, etc in the item mapper… but I can never find it in the results</li> <li>I see someone has reported this issue on Jira in DSpace 5.x’s XMLUI item mapper: <a href="https://jira.lyrasis.org/browse/DS-2761">https://jira.lyrasis.org/browse/DS-2761</a></li> <li>The Solr log shows that my query (with and without quotes, etc) has 143 results:</li> </ul> </li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>2021-03-29 08:55:40,073 INFO org.apache.solr.core.SolrCore @ [search] webapp=/solr path=/select params={q=Gender+mainstreaming+in+local+potato+seed+system+in+Georgia&fl=handle,search.resourcetype,search.resourceid,search.uniqueid&start=0&fq=NOT(withdrawn:true)&fq=NOT(discoverable:false)&fq=-location:l5308ea39-7c65-401b-890b-c2b93dad649a&wt=javabin&version=2} hits=143 status=0 QTime=0 </span></span></code></pre></div><ul> <li>But the item mapper only displays ten items, with no pagination <ul> <li>There is no way to search by handle or ID</li> <li>I mapped the item manually using a CSV</li> </ul> </li> </ul> <h2 id="2021-03-30">2021-03-30</h2> <ul> <li>I realized I never finished deleting all the old fields after our CG Core migration a few months ago <ul> <li>I found a few occurrences of old metadata so I had to move them where possible and delete them where not</li> </ul> </li> <li>I updated the <a href="/cgspace-notes/cgspace-cgcorev2-migration/">CG Core v2 migration page</a></li> <li>Marianne Gadeberg wrote to ask why the item she wanted to map a few days ago still doesn’t appear in the mapped collection <ul> <li>I looked on the item page itself and it lists the collection, but doesn’t appear in the collection list</li> <li>I tried to forceably reindex the collection and the item, but it didn’t seem to work</li> <li>Now I will try a complete Discovery re-index</li> </ul> </li> </ul> <h2 id="2021-03-31">2021-03-31</h2> <ul> <li>The Discovery re-index finished, but <a href="https://hdl.handle.net/10568/110633">the CIP item</a> still does not appear in the GENDER Platform grants collection <ul> <li>The item page itself DOES list the grants collection! WTF</li> <li>I sent a message to the dspace-tech mailing list to see if someone can comment</li> <li>I even tried unmapping and re-mapping, but it doesn’t change anything: the item still doesn’t appear in the collection, but I can see that it is mapped</li> </ul> </li> <li>I signed up for a SHERPA API key so I can try to write something to get journal names from ISSN <ul> <li>This code seems to get a journal title, though I only tried it with a few ISSNs:</li> </ul> </li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> requests </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span>query_params <span style="color:#f92672">=</span> {<span style="color:#e6db74">'item-type'</span>: <span style="color:#e6db74">'publication'</span>, <span style="color:#e6db74">'format'</span>: <span style="color:#e6db74">'Json'</span>, <span style="color:#e6db74">'limit'</span>: <span style="color:#ae81ff">10</span>, <span style="color:#e6db74">'offset'</span>: <span style="color:#ae81ff">0</span>, <span style="color:#e6db74">'api-key'</span>: <span style="color:#e6db74">'blahhhahahah'</span>, <span style="color:#e6db74">'filter'</span>: <span style="color:#e6db74">'[["issn","equals","0011-183X"]]'</span>} </span></span><span style="display:flex;"><span>r <span style="color:#f92672">=</span> requests<span style="color:#f92672">.</span>get(<span style="color:#e6db74">'https://v2.sherpa.ac.uk/cgi/retrieve'</span>) </span></span><span style="display:flex;"><span><span style="color:#66d9ef">if</span> r<span style="color:#f92672">.</span>status_code <span style="color:#f92672">and</span> len(r<span style="color:#f92672">.</span>json()[<span style="color:#e6db74">'items'</span>]) <span style="color:#f92672">></span> <span style="color:#ae81ff">0</span>: </span></span><span style="display:flex;"><span> r<span style="color:#f92672">.</span>json()[<span style="color:#e6db74">'items'</span>][<span style="color:#ae81ff">0</span>][<span style="color:#e6db74">'title'</span>][<span style="color:#ae81ff">0</span>][<span style="color:#e6db74">'title'</span>] </span></span></code></pre></div><ul> <li>I exported a list of all our ISSNs from CGSpace:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>localhost/dspace63= > \COPY (SELECT DISTINCT text_value FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=253) to /tmp/2021-03-31-issns.csv; </span></span><span style="display:flex;"><span>COPY 3081 </span></span></code></pre></div><ul> <li>I wrote a script to check the ISSNs against Crossref’s API: <code>crossref-issn-lookup.py</code> <ul> <li>I suspect Crossref might have better data actually…</li> </ul> </li> </ul> <!-- raw HTML omitted --> </article> </div> <!-- /.blog-main --> <aside class="col-sm-3 ml-auto blog-sidebar"> <section class="sidebar-module"> <h4>Recent Posts</h4> <ol class="list-unstyled"> <li><a href="/cgspace-notes/2022-11/">November, 2022</a></li> <li><a href="/cgspace-notes/2022-10/">October, 2022</a></li> <li><a href="/cgspace-notes/2022-09/">September, 2022</a></li> <li><a href="/cgspace-notes/2022-08/">August, 2022</a></li> <li><a href="/cgspace-notes/2022-07/">July, 2022</a></li> </ol> </section> <section class="sidebar-module"> <h4>Links</h4> <ol class="list-unstyled"> <li><a href="https://cgspace.cgiar.org">CGSpace</a></li> <li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li> <li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li> </ol> </section> </aside> </div> <!-- /.row --> </div> <!-- /.container --> <footer class="blog-footer"> <p dir="auto"> Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>. </p> <p> <a href="#">Back to top</a> </p> </footer> </body> </html>