mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-12-27 23:44:30 +01:00
438 lines
18 KiB
HTML
438 lines
18 KiB
HTML
<!DOCTYPE html>
|
|
<html lang="en" >
|
|
|
|
<head>
|
|
<meta charset="utf-8">
|
|
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
|
|
|
|
|
|
<meta property="og:title" content="March, 2021" />
|
|
<meta property="og:description" content="2021-03-01
|
|
|
|
Discuss some OpenRXV issues with Abdullah from CodeObia
|
|
|
|
He’s trying to work on the DSpace 6+ metadata schema autoimport using the DSpace 6+ REST API
|
|
Also, we found some issues building and running OpenRXV currently due to ecosystem shift in the Node.js dependencies
|
|
|
|
|
|
" />
|
|
<meta property="og:type" content="article" />
|
|
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2021-03/" />
|
|
<meta property="article:published_time" content="2021-03-01T10:13:54+02:00" />
|
|
<meta property="article:modified_time" content="2021-03-06T13:35:20+02:00" />
|
|
|
|
|
|
|
|
<meta name="twitter:card" content="summary"/>
|
|
<meta name="twitter:title" content="March, 2021"/>
|
|
<meta name="twitter:description" content="2021-03-01
|
|
|
|
Discuss some OpenRXV issues with Abdullah from CodeObia
|
|
|
|
He’s trying to work on the DSpace 6+ metadata schema autoimport using the DSpace 6+ REST API
|
|
Also, we found some issues building and running OpenRXV currently due to ecosystem shift in the Node.js dependencies
|
|
|
|
|
|
"/>
|
|
<meta name="generator" content="Hugo 0.81.0" />
|
|
|
|
|
|
|
|
<script type="application/ld+json">
|
|
{
|
|
"@context": "http://schema.org",
|
|
"@type": "BlogPosting",
|
|
"headline": "March, 2021",
|
|
"url": "https://alanorth.github.io/cgspace-notes/2021-03/",
|
|
"wordCount": "1306",
|
|
"datePublished": "2021-03-01T10:13:54+02:00",
|
|
"dateModified": "2021-03-06T13:35:20+02:00",
|
|
"author": {
|
|
"@type": "Person",
|
|
"name": "Alan Orth"
|
|
},
|
|
"keywords": "Notes"
|
|
}
|
|
</script>
|
|
|
|
|
|
|
|
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2021-03/">
|
|
|
|
<title>March, 2021 | CGSpace Notes</title>
|
|
|
|
|
|
<!-- combined, minified CSS -->
|
|
|
|
<link href="https://alanorth.github.io/cgspace-notes/css/style.beb8012edc08ba10be012f079d618dc243812267efe62e11f22fe49618f976a4.css" rel="stylesheet" integrity="sha256-vrgBLtwIuhC+AS8HnWGNwkOBImfv5i4R8i/klhj5dqQ=" crossorigin="anonymous">
|
|
|
|
|
|
<!-- minified Font Awesome for SVG icons -->
|
|
|
|
<script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.ffbfea088a9a1666ec65c3a8cb4906e2a0e4f92dc70dbbf400a125ad2422123a.js" integrity="sha256-/7/qCIqaFmbsZcOoy0kG4qDk+S3HDbv0AKElrSQiEjo=" crossorigin="anonymous"></script>
|
|
|
|
<!-- RSS 2.0 feed -->
|
|
|
|
|
|
|
|
|
|
</head>
|
|
|
|
<body>
|
|
|
|
|
|
<div class="blog-masthead">
|
|
<div class="container">
|
|
<nav class="nav blog-nav">
|
|
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
|
|
</nav>
|
|
</div>
|
|
</div>
|
|
|
|
|
|
|
|
|
|
<header class="blog-header">
|
|
<div class="container">
|
|
<h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
|
|
<p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
|
|
</div>
|
|
</header>
|
|
|
|
|
|
|
|
|
|
<div class="container">
|
|
<div class="row">
|
|
<div class="col-sm-8 blog-main">
|
|
|
|
|
|
|
|
|
|
<article class="blog-post">
|
|
<header>
|
|
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2021-03/">March, 2021</a></h2>
|
|
<p class="blog-post-meta">
|
|
<time datetime="2021-03-01T10:13:54+02:00">Mon Mar 01, 2021</time>
|
|
in
|
|
<span class="fas fa-folder" aria-hidden="true"></span> <a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
|
|
|
|
|
|
</p>
|
|
</header>
|
|
<h2 id="2021-03-01">2021-03-01</h2>
|
|
<ul>
|
|
<li>Discuss some OpenRXV issues with Abdullah from CodeObia
|
|
<ul>
|
|
<li>He’s trying to work on the DSpace 6+ metadata schema autoimport using the DSpace 6+ REST API</li>
|
|
<li>Also, we found some issues building and running OpenRXV currently due to ecosystem shift in the Node.js dependencies</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
<h2 id="2021-03-02">2021-03-02</h2>
|
|
<ul>
|
|
<li>I fixed three build and runtime issues in OpenRXV:
|
|
<ul>
|
|
<li><a href="https://github.com/ilri/OpenRXV/pull/80">fix highcharts-angular and ngx-tour-core build</a></li>
|
|
<li><a href="https://github.com/ilri/OpenRXV/pull/82">frontend/package.json: Pin @types/ramda at 0.27.34</a></li>
|
|
</ul>
|
|
</li>
|
|
<li>Then I merged a few fixes that Abdullah had worked on last week</li>
|
|
</ul>
|
|
<h2 id="2021-03-03">2021-03-03</h2>
|
|
<ul>
|
|
<li>I <a href="https://github.com/ilri/OpenRXV/issues/83">fixed another frontend build warning on OpenRXV</a></li>
|
|
<li>Then I <a href="https://github.com/ilri/OpenRXV/pull/84">updated the frontend container to use Node.js 12 and Ubuntu 20.04</a></li>
|
|
<li>Also, I <a href="https://github.com/ilri/OpenRXV/pull/85">added a GitHub Actions workflow to build the frontend</a></li>
|
|
<li>I did some testing of Abdullah’s patch for the values mapping search on OpenRXV
|
|
<ul>
|
|
<li>It still doesn’t work with multi-word values, so I recorded a video with wf-recorder and uploaded it to <a href="https://github.com/ilri/OpenRXV/issues/43">the issue</a> for him to investigate</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
<h2 id="2021-03-04">2021-03-04</h2>
|
|
<ul>
|
|
<li>Peter is having issues with the workflow since yesterday
|
|
<ul>
|
|
<li>I looked at the Munin stats and see a high number of database locks since yesterday</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
<p><img src="/cgspace-notes/2021/03/postgres_locks_ALL-week.png" alt="PostgreSQL locks week">
|
|
<img src="/cgspace-notes/2021/03/postgres_connections_cgspace-week.png" alt="PostgreSQL connections week"></p>
|
|
<ul>
|
|
<li>I looked at the number of connections in PostgreSQL and it’s definitely high again:</li>
|
|
</ul>
|
|
<pre><code class="language-console" data-lang="console">$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
|
|
1020
|
|
</code></pre><ul>
|
|
<li>I reported it to Atmire to take a look, on the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=851">same issue</a> we had been tracking this before</li>
|
|
<li>Abenet asked me to add a new ORCID for ILRI staff member Zoe Campbell</li>
|
|
<li>I added it to the controlled vocabulary and then tagged her existing items on CGSpace using my <code>add-orcid-identifier.py</code> script:</li>
|
|
</ul>
|
|
<pre><code class="language-console" data-lang="console">$ cat 2021-03-04-add-zoe-campbell-orcid.csv
|
|
dc.contributor.author,cg.creator.identifier
|
|
"Campbell, Zoë","Zoe Campbell: 0000-0002-4759-9976"
|
|
"Campbell, Zoe A.","Zoe Campbell: 0000-0002-4759-9976"
|
|
$ ./ilri/add-orcid-identifiers-csv.py -i 2021-03-04-add-zoe-campbell-orcid.csv -db dspace -u dspace -p 'fuuu'
|
|
</code></pre><ul>
|
|
<li>I still need to do cleanup on the journal articles metadata
|
|
<ul>
|
|
<li>Peter sent me some cleanups but I can’t use them in the search/replace format he gave</li>
|
|
<li>I think it’s better to export the metadata values with IDs and import cleaned up ones as CSV</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
<pre><code class="language-console" data-lang="console">localhost/dspace63= > \COPY (SELECT dspace_object_id AS id, text_value as "cg.journal" FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=251) to /tmp/2021-02-24-journals.csv WITH CSV HEADER;
|
|
COPY 32087
|
|
</code></pre><ul>
|
|
<li>I used OpenRefine to remove all journal values that didn’t have one of these values: ; ( )
|
|
<ul>
|
|
<li>Then I cloned the <code>cg.journal</code> field to <code>cg.volume</code> and <code>cg.issue</code></li>
|
|
<li>I used some GREL expressions like these to extract the journal name, volume, and issue:</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
<pre><code class="language-console" data-lang="console">value.partition(';')[0].trim() # to get journal names
|
|
value.partition(/[0-9]+\([0-9]+\)/)[1].replace(/^(\d+)\(\d+\)/,"$1") # to get journal volumes
|
|
value.partition(/[0-9]+\([0-9]+\)/)[1].replace(/^\d+\((\d+)\)/,"$1") # to get journal issues
|
|
</code></pre><ul>
|
|
<li>Then I uploaded the changes to CGSpace using <code>dspace metadata-import</code></li>
|
|
<li>Margarita from CCAFS was asking about an error deleting some items that were showing up in Google and should have been private
|
|
<ul>
|
|
<li>The error was “Authorization denied for action OBSOLETE (DELETE) on BITSTREAM:bd157345-448e …”</li>
|
|
<li>I searched the DSpace issue tracker and found several issues reporting this:
|
|
<ul>
|
|
<li><a href="https://jira.lyrasis.org/browse/DS-3985">DS-3985 Delete item fails</a></li>
|
|
<li><a href="https://jira.lyrasis.org/browse/DS-4004">DS-4004 Authorization denied Exception when trying to delete permanently an item, collection or community as a non-Admin user</a></li>
|
|
<li><a href="https://jira.lyrasis.org/browse/DS-4297">DS-4297 Authorization error when trying to delete item by submitter/administrator</a></li>
|
|
</ul>
|
|
</li>
|
|
<li>The issue is apparently with non-admin users who are in the admin and submit groups of the owning collection…</li>
|
|
<li>In this case the item was uploaded to the CCAFS Reports collection, and Margarita is a non-admin user who is a member of the collection’s admin and submit groups, exactly as the issue described</li>
|
|
<li>I added a comment about our issue to <a href="https://jira.lyrasis.org/browse/DS-4297">DS-4297</a></li>
|
|
</ul>
|
|
</li>
|
|
<li>Yesterday Abenet added me to a WLE collection approver/editer steps so we can try to figure out why Niroshini is having issues adding metadata to Udana’s submissions
|
|
<ul>
|
|
<li>I edited Udana’s submission to CGSpace:
|
|
<ul>
|
|
<li>corrected the title</li>
|
|
<li>added language English</li>
|
|
<li>changed the link to the external item page instead of PDF</li>
|
|
<li>added SDGs from the external item page</li>
|
|
<li>added AGROVOC subjects from the external item page</li>
|
|
<li>added pagination (extent)</li>
|
|
<li>changed the license to “other” because CC-BY-NC-ND is not printed anywhere in the PDF or external item page</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
<h2 id="2021-03-05">2021-03-05</h2>
|
|
<ul>
|
|
<li>I migrated the Docker bind mount for the AReS Elasticsearch container to a Docker volume:</li>
|
|
</ul>
|
|
<pre><code class="language-console" data-lang="console">$ docker-compose -f docker/docker-compose.yml down
|
|
$ docker volume create docker_esData_7
|
|
$ docker container create --name es_dummy -v docker_esData_7:/usr/share/elasticsearch/data:rw elasticsearch:7.6.2
|
|
$ docker cp docker/esData_7/nodes es_dummy:/usr/share/elasticsearch/data
|
|
$ docker rm es_dummy
|
|
# edit docker/docker-compose.yml to switch from bind mount to volume
|
|
$ docker-compose -f docker/docker-compose.yml up -d
|
|
</code></pre><ul>
|
|
<li>The trick is that when you create a volume like “myvolume” from a <code>docker-compose.yml</code> file, Docker will create it with the name “docker_myvolume”
|
|
<ul>
|
|
<li>If you create it manually on the command line with <code>docker volume create myvolume</code> then the name is literally “myvolume”</li>
|
|
</ul>
|
|
</li>
|
|
<li>I still need to make the changes to git master and add these notes to the pull request so Moayad and others can benefit</li>
|
|
<li>Delete the <code>openrxv-items-temp</code> index to test a fresh harvesting:</li>
|
|
</ul>
|
|
<pre><code class="language-console" data-lang="console">$ curl -XDELETE 'http://localhost:9200/openrxv-items-temp'
|
|
</code></pre><h2 id="2021-03-05-1">2021-03-05</h2>
|
|
<ul>
|
|
<li>Check the results of the AReS harvesting from last night:</li>
|
|
</ul>
|
|
<pre><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*&pretty'
|
|
{
|
|
"count" : 101761,
|
|
"_shards" : {
|
|
"total" : 1,
|
|
"successful" : 1,
|
|
"skipped" : 0,
|
|
"failed" : 0
|
|
}
|
|
}
|
|
</code></pre><ul>
|
|
<li>Set the current items index to read only and make a backup:</li>
|
|
</ul>
|
|
<pre><code class="language-console" data-lang="console">$ curl -X PUT "localhost:9200/openrxv-items/_settings" -H 'Content-Type: application/json' -d' {"settings": {"index.blocks.write":true}}'
|
|
$ curl -s -X POST http://localhost:9200/openrxv-items/_clone/openrxv-items-2021-03-05
|
|
</code></pre><ul>
|
|
<li>Delete the current items index and clone the temp one to it:</li>
|
|
</ul>
|
|
<pre><code class="language-console" data-lang="console">$ curl -XDELETE 'http://localhost:9200/openrxv-items'
|
|
$ curl -X PUT "localhost:9200/openrxv-items-temp/_settings" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": true}}'
|
|
$ curl -s -X POST http://localhost:9200/openrxv-items-temp/_clone/openrxv-items
|
|
</code></pre><ul>
|
|
<li>Then delete the temp and backup:</li>
|
|
</ul>
|
|
<pre><code class="language-console" data-lang="console">$ curl -XDELETE 'http://localhost:9200/openrxv-items-temp'
|
|
{"acknowledged":true}%
|
|
$ curl -XDELETE 'http://localhost:9200/openrxv-items-2021-03-05'
|
|
</code></pre><ul>
|
|
<li>I made some pull requests to OpenRXV:
|
|
<ul>
|
|
<li><a href="https://github.com/ilri/OpenRXV/pull/86">docker/docker-compose.yml: Use docker volumes</a></li>
|
|
<li><a href="https://github.com/ilri/OpenRXV/pull/87">docker/docker-compose.yml: Pin Redis to version 5</a></li>
|
|
</ul>
|
|
</li>
|
|
<li>I deployed the latest changes from the last few days on AReS production</li>
|
|
</ul>
|
|
<h2 id="2021-03-07">2021-03-07</h2>
|
|
<ul>
|
|
<li>I realized there is something wrong with the Elasticsearch indexes on AReS
|
|
<ul>
|
|
<li>On a new test environment I see <code>openrxv-items</code> is correctly an alias of <code>openrxv-items-final</code>:</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
<pre><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/_alias/' | python -m json.tool | less
|
|
...
|
|
"openrxv-items-final": {
|
|
"aliases": {
|
|
"openrxv-items": {}
|
|
}
|
|
},
|
|
</code></pre><ul>
|
|
<li>But on AReS production <code>openrxv-items</code> has somehow become an index:</li>
|
|
</ul>
|
|
<pre><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/_alias/' | python -m json.tool | less
|
|
...
|
|
"openrxv-items": {
|
|
"aliases": {}
|
|
},
|
|
"openrxv-items-final": {
|
|
"aliases": {}
|
|
},
|
|
"openrxv-items-temp": {
|
|
"aliases": {}
|
|
},
|
|
</code></pre><ul>
|
|
<li>I fixed the issue on production by cloning the <code>openrxv-items</code> index to <code>openrxv-items-final</code>, deleting <code>openrxv-items</code>, and then re-creating it as an alias:</li>
|
|
</ul>
|
|
<pre><code class="language-console" data-lang="console">$ curl -X PUT "localhost:9200/openrxv-items/_settings" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": true}}'
|
|
$ curl -s -X POST http://localhost:9200/openrxv-items/_clone/openrxv-items-2021-03-07
|
|
$ curl -XDELETE 'http://localhost:9200/openrxv-items-final'
|
|
$ curl -s -X POST http://localhost:9200/openrxv-items/_clone/openrxv-items-final
|
|
$ curl -XDELETE 'http://localhost:9200/openrxv-items'
|
|
$ curl -s -X POST 'http://localhost:9200/_aliases' -H 'Content-Type: application/json' -d'{"actions" : [{"add" : { "index" : "openrxv-items-final", "alias" : "openrxv-items"}}]}'
|
|
</code></pre><ul>
|
|
<li>Delete backups and remove read-only mode on <code>openrxv-items</code>:</li>
|
|
</ul>
|
|
<pre><code class="language-console" data-lang="console">$ curl -XDELETE 'http://localhost:9200/openrxv-items-2021-03-07'
|
|
$ curl -X PUT "localhost:9200/openrxv-items/_settings" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": false}}'
|
|
</code></pre><ul>
|
|
<li>Linode sent alerts about the CPU usage on CGSpace yesterday and the day before
|
|
<ul>
|
|
<li>Looking in the logs I see a few IPs making heavy usage on the REST API and XMLUI:</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
<pre><code class="language-console" data-lang="console"># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz | grep -E '0[56]/Mar/2021' | goaccess --log-format=COMBINED -
|
|
</code></pre><ul>
|
|
<li>I see the usual IPs for CCAFS and ILRI importer bots, but also <code>143.233.242.132</code> which appears to be for GARDIAN:</li>
|
|
</ul>
|
|
<pre><code class="language-console" data-lang="console"># zgrep '143.233.242.132' /var/log/nginx/access.log.1 | grep -c Delphi
|
|
6237
|
|
# zgrep '143.233.242.132' /var/log/nginx/access.log.1 | grep -c -v Delphi
|
|
6418
|
|
</code></pre><ul>
|
|
<li>They seem to make requests twice, once with the Delphi user agent that we know and already mark as a bot, and once with a “normal” user agent
|
|
<ul>
|
|
<li>Looking in Solr I see they have been using this IP for awhile, as they have 100,000 hits going back into 2020</li>
|
|
<li>I will add this IP to the list of bots in nginx and purge it from Solr with my <code>check-spider-ip-hits.sh</code> script</li>
|
|
</ul>
|
|
</li>
|
|
<li>I made a few changes to OpenRXV:
|
|
<ul>
|
|
<li><a href="https://github.com/ilri/OpenRXV/issues/89">Migrated away from links to use networks</a></li>
|
|
<li><a href="https://github.com/ilri/OpenRXV/issues/68">Converted the backend container to use a custom image that includes <code>unoconv</code></a> so we don’t have to manually install it anymore</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
<!-- raw HTML omitted -->
|
|
|
|
|
|
|
|
|
|
|
|
</article>
|
|
|
|
|
|
|
|
</div> <!-- /.blog-main -->
|
|
|
|
<aside class="col-sm-3 ml-auto blog-sidebar">
|
|
|
|
|
|
|
|
<section class="sidebar-module">
|
|
<h4>Recent Posts</h4>
|
|
<ol class="list-unstyled">
|
|
|
|
|
|
<li><a href="/cgspace-notes/2021-03/">March, 2021</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2021-02/">February, 2021</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2021-01/">January, 2021</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2020-12/">December, 2020</a></li>
|
|
|
|
<li><a href="/cgspace-notes/cgspace-dspace6-upgrade/">CGSpace DSpace 6 Upgrade</a></li>
|
|
|
|
</ol>
|
|
</section>
|
|
|
|
|
|
|
|
|
|
<section class="sidebar-module">
|
|
<h4>Links</h4>
|
|
<ol class="list-unstyled">
|
|
|
|
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
|
|
|
|
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
|
|
|
|
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
|
|
|
|
</ol>
|
|
</section>
|
|
|
|
</aside>
|
|
|
|
|
|
</div> <!-- /.row -->
|
|
</div> <!-- /.container -->
|
|
|
|
|
|
|
|
<footer class="blog-footer">
|
|
<p dir="auto">
|
|
|
|
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
|
|
|
|
</p>
|
|
<p>
|
|
<a href="#">Back to top</a>
|
|
</p>
|
|
</footer>
|
|
|
|
|
|
</body>
|
|
|
|
</html>
|