Add notes for 2021-11-08

This commit is contained in:
2021-11-09 06:29:52 +02:00
parent b3df4ff58f
commit 9afe5c13f9
110 changed files with 1827 additions and 1737 deletions

View File

@ -48,7 +48,7 @@ The syntax Moayad showed me last month doesn’t seem to honor the search qu
"/>
<meta name="generator" content="Hugo 0.88.1" />
<meta name="generator" content="Hugo 0.89.2" />
@ -154,9 +154,9 @@ The syntax Moayad showed me last month doesn&rsquo;t seem to honor the search qu
<ul>
<li>Update Docker images on AReS server (linode20) and rebuild OpenRXV:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ docker images | grep -v ^REPO | sed 's/ \+/:/g' | cut -d: -f1,2 | xargs -L1 docker pull
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ docker images | grep -v ^REPO | sed <span style="color:#e6db74">&#39;s/ \+/:/g&#39;</span> | cut -d: -f1,2 | xargs -L1 docker pull
$ docker-compose build
</code></pre><ul>
</code></pre></div><ul>
<li>Then run system updates and reboot the server
<ul>
<li>After the system came back up I started a fresh re-harvesting</li>
@ -201,8 +201,8 @@ $ docker-compose build
</ul>
</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ vipsthumbnail ARRTB2020ST.pdf -s x600 -o '%s.jpg[Q=85,optimize_coding,strip]'
</code></pre><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ vipsthumbnail ARRTB2020ST.pdf -s x600 -o <span style="color:#e6db74">&#39;%s.jpg[Q=85,optimize_coding,strip]&#39;</span>
</code></pre></div><ul>
<li>Looking at the PDF&rsquo;s metadata I see:
<ul>
<li>Producer: iLovePDF</li>
@ -236,11 +236,11 @@ $ docker-compose build
</ul>
</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ cat 2021-09-15-add-orcids.csv
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ cat 2021-09-15-add-orcids.csv
dc.contributor.author,cg.creator.identifier
&quot;Kotchofa, Pacem&quot;,&quot;Pacem Kotchofa: 0000-0002-1640-8807&quot;
$ ./ilri/add-orcid-identifiers-csv.py -i 2021-09-15-add-orcids.csv -db dspace -u dspace -p 'fuuuu'
</code></pre><ul>
&#34;Kotchofa, Pacem&#34;,&#34;Pacem Kotchofa: 0000-0002-1640-8807&#34;
$ ./ilri/add-orcid-identifiers-csv.py -i 2021-09-15-add-orcids.csv -db dspace -u dspace -p <span style="color:#e6db74">&#39;fuuuu&#39;</span>
</code></pre></div><ul>
<li>Meeting with Leroy Mwanzia and some other Alliance people about depositing to CGSpace via API
<ul>
<li>I gave them some technical information about the CGSpace API and links to the controlled vocabularies and metadata registries we are using</li>
@ -273,24 +273,24 @@ $ ./ilri/add-orcid-identifiers-csv.py -i 2021-09-15-add-orcids.csv -db dspace -u
</ul>
</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ psql -c 'SELECT * FROM pg_stat_activity' | wc -l
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ psql -c <span style="color:#e6db74">&#39;SELECT * FROM pg_stat_activity&#39;</span> | wc -l
63
</code></pre><ul>
</code></pre></div><ul>
<li>Load on the server is under 1.0, and there are only about 1,000 XMLUI sessions, which seems to be normal for this time of day according to Munin</li>
<li>But the DSpace log file shows tons of database issues:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ grep -c &quot;Timeout waiting for idle object&quot; dspace.log.2021-09-17
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ grep -c <span style="color:#e6db74">&#34;Timeout waiting for idle object&#34;</span> dspace.log.2021-09-17
14779
</code></pre><ul>
</code></pre></div><ul>
<li>The earliest one I see is around midnight (now is 2PM):</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">2021-09-17 00:01:49,572 WARN org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ SQL Error: 0, SQLState: null
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">2021-09-17 00:01:49,572 WARN org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ SQL Error: 0, SQLState: null
2021-09-17 00:01:49,572 ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ Cannot get a connection, pool error Timeout waiting for idle object
</code></pre><ul>
</code></pre></div><ul>
<li>But I was definitely logged into the site this morning so there were no issues then&hellip;</li>
<li>It seems that a few errors are normal, but there&rsquo;s obviously something wrong today:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ grep -c &quot;Timeout waiting for idle object&quot; dspace.log.2021-09-*
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ grep -c <span style="color:#e6db74">&#34;Timeout waiting for idle object&#34;</span> dspace.log.2021-09-*
dspace.log.2021-09-01:116
dspace.log.2021-09-02:163
dspace.log.2021-09-03:77
@ -308,7 +308,7 @@ dspace.log.2021-09-14:102
dspace.log.2021-09-15:542
dspace.log.2021-09-16:368
dspace.log.2021-09-17:15235
</code></pre><ul>
</code></pre></div><ul>
<li>I restarted the server and DSpace came up fine&hellip; so it must have been some kind of fluke</li>
<li>Continue working on cleaning up and annotating the metadata registry on CGSpace
<ul>
@ -338,9 +338,9 @@ dspace.log.2021-09-17:15235
</ul>
</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ docker images | grep -v ^REPO | sed 's/ \+/:/g' | cut -d: -f1,2 | xargs -L1 docker pull
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ docker images | grep -v ^REPO | sed <span style="color:#e6db74">&#39;s/ \+/:/g&#39;</span> | cut -d: -f1,2 | xargs -L1 docker pull
$ docker-compose build
</code></pre><h2 id="2021-09-20">2021-09-20</h2>
</code></pre></div><h2 id="2021-09-20">2021-09-20</h2>
<ul>
<li>I synchronized the production CGSpace PostreSQL, Solr, and Assetstore data with DSpace Test</li>
<li>Over the weekend a few users reported that they could not log into CGSpace
@ -349,10 +349,10 @@ $ docker-compose build
</ul>
</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ ldapsearch -x -H ldaps://AZCGNEROOT2.CGIARAD.ORG:636/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;cgspace-ldap-account@cgiarad.org&quot; -W &quot;(sAMAccountName=someaccountnametocheck)&quot;
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ldapsearch -x -H ldaps://AZCGNEROOT2.CGIARAD.ORG:636/ -b <span style="color:#e6db74">&#34;dc=cgiarad,dc=org&#34;</span> -D <span style="color:#e6db74">&#34;cgspace-ldap-account@cgiarad.org&#34;</span> -W <span style="color:#e6db74">&#34;(sAMAccountName=someaccountnametocheck)&#34;</span>
Enter LDAP Password:
ldap_sasl_bind(SIMPLE): Can't contact LDAP server (-1)
</code></pre><ul>
ldap_sasl_bind(SIMPLE): Can&#39;t contact LDAP server (-1)
</code></pre></div><ul>
<li>I sent a message to CGNET to ask about the server settings and see if our IP is still whitelisted
<ul>
<li>It turns out that CGNET created a new Active Directory server (AZCGNEROOT3.cgiarad.org) and decomissioned the old one last week</li>
@ -361,8 +361,8 @@ ldap_sasl_bind(SIMPLE): Can't contact LDAP server (-1)
</li>
<li>Create another test account for Rafael from Bioversity-CIAT to submit some items to DSpace Test:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ dspace user -a -m tip-submit@cgiar.org -g CIAT -s Submit -p 'fuuuuuuuu'
</code></pre><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ dspace user -a -m tip-submit@cgiar.org -g CIAT -s Submit -p <span style="color:#e6db74">&#39;fuuuuuuuu&#39;</span>
</code></pre></div><ul>
<li>I added the account to the Alliance Admins account, which is should allow him to submit to any Alliance collection
<ul>
<li>According to my notes from <a href="/cgspace-notes/2020-10/">2020-10</a> the account must be in the admin group in order to submit via the REST API</li>
@ -371,13 +371,13 @@ ldap_sasl_bind(SIMPLE): Can't contact LDAP server (-1)
<li>Run <code>dspace cleanup -v</code> process on CGSpace to clean up old bitstreams</li>
<li>Export lists of authors, donors, and affiliations for Peter Ballantyne to clean up:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">localhost/dspace63= &gt; \COPY (SELECT DISTINCT text_value as &quot;dc.contributor.author&quot;, count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 3 GROUP BY text_value ORDER BY count DESC) to /tmp/2021-09-20-authors.csv WITH CSV HEADER;
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">localhost/dspace63= &gt; \COPY (SELECT DISTINCT text_value as &#34;dc.contributor.author&#34;, count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 3 GROUP BY text_value ORDER BY count DESC) to /tmp/2021-09-20-authors.csv WITH CSV HEADER;
COPY 80901
localhost/dspace63= &gt; \COPY (SELECT DISTINCT text_value as &quot;cg.contributor.donor&quot;, count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 248 GROUP BY text_value ORDER BY count DESC) to /tmp/2021-09-20-donors.csv WITH CSV HEADER;
localhost/dspace63= &gt; \COPY (SELECT DISTINCT text_value as &#34;cg.contributor.donor&#34;, count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 248 GROUP BY text_value ORDER BY count DESC) to /tmp/2021-09-20-donors.csv WITH CSV HEADER;
COPY 1274
localhost/dspace63= &gt; \COPY (SELECT DISTINCT text_value as &quot;cg.contributor.affiliation&quot;, count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC) to /tmp/2021-09-20-affiliations.csv WITH CSV HEADER;
localhost/dspace63= &gt; \COPY (SELECT DISTINCT text_value as &#34;cg.contributor.affiliation&#34;, count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC) to /tmp/2021-09-20-affiliations.csv WITH CSV HEADER;
COPY 8091
</code></pre><h2 id="2021-09-23">2021-09-23</h2>
</code></pre></div><h2 id="2021-09-23">2021-09-23</h2>
<ul>
<li>Peter sent me back the corrections for the affiliations
<ul>
@ -386,24 +386,24 @@ COPY 8091
</ul>
</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ csv-metadata-quality -i ~/Downloads/2021-09-20-affiliations.csv -o /tmp/affiliations.csv -x cg.contributor.affiliation
$ csvgrep -c 'correct' -m 'DELETE' /tmp/affiliations.csv &gt; /tmp/affiliations-delete.csv
$ csvgrep -c 'correct' -r '^.+$' /tmp/affiliations.csv | csvgrep -i -c 'correct' -m 'DELETE' &gt; /tmp/affiliations-fix.csv
$ ./ilri/fix-metadata-values.py -i /tmp/affiliations-fix.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -t 'correct' -m 211
$ ./ilri/delete-metadata-values.py -i /tmp/affiliations-fix.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -m 211
</code></pre><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ csv-metadata-quality -i ~/Downloads/2021-09-20-affiliations.csv -o /tmp/affiliations.csv -x cg.contributor.affiliation
$ csvgrep -c <span style="color:#e6db74">&#39;correct&#39;</span> -m <span style="color:#e6db74">&#39;DELETE&#39;</span> /tmp/affiliations.csv &gt; /tmp/affiliations-delete.csv
$ csvgrep -c <span style="color:#e6db74">&#39;correct&#39;</span> -r <span style="color:#e6db74">&#39;^.+$&#39;</span> /tmp/affiliations.csv | csvgrep -i -c <span style="color:#e6db74">&#39;correct&#39;</span> -m <span style="color:#e6db74">&#39;DELETE&#39;</span> &gt; /tmp/affiliations-fix.csv
$ ./ilri/fix-metadata-values.py -i /tmp/affiliations-fix.csv -db dspace -u dspace -p <span style="color:#e6db74">&#39;fuuu&#39;</span> -f cg.contributor.affiliation -t <span style="color:#e6db74">&#39;correct&#39;</span> -m <span style="color:#ae81ff">211</span>
$ ./ilri/delete-metadata-values.py -i /tmp/affiliations-fix.csv -db dspace -u dspace -p <span style="color:#e6db74">&#39;fuuu&#39;</span> -f cg.contributor.affiliation -m <span style="color:#ae81ff">211</span>
</code></pre></div><ul>
<li>Then I updated the controlled vocabulary for affiliations by exporting the top 1,000 used terms:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">localhost/dspace63= &gt; \COPY (SELECT DISTINCT text_value as &quot;cg.contributor.affiliation&quot;, count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC LIMIT 1000) to /tmp/2021-09-23-affiliations.csv WITH CSV HEADER;
$ csvcut -c 1 /tmp/2021-09-23-affiliations.csv | sed 1d &gt; /tmp/affiliations.txt
</code></pre><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">localhost/dspace63= &gt; \COPY (SELECT DISTINCT text_value as &#34;cg.contributor.affiliation&#34;, count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC LIMIT 1000) to /tmp/2021-09-23-affiliations.csv WITH CSV HEADER;
$ csvcut -c <span style="color:#ae81ff">1</span> /tmp/2021-09-23-affiliations.csv | sed 1d &gt; /tmp/affiliations.txt
</code></pre></div><ul>
<li>Peter also sent me 310 corrections and 234 deletions for donors so I applied those and updated the controlled vocabularies too</li>
<li>Move some One CGIAR-related collections around the CGSpace hierarchy for Peter Ballantyne</li>
<li>Mohammed Salem asked me for an ID to UUID mapping for CGSpace collections, so I generated one similar to the ID one I sent him in 2020-11:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">localhost/dspace63= &gt; \COPY (SELECT collection_id,uuid FROM collection WHERE collection_id IS NOT NULL) TO /tmp/2021-09-23-collection-id2uuid.csv WITH CSV HEADER;
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">localhost/dspace63= &gt; \COPY (SELECT collection_id,uuid FROM collection WHERE collection_id IS NOT NULL) TO /tmp/2021-09-23-collection-id2uuid.csv WITH CSV HEADER;
COPY 1139
</code></pre><h2 id="2021-09-24">2021-09-24</h2>
</code></pre></div><h2 id="2021-09-24">2021-09-24</h2>
<ul>
<li>Peter and Abenet agreed that we should consider converting more of our UPPER CASE metadata values to Title Case
<ul>
@ -435,33 +435,33 @@ COPY 1139
</ul>
</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">localhost/dspace63= &gt; UPDATE metadatavalue SET text_value=INITCAP(text_value) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=231;
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">localhost/dspace63= &gt; UPDATE metadatavalue SET text_value=INITCAP(text_value) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=231;
UPDATE 2903
localhost/dspace63= &gt; \COPY (SELECT DISTINCT text_value as &quot;cg.coverage.subregion&quot; FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 231) to /tmp/2021-09-24-subregions.txt;
localhost/dspace63= &gt; \COPY (SELECT DISTINCT text_value as &#34;cg.coverage.subregion&#34; FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 231) to /tmp/2021-09-24-subregions.txt;
COPY 1200
</code></pre><ul>
</code></pre></div><ul>
<li>Then I process the list for matches with my <code>subdivision-lookup.py</code> script, and extract only the values that matched:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ ./ilri/subdivision-lookup.py -i /tmp/2021-09-24-subregions.txt -o /tmp/subregions.csv
$ csvgrep -c matched -m 'true' /tmp/subregions.csv | csvcut -c 1 | sed 1d &gt; /tmp/subregions-matched.txt
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ./ilri/subdivision-lookup.py -i /tmp/2021-09-24-subregions.txt -o /tmp/subregions.csv
$ csvgrep -c matched -m <span style="color:#e6db74">&#39;true&#39;</span> /tmp/subregions.csv | csvcut -c <span style="color:#ae81ff">1</span> | sed 1d &gt; /tmp/subregions-matched.txt
$ wc -l /tmp/subregions-matched.txt
81 /tmp/subregions-matched.txt
</code></pre><ul>
</code></pre></div><ul>
<li>Then I updated the controlled vocabulary in the submission forms</li>
<li>I did the same for <code>dcterms.audience</code>, taking special care to a few all-caps values:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">localhost/dspace63= &gt; UPDATE metadatavalue SET text_value=INITCAP(text_value) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=144 AND text_value != 'NGOS' AND text_value != 'CGIAR';
localhost/dspace63= &gt; UPDATE metadatavalue SET text_value='NGOs' WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=144 AND text_value = 'NGOS';
</code></pre><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">localhost/dspace63= &gt; UPDATE metadatavalue SET text_value=INITCAP(text_value) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=144 AND text_value != &#39;NGOS&#39; AND text_value != &#39;CGIAR&#39;;
localhost/dspace63= &gt; UPDATE metadatavalue SET text_value=&#39;NGOs&#39; WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=144 AND text_value = &#39;NGOS&#39;;
</code></pre></div><ul>
<li>Update submission form comment for DOIs because it was still recommending people use the &ldquo;dx.doi.org&rdquo; format even though I batch updated all DOIs to the &ldquo;doi.org&rdquo; format a few times in the last year
<ul>
<li>Then I updated all existing metadata to the new format again:</li>
</ul>
</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">dspace=# UPDATE metadatavalue SET text_value = regexp_replace(text_value, 'https://dx.doi.org', 'https://doi.org') WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 220 AND text_value LIKE 'https://dx.doi.org%';
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">dspace=# UPDATE metadatavalue SET text_value = regexp_replace(text_value, &#39;https://dx.doi.org&#39;, &#39;https://doi.org&#39;) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 220 AND text_value LIKE &#39;https://dx.doi.org%&#39;;
UPDATE 49
</code></pre><h2 id="2021-09-26">2021-09-26</h2>
</code></pre></div><h2 id="2021-09-26">2021-09-26</h2>
<ul>
<li>Mohammed Salem told me last week that MELSpace and WorldFish have been upgraded to DSpace 6 so I updated the repository setup in AReS to use the UUID field instead of IDs
<ul>
@ -489,26 +489,26 @@ UPDATE 49
</ul>
</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ csvcut -c 'id,collection,dc.title[en_US]' ~/Downloads/10568-106990.csv &gt; /tmp/2021-09-28-alliance-reports.csv
</code></pre><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ csvcut -c <span style="color:#e6db74">&#39;id,collection,dc.title[en_US]&#39;</span> ~/Downloads/10568-106990.csv &gt; /tmp/2021-09-28-alliance-reports.csv
</code></pre></div><ul>
<li>She sent it back fairly quickly with a new column marked &ldquo;Move&rdquo; so I extracted those items that matched and set them to the new owning collection:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ csvgrep -c Move -m 'Yes' ~/Downloads/2021_28_09_alliance_reports_csv.csv | csvcut -c 1,2 | sed 's_10568/106990_10568/111506_' &gt; /tmp/alliance-move.csv
</code></pre><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ csvgrep -c Move -m <span style="color:#e6db74">&#39;Yes&#39;</span> ~/Downloads/2021_28_09_alliance_reports_csv.csv | csvcut -c 1,2 | sed <span style="color:#e6db74">&#39;s_10568/106990_10568/111506_&#39;</span> &gt; /tmp/alliance-move.csv
</code></pre></div><ul>
<li>Maria from the Alliance emailed us to say that approving submissions was slow on CGSpace
<ul>
<li>I looked at the PostgreSQL activity and it seems low:</li>
</ul>
</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">postgres@linode18:~$ psql -c 'SELECT * FROM pg_stat_activity' | wc -l
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">postgres@linode18:~$ psql -c &#39;SELECT * FROM pg_stat_activity&#39; | wc -l
59
</code></pre><ul>
</code></pre></div><ul>
<li>Locks look high though:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">postgres@linode18:~$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | sort | uniq -c | wc -l
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">postgres@linode18:~$ psql -c &#39;SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;&#39; | sort | uniq -c | wc -l
1154
</code></pre><ul>
</code></pre></div><ul>
<li>Indeed it seems something started causing locks to increase yesterday:</li>
</ul>
<p><img src="/cgspace-notes/2021/09/postgres_locks_ALL-week.png" alt="PostgreSQL locks week"></p>
@ -520,9 +520,9 @@ UPDATE 49
<li>The number of DSpace sessions is normal, hovering around 1,000&hellip;</li>
<li>Looking closer at the PostgreSQL activity log, I see the locks are all held by the <code>dspaceCli</code> user&hellip; which seem weird:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">postgres@linode18:~$ psql -c &quot;SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid WHERE application_name='dspaceCli';&quot; | wc -l
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">postgres@linode18:~$ psql -c &#34;SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid WHERE application_name=&#39;dspaceCli&#39;;&#34; | wc -l
1096
</code></pre><ul>
</code></pre></div><ul>
<li>Now I&rsquo;m wondering why there are no connections from <code>dspaceApi</code> or <code>dspaceWeb</code>. Could it be that our Tomcat JDBC pooling via JNDI isn&rsquo;t working?
<ul>
<li>I see the same thing on DSpace Test hmmmm</li>
@ -536,14 +536,14 @@ UPDATE 49
<ul>
<li>Export a list of ILRI subjects from CGSpace to validate against AGROVOC for Peter and Abenet:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">localhost/dspace63= &gt; \COPY (SELECT DISTINCT text_value FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 203) to /tmp/2021-09-29-ilri-subject.txt;
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">localhost/dspace63= &gt; \COPY (SELECT DISTINCT text_value FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 203) to /tmp/2021-09-29-ilri-subject.txt;
COPY 149
</code></pre><ul>
</code></pre></div><ul>
<li>Then validate and format the matches:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ ./ilri/agrovoc-lookup.py -i /tmp/2021-09-29-ilri-subject.txt -o /tmp/2021-09-29-ilri-subjects.csv -d
$ csvcut -c subject,'match type' /tmp/2021-09-29-ilri-subjects.csv | sed -e 's/match type/matched/' -e 's/\(alt\|pref\)Label/yes/' &gt; /tmp/2021-09-29-ilri-subjects2.csv
</code></pre><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ./ilri/agrovoc-lookup.py -i /tmp/2021-09-29-ilri-subject.txt -o /tmp/2021-09-29-ilri-subjects.csv -d
$ csvcut -c subject,<span style="color:#e6db74">&#39;match type&#39;</span> /tmp/2021-09-29-ilri-subjects.csv | sed -e <span style="color:#e6db74">&#39;s/match type/matched/&#39;</span> -e <span style="color:#e6db74">&#39;s/\(alt\|pref\)Label/yes/&#39;</span> &gt; /tmp/2021-09-29-ilri-subjects2.csv
</code></pre></div><ul>
<li>I talked to Salem about depositing from MEL to CGSpace
<ul>
<li>He mentioned that the one issue is that when you deposit to a workflow you don&rsquo;t get a Handle or any kind of identifier back!</li>