mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2021-11-08
This commit is contained in:
@ -48,7 +48,7 @@ The syntax Moayad showed me last month doesn’t seem to honor the search qu
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.88.1" />
|
||||
<meta name="generator" content="Hugo 0.89.2" />
|
||||
|
||||
|
||||
|
||||
@ -154,9 +154,9 @@ The syntax Moayad showed me last month doesn’t seem to honor the search qu
|
||||
<ul>
|
||||
<li>Update Docker images on AReS server (linode20) and rebuild OpenRXV:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ docker images | grep -v ^REPO | sed 's/ \+/:/g' | cut -d: -f1,2 | xargs -L1 docker pull
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ docker images | grep -v ^REPO | sed <span style="color:#e6db74">'s/ \+/:/g'</span> | cut -d: -f1,2 | xargs -L1 docker pull
|
||||
$ docker-compose build
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>Then run system updates and reboot the server
|
||||
<ul>
|
||||
<li>After the system came back up I started a fresh re-harvesting</li>
|
||||
@ -201,8 +201,8 @@ $ docker-compose build
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ vipsthumbnail ARRTB2020ST.pdf -s x600 -o '%s.jpg[Q=85,optimize_coding,strip]'
|
||||
</code></pre><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ vipsthumbnail ARRTB2020ST.pdf -s x600 -o <span style="color:#e6db74">'%s.jpg[Q=85,optimize_coding,strip]'</span>
|
||||
</code></pre></div><ul>
|
||||
<li>Looking at the PDF’s metadata I see:
|
||||
<ul>
|
||||
<li>Producer: iLovePDF</li>
|
||||
@ -236,11 +236,11 @@ $ docker-compose build
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ cat 2021-09-15-add-orcids.csv
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ cat 2021-09-15-add-orcids.csv
|
||||
dc.contributor.author,cg.creator.identifier
|
||||
"Kotchofa, Pacem","Pacem Kotchofa: 0000-0002-1640-8807"
|
||||
$ ./ilri/add-orcid-identifiers-csv.py -i 2021-09-15-add-orcids.csv -db dspace -u dspace -p 'fuuuu'
|
||||
</code></pre><ul>
|
||||
"Kotchofa, Pacem","Pacem Kotchofa: 0000-0002-1640-8807"
|
||||
$ ./ilri/add-orcid-identifiers-csv.py -i 2021-09-15-add-orcids.csv -db dspace -u dspace -p <span style="color:#e6db74">'fuuuu'</span>
|
||||
</code></pre></div><ul>
|
||||
<li>Meeting with Leroy Mwanzia and some other Alliance people about depositing to CGSpace via API
|
||||
<ul>
|
||||
<li>I gave them some technical information about the CGSpace API and links to the controlled vocabularies and metadata registries we are using</li>
|
||||
@ -273,24 +273,24 @@ $ ./ilri/add-orcid-identifiers-csv.py -i 2021-09-15-add-orcids.csv -db dspace -u
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ psql -c 'SELECT * FROM pg_stat_activity' | wc -l
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ psql -c <span style="color:#e6db74">'SELECT * FROM pg_stat_activity'</span> | wc -l
|
||||
63
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>Load on the server is under 1.0, and there are only about 1,000 XMLUI sessions, which seems to be normal for this time of day according to Munin</li>
|
||||
<li>But the DSpace log file shows tons of database issues:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ grep -c "Timeout waiting for idle object" dspace.log.2021-09-17
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ grep -c <span style="color:#e6db74">"Timeout waiting for idle object"</span> dspace.log.2021-09-17
|
||||
14779
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>The earliest one I see is around midnight (now is 2PM):</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">2021-09-17 00:01:49,572 WARN org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ SQL Error: 0, SQLState: null
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">2021-09-17 00:01:49,572 WARN org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ SQL Error: 0, SQLState: null
|
||||
2021-09-17 00:01:49,572 ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ Cannot get a connection, pool error Timeout waiting for idle object
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>But I was definitely logged into the site this morning so there were no issues then…</li>
|
||||
<li>It seems that a few errors are normal, but there’s obviously something wrong today:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ grep -c "Timeout waiting for idle object" dspace.log.2021-09-*
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ grep -c <span style="color:#e6db74">"Timeout waiting for idle object"</span> dspace.log.2021-09-*
|
||||
dspace.log.2021-09-01:116
|
||||
dspace.log.2021-09-02:163
|
||||
dspace.log.2021-09-03:77
|
||||
@ -308,7 +308,7 @@ dspace.log.2021-09-14:102
|
||||
dspace.log.2021-09-15:542
|
||||
dspace.log.2021-09-16:368
|
||||
dspace.log.2021-09-17:15235
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>I restarted the server and DSpace came up fine… so it must have been some kind of fluke</li>
|
||||
<li>Continue working on cleaning up and annotating the metadata registry on CGSpace
|
||||
<ul>
|
||||
@ -338,9 +338,9 @@ dspace.log.2021-09-17:15235
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ docker images | grep -v ^REPO | sed 's/ \+/:/g' | cut -d: -f1,2 | xargs -L1 docker pull
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ docker images | grep -v ^REPO | sed <span style="color:#e6db74">'s/ \+/:/g'</span> | cut -d: -f1,2 | xargs -L1 docker pull
|
||||
$ docker-compose build
|
||||
</code></pre><h2 id="2021-09-20">2021-09-20</h2>
|
||||
</code></pre></div><h2 id="2021-09-20">2021-09-20</h2>
|
||||
<ul>
|
||||
<li>I synchronized the production CGSpace PostreSQL, Solr, and Assetstore data with DSpace Test</li>
|
||||
<li>Over the weekend a few users reported that they could not log into CGSpace
|
||||
@ -349,10 +349,10 @@ $ docker-compose build
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ ldapsearch -x -H ldaps://AZCGNEROOT2.CGIARAD.ORG:636/ -b "dc=cgiarad,dc=org" -D "cgspace-ldap-account@cgiarad.org" -W "(sAMAccountName=someaccountnametocheck)"
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ldapsearch -x -H ldaps://AZCGNEROOT2.CGIARAD.ORG:636/ -b <span style="color:#e6db74">"dc=cgiarad,dc=org"</span> -D <span style="color:#e6db74">"cgspace-ldap-account@cgiarad.org"</span> -W <span style="color:#e6db74">"(sAMAccountName=someaccountnametocheck)"</span>
|
||||
Enter LDAP Password:
|
||||
ldap_sasl_bind(SIMPLE): Can't contact LDAP server (-1)
|
||||
</code></pre><ul>
|
||||
ldap_sasl_bind(SIMPLE): Can't contact LDAP server (-1)
|
||||
</code></pre></div><ul>
|
||||
<li>I sent a message to CGNET to ask about the server settings and see if our IP is still whitelisted
|
||||
<ul>
|
||||
<li>It turns out that CGNET created a new Active Directory server (AZCGNEROOT3.cgiarad.org) and decomissioned the old one last week</li>
|
||||
@ -361,8 +361,8 @@ ldap_sasl_bind(SIMPLE): Can't contact LDAP server (-1)
|
||||
</li>
|
||||
<li>Create another test account for Rafael from Bioversity-CIAT to submit some items to DSpace Test:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ dspace user -a -m tip-submit@cgiar.org -g CIAT -s Submit -p 'fuuuuuuuu'
|
||||
</code></pre><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ dspace user -a -m tip-submit@cgiar.org -g CIAT -s Submit -p <span style="color:#e6db74">'fuuuuuuuu'</span>
|
||||
</code></pre></div><ul>
|
||||
<li>I added the account to the Alliance Admins account, which is should allow him to submit to any Alliance collection
|
||||
<ul>
|
||||
<li>According to my notes from <a href="/cgspace-notes/2020-10/">2020-10</a> the account must be in the admin group in order to submit via the REST API</li>
|
||||
@ -371,13 +371,13 @@ ldap_sasl_bind(SIMPLE): Can't contact LDAP server (-1)
|
||||
<li>Run <code>dspace cleanup -v</code> process on CGSpace to clean up old bitstreams</li>
|
||||
<li>Export lists of authors, donors, and affiliations for Peter Ballantyne to clean up:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">localhost/dspace63= > \COPY (SELECT DISTINCT text_value as "dc.contributor.author", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 3 GROUP BY text_value ORDER BY count DESC) to /tmp/2021-09-20-authors.csv WITH CSV HEADER;
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">localhost/dspace63= > \COPY (SELECT DISTINCT text_value as "dc.contributor.author", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 3 GROUP BY text_value ORDER BY count DESC) to /tmp/2021-09-20-authors.csv WITH CSV HEADER;
|
||||
COPY 80901
|
||||
localhost/dspace63= > \COPY (SELECT DISTINCT text_value as "cg.contributor.donor", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 248 GROUP BY text_value ORDER BY count DESC) to /tmp/2021-09-20-donors.csv WITH CSV HEADER;
|
||||
localhost/dspace63= > \COPY (SELECT DISTINCT text_value as "cg.contributor.donor", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 248 GROUP BY text_value ORDER BY count DESC) to /tmp/2021-09-20-donors.csv WITH CSV HEADER;
|
||||
COPY 1274
|
||||
localhost/dspace63= > \COPY (SELECT DISTINCT text_value as "cg.contributor.affiliation", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC) to /tmp/2021-09-20-affiliations.csv WITH CSV HEADER;
|
||||
localhost/dspace63= > \COPY (SELECT DISTINCT text_value as "cg.contributor.affiliation", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC) to /tmp/2021-09-20-affiliations.csv WITH CSV HEADER;
|
||||
COPY 8091
|
||||
</code></pre><h2 id="2021-09-23">2021-09-23</h2>
|
||||
</code></pre></div><h2 id="2021-09-23">2021-09-23</h2>
|
||||
<ul>
|
||||
<li>Peter sent me back the corrections for the affiliations
|
||||
<ul>
|
||||
@ -386,24 +386,24 @@ COPY 8091
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ csv-metadata-quality -i ~/Downloads/2021-09-20-affiliations.csv -o /tmp/affiliations.csv -x cg.contributor.affiliation
|
||||
$ csvgrep -c 'correct' -m 'DELETE' /tmp/affiliations.csv > /tmp/affiliations-delete.csv
|
||||
$ csvgrep -c 'correct' -r '^.+$' /tmp/affiliations.csv | csvgrep -i -c 'correct' -m 'DELETE' > /tmp/affiliations-fix.csv
|
||||
$ ./ilri/fix-metadata-values.py -i /tmp/affiliations-fix.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -t 'correct' -m 211
|
||||
$ ./ilri/delete-metadata-values.py -i /tmp/affiliations-fix.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -m 211
|
||||
</code></pre><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ csv-metadata-quality -i ~/Downloads/2021-09-20-affiliations.csv -o /tmp/affiliations.csv -x cg.contributor.affiliation
|
||||
$ csvgrep -c <span style="color:#e6db74">'correct'</span> -m <span style="color:#e6db74">'DELETE'</span> /tmp/affiliations.csv > /tmp/affiliations-delete.csv
|
||||
$ csvgrep -c <span style="color:#e6db74">'correct'</span> -r <span style="color:#e6db74">'^.+$'</span> /tmp/affiliations.csv | csvgrep -i -c <span style="color:#e6db74">'correct'</span> -m <span style="color:#e6db74">'DELETE'</span> > /tmp/affiliations-fix.csv
|
||||
$ ./ilri/fix-metadata-values.py -i /tmp/affiliations-fix.csv -db dspace -u dspace -p <span style="color:#e6db74">'fuuu'</span> -f cg.contributor.affiliation -t <span style="color:#e6db74">'correct'</span> -m <span style="color:#ae81ff">211</span>
|
||||
$ ./ilri/delete-metadata-values.py -i /tmp/affiliations-fix.csv -db dspace -u dspace -p <span style="color:#e6db74">'fuuu'</span> -f cg.contributor.affiliation -m <span style="color:#ae81ff">211</span>
|
||||
</code></pre></div><ul>
|
||||
<li>Then I updated the controlled vocabulary for affiliations by exporting the top 1,000 used terms:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">localhost/dspace63= > \COPY (SELECT DISTINCT text_value as "cg.contributor.affiliation", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC LIMIT 1000) to /tmp/2021-09-23-affiliations.csv WITH CSV HEADER;
|
||||
$ csvcut -c 1 /tmp/2021-09-23-affiliations.csv | sed 1d > /tmp/affiliations.txt
|
||||
</code></pre><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">localhost/dspace63= > \COPY (SELECT DISTINCT text_value as "cg.contributor.affiliation", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC LIMIT 1000) to /tmp/2021-09-23-affiliations.csv WITH CSV HEADER;
|
||||
$ csvcut -c <span style="color:#ae81ff">1</span> /tmp/2021-09-23-affiliations.csv | sed 1d > /tmp/affiliations.txt
|
||||
</code></pre></div><ul>
|
||||
<li>Peter also sent me 310 corrections and 234 deletions for donors so I applied those and updated the controlled vocabularies too</li>
|
||||
<li>Move some One CGIAR-related collections around the CGSpace hierarchy for Peter Ballantyne</li>
|
||||
<li>Mohammed Salem asked me for an ID to UUID mapping for CGSpace collections, so I generated one similar to the ID one I sent him in 2020-11:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">localhost/dspace63= > \COPY (SELECT collection_id,uuid FROM collection WHERE collection_id IS NOT NULL) TO /tmp/2021-09-23-collection-id2uuid.csv WITH CSV HEADER;
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">localhost/dspace63= > \COPY (SELECT collection_id,uuid FROM collection WHERE collection_id IS NOT NULL) TO /tmp/2021-09-23-collection-id2uuid.csv WITH CSV HEADER;
|
||||
COPY 1139
|
||||
</code></pre><h2 id="2021-09-24">2021-09-24</h2>
|
||||
</code></pre></div><h2 id="2021-09-24">2021-09-24</h2>
|
||||
<ul>
|
||||
<li>Peter and Abenet agreed that we should consider converting more of our UPPER CASE metadata values to Title Case
|
||||
<ul>
|
||||
@ -435,33 +435,33 @@ COPY 1139
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">localhost/dspace63= > UPDATE metadatavalue SET text_value=INITCAP(text_value) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=231;
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">localhost/dspace63= > UPDATE metadatavalue SET text_value=INITCAP(text_value) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=231;
|
||||
UPDATE 2903
|
||||
localhost/dspace63= > \COPY (SELECT DISTINCT text_value as "cg.coverage.subregion" FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 231) to /tmp/2021-09-24-subregions.txt;
|
||||
localhost/dspace63= > \COPY (SELECT DISTINCT text_value as "cg.coverage.subregion" FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 231) to /tmp/2021-09-24-subregions.txt;
|
||||
COPY 1200
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>Then I process the list for matches with my <code>subdivision-lookup.py</code> script, and extract only the values that matched:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ ./ilri/subdivision-lookup.py -i /tmp/2021-09-24-subregions.txt -o /tmp/subregions.csv
|
||||
$ csvgrep -c matched -m 'true' /tmp/subregions.csv | csvcut -c 1 | sed 1d > /tmp/subregions-matched.txt
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ./ilri/subdivision-lookup.py -i /tmp/2021-09-24-subregions.txt -o /tmp/subregions.csv
|
||||
$ csvgrep -c matched -m <span style="color:#e6db74">'true'</span> /tmp/subregions.csv | csvcut -c <span style="color:#ae81ff">1</span> | sed 1d > /tmp/subregions-matched.txt
|
||||
$ wc -l /tmp/subregions-matched.txt
|
||||
81 /tmp/subregions-matched.txt
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>Then I updated the controlled vocabulary in the submission forms</li>
|
||||
<li>I did the same for <code>dcterms.audience</code>, taking special care to a few all-caps values:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">localhost/dspace63= > UPDATE metadatavalue SET text_value=INITCAP(text_value) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=144 AND text_value != 'NGOS' AND text_value != 'CGIAR';
|
||||
localhost/dspace63= > UPDATE metadatavalue SET text_value='NGOs' WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=144 AND text_value = 'NGOS';
|
||||
</code></pre><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">localhost/dspace63= > UPDATE metadatavalue SET text_value=INITCAP(text_value) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=144 AND text_value != 'NGOS' AND text_value != 'CGIAR';
|
||||
localhost/dspace63= > UPDATE metadatavalue SET text_value='NGOs' WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=144 AND text_value = 'NGOS';
|
||||
</code></pre></div><ul>
|
||||
<li>Update submission form comment for DOIs because it was still recommending people use the “dx.doi.org” format even though I batch updated all DOIs to the “doi.org” format a few times in the last year
|
||||
<ul>
|
||||
<li>Then I updated all existing metadata to the new format again:</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">dspace=# UPDATE metadatavalue SET text_value = regexp_replace(text_value, 'https://dx.doi.org', 'https://doi.org') WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 220 AND text_value LIKE 'https://dx.doi.org%';
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">dspace=# UPDATE metadatavalue SET text_value = regexp_replace(text_value, 'https://dx.doi.org', 'https://doi.org') WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 220 AND text_value LIKE 'https://dx.doi.org%';
|
||||
UPDATE 49
|
||||
</code></pre><h2 id="2021-09-26">2021-09-26</h2>
|
||||
</code></pre></div><h2 id="2021-09-26">2021-09-26</h2>
|
||||
<ul>
|
||||
<li>Mohammed Salem told me last week that MELSpace and WorldFish have been upgraded to DSpace 6 so I updated the repository setup in AReS to use the UUID field instead of IDs
|
||||
<ul>
|
||||
@ -489,26 +489,26 @@ UPDATE 49
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ csvcut -c 'id,collection,dc.title[en_US]' ~/Downloads/10568-106990.csv > /tmp/2021-09-28-alliance-reports.csv
|
||||
</code></pre><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ csvcut -c <span style="color:#e6db74">'id,collection,dc.title[en_US]'</span> ~/Downloads/10568-106990.csv > /tmp/2021-09-28-alliance-reports.csv
|
||||
</code></pre></div><ul>
|
||||
<li>She sent it back fairly quickly with a new column marked “Move” so I extracted those items that matched and set them to the new owning collection:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ csvgrep -c Move -m 'Yes' ~/Downloads/2021_28_09_alliance_reports_csv.csv | csvcut -c 1,2 | sed 's_10568/106990_10568/111506_' > /tmp/alliance-move.csv
|
||||
</code></pre><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ csvgrep -c Move -m <span style="color:#e6db74">'Yes'</span> ~/Downloads/2021_28_09_alliance_reports_csv.csv | csvcut -c 1,2 | sed <span style="color:#e6db74">'s_10568/106990_10568/111506_'</span> > /tmp/alliance-move.csv
|
||||
</code></pre></div><ul>
|
||||
<li>Maria from the Alliance emailed us to say that approving submissions was slow on CGSpace
|
||||
<ul>
|
||||
<li>I looked at the PostgreSQL activity and it seems low:</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">postgres@linode18:~$ psql -c 'SELECT * FROM pg_stat_activity' | wc -l
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">postgres@linode18:~$ psql -c 'SELECT * FROM pg_stat_activity' | wc -l
|
||||
59
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>Locks look high though:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">postgres@linode18:~$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | sort | uniq -c | wc -l
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">postgres@linode18:~$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | sort | uniq -c | wc -l
|
||||
1154
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>Indeed it seems something started causing locks to increase yesterday:</li>
|
||||
</ul>
|
||||
<p><img src="/cgspace-notes/2021/09/postgres_locks_ALL-week.png" alt="PostgreSQL locks week"></p>
|
||||
@ -520,9 +520,9 @@ UPDATE 49
|
||||
<li>The number of DSpace sessions is normal, hovering around 1,000…</li>
|
||||
<li>Looking closer at the PostgreSQL activity log, I see the locks are all held by the <code>dspaceCli</code> user… which seem weird:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">postgres@linode18:~$ psql -c "SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid WHERE application_name='dspaceCli';" | wc -l
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">postgres@linode18:~$ psql -c "SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid WHERE application_name='dspaceCli';" | wc -l
|
||||
1096
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>Now I’m wondering why there are no connections from <code>dspaceApi</code> or <code>dspaceWeb</code>. Could it be that our Tomcat JDBC pooling via JNDI isn’t working?
|
||||
<ul>
|
||||
<li>I see the same thing on DSpace Test hmmmm</li>
|
||||
@ -536,14 +536,14 @@ UPDATE 49
|
||||
<ul>
|
||||
<li>Export a list of ILRI subjects from CGSpace to validate against AGROVOC for Peter and Abenet:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">localhost/dspace63= > \COPY (SELECT DISTINCT text_value FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 203) to /tmp/2021-09-29-ilri-subject.txt;
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">localhost/dspace63= > \COPY (SELECT DISTINCT text_value FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 203) to /tmp/2021-09-29-ilri-subject.txt;
|
||||
COPY 149
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>Then validate and format the matches:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ ./ilri/agrovoc-lookup.py -i /tmp/2021-09-29-ilri-subject.txt -o /tmp/2021-09-29-ilri-subjects.csv -d
|
||||
$ csvcut -c subject,'match type' /tmp/2021-09-29-ilri-subjects.csv | sed -e 's/match type/matched/' -e 's/\(alt\|pref\)Label/yes/' > /tmp/2021-09-29-ilri-subjects2.csv
|
||||
</code></pre><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ./ilri/agrovoc-lookup.py -i /tmp/2021-09-29-ilri-subject.txt -o /tmp/2021-09-29-ilri-subjects.csv -d
|
||||
$ csvcut -c subject,<span style="color:#e6db74">'match type'</span> /tmp/2021-09-29-ilri-subjects.csv | sed -e <span style="color:#e6db74">'s/match type/matched/'</span> -e <span style="color:#e6db74">'s/\(alt\|pref\)Label/yes/'</span> > /tmp/2021-09-29-ilri-subjects2.csv
|
||||
</code></pre></div><ul>
|
||||
<li>I talked to Salem about depositing from MEL to CGSpace
|
||||
<ul>
|
||||
<li>He mentioned that the one issue is that when you deposit to a workflow you don’t get a Handle or any kind of identifier back!</li>
|
||||
|
Reference in New Issue
Block a user