mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2021-09-13
This commit is contained in:
@ -50,7 +50,7 @@ DELETE 1
|
||||
Create issue on GitHub to track the addition of CCAFS Phase II project tags (#301)
|
||||
Looks like we’ll be using cg.identifier.ccafsprojectpii as the field name
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.87.0" />
|
||||
<meta name="generator" content="Hugo 0.88.1" />
|
||||
|
||||
|
||||
|
||||
@ -140,7 +140,7 @@ Looks like we’ll be using cg.identifier.ccafsprojectpii as the field name
|
||||
<ul>
|
||||
<li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# select * from collection2item where item_id = '80278';
|
||||
<pre tabindex="0"><code>dspace=# select * from collection2item where item_id = '80278';
|
||||
id | collection_id | item_id
|
||||
-------+---------------+---------
|
||||
92551 | 313 | 80278
|
||||
@ -166,7 +166,7 @@ DELETE 1
|
||||
<li>The climate risk management one doesn’t exist, so I will have to ask Magdalena if they want me to add it to the input forms</li>
|
||||
<li>Start testing some nearly 500 author corrections that CCAFS sent me:</li>
|
||||
</ul>
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/CCAFS-Authors-Feb-7.csv -f dc.contributor.author -t 'correct name' -m 3 -d dspace -u dspace -p fuuu
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/CCAFS-Authors-Feb-7.csv -f dc.contributor.author -t 'correct name' -m 3 -d dspace -u dspace -p fuuu
|
||||
</code></pre><h2 id="2017-02-09">2017-02-09</h2>
|
||||
<ul>
|
||||
<li>More work on CCAFS Phase II stuff</li>
|
||||
@ -175,7 +175,7 @@ DELETE 1
|
||||
<li>It’s not a very good way to manage the registry, though, as removing one there doesn’t cause it to be removed from the registry, and we always restore from database backups so there would never be a scenario when we needed these to be created</li>
|
||||
<li>Testing some corrections on CCAFS Phase II flagships (<code>cg.subject.ccafs</code>):</li>
|
||||
</ul>
|
||||
<pre><code>$ ./fix-metadata-values.py -i ccafs-flagships-feb7.csv -f cg.subject.ccafs -t correct -m 210 -d dspace -u dspace -p fuuu
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i ccafs-flagships-feb7.csv -f cg.subject.ccafs -t correct -m 210 -d dspace -u dspace -p fuuu
|
||||
</code></pre><h2 id="2017-02-10">2017-02-10</h2>
|
||||
<ul>
|
||||
<li>CCAFS said they want to wait on the flagship updates (<code>cg.subject.ccafs</code>) on CGSpace, perhaps for a month or so</li>
|
||||
@ -215,46 +215,46 @@ DELETE 1
|
||||
<li>Fix issue with duplicate declaration of in atmire-dspace-xmlui <code>pom.xml</code> (causing non-fatal warnings during the maven build)</li>
|
||||
<li>Experiment with making DSpace generate HTTPS handle links, first a change in dspace.cfg or the site’s properties file:</li>
|
||||
</ul>
|
||||
<pre><code>handle.canonical.prefix = https://hdl.handle.net/
|
||||
<pre tabindex="0"><code>handle.canonical.prefix = https://hdl.handle.net/
|
||||
</code></pre><ul>
|
||||
<li>And then a SQL command to update existing records:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://hdl.handle.net', 'https://hdl.handle.net') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'uri');
|
||||
<pre tabindex="0"><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://hdl.handle.net', 'https://hdl.handle.net') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'uri');
|
||||
UPDATE 58193
|
||||
</code></pre><ul>
|
||||
<li>Seems to work fine!</li>
|
||||
<li>I noticed a few items that have incorrect DOI links (<code>dc.identifier.doi</code>), and after looking in the database I see there are over 100 that are missing the scheme or are just plain wrong:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# select distinct text_value from metadatavalue where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value not like 'http%://%';
|
||||
<pre tabindex="0"><code>dspace=# select distinct text_value from metadatavalue where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value not like 'http%://%';
|
||||
</code></pre><ul>
|
||||
<li>This will replace any that begin with <code>10.</code> and change them to <code>https://dx.doi.org/10.</code>:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '(^10\..+$)', 'https://dx.doi.org/\1') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like '10.%';
|
||||
<pre tabindex="0"><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '(^10\..+$)', 'https://dx.doi.org/\1') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like '10.%';
|
||||
</code></pre><ul>
|
||||
<li>This will get any that begin with <code>doi:10.</code> and change them to <code>https://dx.doi.org/10.x</code>:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '^doi:(10\..+$)', 'https://dx.doi.org/\1') where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'doi:10%';
|
||||
<pre tabindex="0"><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '^doi:(10\..+$)', 'https://dx.doi.org/\1') where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'doi:10%';
|
||||
</code></pre><ul>
|
||||
<li>Fix DOIs like <code>dx.doi.org/10.</code> to be <code>https://dx.doi.org/10.</code>:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '(^dx.doi.org/.+$)', 'https://dx.doi.org/\1') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'dx.doi.org/%';
|
||||
<pre tabindex="0"><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '(^dx.doi.org/.+$)', 'https://dx.doi.org/\1') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'dx.doi.org/%';
|
||||
</code></pre><ul>
|
||||
<li>Fix DOIs like <code>http//</code>:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '^http//(dx.doi.org/.+$)', 'https://dx.doi.org/\1') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'http//%';
|
||||
<pre tabindex="0"><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '^http//(dx.doi.org/.+$)', 'https://dx.doi.org/\1') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'http//%';
|
||||
</code></pre><ul>
|
||||
<li>Fix DOIs like <code>dx.doi.org./</code>:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '(^dx.doi.org\./.+$)', 'https://dx.doi.org/\1') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'dx.doi.org./%'
|
||||
<pre tabindex="0"><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '(^dx.doi.org\./.+$)', 'https://dx.doi.org/\1') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'dx.doi.org./%'
|
||||
|
||||
</code></pre><ul>
|
||||
<li>Delete some invalid DOIs:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value in ('DOI','CPWF Mekong','Bulawayo, Zimbabwe','bb');
|
||||
<pre tabindex="0"><code>dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value in ('DOI','CPWF Mekong','Bulawayo, Zimbabwe','bb');
|
||||
</code></pre><ul>
|
||||
<li>Fix some other random outliers:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# update metadatavalue set text_value = 'https://dx.doi.org/10.1016/j.aquaculture.2015.09.003' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'http:/dx.doi.org/10.1016/j.aquaculture.2015.09.003';
|
||||
<pre tabindex="0"><code>dspace=# update metadatavalue set text_value = 'https://dx.doi.org/10.1016/j.aquaculture.2015.09.003' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'http:/dx.doi.org/10.1016/j.aquaculture.2015.09.003';
|
||||
dspace=# update metadatavalue set text_value = 'https://dx.doi.org/10.5337/2016.200' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'doi: https://dx.doi.org/10.5337/2016.200';
|
||||
dspace=# update metadatavalue set text_value = 'https://dx.doi.org/doi:10.1371/journal.pone.0062898' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'Http://dx.doi.org/doi:10.1371/journal.pone.0062898';
|
||||
dspace=# update metadatavalue set text_value = 'https://dx.doi.10.1016/j.cosust.2013.11.012' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'http:dx.doi.10.1016/j.cosust.2013.11.012';
|
||||
@ -263,13 +263,13 @@ dspace=# update metadatavalue set text_value = 'https://dx.doi.org/10.15446/agro
|
||||
</code></pre><ul>
|
||||
<li>And do another round of <code>http://</code> → <code>https://</code> cleanups:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://dx.doi.org', 'https://dx.doi.org') where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'http://dx.doi.org%';
|
||||
<pre tabindex="0"><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://dx.doi.org', 'https://dx.doi.org') where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'http://dx.doi.org%';
|
||||
</code></pre><ul>
|
||||
<li>Run all DOI corrections on CGSpace</li>
|
||||
<li>Something to think about here is to write a <a href="https://wiki.lyrasis.org/display/DSDOC5x/Curation+System#CurationSystem-ScriptedTasks">Curation Task</a> in Java to do these sanity checks / corrections every night</li>
|
||||
<li>Then we could add a cron job for them and run them from the command line like:</li>
|
||||
</ul>
|
||||
<pre><code>[dspace]/bin/dspace curate -t noop -i 10568/79891
|
||||
<pre tabindex="0"><code>[dspace]/bin/dspace curate -t noop -i 10568/79891
|
||||
</code></pre><h2 id="2017-02-20">2017-02-20</h2>
|
||||
<ul>
|
||||
<li>Run all system updates on DSpace Test and reboot the server</li>
|
||||
@ -280,7 +280,7 @@ dspace=# update metadatavalue set text_value = 'https://dx.doi.org/10.15446/agro
|
||||
<li>Testing the <code>fix-metadata-values.py</code> script on macOS and it seems like we don’t need to use <code>.encode('utf-8')</code> anymore when printing strings to the screen</li>
|
||||
<li>It seems this might have only been a temporary problem, as both Python 3.5.2 and 3.6.0 are able to print the problematic string “Entwicklung & Ländlicher Raum” without the <code>encode()</code> call, but print it as a bytes when it <em>is</em> used:</li>
|
||||
</ul>
|
||||
<pre><code>$ python
|
||||
<pre tabindex="0"><code>$ python
|
||||
Python 3.6.0 (default, Dec 25 2016, 17:30:53)
|
||||
>>> print('Entwicklung & Ländlicher Raum')
|
||||
Entwicklung & Ländlicher Raum
|
||||
@ -294,7 +294,7 @@ b'Entwicklung & L\xc3\xa4ndlicher Raum'
|
||||
<li>Testing regenerating PDF thumbnails, like I started in 2016-11</li>
|
||||
<li>It seems there is a bug in <code>filter-media</code> that causes it to process formats that aren’t part of its configuration:</li>
|
||||
</ul>
|
||||
<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16856 -p "ImageMagick PDF Thumbnail"
|
||||
<pre tabindex="0"><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16856 -p "ImageMagick PDF Thumbnail"
|
||||
File: earlywinproposal_esa_postharvest.pdf.jpg
|
||||
FILTERED: bitstream 13787 (item: 10568/16881) and created 'earlywinproposal_esa_postharvest.pdf.jpg'
|
||||
File: postHarvest.jpg.jpg
|
||||
@ -302,7 +302,7 @@ FILTERED: bitstream 16524 (item: 10568/24655) and created 'postHarvest.jpg.jpg'
|
||||
</code></pre><ul>
|
||||
<li>According to <code>dspace.cfg</code> the ImageMagick PDF Thumbnail plugin should only process PDFs:</li>
|
||||
</ul>
|
||||
<pre><code>filter.org.dspace.app.mediafilter.ImageMagickImageThumbnailFilter.inputFormats = BMP, GIF, image/png, JPG, TIFF, JPEG, JPEG 2000
|
||||
<pre tabindex="0"><code>filter.org.dspace.app.mediafilter.ImageMagickImageThumbnailFilter.inputFormats = BMP, GIF, image/png, JPG, TIFF, JPEG, JPEG 2000
|
||||
filter.org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter.inputFormats = Adobe PDF
|
||||
</code></pre><ul>
|
||||
<li>I’ve sent a message to the mailing list and might file a Jira issue</li>
|
||||
@ -317,7 +317,7 @@ filter.org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter.inputFormats = A
|
||||
<ul>
|
||||
<li>Find all fields with “<a href="http://hdl.handle.net">http://hdl.handle.net</a>” values (most are in <code>dc.identifier.uri</code>, but some are in other URL-related fields like <code>cg.link.reference</code>, <code>cg.identifier.dataurl</code>, and <code>cg.identifier.url</code>):</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# select distinct metadata_field_id from metadatavalue where resource_type_id=2 and text_value like 'http://hdl.handle.net%';
|
||||
<pre tabindex="0"><code>dspace=# select distinct metadata_field_id from metadatavalue where resource_type_id=2 and text_value like 'http://hdl.handle.net%';
|
||||
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://hdl.handle.net', 'https://hdl.handle.net') where resource_type_id=2 and metadata_field_id IN (25, 113, 179, 219, 220, 223) and text_value like 'http://hdl.handle.net%';
|
||||
UPDATE 58633
|
||||
</code></pre><ul>
|
||||
@ -328,7 +328,7 @@ UPDATE 58633
|
||||
<ul>
|
||||
<li>LDAP users cannot log in today, looks to be an issue with CGIAR’s LDAP server:</li>
|
||||
</ul>
|
||||
<pre><code>$ openssl s_client -connect svcgroot2.cgiarad.org:3269
|
||||
<pre tabindex="0"><code>$ openssl s_client -connect svcgroot2.cgiarad.org:3269
|
||||
CONNECTED(00000003)
|
||||
depth=0 CN = SVCGROOT2.CGIARAD.ORG
|
||||
verify error:num=20:unable to get local issuer certificate
|
||||
@ -345,7 +345,7 @@ Certificate chain
|
||||
<li>For some reason it is now signed by a private certificate authority</li>
|
||||
<li>This error seems to have started on 2017-02-25:</li>
|
||||
</ul>
|
||||
<pre><code>$ grep -c "unable to find valid certification path" [dspace]/log/dspace.log.2017-02-*
|
||||
<pre tabindex="0"><code>$ grep -c "unable to find valid certification path" [dspace]/log/dspace.log.2017-02-*
|
||||
[dspace]/log/dspace.log.2017-02-01:0
|
||||
[dspace]/log/dspace.log.2017-02-02:0
|
||||
[dspace]/log/dspace.log.2017-02-03:0
|
||||
@ -381,7 +381,7 @@ Certificate chain
|
||||
<li>The problem likely lies in the logic of <code>ImageMagickThumbnailFilter.java</code>, as <code>ImageMagickPdfThumbnailFilter.java</code> extends it</li>
|
||||
<li>Run CIAT corrections on CGSpace</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value = 'International Center for Tropical Agriculture';
|
||||
<pre tabindex="0"><code>dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value = 'International Center for Tropical Agriculture';
|
||||
</code></pre><ul>
|
||||
<li>CGNET has fixed the certificate chain on their LDAP server</li>
|
||||
<li>Redeploy CGSpace and DSpace Test to on latest <code>5_x-prod</code> branch with fixes for LDAP bind user</li>
|
||||
@ -393,16 +393,16 @@ Certificate chain
|
||||
<li>Ah, this is probably because some items have the <code>International Center for Tropical Agriculture</code> author twice, which I first noticed in 2016-12 but couldn’t figure out how to fix</li>
|
||||
<li>I think I can do it by first exporting all metadatavalues that have the author <code>International Center for Tropical Agriculture</code></li>
|
||||
</ul>
|
||||
<pre><code>dspace=# \copy (select resource_id, metadata_value_id from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='International Center for Tropical Agriculture') to /tmp/ciat.csv with csv;
|
||||
<pre tabindex="0"><code>dspace=# \copy (select resource_id, metadata_value_id from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='International Center for Tropical Agriculture') to /tmp/ciat.csv with csv;
|
||||
COPY 1968
|
||||
</code></pre><ul>
|
||||
<li>And then use awk to print the duplicate lines to a separate file:</li>
|
||||
</ul>
|
||||
<pre><code>$ awk -F',' 'seen[$1]++' /tmp/ciat.csv > /tmp/ciat-dupes.csv
|
||||
<pre tabindex="0"><code>$ awk -F',' 'seen[$1]++' /tmp/ciat.csv > /tmp/ciat-dupes.csv
|
||||
</code></pre><ul>
|
||||
<li>From that file I can create a list of 279 deletes and put them in a batch script like:</li>
|
||||
</ul>
|
||||
<pre><code>delete from metadatavalue where resource_type_id=2 and metadata_field_id=3 and metadata_value_id=2742061;
|
||||
<pre tabindex="0"><code>delete from metadatavalue where resource_type_id=2 and metadata_field_id=3 and metadata_value_id=2742061;
|
||||
</code></pre>
|
||||
|
||||
|
||||
|
Reference in New Issue
Block a user