mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2019-05-05
This commit is contained in:
@ -33,7 +33,7 @@ This is their publications set: http://ebrary.ifpri.org/oai/oai.php?verb=ListRec
|
||||
You can see the others by using the OAI ListSets verb: http://ebrary.ifpri.org/oai/oai.php?verb=ListSets
|
||||
Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in dc.identifier.fund to cg.identifier.cpwfproject and then the rest to dc.description.sponsorship
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -137,91 +137,88 @@ UPDATE 14
|
||||
|
||||
<ul>
|
||||
<li>Testing the configuration and theme changes for the upcoming metadata migration and I found some issues with <code>cg.coverage.admin-unit</code></li>
|
||||
<li>Seems that the Browse configuration in <code>dspace.cfg</code> can’t handle the ‘-’ in the field name:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Seems that the Browse configuration in <code>dspace.cfg</code> can’t handle the ‘-’ in the field name:</p>
|
||||
|
||||
<pre><code>webui.browse.index.12 = subregion:metadata:cg.coverage.admin-unit:text
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>But actually, I think since DSpace 4 or 5 (we are 5.1) the Browse indexes come from Discovery (defined in discovery.xml) so this is really just a parsing error</li>
|
||||
<li>I’ve sent a message to the DSpace mailing list to ask about the Browse index definition</li>
|
||||
<li>A user was having problems with submission and from the stacktrace it looks like a Sherpa/Romeo issue</li>
|
||||
<li>I found a thread on the mailing list talking about it and there is bug report and a patch: <a href="https://jira.duraspace.org/browse/DS-2740">https://jira.duraspace.org/browse/DS-2740</a></li>
|
||||
<li>The patch applies successfully on DSpace 5.1 so I will try it later</li>
|
||||
<li><p>But actually, I think since DSpace 4 or 5 (we are 5.1) the Browse indexes come from Discovery (defined in discovery.xml) so this is really just a parsing error</p></li>
|
||||
|
||||
<li><p>I’ve sent a message to the DSpace mailing list to ask about the Browse index definition</p></li>
|
||||
|
||||
<li><p>A user was having problems with submission and from the stacktrace it looks like a Sherpa/Romeo issue</p></li>
|
||||
|
||||
<li><p>I found a thread on the mailing list talking about it and there is bug report and a patch: <a href="https://jira.duraspace.org/browse/DS-2740">https://jira.duraspace.org/browse/DS-2740</a></p></li>
|
||||
|
||||
<li><p>The patch applies successfully on DSpace 5.1 so I will try it later</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-06-03">2016-06-03</h2>
|
||||
|
||||
<ul>
|
||||
<li>Investigating the CCAFS authority issue, I exported the metadata for the Videos collection</li>
|
||||
<li>The top two authors are:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>The top two authors are:</p>
|
||||
|
||||
<pre><code>CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::500
|
||||
CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::600
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>So the only difference is the “confidence”</li>
|
||||
<li>Ok, well THAT is interesting:</li>
|
||||
</ul>
|
||||
<li><p>So the only difference is the “confidence”</p></li>
|
||||
|
||||
<li><p>Ok, well THAT is interesting:</p>
|
||||
|
||||
<pre><code>dspacetest=# select text_value, authority, confidence from metadatavalue where metadata_field_id=3 and text_value like '%Orth, %';
|
||||
text_value | authority | confidence
|
||||
text_value | authority | confidence
|
||||
------------+--------------------------------------+------------
|
||||
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
|
||||
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
|
||||
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
|
||||
Orth, Alan | | -1
|
||||
Orth, Alan | | -1
|
||||
Orth, Alan | | -1
|
||||
Orth, Alan | | -1
|
||||
Orth, A. | 05c2c622-d252-4efb-b9ed-95a07d3adf11 | -1
|
||||
Orth, A. | 05c2c622-d252-4efb-b9ed-95a07d3adf11 | -1
|
||||
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
|
||||
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
|
||||
Orth, Alan | ad281dbf-ef81-4007-96c3-a7f5d2eaa6d9 | 600
|
||||
Orth, Alan | ad281dbf-ef81-4007-96c3-a7f5d2eaa6d9 | 600
|
||||
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
|
||||
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
|
||||
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
|
||||
Orth, Alan | | -1
|
||||
Orth, Alan | | -1
|
||||
Orth, Alan | | -1
|
||||
Orth, Alan | | -1
|
||||
Orth, A. | 05c2c622-d252-4efb-b9ed-95a07d3adf11 | -1
|
||||
Orth, A. | 05c2c622-d252-4efb-b9ed-95a07d3adf11 | -1
|
||||
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
|
||||
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
|
||||
Orth, Alan | ad281dbf-ef81-4007-96c3-a7f5d2eaa6d9 | 600
|
||||
Orth, Alan | ad281dbf-ef81-4007-96c3-a7f5d2eaa6d9 | 600
|
||||
(13 rows)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And now an actually relevent example:</li>
|
||||
</ul>
|
||||
<li><p>And now an actually relevent example:</p>
|
||||
|
||||
<pre><code>dspacetest=# select count(*) from metadatavalue where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security' and confidence = 500;
|
||||
count
|
||||
count
|
||||
-------
|
||||
707
|
||||
707
|
||||
(1 row)
|
||||
|
||||
dspacetest=# select count(*) from metadatavalue where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security' and confidence != 500;
|
||||
count
|
||||
count
|
||||
-------
|
||||
253
|
||||
253
|
||||
(1 row)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Trying something experimental:</li>
|
||||
</ul>
|
||||
<li><p>Trying something experimental:</p>
|
||||
|
||||
<pre><code>dspacetest=# update metadatavalue set confidence=500 where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security';
|
||||
UPDATE 960
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And then re-indexing authority and Discovery…?</li>
|
||||
<li>After Discovery reindex the CCAFS authors are all together in the Authors sidebar facet</li>
|
||||
<li>The docs for the ORCiD and Authority stuff for DSpace 5 mention changing the browse indexes to use the Authority as well:</li>
|
||||
</ul>
|
||||
<li><p>And then re-indexing authority and Discovery…?</p></li>
|
||||
|
||||
<li><p>After Discovery reindex the CCAFS authors are all together in the Authors sidebar facet</p></li>
|
||||
|
||||
<li><p>The docs for the ORCiD and Authority stuff for DSpace 5 mention changing the browse indexes to use the Authority as well:</p>
|
||||
|
||||
<pre><code>webui.browse.index.2 = author:metadataAuthority:dc.contributor.author:authority
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>That would only be for the “Browse by” function… so we’ll have to see what effect that has later</li>
|
||||
<li><p>That would only be for the “Browse by” function… so we’ll have to see what effect that has later</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-06-04">2016-06-04</h2>
|
||||
@ -235,13 +232,11 @@ UPDATE 960
|
||||
<h2 id="2016-06-07">2016-06-07</h2>
|
||||
|
||||
<ul>
|
||||
<li>Figured out how to export a list of the unique values from a metadata field ordered by count:</li>
|
||||
</ul>
|
||||
<li><p>Figured out how to export a list of the unique values from a metadata field ordered by count:</p>
|
||||
|
||||
<pre><code>dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=29 group by text_value order by count desc) to /tmp/sponsorship.csv with csv;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li><p>Identified the next round of fields to migrate:</p>
|
||||
|
||||
<ul>
|
||||
@ -266,17 +261,19 @@ UPDATE 960
|
||||
<ul>
|
||||
<li>Discuss controlled vocabularies for ~28 fields</li>
|
||||
<li>Looks like this is all we need: <a href="https://wiki.duraspace.org/display/DSDOC5x/Submission+User+Interface#SubmissionUserInterface-ConfiguringControlledVocabularies">https://wiki.duraspace.org/display/DSDOC5x/Submission+User+Interface#SubmissionUserInterface-ConfiguringControlledVocabularies</a></li>
|
||||
<li>I wrote an XPath expression to extract the ILRI subjects from <code>input-forms.xml</code> (uses xmlstartlet):</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I wrote an XPath expression to extract the ILRI subjects from <code>input-forms.xml</code> (uses xmlstartlet):</p>
|
||||
|
||||
<pre><code>$ xml sel -t -m '//value-pairs[@value-pairs-name="ilrisubject"]/pair/displayed-value/text()' -c '.' -n dspace/config/input-forms.xml
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Write to Atmire about the use of <code>atmire.orcid.id</code> to see if we can change it</li>
|
||||
<li>Seems to be a virtual field that is queried from the authority cache… hmm</li>
|
||||
<li>In other news, I found out that the About page that we haven’t been using lives in <code>dspace/config/about.xml</code>, so now we can update the text</li>
|
||||
<li>File bug about <code>closed="true"</code> attribute of controlled vocabularies not working: <a href="https://jira.duraspace.org/browse/DS-3238">https://jira.duraspace.org/browse/DS-3238</a></li>
|
||||
<li><p>Write to Atmire about the use of <code>atmire.orcid.id</code> to see if we can change it</p></li>
|
||||
|
||||
<li><p>Seems to be a virtual field that is queried from the authority cache… hmm</p></li>
|
||||
|
||||
<li><p>In other news, I found out that the About page that we haven’t been using lives in <code>dspace/config/about.xml</code>, so now we can update the text</p></li>
|
||||
|
||||
<li><p>File bug about <code>closed="true"</code> attribute of controlled vocabularies not working: <a href="https://jira.duraspace.org/browse/DS-3238">https://jira.duraspace.org/browse/DS-3238</a></p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-06-09">2016-06-09</h2>
|
||||
@ -292,24 +289,30 @@ UPDATE 960
|
||||
<ul>
|
||||
<li>Investigating authority confidences</li>
|
||||
<li>It looks like the values are documented in <code>Choices.java</code></li>
|
||||
<li>Experiment with setting all 960 CCAFS author values to be 500:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Experiment with setting all 960 CCAFS author values to be 500:</p>
|
||||
|
||||
<pre><code>dspacetest=# SELECT authority, confidence FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=3 AND text_value = 'CGIAR Research Program on Climate Change, Agriculture and Food Security';
|
||||
|
||||
dspacetest=# UPDATE metadatavalue set confidence = 500 where resource_type_id=2 AND metadata_field_id=3 AND text_value = 'CGIAR Research Program on Climate Change, Agriculture and Food Security';
|
||||
UPDATE 960
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>After the database edit, I did a full Discovery re-index</li>
|
||||
<li>And now there are exactly 960 items in the authors facet for ‘CGIAR Research Program on Climate Change, Agriculture and Food Security’</li>
|
||||
<li>Now I ran the same on CGSpace</li>
|
||||
<li>Merge controlled vocabulary functionality for animal breeds to <code>5_x-prod</code> (<a href="https://github.com/ilri/DSpace/pull/236">#236</a>)</li>
|
||||
<li>Write python script to update metadata values in batch via PostgreSQL: <a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897">fix-metadata-values.py</a></li>
|
||||
<li>We need to use this to correct some pretty ugly values in fields like <code>dc.description.sponsorship</code></li>
|
||||
<li>Merge item display tweaks from earlier this week (<a href="https://github.com/ilri/DSpace/pull/231">#231</a>)</li>
|
||||
<li>Merge controlled vocabulary functionality for subregions (<a href="https://github.com/ilri/DSpace/pull/238">#238</a>)</li>
|
||||
<li><p>After the database edit, I did a full Discovery re-index</p></li>
|
||||
|
||||
<li><p>And now there are exactly 960 items in the authors facet for ‘CGIAR Research Program on Climate Change, Agriculture and Food Security’</p></li>
|
||||
|
||||
<li><p>Now I ran the same on CGSpace</p></li>
|
||||
|
||||
<li><p>Merge controlled vocabulary functionality for animal breeds to <code>5_x-prod</code> (<a href="https://github.com/ilri/DSpace/pull/236">#236</a>)</p></li>
|
||||
|
||||
<li><p>Write python script to update metadata values in batch via PostgreSQL: <a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897">fix-metadata-values.py</a></p></li>
|
||||
|
||||
<li><p>We need to use this to correct some pretty ugly values in fields like <code>dc.description.sponsorship</code></p></li>
|
||||
|
||||
<li><p>Merge item display tweaks from earlier this week (<a href="https://github.com/ilri/DSpace/pull/231">#231</a>)</p></li>
|
||||
|
||||
<li><p>Merge controlled vocabulary functionality for subregions (<a href="https://github.com/ilri/DSpace/pull/238">#238</a>)</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-06-11">2016-06-11</h2>
|
||||
@ -355,35 +358,33 @@ UPDATE 960
|
||||
<h2 id="2016-06-20">2016-06-20</h2>
|
||||
|
||||
<ul>
|
||||
<li>CGSpace’s HTTPS certificate expired last night and I didn’t notice, had to renew:</li>
|
||||
</ul>
|
||||
<li><p>CGSpace’s HTTPS certificate expired last night and I didn’t notice, had to renew:</p>
|
||||
|
||||
<pre><code># /opt/letsencrypt/letsencrypt-auto renew --standalone --pre-hook "/usr/bin/service nginx stop" --post-hook "/usr/bin/service nginx start"
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I really need to fix that cron job…</li>
|
||||
<li><p>I really need to fix that cron job…</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-06-24">2016-06-24</h2>
|
||||
|
||||
<ul>
|
||||
<li>Run the replacements/deletes for <code>dc.description.sponsorship</code> (investors) on CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>Run the replacements/deletes for <code>dc.description.sponsorship</code> (investors) on CGSpace:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i investors-not-blank-not-delete-85.csv -f dc.description.sponsorship -t 'correct investor' -m 29 -d cgspace -p 'fuuu' -u cgspace
|
||||
$ ./delete-metadata-values.py -i investors-delete-82.csv -f dc.description.sponsorship -m 29 -d cgspace -p 'fuuu' -u cgspace
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>The scripts for this are here:
|
||||
<li><p>The scripts for this are here:</p>
|
||||
|
||||
<ul>
|
||||
<li><a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897">fix-metadata-values.py</a></li>
|
||||
<li><a href="https://gist.github.com/alanorth/bd7d58c947f686401a2b1fadc78736be">delete-metadata-values.py</a></li>
|
||||
</ul></li>
|
||||
<li>Add new sponsors to controlled vocabulary (<a href="https://github.com/ilri/DSpace/pull/244">#244</a>)</li>
|
||||
<li>Refine submission form labels and hints</li>
|
||||
|
||||
<li><p>Add new sponsors to controlled vocabulary (<a href="https://github.com/ilri/DSpace/pull/244">#244</a>)</p></li>
|
||||
|
||||
<li><p>Refine submission form labels and hints</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-06-28">2016-06-28</h2>
|
||||
@ -391,21 +392,19 @@ $ ./delete-metadata-values.py -i investors-delete-82.csv -f dc.description.spons
|
||||
<ul>
|
||||
<li>Testing the cleanup of <code>dc.contributor.corporate</code> with 13 deletions and 121 replacements</li>
|
||||
<li>There are still ~97 fields that weren’t indicated to do anything</li>
|
||||
<li>After the above deletions and replacements I regenerated a CSV and sent it to Peter <em>et al</em> to have a look</li>
|
||||
</ul>
|
||||
|
||||
<li><p>After the above deletions and replacements I regenerated a CSV and sent it to Peter <em>et al</em> to have a look</p>
|
||||
|
||||
<pre><code>dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=126 group by text_value order by count desc) to /tmp/contributors-june28.csv with csv;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Re-evaluate <code>dc.contributor.corporate</code> and it seems we will move it to <code>dc.contributor.author</code> as this is more in line with how editors are actually using it</li>
|
||||
<li><p>Re-evaluate <code>dc.contributor.corporate</code> and it seems we will move it to <code>dc.contributor.author</code> as this is more in line with how editors are actually using it</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-06-29">2016-06-29</h2>
|
||||
|
||||
<ul>
|
||||
<li>Test run of <code>migrate-fields.sh</code> with the following re-mappings:</li>
|
||||
</ul>
|
||||
<li><p>Test run of <code>migrate-fields.sh</code> with the following re-mappings:</p>
|
||||
|
||||
<pre><code>72 55 #dc.source
|
||||
86 230 #cg.contributor.crp
|
||||
@ -417,20 +416,18 @@ $ ./delete-metadata-values.py -i investors-delete-82.csv -f dc.description.spons
|
||||
74 220 #cg.identifier.doi
|
||||
79 222 #cg.identifier.googleurl
|
||||
89 223 #cg.identifier.dataurl
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Run all cleanups and deletions of <code>dc.contributor.corporate</code> on CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>Run all cleanups and deletions of <code>dc.contributor.corporate</code> on CGSpace:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i Corporate-Authors-Fix-121.csv -f dc.contributor.corporate -t 'Correct style' -m 126 -d cgspace -u cgspace -p 'fuuu'
|
||||
$ ./fix-metadata-values.py -i Corporate-Authors-Fix-PB.csv -f dc.contributor.corporate -t 'should be' -m 126 -d cgspace -u cgspace -p 'fuuu'
|
||||
$ ./delete-metadata-values.py -f dc.contributor.corporate -i Corporate-Authors-Delete-13.csv -m 126 -u cgspace -d cgspace -p 'fuuu'
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Re-deploy CGSpace and DSpace Test with latest June changes</li>
|
||||
<li>Now the sharing and Altmetric bits are more prominent:</li>
|
||||
<li><p>Re-deploy CGSpace and DSpace Test with latest June changes</p></li>
|
||||
|
||||
<li><p>Now the sharing and Altmetric bits are more prominent:</p></li>
|
||||
</ul>
|
||||
|
||||
<p><img src="/cgspace-notes/2016/06/xmlui-altmetric-sharing.png" alt="DSpace 5.1 XMLUI With Altmetric Badge" /></p>
|
||||
@ -443,18 +440,16 @@ $ ./delete-metadata-values.py -f dc.contributor.corporate -i Corporate-Authors-D
|
||||
<h2 id="2016-06-30">2016-06-30</h2>
|
||||
|
||||
<ul>
|
||||
<li>Wow, there are 95 authors in the database who have ‘,’ at the end of their name:</li>
|
||||
</ul>
|
||||
<li><p>Wow, there are 95 authors in the database who have ‘,’ at the end of their name:</p>
|
||||
|
||||
<pre><code># select text_value from metadatavalue where metadata_field_id=3 and text_value like '%,';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>We need to use something like this to fix them, need to write a proper regex later:</li>
|
||||
</ul>
|
||||
<li><p>We need to use something like this to fix them, need to write a proper regex later:</p>
|
||||
|
||||
<pre><code># update metadatavalue set text_value = regexp_replace(text_value, '(Poole, J),', '\1') where metadata_field_id=3 and text_value = 'Poole, J,';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
|
||||
|
||||
|
Reference in New Issue
Block a user