Add notes

This commit is contained in:
2024-04-25 15:28:20 +03:00
parent 6db3da2739
commit 515cc0650f
152 changed files with 278 additions and 195 deletions

View File

@ -14,7 +14,7 @@ Work on CGSpace duplicate DOIs more
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2024-04/" />
<meta property="article:published_time" content="2024-04-04T10:23:00+03:00" />
<meta property="article:modified_time" content="2024-04-18T09:38:02+03:00" />
<meta property="article:modified_time" content="2024-04-18T17:00:25+03:00" />
@ -24,7 +24,7 @@ Work on CGSpace duplicate DOIs more
Work on CGSpace duplicate DOIs more
"/>
<meta name="generator" content="Hugo 0.125.0">
<meta name="generator" content="Hugo 0.125.3">
@ -34,9 +34,9 @@ Work on CGSpace duplicate DOIs more
"@type": "BlogPosting",
"headline": "April, 2024",
"url": "https://alanorth.github.io/cgspace-notes/2024-04/",
"wordCount": "456",
"wordCount": "711",
"datePublished": "2024-04-04T10:23:00+03:00",
"dateModified": "2024-04-18T09:38:02+03:00",
"dateModified": "2024-04-18T17:00:25+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -214,7 +214,52 @@ curl -s -o /dev/null 0.01s user 0.01s system 0% cpu 4.764 total
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>map $request_uri $new_uri {
</span></span><span style="display:flex;"><span> /handle/10568/112821 /handle/10568/97605;
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><!-- raw HTML omitted -->
</span></span></code></pre></div><h2 id="2024-04-19">2024-04-19</h2>
<ul>
<li>Spend some time looking at duplicate DOIs again&hellip;</li>
<li>Refresh ORCID identifiers from ORCID API and update CGSpace metadata and controlled vocabulary</li>
</ul>
<h2 id="2024-04-20">2024-04-20</h2>
<ul>
<li>I read an <a href="https://github.com/greenelab/scihub/issues/9">interesting thread about DOI casing</a>
<ul>
<li>Apparently the DOI specification says ASCII characters in DOIs are case insensitive</li>
<li>Indeed, <a href="https://www.crossref.org/documentation/member-setup/constructing-your-dois/">Crossref recommends lower case</a> for all DOIs</li>
<li>I was curious about the DOIs in our database so I checked before and after lower casing:</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>localhost/dspace7= ☘ \COPY (SELECT DISTINCT(text_value) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=220 AND text_value IS NOT NULL AND text_value !=&#39;&#39;) TO /tmp/dois-sql-before.txt;
</span></span><span style="display:flex;"><span>COPY 25675
</span></span><span style="display:flex;"><span>localhost/dspace7= ☘ \COPY (SELECT DISTINCT(lower(text_value)) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=220 AND text_value IS NOT NULL AND text_value !=&#39;&#39;) TO /tmp/dois-sql-after.txt;
</span></span><span style="display:flex;"><span>COPY 25666
</span></span></code></pre></div><ul>
<li>I need to investigate options for lower casing these in the repository, for example in a curation task, and in all workflows around DSpace metadata&hellip;</li>
</ul>
<h2 id="2024-04-23">2024-04-23</h2>
<ul>
<li>Spent some time writing a Java curation task to normalize DOIs in items when they enter the workflow edit step
<ul>
<li>The workflow curation tasks are not documented very well but I got a basic configuration working</li>
<li>I found a bug in DSpace curation tasks and discussed on Slack</li>
<li>I finalized the <code>NormalizeDOIs</code> curation task and released v7.6.1.1 of the <a href="https://github.com/ilri/cgspace-java-helpers">cgspace-java-helpers</a> project</li>
</ul>
</li>
</ul>
<h2 id="2024-04-24">2024-04-24</h2>
<ul>
<li>A bit more testing of the curation tasks
<ul>
<li>I tested a patch by Mark Wood</li>
</ul>
</li>
<li>I added support for normalizing DOIs to this same format to my <a href="https://github.com/ilri/csv-metadata-quality">csv-metadata-quality</a> project</li>
</ul>
<h2 id="2024-04-25">2024-04-25</h2>
<ul>
<li>I lowercased the remaining 3,900 DOIs on CGSpace that had uppercase ASCII characters</li>
</ul>
<!-- raw HTML omitted -->