mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2024-03-08
This commit is contained in:
@ -19,7 +19,7 @@ It might be this issue: https://github.com/DSpace/dspace-angular/issues/2808
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2024-03/" />
|
||||
<meta property="article:published_time" content="2024-03-01T09:55:00+03:00" />
|
||||
<meta property="article:modified_time" content="2024-03-01T09:55:00+03:00" />
|
||||
<meta property="article:modified_time" content="2024-03-04T10:02:14+03:00" />
|
||||
|
||||
|
||||
|
||||
@ -34,7 +34,7 @@ It might be this issue: https://github.com/DSpace/dspace-angular/issues/2808
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.123.7">
|
||||
<meta name="generator" content="Hugo 0.123.8">
|
||||
|
||||
|
||||
|
||||
@ -44,9 +44,9 @@ It might be this issue: https://github.com/DSpace/dspace-angular/issues/2808
|
||||
"@type": "BlogPosting",
|
||||
"headline": "March, 2024",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2024-03/",
|
||||
"wordCount": "93",
|
||||
"wordCount": "317",
|
||||
"datePublished": "2024-03-01T09:55:00+03:00",
|
||||
"dateModified": "2024-03-01T09:55:00+03:00",
|
||||
"dateModified": "2024-03-04T10:02:14+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -141,6 +141,51 @@ It might be this issue: https://github.com/DSpace/dspace-angular/issues/2808
|
||||
<li>I did some cleanups on abstracts, licenses, and dates from CrossRef</li>
|
||||
<li>I also did some minor cleanups to affiliations because I saw some incorrect and duplicate ones in our list</li>
|
||||
</ul>
|
||||
<h2 id="2024-03-05">2024-03-05</h2>
|
||||
<ul>
|
||||
<li>I tried a new technique to get some affiliations from Crossref using OpenRefine
|
||||
<ul>
|
||||
<li>First I split them and clustered, resolving a few hundred clusters out of 1500 (!)</li>
|
||||
<li>Then I used a custom text facet with a few dozen CGIAR and other large affiliations to reduce the work</li>
|
||||
<li>Then I joined them with our affiliations, paying no attention to duplicates</li>
|
||||
<li>Then I deduped them using the Jython technique I learned in 2023-02</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<h2 id="2024-03-06">2024-03-06</h2>
|
||||
<ul>
|
||||
<li>Peter sent me some more corrections for the authors that I had sent him in 2023-12</li>
|
||||
</ul>
|
||||
<h2 id="2024-03-08">2024-03-08</h2>
|
||||
<ul>
|
||||
<li>IFPRI sent me their 2023 records from CONTENTdm so I started working on those
|
||||
<ul>
|
||||
<li>I found a way to match their ORCID identifiers in our list using Jython in OpenRefine:</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> re
|
||||
</span></span><span style="display:flex;"><span>
|
||||
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">with</span> open(<span style="color:#e6db74">r</span><span style="color:#e6db74">"/tmp/cg-creator-identifier.txt"</span>,<span style="color:#e6db74">'r'</span>) <span style="color:#66d9ef">as</span> f :
|
||||
</span></span><span style="display:flex;"><span> orcid_ids <span style="color:#f92672">=</span> [orcid_id<span style="color:#f92672">.</span>strip() <span style="color:#66d9ef">for</span> orcid_id <span style="color:#f92672">in</span> f]
|
||||
</span></span><span style="display:flex;"><span>
|
||||
</span></span><span style="display:flex;"><span>matched <span style="color:#f92672">=</span> <span style="color:#66d9ef">False</span>
|
||||
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">for</span> orcid_id <span style="color:#f92672">in</span> orcid_ids:
|
||||
</span></span><span style="display:flex;"><span> <span style="color:#66d9ef">if</span> re<span style="color:#f92672">.</span>search(<span style="color:#e6db74">r</span><span style="color:#e6db74">'.+: </span><span style="color:#e6db74">{}</span><span style="color:#e6db74">'</span><span style="color:#f92672">.</span>format(value), orcid_id):
|
||||
</span></span><span style="display:flex;"><span> matched <span style="color:#f92672">=</span> <span style="color:#66d9ef">True</span>
|
||||
</span></span><span style="display:flex;"><span> <span style="color:#66d9ef">break</span>
|
||||
</span></span><span style="display:flex;"><span>
|
||||
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">if</span> matched:
|
||||
</span></span><span style="display:flex;"><span> <span style="color:#66d9ef">return</span> orcid_id
|
||||
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">else</span>:
|
||||
</span></span><span style="display:flex;"><span> <span style="color:#66d9ef">return</span> value
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>I realized that <a href="https://www.unicef.org/about-unicef/frequently-asked-questions#3">UNICEF was renamed to its current name in 1953</a> so I replaced all other variations in our vocabularies and metadata:</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sql" data-lang="sql"><span style="display:flex;"><span><span style="color:#66d9ef">UPDATE</span> metadatavalue <span style="color:#66d9ef">SET</span> text_value<span style="color:#f92672">=</span><span style="color:#e6db74">'United Nations Children''s Fund'</span> <span style="color:#66d9ef">WHERE</span> dspace_object_id <span style="color:#66d9ef">IN</span> (<span style="color:#66d9ef">SELECT</span> uuid <span style="color:#66d9ef">FROM</span> item) <span style="color:#66d9ef">AND</span> text_value <span style="color:#66d9ef">IN</span> (<span style="color:#e6db74">'United Nations International Children''s Emergency Fund'</span>, <span style="color:#e6db74">'United Nations International Children''s Emergency Fund'</span>, <span style="color:#e6db74">'UNICEF'</span>);
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>Note the use of two single quotes to escape the one in the name</li>
|
||||
</ul>
|
||||
<!-- raw HTML omitted -->
|
||||
|
||||
|
||||
|
Reference in New Issue
Block a user