Add notes

This commit is contained in:
2024-04-18 09:38:02 +03:00
parent efd8eb7f79
commit 60b244486f
152 changed files with 244 additions and 199 deletions

View File

@ -14,7 +14,7 @@ Work on CGSpace duplicate DOIs more
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2024-04/" />
<meta property="article:published_time" content="2024-04-04T10:23:00+03:00" />
<meta property="article:modified_time" content="2024-04-12T20:40:52+03:00" />
<meta property="article:modified_time" content="2024-04-16T09:35:30+03:00" />
@ -24,7 +24,7 @@ Work on CGSpace duplicate DOIs more
Work on CGSpace duplicate DOIs more
"/>
<meta name="generator" content="Hugo 0.124.1">
<meta name="generator" content="Hugo 0.125.0">
@ -34,9 +34,9 @@ Work on CGSpace duplicate DOIs more
"@type": "BlogPosting",
"headline": "April, 2024",
"url": "https://alanorth.github.io/cgspace-notes/2024-04/",
"wordCount": "236",
"wordCount": "352",
"datePublished": "2024-04-04T10:23:00+03:00",
"dateModified": "2024-04-12T20:40:52+03:00",
"dateModified": "2024-04-16T09:35:30+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -155,7 +155,7 @@ Work on CGSpace duplicate DOIs more
</ul>
<pre tabindex="0"><code>$ time curl -s -o /dev/null &#39;https://cgspace.cgiar.org/server/api/discover/search/objects?query=cg.identifier.project%3AIFPRI*&amp;scope=8f1e9650-fe87-4e6e-889a-1cacfb747408&amp;page=0&amp;size=100&amp;embed=thumbnail,bundles/bitstreams&amp;sort=dcterms.issued,desc&#39;
curl -s -o /dev/null 0.01s user 0.01s system 0% cpu 47.515 total
$ time curl -s -o /dev/null &#39;https://cgspace.cgiar.org/server/api/discover/search/objects?query=cg.identifier.project%3AIFPRI*&amp;scope=8f1e9650-fe87-4e6e-889a-1cacfb747408&amp;page=0&amp;size=100&amp;sort=dcterms.issued,desc&#39;
$ time curl -s -o /dev/null &#39;https://cgspace.cgiar.org/server/api/discover/search/objects?query=cg.identifier.project%3AIFPRI*&amp;scope=8f1e9650-fe87-4e6e-889a-1cacfb747408&amp;page=0&amp;size=100&amp;sort=dcterms.issued,desc&#39;
curl -s -o /dev/null 0.01s user 0.01s system 0% cpu 4.764 total
</code></pre><ul>
<li>Finalize processing the remaining 206 items from the IFPRI 2022 batch set that already existed on CGSpace
@ -168,6 +168,28 @@ curl -s -o /dev/null 0.01s user 0.01s system 0% cpu 4.764 total
<h2 id="2024-04-16">2024-04-16</h2>
<ul>
<li>Spend some time looking at duplicate DOIs again&hellip;</li>
<li>Assist Deborah with an advanced query on CGSpace for biodiversity and health:</li>
</ul>
<pre tabindex="0"><code>dcterms.issued:[2010 TO 2024] AND dcterms.type:&#34;Journal Article&#34; AND (dc.title:&#34;biodiversity&#34; OR dcterms.subject:&#34;biodiversity&#34; OR dc.title:&#34;health&#34; OR dcterms.subject:&#34;health&#34;)
</code></pre><ul>
<li>Remove CIMMYT URLs and citations from 277 journal articles on CGSpace since it is a bit tacky
<ul>
<li>I used this Jython expression in OpenRefine with <a href="https://citation.crosscite.org/docs.html">Crossref&rsquo;s content negotiation</a> to get citations for all DOIs:</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> urllib2
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>doi <span style="color:#f92672">=</span> cells[<span style="color:#e6db74">&#39;cg.identifier.doi[en_US]&#39;</span>]<span style="color:#f92672">.</span>value
</span></span><span style="display:flex;"><span>url <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https://api.crossref.org/works/&#34;</span> <span style="color:#f92672">+</span> doi <span style="color:#f92672">+</span> <span style="color:#e6db74">&#34;/transform/text/x-bibliography&#34;</span>
</span></span><span style="display:flex;"><span>useragent <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;Python (mailto:a.o@cgiar.org)&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>request <span style="color:#f92672">=</span> urllib2<span style="color:#f92672">.</span>Request(url<span style="color:#f92672">.</span>encode(<span style="color:#e6db74">&#34;utf-8&#34;</span>), headers<span style="color:#f92672">=</span>{<span style="color:#e6db74">&#34;User-Agent&#34;</span> : useragent})
</span></span><span style="display:flex;"><span>get <span style="color:#f92672">=</span> urllib2<span style="color:#f92672">.</span>urlopen(request)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">return</span> get<span style="color:#f92672">.</span>read()<span style="color:#f92672">.</span>decode(<span style="color:#e6db74">&#39;utf-8&#39;</span>)
</span></span></code></pre></div><ul>
<li>It took ten or so minutes for it to finish (and note this is Python 2 inside OpenRefine so I had to be careful with Unicode), but worked well!</li>
</ul>
<!-- raw HTML omitted -->