mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes
This commit is contained in:
@ -14,7 +14,7 @@ Work on CGSpace duplicate DOIs more
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2024-04/" />
|
||||
<meta property="article:published_time" content="2024-04-04T10:23:00+03:00" />
|
||||
<meta property="article:modified_time" content="2024-04-12T20:40:52+03:00" />
|
||||
<meta property="article:modified_time" content="2024-04-16T09:35:30+03:00" />
|
||||
|
||||
|
||||
|
||||
@ -24,7 +24,7 @@ Work on CGSpace duplicate DOIs more
|
||||
|
||||
Work on CGSpace duplicate DOIs more
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.124.1">
|
||||
<meta name="generator" content="Hugo 0.125.0">
|
||||
|
||||
|
||||
|
||||
@ -34,9 +34,9 @@ Work on CGSpace duplicate DOIs more
|
||||
"@type": "BlogPosting",
|
||||
"headline": "April, 2024",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2024-04/",
|
||||
"wordCount": "236",
|
||||
"wordCount": "352",
|
||||
"datePublished": "2024-04-04T10:23:00+03:00",
|
||||
"dateModified": "2024-04-12T20:40:52+03:00",
|
||||
"dateModified": "2024-04-16T09:35:30+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -155,7 +155,7 @@ Work on CGSpace duplicate DOIs more
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ time curl -s -o /dev/null 'https://cgspace.cgiar.org/server/api/discover/search/objects?query=cg.identifier.project%3AIFPRI*&scope=8f1e9650-fe87-4e6e-889a-1cacfb747408&page=0&size=100&embed=thumbnail,bundles/bitstreams&sort=dcterms.issued,desc'
|
||||
curl -s -o /dev/null 0.01s user 0.01s system 0% cpu 47.515 total
|
||||
$ time curl -s -o /dev/null 'https://cgspace.cgiar.org/server/api/discover/search/objects?query=cg.identifier.project%3AIFPRI*&scope=8f1e9650-fe87-4e6e-889a-1cacfb747408&page=0&size=100&sort=dcterms.issued,desc'
|
||||
$ time curl -s -o /dev/null 'https://cgspace.cgiar.org/server/api/discover/search/objects?query=cg.identifier.project%3AIFPRI*&scope=8f1e9650-fe87-4e6e-889a-1cacfb747408&page=0&size=100&sort=dcterms.issued,desc'
|
||||
curl -s -o /dev/null 0.01s user 0.01s system 0% cpu 4.764 total
|
||||
</code></pre><ul>
|
||||
<li>Finalize processing the remaining 206 items from the IFPRI 2022 batch set that already existed on CGSpace
|
||||
@ -168,6 +168,28 @@ curl -s -o /dev/null 0.01s user 0.01s system 0% cpu 4.764 total
|
||||
<h2 id="2024-04-16">2024-04-16</h2>
|
||||
<ul>
|
||||
<li>Spend some time looking at duplicate DOIs again…</li>
|
||||
<li>Assist Deborah with an advanced query on CGSpace for biodiversity and health:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>dcterms.issued:[2010 TO 2024] AND dcterms.type:"Journal Article" AND (dc.title:"biodiversity" OR dcterms.subject:"biodiversity" OR dc.title:"health" OR dcterms.subject:"health")
|
||||
</code></pre><ul>
|
||||
<li>Remove CIMMYT URLs and citations from 277 journal articles on CGSpace since it is a bit tacky
|
||||
<ul>
|
||||
<li>I used this Jython expression in OpenRefine with <a href="https://citation.crosscite.org/docs.html">Crossref’s content negotiation</a> to get citations for all DOIs:</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> urllib2
|
||||
</span></span><span style="display:flex;"><span>
|
||||
</span></span><span style="display:flex;"><span>doi <span style="color:#f92672">=</span> cells[<span style="color:#e6db74">'cg.identifier.doi[en_US]'</span>]<span style="color:#f92672">.</span>value
|
||||
</span></span><span style="display:flex;"><span>url <span style="color:#f92672">=</span> <span style="color:#e6db74">"https://api.crossref.org/works/"</span> <span style="color:#f92672">+</span> doi <span style="color:#f92672">+</span> <span style="color:#e6db74">"/transform/text/x-bibliography"</span>
|
||||
</span></span><span style="display:flex;"><span>useragent <span style="color:#f92672">=</span> <span style="color:#e6db74">"Python (mailto:a.o@cgiar.org)"</span>
|
||||
</span></span><span style="display:flex;"><span>
|
||||
</span></span><span style="display:flex;"><span>request <span style="color:#f92672">=</span> urllib2<span style="color:#f92672">.</span>Request(url<span style="color:#f92672">.</span>encode(<span style="color:#e6db74">"utf-8"</span>), headers<span style="color:#f92672">=</span>{<span style="color:#e6db74">"User-Agent"</span> : useragent})
|
||||
</span></span><span style="display:flex;"><span>get <span style="color:#f92672">=</span> urllib2<span style="color:#f92672">.</span>urlopen(request)
|
||||
</span></span><span style="display:flex;"><span>
|
||||
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">return</span> get<span style="color:#f92672">.</span>read()<span style="color:#f92672">.</span>decode(<span style="color:#e6db74">'utf-8'</span>)
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>It took ten or so minutes for it to finish (and note this is Python 2 inside OpenRefine so I had to be careful with Unicode), but worked well!</li>
|
||||
</ul>
|
||||
<!-- raw HTML omitted -->
|
||||
|
||||
|
Reference in New Issue
Block a user