cgspace-notes/docs/2024-06/index.html
2024-06-16 16:40:54 +03:00

229 lines
6.3 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html lang="en" >
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:title" content="June, 2024" />
<meta property="og:description" content="2024-06-03
Working on IFPRI datasets
I noticed the licenses were missing from Nilam&rsquo;s original file so I found a way to check Dataverse&rsquo;s API for a persistent identifier
We have both Handles and DOIs for these datasets, both from Harvard&rsquo;s Dataverse
" />
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2024-06/" />
<meta property="article:published_time" content="2024-06-03T14:14:00+03:00" />
<meta property="article:modified_time" content="2024-06-03T17:31:03+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="June, 2024"/>
<meta name="twitter:description" content="2024-06-03
Working on IFPRI datasets
I noticed the licenses were missing from Nilam&rsquo;s original file so I found a way to check Dataverse&rsquo;s API for a persistent identifier
We have both Handles and DOIs for these datasets, both from Harvard&rsquo;s Dataverse
"/>
<meta name="generator" content="Hugo 0.127.0">
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "June, 2024",
"url": "https://alanorth.github.io/cgspace-notes/2024-06/",
"wordCount": "126",
"datePublished": "2024-06-03T14:14:00+03:00",
"dateModified": "2024-06-03T17:31:03+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"keywords": "Notes"
}
</script>
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2024-06/">
<title>June, 2024 | CGSpace Notes</title>
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.c6ba80bc50669557645abe05f86b73cc5af84408ed20f1551a267bc19ece8228.css" rel="stylesheet" integrity="sha256-xrqAvFBmlVdkWr4F&#43;GtzzFr4RAjtIPFVGiZ7wZ7Ogig=" crossorigin="anonymous">
<!-- minified Font Awesome for SVG icons -->
<script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f5072c55a0721857184db93a50561d7dc13975b4de2e19db7f81eb5f3fa57270.js" integrity="sha256-9QcsVaByGFcYTbk6UFYdfcE5dbTeLhnbf4HrXz&#43;lcnA=" crossorigin="anonymous"></script>
<!-- RSS 2.0 feed -->
</head>
<body>
<div class="blog-masthead">
<div class="container">
<nav class="nav blog-nav">
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
</nav>
</div>
</div>
<header class="blog-header">
<div class="container">
<h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
<p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
</div>
</header>
<div class="container">
<div class="row">
<div class="col-sm-8 blog-main">
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2024-06/">June, 2024</a></h2>
<p class="blog-post-meta">
<time datetime="2024-06-03T14:14:00+03:00">Mon Jun 03, 2024</time>
in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2024-06-03">2024-06-03</h2>
<ul>
<li>Working on IFPRI datasets
<ul>
<li>I noticed the licenses were missing from Nilam&rsquo;s original file so I found a way to check <a href="https://guides.dataverse.org/en/latest/api/native-api.html#export-metadata-of-a-dataset-in-various-formats">Dataverse&rsquo;s API for a persistent identifier</a></li>
<li>We have both Handles and DOIs for these datasets, both from Harvard&rsquo;s Dataverse</li>
</ul>
</li>
</ul>
<ul>
<li>I used this GREL in OpenRefine to create a new column based on URLs using the DOI (uppercasing the DOI for Dataverse):</li>
</ul>
<pre tabindex="0"><code>&#34;https://dataverse.harvard.edu/api/datasets/export?exporter=dataverse_json&amp;persistentId=doi:&#34; + value.split(&#39;https://doi.org/&#39;)[-1].toUppercase()
</code></pre><ul>
<li>Then I was able to extract the license text from the JSON response using:</li>
</ul>
<pre tabindex="0"><code>value.parseJson()[&#39;datasetVersion&#39;][&#39;termsOfUse&#39;]
</code></pre><ul>
<li>Similar for the Handle&hellip;</li>
</ul>
<h2 id="2024-06-04">2024-06-04</h2>
<ul>
<li>Some Dataverse entries have the license in <code>['datasetVersion']['license']</code> instead&hellip;</li>
<li>I finalized cleaning the 722 IFPRI datasets and uploaded them to CGSpace</li>
</ul>
<h2 id="2024-06-14">2024-06-14</h2>
<ul>
<li>Minor cleanups on IFPRI&rsquo;s 20162019 batch migration file
<ul>
<li>I will start with duplicates on unique identifiers like DOIs</li>
</ul>
</li>
</ul>
<!-- raw HTML omitted -->
</article>
</div> <!-- /.blog-main -->
<aside class="col-sm-3 ml-auto blog-sidebar">
<section class="sidebar-module">
<h4>Recent Posts</h4>
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2024-06/">June, 2024</a></li>
<li><a href="/cgspace-notes/2024-05/">May, 2024</a></li>
<li><a href="/cgspace-notes/2024-04/">April, 2024</a></li>
<li><a href="/cgspace-notes/2024-03/">March, 2024</a></li>
<li><a href="/cgspace-notes/2024-02/">February, 2024</a></li>
</ol>
</section>
<section class="sidebar-module">
<h4>Links</h4>
<ol class="list-unstyled">
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
</ol>
</section>
</aside>
</div> <!-- /.row -->
</div> <!-- /.container -->
<footer class="blog-footer">
<p dir="auto">
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
</p>
<p>
<a href="#">Back to top</a>
</p>
</footer>
</body>
</html>