cgspace-notes/docs/2023-10/index.html
2023-10-04 09:24:33 +03:00

241 lines
7.8 KiB
HTML

<!DOCTYPE html>
<html lang="en" >
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:title" content="October, 2023" />
<meta property="og:description" content="2023-10-02
Export CGSpace to check DOIs against Crossref
I found that Crossref&rsquo;s metadata is in the public domain under the CC0 license
One interesting thing is the abstracts, which are copyrighted by the copyright owner, meaning Crossref cannot waive the copyright under the terms of the CC0 license, because it is not theirs to waive
We can be on the safe side by using only abstracts for items that are licensed under Creative Commons
" />
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2023-10/" />
<meta property="article:published_time" content="2023-10-02T09:05:36+03:00" />
<meta property="article:modified_time" content="2023-10-02T09:05:36+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="October, 2023"/>
<meta name="twitter:description" content="2023-10-02
Export CGSpace to check DOIs against Crossref
I found that Crossref&rsquo;s metadata is in the public domain under the CC0 license
One interesting thing is the abstracts, which are copyrighted by the copyright owner, meaning Crossref cannot waive the copyright under the terms of the CC0 license, because it is not theirs to waive
We can be on the safe side by using only abstracts for items that are licensed under Creative Commons
"/>
<meta name="generator" content="Hugo 0.119.0">
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "October, 2023",
"url": "https://alanorth.github.io/cgspace-notes/2023-10/",
"wordCount": "286",
"datePublished": "2023-10-02T09:05:36+03:00",
"dateModified": "2023-10-02T09:05:36+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"keywords": "Notes"
}
</script>
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2023-10/">
<title>October, 2023 | CGSpace Notes</title>
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.c6ba80bc50669557645abe05f86b73cc5af84408ed20f1551a267bc19ece8228.css" rel="stylesheet" integrity="sha256-xrqAvFBmlVdkWr4F&#43;GtzzFr4RAjtIPFVGiZ7wZ7Ogig=" crossorigin="anonymous">
<!-- minified Font Awesome for SVG icons -->
<script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f5072c55a0721857184db93a50561d7dc13975b4de2e19db7f81eb5f3fa57270.js" integrity="sha256-9QcsVaByGFcYTbk6UFYdfcE5dbTeLhnbf4HrXz&#43;lcnA=" crossorigin="anonymous"></script>
<!-- RSS 2.0 feed -->
</head>
<body>
<div class="blog-masthead">
<div class="container">
<nav class="nav blog-nav">
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
</nav>
</div>
</div>
<header class="blog-header">
<div class="container">
<h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
<p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
</div>
</header>
<div class="container">
<div class="row">
<div class="col-sm-8 blog-main">
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2023-10/">October, 2023</a></h2>
<p class="blog-post-meta">
<time datetime="2023-10-02T09:05:36+03:00">Mon Oct 02, 2023</time>
in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2023-10-02">2023-10-02</h2>
<ul>
<li>Export CGSpace to check DOIs against Crossref
<ul>
<li>I found that <a href="https://www.crossref.org/documentation/retrieve-metadata/rest-api/rest-api-metadata-license-information/">Crossref&rsquo;s metadata is in the public domain under the CC0 license</a></li>
<li>One interesting thing is the abstracts, which are copyrighted by the copyright owner, meaning Crossref cannot waive the copyright under the terms of the CC0 license, because it is not theirs to waive</li>
<li>We can be on the safe side by using only abstracts for items that are licensed under Creative Commons</li>
</ul>
</li>
</ul>
<ul>
<li>This GREL extracts the <em>text</em> content of the <code>&lt;jats:p&gt;</code> tags (ie, no other JATS XML markup tags like <code>&lt;jats:i&gt;</code>, <code>&lt;jats:sub&gt;</code>, etc):</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>forEach(value.parseXml().select(&#34;jats|p&#34;),i,i.xmlText()).join(&#34;&#34;)
</span></span></code></pre></div><ul>
<li>Note that we need to use <code>select(&quot;jats|p&quot;)</code> instead of <code>select(&quot;jats:p&quot;)</code> for OpenRefine&rsquo;s parseXml, and we need to <code>join()</code> on the end</li>
<li>I updated metadata for about 3,000 items using Crossref metadata
<ul>
<li>I stripped trailing periods for titles where they were missing on the Crossref titles</li>
<li>I copied abstracts for about 600 items that were missing them, for items that were Creative Commons</li>
<li>I updated publishers for a few thousand more where ours and Crossref disagreed, checking a handful manually first</li>
</ul>
</li>
<li>I also added subjects to the <code>crossref_doi_lookup.py</code> script to see if they will be useful for us
<ul>
<li>When checking with csv-metadata-quality I can validate those subjects against AGROVOC and add them if they are valid</li>
</ul>
</li>
</ul>
<h2 id="2023-10-03">2023-10-03</h2>
<ul>
<li>I added the item type to the collection subscription email on DSpace 6
<ul>
<li>It&rsquo;s done differently on DSpace 7 so I&rsquo;ll have to see how to do it there&hellip;</li>
</ul>
</li>
<li>Test a patch that fixes a bug with item versioning disabled in DSpace 7
<ul>
<li>I hadn&rsquo;t realized that DSpace 7 defaulted to versioning being enabled, whereas we never used this in DSpace 6 (yet)</li>
</ul>
</li>
<li>Submit <a href="https://github.com/DSpace/DSpace/issues/9104">an issue regarding duplicate Discovery sort fields</a> in DSpace 7</li>
</ul>
<!-- raw HTML omitted -->
</article>
</div> <!-- /.blog-main -->
<aside class="col-sm-3 ml-auto blog-sidebar">
<section class="sidebar-module">
<h4>Recent Posts</h4>
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2023-10/">October, 2023</a></li>
<li><a href="/cgspace-notes/2023-09/">September, 2023</a></li>
<li><a href="/cgspace-notes/2023-08/">August, 2023</a></li>
<li><a href="/cgspace-notes/2023-07/">July, 2023</a></li>
<li><a href="/cgspace-notes/2023-06/">June, 2023</a></li>
</ol>
</section>
<section class="sidebar-module">
<h4>Links</h4>
<ol class="list-unstyled">
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
</ol>
</section>
</aside>
</div> <!-- /.row -->
</div> <!-- /.container -->
<footer class="blog-footer">
<p dir="auto">
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
</p>
<p>
<a href="#">Back to top</a>
</p>
</footer>
</body>
</html>