cgspace-notes/docs/index.html

431 lines
16 KiB
HTML

<!DOCTYPE html>
<html lang="en" >
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:title" content="CGSpace Notes" />
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-10-04T09:24:33+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.119.0">
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "Blog",
"headline": "CGSpace Notes",
"url" : "https://alanorth.github.io/cgspace-notes/",
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"dateModified": "2023-10-02T09:05:36+03:00",
"keywords": "notes, migration, notes",
"description":"Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."
}
</script>
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/">
<title>CGSpace Notes</title>
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.c6ba80bc50669557645abe05f86b73cc5af84408ed20f1551a267bc19ece8228.css" rel="stylesheet" integrity="sha256-xrqAvFBmlVdkWr4F&#43;GtzzFr4RAjtIPFVGiZ7wZ7Ogig=" crossorigin="anonymous">
<!-- minified Font Awesome for SVG icons -->
<script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f5072c55a0721857184db93a50561d7dc13975b4de2e19db7f81eb5f3fa57270.js" integrity="sha256-9QcsVaByGFcYTbk6UFYdfcE5dbTeLhnbf4HrXz&#43;lcnA=" crossorigin="anonymous"></script>
<!-- RSS 2.0 feed -->
<link rel="alternate" type="application/rss+xml" href="https://alanorth.github.io/cgspace-notes/index.xml" title="CGSpace Notes" />
</head>
<body>
<div class="blog-masthead">
<div class="container">
<nav class="nav blog-nav">
<a class="nav-link active" href="https://alanorth.github.io/cgspace-notes/">Home</a>
</nav>
</div>
</div>
<header class="blog-header">
<div class="container">
<h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
<p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
</div>
</header>
<div class="container">
<div class="row">
<div class="col-sm-8 blog-main">
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2023-10/">October, 2023</a></h2>
<p class="blog-post-meta"><time datetime="2023-10-02T09:05:36+03:00">Mon Oct 02, 2023</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2023-10-02">2023-10-02</h2>
<ul>
<li>Export CGSpace to check DOIs against Crossref
<ul>
<li>I found that <a href="https://www.crossref.org/documentation/retrieve-metadata/rest-api/rest-api-metadata-license-information/">Crossref&rsquo;s metadata is in the public domain under the CC0 license</a></li>
<li>One interesting thing is the abstracts, which are copyrighted by the copyright owner, meaning Crossref cannot waive the copyright under the terms of the CC0 license, because it is not theirs to waive</li>
<li>We can be on the safe side by using only abstracts for items that are licensed under Creative Commons</li>
</ul>
</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2023-10/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2023-09/">September, 2023</a></h2>
<p class="blog-post-meta"><time datetime="2023-09-02T17:29:36+03:00">Sat Sep 02, 2023</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2023-09-02">2023-09-02</h2>
<ul>
<li>Export CGSpace to check for missing Initiative collection mappings</li>
<li>Start a harvest on AReS</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2023-09/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2023-08/">August, 2023</a></h2>
<p class="blog-post-meta"><time datetime="2023-08-03T11:18:36+03:00">Thu Aug 03, 2023</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2023-08-03">2023-08-03</h2>
<ul>
<li>I finally got around to working on Peter&rsquo;s cleanups for affiliations, authors, and donors from last week
<ul>
<li>I did some minor cleanups myself and applied them to CGSpace</li>
</ul>
</li>
<li>Start working on some batch uploads for IFPRI</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2023-08/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2023-07/">July, 2023</a></h2>
<p class="blog-post-meta"><time datetime="2023-07-01T17:14:36+03:00">Sat Jul 01, 2023</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
2023-07-01 Export CGSpace to check for missing Initiative collection mappings Start harvesting on AReS 2023-07-02 Minor edits to the crossref_doi_lookup.py script while running some checks from 22,000 CGSpace DOIs 2023-07-03 I analyzed the licenses declared by Crossref and found with high confidence that ~400 of ours were incorrect I took the more accurate ones from Crossref and updated the items on CGSpace I took a few hundred ISBNs as well for where we were missing them I also tagged ~4,700 items with missing licenses as &ldquo;Copyrighted; all rights reserved&rdquo; based on their Crossref license status being TDM, mostly from Elsevier, Wiley, and Springer Checking a dozen or so manually, I confirmed that if Crossref only has a TDM license then it&rsquo;s usually copyrighted (could still be open access, but we can&rsquo;t tell via Crossref) I would be curious to write a script to check the Unpaywall API for open access status&hellip; In the past I found that their license status was not very accurate, but the open access status might be more reliable More minor work on the DSpace 7 item views I learned some new Angular template syntax I created a custom component to show Creative Commons licenses on the simple item page I also decided that I don&rsquo;t like the Impact Area icons as a component because they don&rsquo;t have any visual meaning 2023-07-04 Focus group meeting with CGSpace partners about DSpace 7 I added a themed file selection component to the CGSpace theme It displays the bistream description instead of the file name, just like we did in DSpace 6 XMLUI I added a custom component to show share icons 2023-07-05 I spent some time trying to update OpenRXV from Angular 9 to 10 to 11 to 12 to 13 Most things work but there are some minor bugs it seems Mishell from CIP emailed me to say she was having problems approving an item on CGSpace Looking at PostgreSQL I saw there were a dozen or so locks that were several hours and even over one day old so I killed those processes and told her to try again 2023-07-06 Types meeting I wrote a Python script to check Unpaywall for some information about DOIs 2023-07-7 Continue exploring Unpaywall data for some of our DOIs In the past I&rsquo;ve found their licensing information to not be very reliable (preferring Crossref), but I think their open access status is more reliable, especially when the provider is listed as being the publisher Even so, sometimes the version can be &ldquo;acceptedVersion&rdquo;, which is presumably the author&rsquo;s version, as opposed to the &ldquo;publishedVersion&rdquo;, which means it&rsquo;s available as open access on the publisher&rsquo;s website I did some quality assurance and found ~100 that were marked as Limited Access, but should have been Open Access, and fixed a handful of licenses Delete duplicate metadata as describe in my DSpace issue from last year: https://github.
<a href='https://alanorth.github.io/cgspace-notes/2023-07/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2023-06/">June, 2023</a></h2>
<p class="blog-post-meta"><time datetime="2023-06-02T10:29:36+03:00">Fri Jun 02, 2023</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2023-06-02">2023-06-02</h2>
<ul>
<li>Spend some time testing my <code>post_bitstreams.py</code> script to update thumbnails for items on CGSpace
<ul>
<li>Interestingly I found an item with a JFIF thumbnail and another with a WebP thumbnail&hellip;</li>
</ul>
</li>
<li>Meeting with Valentina, Stefano, and Sara about MODS metadata in CGSpace
<ul>
<li>They have experience with improving the MODS interface in MELSpace&rsquo;s OAI-PMH for use with AGRIS and were curious if we could do the same in CGSpace</li>
<li>From what I can see we need to upgrade the MODS schema from 3.1 to 3.7 and then just add a bunch of our fields to the crosswalk</li>
</ul>
</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2023-06/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2023-05/">May, 2023</a></h2>
<p class="blog-post-meta"><time datetime="2023-05-03T08:53:36+03:00">Wed May 03, 2023</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2023-05-03">2023-05-03</h2>
<ul>
<li>Alliance&rsquo;s TIP team emailed me to ask about issues authenticating on CGSpace
<ul>
<li>It seems their password expired, which is annoying</li>
</ul>
</li>
<li>I continued looking at the CGSpace subjects for the FAO / AGROVOC exercise that I started last week
<ul>
<li>There are many of our subjects that would match if they added a &ldquo;-&rdquo; like &ldquo;high yielding varieties&rdquo; or used singular&hellip;</li>
<li>Also I found at least two spelling mistakes, for example &ldquo;decison support systems&rdquo;, which would match if it was spelled correctly</li>
</ul>
</li>
<li>Work on cleaning, proofing, and uploading twenty-seven records for IFPRI to CGSpace</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2023-05/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2023-04/">April, 2023</a></h2>
<p class="blog-post-meta"><time datetime="2023-04-02T08:19:36+03:00">Sun Apr 02, 2023</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2023-04-02">2023-04-02</h2>
<ul>
<li>Run all system updates on CGSpace and reboot it</li>
<li>I exported CGSpace to CSV to check for any missing Initiative collection mappings
<ul>
<li>I also did a check for missing country/region mappings with csv-metadata-quality</li>
</ul>
</li>
<li>Start a harvest on AReS</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2023-04/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2023-03/">March, 2023</a></h2>
<p class="blog-post-meta"><time datetime="2023-03-01T07:58:36+03:00">Wed Mar 01, 2023</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2023-03-01">2023-03-01</h2>
<ul>
<li>Remove <code>cg.subject.wle</code> and <code>cg.identifier.wletheme</code> from CGSpace input form after confirming with IWMI colleagues that they no longer need them (WLE closed in 2021)</li>
<li><a href="https://salsa.debian.org/iso-codes-team/iso-codes/-/blob/main/CHANGELOG.md#4130-2023-02-28">iso-codes 4.13.0 was released</a>, which incorporates my changes to the common names for Iran, Laos, and Syria</li>
<li>I finally got through with porting the input form from DSpace 6 to DSpace 7</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2023-03/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2023-02/">February, 2023</a></h2>
<p class="blog-post-meta"><time datetime="2023-02-01T10:57:36+03:00">Wed Feb 01, 2023</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2023-02-01">2023-02-01</h2>
<ul>
<li>Export CGSpace to cross check the DOI metadata with Crossref
<ul>
<li>I want to try to expand my use of their data to journals, publishers, volumes, issues, etc&hellip;</li>
</ul>
</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2023-02/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2023-01/">January, 2023</a></h2>
<p class="blog-post-meta"><time datetime="2023-01-01T08:44:36+03:00">Sun Jan 01, 2023</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2023-01-01">2023-01-01</h2>
<ul>
<li>Apply some more ORCID identifiers to items on CGSpace using my <code>2022-09-22-add-orcids.csv</code> file
<ul>
<li>I want to update all ORCID names and refresh them in the database</li>
<li>I see we have some new ones that aren&rsquo;t in our list if I combine with this file:</li>
</ul>
</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2023-01/'>Read more →</a>
</article>
<nav class="blog-pagination">
<a class="btn btn-outline-primary disabled" href="#" role="button" aria-disabled="true">Previous page</a>
<a class="btn btn-outline-primary" href="/cgspace-notes/page/2/" rel="next" role="button">Next page</a>
</nav>
</div> <!-- /.blog-main -->
<aside class="col-sm-3 ml-auto blog-sidebar">
<section class="sidebar-module">
<h4>Recent Posts</h4>
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2023-10/">October, 2023</a></li>
<li><a href="/cgspace-notes/2023-09/">September, 2023</a></li>
<li><a href="/cgspace-notes/2023-08/">August, 2023</a></li>
<li><a href="/cgspace-notes/2023-07/">July, 2023</a></li>
<li><a href="/cgspace-notes/2023-06/">June, 2023</a></li>
</ol>
</section>
<section class="sidebar-module">
<h4>Links</h4>
<ol class="list-unstyled">
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
</ol>
</section>
</aside>
</div> <!-- /.row -->
</div> <!-- /.container -->
<footer class="blog-footer">
<p dir="auto">
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
</p>
<p>
<a href="#">Back to top</a>
</p>
</footer>
</body>
</html>