cgspace-notes/docs/2023-06/index.html

302 lines
14 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html lang="en" >
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:title" content="June, 2023" />
<meta property="og:description" content="2023-06-02
Spend some time testing my post_bitstreams.py script to update thumbnails for items on CGSpace
Interestingly I found an item with a JFIF thumbnail and another with a WebP thumbnail&hellip;
Meeting with Valentina, Stefano, and Sara about MODS metadata in CGSpace
They have experience with improving the MODS interface in MELSpace&rsquo;s OAI-PMH for use with AGRIS and were curious if we could do the same in CGSpace
From what I can see we need to upgrade the MODS schema from 3.1 to 3.7 and then just add a bunch of our fields to the crosswalk
" />
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2023-06/" />
<meta property="article:published_time" content="2023-06-02T10:29:36+03:00" />
<meta property="article:modified_time" content="2023-06-08T17:04:20+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="June, 2023"/>
<meta name="twitter:description" content="2023-06-02
Spend some time testing my post_bitstreams.py script to update thumbnails for items on CGSpace
Interestingly I found an item with a JFIF thumbnail and another with a WebP thumbnail&hellip;
Meeting with Valentina, Stefano, and Sara about MODS metadata in CGSpace
They have experience with improving the MODS interface in MELSpace&rsquo;s OAI-PMH for use with AGRIS and were curious if we could do the same in CGSpace
From what I can see we need to upgrade the MODS schema from 3.1 to 3.7 and then just add a bunch of our fields to the crosswalk
"/>
<meta name="generator" content="Hugo 0.112.3">
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "June, 2023",
"url": "https://alanorth.github.io/cgspace-notes/2023-06/",
"wordCount": "636",
"datePublished": "2023-06-02T10:29:36+03:00",
"dateModified": "2023-06-08T17:04:20+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"keywords": "Notes"
}
</script>
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2023-06/">
<title>June, 2023 | CGSpace Notes</title>
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.c6ba80bc50669557645abe05f86b73cc5af84408ed20f1551a267bc19ece8228.css" rel="stylesheet" integrity="sha256-xrqAvFBmlVdkWr4F&#43;GtzzFr4RAjtIPFVGiZ7wZ7Ogig=" crossorigin="anonymous">
<!-- minified Font Awesome for SVG icons -->
<script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f5072c55a0721857184db93a50561d7dc13975b4de2e19db7f81eb5f3fa57270.js" integrity="sha256-9QcsVaByGFcYTbk6UFYdfcE5dbTeLhnbf4HrXz&#43;lcnA=" crossorigin="anonymous"></script>
<!-- RSS 2.0 feed -->
</head>
<body>
<div class="blog-masthead">
<div class="container">
<nav class="nav blog-nav">
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
</nav>
</div>
</div>
<header class="blog-header">
<div class="container">
<h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
<p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
</div>
</header>
<div class="container">
<div class="row">
<div class="col-sm-8 blog-main">
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2023-06/">June, 2023</a></h2>
<p class="blog-post-meta">
<time datetime="2023-06-02T10:29:36+03:00">Fri Jun 02, 2023</time>
in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2023-06-02">2023-06-02</h2>
<ul>
<li>Spend some time testing my <code>post_bitstreams.py</code> script to update thumbnails for items on CGSpace
<ul>
<li>Interestingly I found an item with a JFIF thumbnail and another with a WebP thumbnail&hellip;</li>
</ul>
</li>
<li>Meeting with Valentina, Stefano, and Sara about MODS metadata in CGSpace
<ul>
<li>They have experience with improving the MODS interface in MELSpace&rsquo;s OAI-PMH for use with AGRIS and were curious if we could do the same in CGSpace</li>
<li>From what I can see we need to upgrade the MODS schema from 3.1 to 3.7 and then just add a bunch of our fields to the crosswalk</li>
</ul>
</li>
</ul>
<h2 id="2023-06-04">2023-06-04</h2>
<ul>
<li>Upgrade CGSpace to Ubuntu 22.04
<ul>
<li>The upgrade was mostly normal, but I had to unhold the openjdk package in order for <code>do-release-upgrade</code> to run:</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># apt-mark hold openjdk-8-jdk-headless:amd64 openjdk-8-jre-headless:amd64
</span></span></code></pre></div><ul>
<li>In <a href="/cgspace-notes/2022-11/">2022-11</a> an upstream Java update broke the DSpace 6 Handle server so we will have to pin this again after the upgrade to Ubuntu 22.04</li>
<li>After the upgrade I made sure CGSpace was working, then proceeded to upgrade PostgreSQL from 12 to 14, like I did on <a href="/cgspace-notes/2023-03/">DSpace Test in 2023-03</a></li>
<li>Then I had to downgrade OpenJDK to fix the Handle server using the ones I had previously downloaded for Ubuntu 20.04 because they no longer exist on Launchpad:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># dpkg -i openjdk-8-j*8u342-b07*.deb
</span></span></code></pre></div><ul>
<li>Export CGSpace to fix missing Initiative collection mappings</li>
<li>Start a harvest on AReS</li>
<li>Work on the DSpace 7 migration a bit more
<ul>
<li>I decided to rebase and drop all the submission form edits because they conflict every time upstream changes!</li>
</ul>
</li>
</ul>
<h2 id="2023-06-06">2023-06-06</h2>
<ul>
<li>Fix some incorrect ORCID identifiers for an Alliance author on CGSpace</li>
<li>Export our list of ORCID identifiers, resolve them, and update the records in CGSpace:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ cat dspace/config/controlled-vocabularies/cg-creator-identifier.xml 2022-09-22-add-orcids.csv| grep -oE <span style="color:#e6db74">&#39;[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}&#39;</span> | sort -u &gt; /tmp/2023-06-06-orcids.txt
</span></span><span style="display:flex;"><span>$ ./ilri/resolve_orcids.py -i /tmp/2023-06-06-orcids.txt -o /tmp/2023-06-06-orcids-names.txt -d
</span></span><span style="display:flex;"><span>$ ./ilri/update_orcids.py -i /tmp/2023-06-06-orcids-names.txt -db dspacetest -u dspace -p <span style="color:#e6db74">&#39;ffff&#39;</span> -m <span style="color:#ae81ff">247</span>
</span></span></code></pre></div><ul>
<li>Start working on updating the MODS schema in CGSpace from 3.1 to 3.8 based on Stefano and Salem&rsquo;s work last year</li>
</ul>
<h2 id="2023-06-08">2023-06-08</h2>
<ul>
<li>Continue working on the MODS schema mapping</li>
<li>Export CGSpace to check and update <code>dcterms.extent</code> fields
<ul>
<li>I normalized about 1,500 to use either &ldquo;p. 1-6&rdquo; or &ldquo;5 p.&rdquo; format</li>
<li>Also, I used this GREL expression to extract missing pages from the citation field: <code>cells['dcterms.bibliographicCitation[en_US]'].value.match(/.*(pp?\.\s?\d+[-]\d+).*/)[0]</code></li>
<li>This was over 4,000 items with a format like &ldquo;p. 1-6&rdquo; and &ldquo;pp. 1-6&rdquo; in the citation</li>
<li>I used another GREL expression to extract another 5,000: <code>cells['dcterms.bibliographicCitation[en_US]'].value.match(/.*?(\d+\s+?[Pp]+\.).*/)[0]</code></li>
<li>This was for the format like &ldquo;1 p.&rdquo; (note we had to protect against the greedy <code>.*</code> in the beginning)</li>
</ul>
</li>
<li>I also did some work to capture a handful of missing DOIs and ISSNs, but it was only about 100 items and I will have to wait until the 10,000+ above finish importing</li>
</ul>
<h2 id="2023-06-09">2023-06-09</h2>
<ul>
<li>I see there are ~200 users in CGSpace that have registered with their CGIAR email address using a password as opposed to using Active Directory:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sql" data-lang="sql"><span style="display:flex;"><span><span style="color:#66d9ef">SELECT</span> <span style="color:#f92672">*</span> <span style="color:#66d9ef">FROM</span> eperson <span style="color:#66d9ef">WHERE</span> email <span style="color:#66d9ef">LIKE</span> <span style="color:#e6db74">&#39;%cgiar.org&#39;</span> <span style="color:#66d9ef">AND</span> netid <span style="color:#66d9ef">IS</span> <span style="color:#66d9ef">NOT</span> <span style="color:#66d9ef">NULL</span> <span style="color:#66d9ef">AND</span> password <span style="color:#66d9ef">IS</span> <span style="color:#66d9ef">NOT</span> <span style="color:#66d9ef">NULL</span>;
</span></span></code></pre></div><ul>
<li>I am wondering if I should delete their passwords and tell them use log in using LDAP
<ul>
<li>As an initial test I will reset a few accounts including my own that have passwords and salts:</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sql" data-lang="sql"><span style="display:flex;"><span><span style="color:#66d9ef">UPDATE</span> eperson <span style="color:#66d9ef">SET</span> password<span style="color:#f92672">=</span><span style="color:#66d9ef">DEFAULT</span>,salt<span style="color:#f92672">=</span><span style="color:#66d9ef">DEFAULT</span>,digest_algorithm<span style="color:#f92672">=</span><span style="color:#66d9ef">DEFAULT</span> <span style="color:#66d9ef">WHERE</span> netid <span style="color:#66d9ef">IN</span> (<span style="color:#e6db74">&#39;axxxx&#39;</span>, <span style="color:#e6db74">&#39;axxxx&#39;</span>, <span style="color:#e6db74">&#39;bxxxx&#39;</span>);
</span></span></code></pre></div><ul>
<li>I also decided to reset passwords/salts for CGIAR accounts that have not been active since 2021 (1.5 years ago):</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sql" data-lang="sql"><span style="display:flex;"><span><span style="color:#66d9ef">UPDATE</span> eperson <span style="color:#66d9ef">SET</span> password<span style="color:#f92672">=</span><span style="color:#66d9ef">DEFAULT</span>,salt<span style="color:#f92672">=</span><span style="color:#66d9ef">DEFAULT</span>,digest_algorithm<span style="color:#f92672">=</span><span style="color:#66d9ef">DEFAULT</span> <span style="color:#66d9ef">WHERE</span> email <span style="color:#66d9ef">LIKE</span> <span style="color:#e6db74">&#39;%cgiar.org&#39;</span> <span style="color:#66d9ef">AND</span> netid <span style="color:#66d9ef">IS</span> <span style="color:#66d9ef">NOT</span> <span style="color:#66d9ef">NULL</span> <span style="color:#66d9ef">AND</span> password <span style="color:#66d9ef">IS</span> <span style="color:#66d9ef">NOT</span> <span style="color:#66d9ef">NULL</span> <span style="color:#66d9ef">AND</span> salt <span style="color:#66d9ef">IS</span> <span style="color:#66d9ef">NOT</span> <span style="color:#66d9ef">NULL</span> <span style="color:#66d9ef">AND</span> last_active <span style="color:#f92672">&lt;</span> <span style="color:#e6db74">&#39;2022-01-01&#39;</span>::date;
</span></span></code></pre></div><ul>
<li>This was about 100 accounts&hellip;
<ul>
<li>I will wait some more time before I decide what to do about the more current ones</li>
</ul>
</li>
<li>Add a few more ORCID identifiers to my list and tag them on CGSpace</li>
</ul>
<h2 id="2023-06-10">2023-06-10</h2>
<ul>
<li>Export CGSpace to check for missing Initiative mappings
<ul>
<li>Start a harvest on AReS</li>
</ul>
</li>
</ul>
<!-- raw HTML omitted -->
</article>
</div> <!-- /.blog-main -->
<aside class="col-sm-3 ml-auto blog-sidebar">
<section class="sidebar-module">
<h4>Recent Posts</h4>
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2023-06/">June, 2023</a></li>
<li><a href="/cgspace-notes/2023-05/">May, 2023</a></li>
<li><a href="/cgspace-notes/2023-04/">April, 2023</a></li>
<li><a href="/cgspace-notes/2023-03/">March, 2023</a></li>
<li><a href="/cgspace-notes/2023-02/">February, 2023</a></li>
</ol>
</section>
<section class="sidebar-module">
<h4>Links</h4>
<ol class="list-unstyled">
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
</ol>
</section>
</aside>
</div> <!-- /.row -->
</div> <!-- /.container -->
<footer class="blog-footer">
<p dir="auto">
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
</p>
<p>
<a href="#">Back to top</a>
</p>
</footer>
</body>
</html>