cgspace-notes/docs/2023-04/index.html

806 lines
51 KiB
HTML

<!DOCTYPE html>
<html lang="en" >
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:title" content="April, 2023" />
<meta property="og:description" content="2023-04-02
Run all system updates on CGSpace and reboot it
I exported CGSpace to CSV to check for any missing Initiative collection mappings
I also did a check for missing country/region mappings with csv-metadata-quality
Start a harvest on AReS
" />
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2023-04/" />
<meta property="article:published_time" content="2023-04-02T08:19:36+03:00" />
<meta property="article:modified_time" content="2023-05-04T14:44:51+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="April, 2023"/>
<meta name="twitter:description" content="2023-04-02
Run all system updates on CGSpace and reboot it
I exported CGSpace to CSV to check for any missing Initiative collection mappings
I also did a check for missing country/region mappings with csv-metadata-quality
Start a harvest on AReS
"/>
<meta name="generator" content="Hugo 0.123.0">
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "April, 2023",
"url": "https://alanorth.github.io/cgspace-notes/2023-04/",
"wordCount": "2490",
"datePublished": "2023-04-02T08:19:36+03:00",
"dateModified": "2023-05-04T14:44:51+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"keywords": "Notes"
}
</script>
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2023-04/">
<title>April, 2023 | CGSpace Notes</title>
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.c6ba80bc50669557645abe05f86b73cc5af84408ed20f1551a267bc19ece8228.css" rel="stylesheet" integrity="sha256-xrqAvFBmlVdkWr4F&#43;GtzzFr4RAjtIPFVGiZ7wZ7Ogig=" crossorigin="anonymous">
<!-- minified Font Awesome for SVG icons -->
<script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f5072c55a0721857184db93a50561d7dc13975b4de2e19db7f81eb5f3fa57270.js" integrity="sha256-9QcsVaByGFcYTbk6UFYdfcE5dbTeLhnbf4HrXz&#43;lcnA=" crossorigin="anonymous"></script>
<!-- RSS 2.0 feed -->
</head>
<body>
<div class="blog-masthead">
<div class="container">
<nav class="nav blog-nav">
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
</nav>
</div>
</div>
<header class="blog-header">
<div class="container">
<h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
<p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
</div>
</header>
<div class="container">
<div class="row">
<div class="col-sm-8 blog-main">
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2023-04/">April, 2023</a></h2>
<p class="blog-post-meta">
<time datetime="2023-04-02T08:19:36+03:00">Sun Apr 02, 2023</time>
in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2023-04-02">2023-04-02</h2>
<ul>
<li>Run all system updates on CGSpace and reboot it</li>
<li>I exported CGSpace to CSV to check for any missing Initiative collection mappings
<ul>
<li>I also did a check for missing country/region mappings with csv-metadata-quality</li>
</ul>
</li>
<li>Start a harvest on AReS</li>
</ul>
<ul>
<li>I&rsquo;m starting to get annoyed at my shell script for doing ImageMagick tests and looking to re-write it in something object oriented like Python
<ul>
<li>There doesn&rsquo;t seem to be an official ImageMagick Python binding on pypi.org, perhaps I can use <a href="https://docs.wand-py.org">Wand</a>?</li>
</ul>
</li>
<li>Testing Wand in Python:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">from</span> wand.image <span style="color:#f92672">import</span> Image
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">with</span> Image(filename<span style="color:#f92672">=</span><span style="color:#e6db74">&#39;data/10568-103447.pdf[0]&#39;</span>, resolution<span style="color:#f92672">=</span><span style="color:#ae81ff">144</span>) <span style="color:#66d9ef">as</span> first_page:
</span></span><span style="display:flex;"><span> print(first_page<span style="color:#f92672">.</span>height)
</span></span></code></pre></div><ul>
<li>I spent more time re-working my thumbnail scripts to compare the resized images and other minor changes
<ul>
<li>I am realizing that doing the thumbnails directly from the source improves the ssimulacra2 score by 1-3% points compared to DSpace&rsquo;s method of creating a lossy supersample followed by a lossy resized thumbnail</li>
</ul>
</li>
</ul>
<h2 id="2023-04-03">2023-04-03</h2>
<ul>
<li>The harvest on AReS that I started yesterday never finished, and actually seems to have died&hellip;
<ul>
<li>Also, Fabio and Patrizio from Alliance emailed me to ask if there is something wrong with the REST API because they are having problems</li>
<li>I stopped the harvest and started the plugins to get the remaining items via the sitemap&hellip;</li>
</ul>
</li>
</ul>
<h2 id="2023-04-04">2023-04-04</h2>
<ul>
<li>Presentation about CGSpace metadata, controlled vocabularies, and curation to Pooja&rsquo;s communications and development team at UNEP
<ul>
<li>I uploaded the presentation to CGSpace here: <a href="https://hdl.handle.net/10568/129896">https://hdl.handle.net/10568/129896</a></li>
</ul>
</li>
<li>Someone from the system organization contacted me to ask how to download a few thousand PDFs from a spreadsheet with DOIs and Handles</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ csvcut -c Handle ~/Downloads/2023-04-04-Donald.csv <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | sed \
</span></span><span style="display:flex;"><span> -e 1d \
</span></span><span style="display:flex;"><span> -e &#39;s_https://hdl.handle.net/__&#39; \
</span></span><span style="display:flex;"><span> -e &#39;s_https://cgspace.cgiar.org/handle/__&#39; \
</span></span><span style="display:flex;"><span> -e &#39;s_http://hdl.handle.net/__&#39; \
</span></span><span style="display:flex;"><span> | sort -u &gt; /tmp/handles.txt
</span></span></code></pre></div><ul>
<li>Then I used the <code>get_dspace_pdfs.py</code> script to download them</li>
</ul>
<h2 id="2023-04-05">2023-04-05</h2>
<ul>
<li>After some cleanup on Donald&rsquo;s DOIs I started the <code>get_scihub_pdfs.py</code> script</li>
</ul>
<h2 id="2023-04-06">2023-04-06</h2>
<ul>
<li>I did some more work to cleanup and streamline my next generation of DSpace thumbnail testing scripts
<ul>
<li>I think I found a bug in ImageMagick 7.1.1.5 where CMYK to sRGB conversion fails if we use image operations like <code>-density</code> or <code>-define</code> before reading the input file</li>
<li>I started <a href="https://github.com/ImageMagick/ImageMagick/discussions/6234">a discussion on the ImageMagick GitHub</a> to ask</li>
</ul>
</li>
<li>Yesterday I started downloading the rest of the PDFs from Donald, those that had DOIs
<ul>
<li>As a measure of caution, I extracted the list of DOIs and used my <code>crossref_doi_lookup.py</code> script to get their licenses from Crossref:</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./ilri/crossref_doi_lookup.py -e xxxx@i.org -i /tmp/dois.txt -o /tmp/donald-crossref-dois.csv -d
</span></span></code></pre></div><ul>
<li>Then I did some CSV manipulation to extract the DOIs that were Creative Commons licensed, excluding any that were &ldquo;No Derivatives&rdquo;, and re-formatting the DOIs:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ csvcut -c doi,license /tmp/donald-crossref-dois.csv <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | csvgrep -c license -m &#39;creativecommons&#39; \
</span></span><span style="display:flex;"><span> | csvgrep -c license -i -r &#39;by-(nd|nc-nd)&#39; \
</span></span><span style="display:flex;"><span> | sed -e &#39;s_^10_https://doi.org/10_&#39; \
</span></span><span style="display:flex;"><span> -e &#39;s/\(am\|tdm\|unspecified\|vor\): //&#39; \
</span></span><span style="display:flex;"><span> | tee /tmp/donald-open-dois.csv \
</span></span><span style="display:flex;"><span> | wc -l
</span></span><span style="display:flex;"><span>4268
</span></span></code></pre></div><ul>
<li>From those I filtered for the DOIs for which I had downloaded PDFs, in the <code>filename</code> column of the Sci-Hub script and copied them to a separate directory:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ <span style="color:#66d9ef">for</span> file in <span style="color:#66d9ef">$(</span>csvjoin -c doi /tmp/donald-doi-pdfs.csv /tmp/donald-open-dois.csv | csvgrep -c filename -i -r <span style="color:#e6db74">&#39;^$&#39;</span> | csvcut -c filename | sed 1d<span style="color:#66d9ef">)</span>; <span style="color:#66d9ef">do</span> cp --reflink<span style="color:#f92672">=</span>always <span style="color:#e6db74">&#34;</span>$file<span style="color:#e6db74">&#34;</span> <span style="color:#e6db74">&#34;creative-commons-licensed/</span>$file<span style="color:#e6db74">&#34;</span>; <span style="color:#66d9ef">done</span>
</span></span></code></pre></div><ul>
<li>I used BTRFS copy-on-write via reflinks to make sure I didn&rsquo;t duplicate the files :-D</li>
<li>I ran out of time and had to stop the process around 3,127 PDFs
<ul>
<li>I zipped them up and sent them to the others, along with a CSV of the DOIs, PDF filenames, and licenses</li>
</ul>
</li>
</ul>
<h2 id="2023-04-17">2023-04-17</h2>
<ul>
<li>Abenet noticed a weird issue with <a href="https://cgspace.cgiar.org/handle/10568/75611">this item</a>
<ul>
<li>The item has metadata, but the page is blank</li>
<li>When I try to edit the item&rsquo;s authorization policies in XMLUI I get a nullPointerException:</li>
</ul>
</li>
</ul>
<pre tabindex="0"><code>Java stacktrace: java.lang.NullPointerException
at org.dspace.app.xmlui.aspect.administrative.authorization.EditItemPolicies.addBody(EditItemPolicies.java:166)
at org.dspace.app.xmlui.wing.AbstractWingTransformer.startElement(AbstractWingTransformer.java:234)
at sun.reflect.GeneratedMethodAccessor347.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy201.startElement(Unknown Source)
at org.apache.cocoon.components.sax.XMLTeePipe.startElement(XMLTeePipe.java:87)
at org.apache.cocoon.xml.AbstractXMLPipe.startElement(AbstractXMLPipe.java:94)
at org.dspace.app.xmlui.wing.AbstractWingTransformer.startElement(AbstractWingTransformer.java:251)
at sun.reflect.GeneratedMethodAccessor347.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy203.startElement(Unknown Source)
at org.apache.cocoon.xml.AbstractXMLPipe.startElement(AbstractXMLPipe.java:94)
at org.dspace.app.xmlui.wing.AbstractWingTransformer.startElement(AbstractWingTransformer.java:251)
at sun.reflect.GeneratedMethodAccessor347.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy203.startElement(Unknown Source)
at org.apache.cocoon.environment.internal.EnvironmentChanger.startElement(EnvironmentStack.java:140)
at org.apache.cocoon.components.sax.XMLTeePipe.startElement(XMLTeePipe.java:87)
at org.apache.cocoon.xml.AbstractXMLPipe.startElement(AbstractXMLPipe.java:94)
at org.dspace.app.xmlui.wing.AbstractWingTransformer.startElement(AbstractWingTransformer.java:251)
at sun.reflect.GeneratedMethodAccessor347.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy203.startElement(Unknown Source)
at org.apache.cocoon.environment.internal.EnvironmentChanger.startElement(EnvironmentStack.java:140)
at org.apache.cocoon.components.sax.XMLTeePipe.startElement(XMLTeePipe.java:87)
at org.apache.cocoon.components.sax.AbstractXMLByteStreamInterpreter.parse(AbstractXMLByteStreamInterpreter.java:117)
at org.apache.cocoon.components.sax.XMLByteStreamInterpreter.deserialize(XMLByteStreamInterpreter.java:44)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:324)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:326)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:326)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:439)
at sun.reflect.GeneratedMethodAccessor255.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.treeprocessor.sitemap.SerializeNode.invoke(SerializeNode.java:147)
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:55)
at org.apache.cocoon.components.treeprocessor.sitemap.MatchNode.invoke(MatchNode.java:87)
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(PipelineNode.java:143)
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(PipelinesNode.java:81)
at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:239)
at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:171)
at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcessor.java:247)
at org.apache.cocoon.components.treeprocessor.sitemap.MountNode.invoke(MountNode.java:117)
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:55)
at org.apache.cocoon.components.treeprocessor.sitemap.MatchNode.invoke(MatchNode.java:87)
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(PipelineNode.java:143)
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(PipelinesNode.java:81)
at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:239)
at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:171)
at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcessor.java:247)
at org.apache.cocoon.components.treeprocessor.sitemap.MountNode.invoke(MountNode.java:117)
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(PipelineNode.java:143)
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(PipelinesNode.java:81)
at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:239)
at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:171)
at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcessor.java:247)
at org.apache.cocoon.servlet.RequestProcessor.process(RequestProcessor.java:351)
at org.apache.cocoon.servlet.RequestProcessor.service(RequestProcessor.java:169)
at org.apache.cocoon.sitemap.SitemapServlet.service(SitemapServlet.java:84)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
at org.apache.cocoon.servletservice.ServletServiceContext$PathDispatcher.forward(ServletServiceContext.java:468)
at org.apache.cocoon.servletservice.ServletServiceContext$PathDispatcher.forward(ServletServiceContext.java:443)
at org.apache.cocoon.servletservice.spring.ServletFactoryBean$ServiceInterceptor.invoke(ServletFactoryBean.java:264)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
at com.sun.proxy.$Proxy186.service(Unknown Source)
at org.dspace.springmvc.CocoonView.render(CocoonView.java:113)
at org.springframework.web.servlet.DispatcherServlet.render(DispatcherServlet.java:1216)
at org.springframework.web.servlet.DispatcherServlet.processDispatchResult(DispatcherServlet.java:1001)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:945)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:867)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:951)
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:853)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:647)
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:827)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.dspace.app.xmlui.cocoon.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:113)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter.doFilter(DSpaceCocoonServletFilter.java:160)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.dspace.app.xmlui.cocoon.servlet.multipart.DSpaceMultipartFilter.doFilter(DSpaceMultipartFilter.java:119)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.dspace.utils.servlet.DSpaceWebappServletFilter.doFilter(DSpaceWebappServletFilter.java:78)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:110)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:492)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:165)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:104)
at org.apache.catalina.valves.CrawlerSessionManagerValve.invoke(CrawlerSessionManagerValve.java:235)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:1025)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:451)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1201)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:654)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:317)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:750)
</code></pre><ul>
<li>I don&rsquo;t see anything on the DSpace issue tracker or mailing list so I asked about it on the DSpace Slack&hellip;</li>
<li>Peter said CGSpace was slow and I see a lot of locks from the XMLUI
<ul>
<li>I looked and found many locks that were many hours and days old so I killed some:</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ psql &lt; locks-age.sql | grep -E <span style="color:#e6db74">&#34;[[:digit:]] days&#34;</span> | awk -F<span style="color:#ae81ff">\|</span> <span style="color:#e6db74">&#39;{print $10}&#39;</span> | sort -u
</span></span><span style="display:flex;"><span> 1050672
</span></span><span style="display:flex;"><span> 1053773
</span></span><span style="display:flex;"><span> 1054602
</span></span><span style="display:flex;"><span> 1054702
</span></span><span style="display:flex;"><span> 1056782
</span></span><span style="display:flex;"><span> 1057629
</span></span><span style="display:flex;"><span> 1057630
</span></span><span style="display:flex;"><span>$ psql &lt; locks-age.sql | grep -E <span style="color:#e6db74">&#34;[[:digit:]] days&#34;</span> | awk -F<span style="color:#ae81ff">\|</span> <span style="color:#e6db74">&#39;{print $10}&#39;</span> | sort -u | xargs kill
</span></span></code></pre></div><ul>
<li>I&rsquo;m also running a <code>dspace cleanup -v</code>, but it doesn&rsquo;t seem to be finishing
<ul>
<li>I recall something like there being errors in the logs rather than on the command line in DSpace 6&hellip;</li>
<li>I found it in the DSpace log:</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>2023-04-17 21:09:46,004 ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ ERROR: update or delete on table &#34;bitstream&#34; violates foreign key constraint &#34;bundle_primary_bitstream_id_fkey&#34; on table &#34;bundle&#34;
</span></span><span style="display:flex;"><span> Detail: Key (uuid)=(a7ddf477-1c04-4de0-9c7a-4d3c84a875bc) is still referenced from table &#34;bundle&#34;.
</span></span></code></pre></div><ul>
<li>If I mark the primary bitstream as null manually the cleanup script continues until it finds a few more
<ul>
<li>I ended up with a long list of UUIDs to fix before the script would complete:</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ psql -d dspace -c <span style="color:#e6db74">&#34;update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (&#39;a7ddf477-1c04-4de0-9c7a-4d3c84a875bc&#39;, &#39;9582b661-9c2d-4c86-be22-c3b0942b646a&#39;, &#39;210a4d5d-3af9-46f0-84cc-682dd1431762&#39;, &#39;51115f07-0a60-4988-8536-b9ebd2a5e15e&#39;, &#39;0fc5021d-3264-413a-b2e2-74bda38a394e&#39;, &#39;4704fa62-b8ab-4dfe-b7aa-0e4905f8412a&#39;)&#34;</span>
</span></span></code></pre></div><ul>
<li>This process ended up taking a few days because each iteration ran for over four hours before failing on the next UUID, sighhhhh</li>
</ul>
<h2 id="2023-04-18">2023-04-18</h2>
<ul>
<li>Regarding the item Abenet noticed yesterday that has a blank page and a nullPointerException
<ul>
<li>It appears OK on DSpace Test! <a href="https://dspacetest.cgiar.org/handle/10568/75611">https://dspacetest.cgiar.org/handle/10568/75611</a></li>
<li>And according to the REST API on CGSpace the item was modified on 2023-04-11, so last week&hellip;</li>
<li>According to the DSpace logs it was Francesca who edited the item last week, so I asked her for more information before I troubleshoot more</li>
</ul>
</li>
</ul>
<h2 id="2023-04-19">2023-04-19</h2>
<ul>
<li>I fixed the Bioversity item by deleting the <code>9781138781276.jpg</code> bitstream via the REST API
<ul>
<li>I <em>think</em> Francesca might have changed the &ldquo;format&rdquo; of it?</li>
<li>Anyway, this item has a PDF so we have a proper thumbnail and don&rsquo;t need that other journal cover one</li>
</ul>
</li>
<li>I noticed a URL for this <a href="https://hdl.handle.net/10568/89049">Bioversity item</a> redirects incorrectly
<ul>
<li>I had mentioned this to Maria and Francesca a few months ago but it seems to never have been resolved</li>
</ul>
</li>
<li>The <code>dspace cleanup -v</code> finally finished after a few days of running and stopping&hellip;</li>
<li>I decided to update the thumbnails in the Bioversity books collection because I saw a few old ones suffering from the CropBox issue</li>
<li>Also, all day there&rsquo;s been a high load on CGSpace, with lots of locks in PostgreSQL
<ul>
<li>I had been waiting until the bitstream cleanup finished&hellip; now I might need to restart PostgreSQL to kill some old locks as something needs to give</li>
<li>I restarted PostgreSQL, but DSpace was still hanging on simple XMLUI options so I ended up restarting Tomcat</li>
</ul>
</li>
<li>Tag 544 ORCID identifiers with my script</li>
<li>I updated my <code>generation-loss.sh</code> and <code>improved-dspace-thumbnails</code> scripts to include thirty-five PDFs from CGSpace (up from twenty-four) to get a larger sample
<ul>
<li>Now starting to get some numbers comparing JPEG, WebP, and AVIF</li>
<li>First, out of curiousity, I checked the average ssimulacra2 scores at Q75, Q80, and Q92 for each format:</li>
</ul>
</li>
</ul>
<table>
<thead>
<tr>
<th></th>
<th>Q75</th>
<th>Q80</th>
<th>Q92</th>
</tr>
</thead>
<tbody>
<tr>
<td>JPEG</td>
<td>71</td>
<td>74</td>
<td>88</td>
</tr>
<tr>
<td>WebP</td>
<td>74</td>
<td>77</td>
<td>82</td>
</tr>
<tr>
<td>AVIF</td>
<td>82</td>
<td>83</td>
<td>86</td>
</tr>
</tbody>
</table>
<ul>
<li>Then I checked the quality and file size (bytes) needed to hit an average ssimulacra2 score of 80 with each format:
<ul>
<li><strong>JPEG</strong>: Q89, 124923 bytes</li>
<li><strong>WebP</strong>: Q86, 84662 bytes (33% smaller than JPEG size)</li>
<li><strong>AVIF</strong>: Q65, 67597 bytes (56% smaller than JPEG size)</li>
</ul>
</li>
<li><a href="https://developers.google.com/speed/webp/docs/webp_study">Google&rsquo;s original WebP study</a> uses this technique to compare WebP to JPEG too
<ul>
<li>As the quality settings are not comparable between formats, we need to compare the formats at matching perceptual scores (ssimulacra2 in this case)</li>
<li>I used a ssimulacra2 score of 80 because that&rsquo;s the about the highest score I see with WebP using my samples, though JPEG and AVIF do go higher</li>
<li>Also, according to current ssimulacra2 (v2.1), a score of 70 is &ldquo;high quality&rdquo; and a score of 90 is &ldquo;very high quality&rdquo;, so 80 should be reasonably high enough&hellip;</li>
</ul>
</li>
<li>Here is a plot of the qualities and ssimulacra2 scores:</li>
</ul>
<p><img src="/cgspace-notes/2023/04/quality-vs-score-ssimulacra-v2.1.png" alt="Quality vs Score"></p>
<ul>
<li>Export CGSpace to check for missing Initiatives mappings</li>
</ul>
<h2 id="2023-04-22">2023-04-22</h2>
<ul>
<li>Export the Initiatives collection to run it through csv-metadata-quality
<ul>
<li>I wanted to make sure all the Initiatives items had correct regions</li>
<li>I had to manually fix a few license identifiers and ISSNs</li>
<li>Also, I found a few items submitted by MEL that had dates in DD/MM/YYYY format, so I sent them to Salem for him to investigate</li>
</ul>
</li>
<li>Start a harvest on AReS</li>
</ul>
<h2 id="2023-04-26">2023-04-26</h2>
<ul>
<li>Begin working on the list of non-AGROVOC CGSpace subjects for FAO
<ul>
<li>The last time I did this was in 2022-06</li>
<li>I used the following SQL query to dump values from all subject fields, lower case them, and group by counts:</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>localhost/dspacetest= ☘ \COPY (SELECT DISTINCT(lower(text_value)) AS &#34;subject&#34;, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (187, 120, 210, 122, 215, 127, 208, 124, 128, 123, 125, 135, 203, 236, 238, 119) GROUP BY &#34;subject&#34; ORDER BY count DESC) to /tmp/2023-04-26-cgspace-subjects.csv WITH CSV HEADER;
</span></span><span style="display:flex;"><span>COPY 26315
</span></span><span style="display:flex;"><span>Time: 2761.981 ms (00:02.762)
</span></span></code></pre></div><ul>
<li>Then I extracted the subjects and looked them up against AGROVOC:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ csvcut -c subject /tmp/2023-04-26-cgspace-subjects.csv | sed <span style="color:#e6db74">&#39;1d&#39;</span> &gt; /tmp/2023-04-26-cgspace-subjects.txt
</span></span><span style="display:flex;"><span>$ ./ilri/agrovoc_lookup.py -i /tmp/2023-04-26-cgspace-subjects.txt -o /tmp/2023-04-26-cgspace-subjects-results.csv
</span></span></code></pre></div><h2 id="2023-04-27">2023-04-27</h2>
<ul>
<li>The AGROVOC lookup from yesterday finished, so I extracted all terms that did not match and joined them with the original CSV so I can see the counts:
<ul>
<li>(I also note that the <code>agrovoc_lookup.py</code> script didn&rsquo;t seem to be caching properly, as it had to look up everything again the next time I ran it despite the requests cache being 174MB!)</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>csvgrep -c &#39;number of matches&#39; -r &#39;^0$&#39; /tmp/2023-04-26-cgspace-subjects-results.csv \
</span></span><span style="display:flex;"><span> | csvcut -c subject \
</span></span><span style="display:flex;"><span> | csvjoin -c subject /tmp/2023-04-26-cgspace-subjects.csv - \
</span></span><span style="display:flex;"><span> &gt; /tmp/2023-04-26-cgspace-non-agrovoc.csv
</span></span></code></pre></div><ul>
<li>I filtered for only those terms that had counts larger than fifty
<ul>
<li>I also removed terms like &ldquo;forages&rdquo;, &ldquo;policy&rdquo;, &ldquo;pests and diseases&rdquo; because those exist as singular or separate terms in AGROVOC</li>
<li>I also removed ambiguous terms like &ldquo;cocoa&rdquo;, &ldquo;diversity&rdquo;, &ldquo;resistance&rdquo; etc because there are various other preferred terms for those in AGROVOC</li>
<li>I also removed spelling mistakes like &ldquo;modeling&rdquo; and &ldquo;savanas&rdquo; because those exist in their correct form in AGROVOC</li>
<li>I also removed internal CGIAR terms like &ldquo;tac&rdquo;, &ldquo;crp&rdquo;, &ldquo;internal review&rdquo; etc (note: these are mostly from CGIAR System Office&rsquo;s subjects&hellip; perhaps I exclude those next time?)</li>
</ul>
</li>
<li>I note that many of <em>our</em> terms would match if they were singular, plural, or split up into separate terms, so perhaps we should pair this with an excercise to review our own terms</li>
<li>I couldn&rsquo;t finish the work locally yet so I uploaded my list to Google Docs to continue later</li>
</ul>
<h2 id="2023-04-28">2023-04-28</h2>
<ul>
<li>The ImageMagick CMYK issue is bothering me still
<ul>
<li>I am on a plane currently, but I have a Docker image of ImageMagick 7.1.1-3 and I compared the output of all CMYK PDFs using the same command on my local machine</li>
<li>The images from the Docker environment are correct with <em>only</em> <code>-colorspace sRGB</code> (no profiles!) as the commenters on GitHub said</li>
<li>This leads me to believe something wrong in my own environment, perhaps Ghostscript&hellip;?</li>
<li>The container has Ghostscript 9.53.3~dfsg-7+deb11u2 from Debian 11, while my Arch Linux system has Ghostscript 10.01.1-1</li>
</ul>
</li>
</ul>
<!-- raw HTML omitted -->
</article>
</div> <!-- /.blog-main -->
<aside class="col-sm-3 ml-auto blog-sidebar">
<section class="sidebar-module">
<h4>Recent Posts</h4>
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2024-01/">February, 2024</a></li>
<li><a href="/cgspace-notes/2024-01/">January, 2024</a></li>
<li><a href="/cgspace-notes/2023-12/">December, 2023</a></li>
<li><a href="/cgspace-notes/2023-11/">November, 2023</a></li>
<li><a href="/cgspace-notes/2023-10/">October, 2023</a></li>
</ol>
</section>
<section class="sidebar-module">
<h4>Links</h4>
<ol class="list-unstyled">
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
</ol>
</section>
</aside>
</div> <!-- /.row -->
</div> <!-- /.container -->
<footer class="blog-footer">
<p dir="auto">
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
</p>
<p>
<a href="#">Back to top</a>
</p>
</footer>
</body>
</html>