cgspace-notes/content/posts/2023-04.md
Alan Orth b4a5ec05e7
content/posts/2023-04.md: update image format scores
After re-calculation with ssimulacra2 v2.1.
2023-05-04 14:44:51 +03:00

546 lines
39 KiB
Markdown

---
title: "April, 2023"
date: 2023-04-02T08:19:36+03:00
author: "Alan Orth"
categories: ["Notes"]
---
## 2023-04-02
- Run all system updates on CGSpace and reboot it
- I exported CGSpace to CSV to check for any missing Initiative collection mappings
- I also did a check for missing country/region mappings with csv-metadata-quality
- Start a harvest on AReS
<!--more-->
- I'm starting to get annoyed at my shell script for doing ImageMagick tests and looking to re-write it in something object oriented like Python
- There doesn't seem to be an official ImageMagick Python binding on pypi.org, perhaps I can use [Wand](https://docs.wand-py.org)?
- Testing Wand in Python:
```python
from wand.image import Image
with Image(filename='data/10568-103447.pdf[0]', resolution=144) as first_page:
print(first_page.height)
```
- I spent more time re-working my thumbnail scripts to compare the resized images and other minor changes
- I am realizing that doing the thumbnails directly from the source improves the ssimulacra2 score by 1-3% points compared to DSpace's method of creating a lossy supersample followed by a lossy resized thumbnail
## 2023-04-03
- The harvest on AReS that I started yesterday never finished, and actually seems to have died...
- Also, Fabio and Patrizio from Alliance emailed me to ask if there is something wrong with the REST API because they are having problems
- I stopped the harvest and started the plugins to get the remaining items via the sitemap...
## 2023-04-04
- Presentation about CGSpace metadata, controlled vocabularies, and curation to Pooja's communications and development team at UNEP
- I uploaded the presentation to CGSpace here: https://hdl.handle.net/10568/129896
- Someone from the system organization contacted me to ask how to download a few thousand PDFs from a spreadsheet with DOIs and Handles
```console
$ csvcut -c Handle ~/Downloads/2023-04-04-Donald.csv \
| sed \
-e 1d \
-e 's_https://hdl.handle.net/__' \
-e 's_https://cgspace.cgiar.org/handle/__' \
-e 's_http://hdl.handle.net/__' \
| sort -u > /tmp/handles.txt
```
- Then I used the `get_dspace_pdfs.py` script to download them
## 2023-04-05
- After some cleanup on Donald's DOIs I started the `get_scihub_pdfs.py` script
## 2023-04-06
- I did some more work to cleanup and streamline my next generation of DSpace thumbnail testing scripts
- I think I found a bug in ImageMagick 7.1.1.5 where CMYK to sRGB conversion fails if we use image operations like `-density` or `-define` before reading the input file
- I started [a discussion on the ImageMagick GitHub](https://github.com/ImageMagick/ImageMagick/discussions/6234) to ask
- Yesterday I started downloading the rest of the PDFs from Donald, those that had DOIs
- As a measure of caution, I extracted the list of DOIs and used my `crossref_doi_lookup.py` script to get their licenses from Crossref:
```console
$ ./ilri/crossref_doi_lookup.py -e xxxx@i.org -i /tmp/dois.txt -o /tmp/donald-crossref-dois.csv -d
```
- Then I did some CSV manipulation to extract the DOIs that were Creative Commons licensed, excluding any that were "No Derivatives", and re-formatting the DOIs:
```console
$ csvcut -c doi,license /tmp/donald-crossref-dois.csv \
| csvgrep -c license -m 'creativecommons' \
| csvgrep -c license -i -r 'by-(nd|nc-nd)' \
| sed -e 's_^10_https://doi.org/10_' \
-e 's/\(am\|tdm\|unspecified\|vor\): //' \
| tee /tmp/donald-open-dois.csv \
| wc -l
4268
```
- From those I filtered for the DOIs for which I had downloaded PDFs, in the `filename` column of the Sci-Hub script and copied them to a separate directory:
```console
$ for file in $(csvjoin -c doi /tmp/donald-doi-pdfs.csv /tmp/donald-open-dois.csv | csvgrep -c filename -i -r '^$' | csvcut -c filename | sed 1d); do cp --reflink=always "$file" "creative-commons-licensed/$file"; done
```
- I used BTRFS copy-on-write via reflinks to make sure I didn't duplicate the files :-D
- I ran out of time and had to stop the process around 3,127 PDFs
- I zipped them up and sent them to the others, along with a CSV of the DOIs, PDF filenames, and licenses
## 2023-04-17
- Abenet noticed a weird issue with [this item](https://cgspace.cgiar.org/handle/10568/75611)
- The item has metadata, but the page is blank
- When I try to edit the item's authorization policies in XMLUI I get a nullPointerException:
```
Java stacktrace: java.lang.NullPointerException
at org.dspace.app.xmlui.aspect.administrative.authorization.EditItemPolicies.addBody(EditItemPolicies.java:166)
at org.dspace.app.xmlui.wing.AbstractWingTransformer.startElement(AbstractWingTransformer.java:234)
at sun.reflect.GeneratedMethodAccessor347.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy201.startElement(Unknown Source)
at org.apache.cocoon.components.sax.XMLTeePipe.startElement(XMLTeePipe.java:87)
at org.apache.cocoon.xml.AbstractXMLPipe.startElement(AbstractXMLPipe.java:94)
at org.dspace.app.xmlui.wing.AbstractWingTransformer.startElement(AbstractWingTransformer.java:251)
at sun.reflect.GeneratedMethodAccessor347.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy203.startElement(Unknown Source)
at org.apache.cocoon.xml.AbstractXMLPipe.startElement(AbstractXMLPipe.java:94)
at org.dspace.app.xmlui.wing.AbstractWingTransformer.startElement(AbstractWingTransformer.java:251)
at sun.reflect.GeneratedMethodAccessor347.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy203.startElement(Unknown Source)
at org.apache.cocoon.environment.internal.EnvironmentChanger.startElement(EnvironmentStack.java:140)
at org.apache.cocoon.components.sax.XMLTeePipe.startElement(XMLTeePipe.java:87)
at org.apache.cocoon.xml.AbstractXMLPipe.startElement(AbstractXMLPipe.java:94)
at org.dspace.app.xmlui.wing.AbstractWingTransformer.startElement(AbstractWingTransformer.java:251)
at sun.reflect.GeneratedMethodAccessor347.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy203.startElement(Unknown Source)
at org.apache.cocoon.environment.internal.EnvironmentChanger.startElement(EnvironmentStack.java:140)
at org.apache.cocoon.components.sax.XMLTeePipe.startElement(XMLTeePipe.java:87)
at org.apache.cocoon.components.sax.AbstractXMLByteStreamInterpreter.parse(AbstractXMLByteStreamInterpreter.java:117)
at org.apache.cocoon.components.sax.XMLByteStreamInterpreter.deserialize(XMLByteStreamInterpreter.java:44)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:324)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:326)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:326)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:439)
at sun.reflect.GeneratedMethodAccessor255.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.treeprocessor.sitemap.SerializeNode.invoke(SerializeNode.java:147)
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:55)
at org.apache.cocoon.components.treeprocessor.sitemap.MatchNode.invoke(MatchNode.java:87)
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(PipelineNode.java:143)
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(PipelinesNode.java:81)
at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:239)
at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:171)
at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcessor.java:247)
at org.apache.cocoon.components.treeprocessor.sitemap.MountNode.invoke(MountNode.java:117)
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:55)
at org.apache.cocoon.components.treeprocessor.sitemap.MatchNode.invoke(MatchNode.java:87)
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(PipelineNode.java:143)
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(PipelinesNode.java:81)
at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:239)
at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:171)
at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcessor.java:247)
at org.apache.cocoon.components.treeprocessor.sitemap.MountNode.invoke(MountNode.java:117)
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(PipelineNode.java:143)
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(PipelinesNode.java:81)
at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:239)
at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:171)
at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcessor.java:247)
at org.apache.cocoon.servlet.RequestProcessor.process(RequestProcessor.java:351)
at org.apache.cocoon.servlet.RequestProcessor.service(RequestProcessor.java:169)
at org.apache.cocoon.sitemap.SitemapServlet.service(SitemapServlet.java:84)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
at org.apache.cocoon.servletservice.ServletServiceContext$PathDispatcher.forward(ServletServiceContext.java:468)
at org.apache.cocoon.servletservice.ServletServiceContext$PathDispatcher.forward(ServletServiceContext.java:443)
at org.apache.cocoon.servletservice.spring.ServletFactoryBean$ServiceInterceptor.invoke(ServletFactoryBean.java:264)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
at com.sun.proxy.$Proxy186.service(Unknown Source)
at org.dspace.springmvc.CocoonView.render(CocoonView.java:113)
at org.springframework.web.servlet.DispatcherServlet.render(DispatcherServlet.java:1216)
at org.springframework.web.servlet.DispatcherServlet.processDispatchResult(DispatcherServlet.java:1001)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:945)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:867)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:951)
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:853)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:647)
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:827)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.dspace.app.xmlui.cocoon.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:113)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter.doFilter(DSpaceCocoonServletFilter.java:160)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.dspace.app.xmlui.cocoon.servlet.multipart.DSpaceMultipartFilter.doFilter(DSpaceMultipartFilter.java:119)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.dspace.utils.servlet.DSpaceWebappServletFilter.doFilter(DSpaceWebappServletFilter.java:78)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:110)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:492)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:165)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:104)
at org.apache.catalina.valves.CrawlerSessionManagerValve.invoke(CrawlerSessionManagerValve.java:235)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:1025)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:451)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1201)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:654)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:317)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:750)
```
- I don't see anything on the DSpace issue tracker or mailing list so I asked about it on the DSpace Slack...
- Peter said CGSpace was slow and I see a lot of locks from the XMLUI
- I looked and found many locks that were many hours and days old so I killed some:
```console
$ psql < locks-age.sql | grep -E "[[:digit:]] days" | awk -F\| '{print $10}' | sort -u
1050672
1053773
1054602
1054702
1056782
1057629
1057630
$ psql < locks-age.sql | grep -E "[[:digit:]] days" | awk -F\| '{print $10}' | sort -u | xargs kill
```
- I'm also running a `dspace cleanup -v`, but it doesn't seem to be finishing
- I recall something like there being errors in the logs rather than on the command line in DSpace 6...
- I found it in the DSpace log:
```console
2023-04-17 21:09:46,004 ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"
Detail: Key (uuid)=(a7ddf477-1c04-4de0-9c7a-4d3c84a875bc) is still referenced from table "bundle".
```
- If I mark the primary bitstream as null manually the cleanup script continues until it finds a few more
- I ended up with a long list of UUIDs to fix before the script would complete:
```console
$ psql -d dspace -c "update bundle set primary_bitstream_id=NULL where primary_bitstream_id in ('a7ddf477-1c04-4de0-9c7a-4d3c84a875bc', '9582b661-9c2d-4c86-be22-c3b0942b646a', '210a4d5d-3af9-46f0-84cc-682dd1431762', '51115f07-0a60-4988-8536-b9ebd2a5e15e', '0fc5021d-3264-413a-b2e2-74bda38a394e', '4704fa62-b8ab-4dfe-b7aa-0e4905f8412a')"
```
- This process ended up taking a few days because each iteration ran for over four hours before failing on the next UUID, sighhhhh
## 2023-04-18
- Regarding the item Abenet noticed yesterday that has a blank page and a nullPointerException
- It appears OK on DSpace Test! https://dspacetest.cgiar.org/handle/10568/75611
- And according to the REST API on CGSpace the item was modified on 2023-04-11, so last week...
- According to the DSpace logs it was Francesca who edited the item last week, so I asked her for more information before I troubleshoot more
## 2023-04-19
- I fixed the Bioversity item by deleting the `9781138781276.jpg` bitstream via the REST API
- I *think* Francesca might have changed the "format" of it?
- Anyway, this item has a PDF so we have a proper thumbnail and don't need that other journal cover one
- I noticed a URL for this [Bioversity item](https://hdl.handle.net/10568/89049) redirects incorrectly
- I had mentioned this to Maria and Francesca a few months ago but it seems to never have been resolved
- The `dspace cleanup -v` finally finished after a few days of running and stopping...
- I decided to update the thumbnails in the Bioversity books collection because I saw a few old ones suffering from the CropBox issue
- Also, all day there's been a high load on CGSpace, with lots of locks in PostgreSQL
- I had been waiting until the bitstream cleanup finished... now I might need to restart PostgreSQL to kill some old locks as something needs to give
- I restarted PostgreSQL, but DSpace was still hanging on simple XMLUI options so I ended up restarting Tomcat
- Tag 544 ORCID identifiers with my script
- I updated my `generation-loss.sh` and `improved-dspace-thumbnails` scripts to include thirty-five PDFs from CGSpace (up from twenty-four) to get a larger sample
- Now starting to get some numbers comparing JPEG, WebP, and AVIF
- First, out of curiousity, I checked the average ssimulacra2 scores at Q75, Q80, and Q92 for each format:
| | Q75 | Q80 | Q92 |
|------|-----|-----|-----|
| JPEG | 71 | 74 | 88 |
| WebP | 74 | 77 | 82 |
| AVIF | 82 | 83 | 86 |
- Then I checked the quality and file size (bytes) needed to hit an average ssimulacra2 score of 80 with each format:
- **JPEG**: Q89, 124923 bytes
- **WebP**: Q86, 84662 bytes (33% smaller than JPEG size)
- **AVIF**: Q65, 67597 bytes (56% smaller than JPEG size)
- [Google's original WebP study](https://developers.google.com/speed/webp/docs/webp_study) uses this technique to compare WebP to JPEG too
- As the quality settings are not comparable between formats, we need to compare the formats at matching perceptual scores (ssimulacra2 in this case)
- I used a ssimulacra2 score of 80 because that's the about the highest score I see with WebP using my samples, though JPEG and AVIF do go higher
- Also, according to current ssimulacra2 (v2.1), a score of 70 is "high quality" and a score of 90 is "very high quality", so 80 should be reasonably high enough...
- Here is a plot of the qualities and ssimulacra2 scores:
![Quality vs Score](/cgspace-notes/2023/04/quality-vs-score-ssimulacra-v2.1.png)
- Export CGSpace to check for missing Initiatives mappings
## 2023-04-22
- Export the Initiatives collection to run it through csv-metadata-quality
- I wanted to make sure all the Initiatives items had correct regions
- I had to manually fix a few license identifiers and ISSNs
- Also, I found a few items submitted by MEL that had dates in DD/MM/YYYY format, so I sent them to Salem for him to investigate
- Start a harvest on AReS
## 2023-04-26
- Begin working on the list of non-AGROVOC CGSpace subjects for FAO
- The last time I did this was in 2022-06
- I used the following SQL query to dump values from all subject fields, lower case them, and group by counts:
```console
localhost/dspacetest= ☘ \COPY (SELECT DISTINCT(lower(text_value)) AS "subject", count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (187, 120, 210, 122, 215, 127, 208, 124, 128, 123, 125, 135, 203, 236, 238, 119) GROUP BY "subject" ORDER BY count DESC) to /tmp/2023-04-26-cgspace-subjects.csv WITH CSV HEADER;
COPY 26315
Time: 2761.981 ms (00:02.762)
```
- Then I extracted the subjects and looked them up against AGROVOC:
```console
$ csvcut -c subject /tmp/2023-04-26-cgspace-subjects.csv | sed '1d' > /tmp/2023-04-26-cgspace-subjects.txt
$ ./ilri/agrovoc_lookup.py -i /tmp/2023-04-26-cgspace-subjects.txt -o /tmp/2023-04-26-cgspace-subjects-results.csv
```
## 2023-04-27
- The AGROVOC lookup from yesterday finished, so I extracted all terms that did not match and joined them with the original CSV so I can see the counts:
- (I also note that the `agrovoc_lookup.py` script didn't seem to be caching properly, as it had to look up everything again the next time I ran it despite the requests cache being 174MB!)
```console
csvgrep -c 'number of matches' -r '^0$' /tmp/2023-04-26-cgspace-subjects-results.csv \
| csvcut -c subject \
| csvjoin -c subject /tmp/2023-04-26-cgspace-subjects.csv - \
> /tmp/2023-04-26-cgspace-non-agrovoc.csv
```
- I filtered for only those terms that had counts larger than fifty
- I also removed terms like "forages", "policy", "pests and diseases" because those exist as singular or separate terms in AGROVOC
- I also removed ambiguous terms like "cocoa", "diversity", "resistance" etc because there are various other preferred terms for those in AGROVOC
- I also removed spelling mistakes like "modeling" and "savanas" because those exist in their correct form in AGROVOC
- I also removed internal CGIAR terms like "tac", "crp", "internal review" etc (note: these are mostly from CGIAR System Office's subjects... perhaps I exclude those next time?)
- I note that many of *our* terms would match if they were singular, plural, or split up into separate terms, so perhaps we should pair this with an excercise to review our own terms
- I couldn't finish the work locally yet so I uploaded my list to Google Docs to continue later
## 2023-04-28
- The ImageMagick CMYK issue is bothering me still
- I am on a plane currently, but I have a Docker image of ImageMagick 7.1.1-3 and I compared the output of all CMYK PDFs using the same command on my local machine
- The images from the Docker environment are correct with *only* `-colorspace sRGB` (no profiles!) as the commenters on GitHub said
- This leads me to believe something wrong in my own environment, perhaps Ghostscript...?
- The container has Ghostscript 9.53.3~dfsg-7+deb11u2 from Debian 11, while my Arch Linux system has Ghostscript 10.01.1-1
<!-- vim: set sw=2 ts=2: -->