2023-04-02 08:16:25 +02:00
<!DOCTYPE html>
< html lang = "en" >
< head >
< meta charset = "utf-8" >
< meta name = "viewport" content = "width=device-width, initial-scale=1, shrink-to-fit=no" >
< meta property = "og:title" content = "April, 2023" / >
< meta property = "og:description" content = "2023-04-02
Run all system updates on CGSpace and reboot it
I exported CGSpace to CSV to check for any missing Initiative collection mappings
I also did a check for missing country/region mappings with csv-metadata-quality
Start a harvest on AReS
" />
< meta property = "og:type" content = "article" / >
< meta property = "og:url" content = "https://alanorth.github.io/cgspace-notes/2023-04/" / >
< meta property = "article:published_time" content = "2023-04-02T08:19:36+03:00" / >
2023-05-04 16:27:29 +02:00
< meta property = "article:modified_time" content = "2023-05-04T14:44:51+03:00" / >
2023-04-02 08:16:25 +02:00
< meta name = "twitter:card" content = "summary" / >
< meta name = "twitter:title" content = "April, 2023" / >
< meta name = "twitter:description" content = "2023-04-02
Run all system updates on CGSpace and reboot it
I exported CGSpace to CSV to check for any missing Initiative collection mappings
I also did a check for missing country/region mappings with csv-metadata-quality
Start a harvest on AReS
"/>
< meta name = "generator" content = "Hugo 0.111.3" >
< script type = "application/ld+json" >
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "April, 2023",
"url": "https://alanorth.github.io/cgspace-notes/2023-04/",
2023-05-02 09:39:34 +02:00
"wordCount": "2490",
2023-04-02 08:16:25 +02:00
"datePublished": "2023-04-02T08:19:36+03:00",
2023-05-04 16:27:29 +02:00
"dateModified": "2023-05-04T14:44:51+03:00",
2023-04-02 08:16:25 +02:00
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"keywords": "Notes"
}
< / script >
< link rel = "canonical" href = "https://alanorth.github.io/cgspace-notes/2023-04/" >
< title > April, 2023 | CGSpace Notes< / title >
<!-- combined, minified CSS -->
< link href = "https://alanorth.github.io/cgspace-notes/css/style.c6ba80bc50669557645abe05f86b73cc5af84408ed20f1551a267bc19ece8228.css" rel = "stylesheet" integrity = "sha256-xrqAvFBmlVdkWr4F+GtzzFr4RAjtIPFVGiZ7wZ7Ogig=" crossorigin = "anonymous" >
<!-- minified Font Awesome for SVG icons -->
< script defer src = "https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f5072c55a0721857184db93a50561d7dc13975b4de2e19db7f81eb5f3fa57270.js" integrity = "sha256-9QcsVaByGFcYTbk6UFYdfcE5dbTeLhnbf4HrXz+lcnA=" crossorigin = "anonymous" > < / script >
<!-- RSS 2.0 feed -->
< / head >
< body >
< div class = "blog-masthead" >
< div class = "container" >
< nav class = "nav blog-nav" >
< a class = "nav-link " href = "https://alanorth.github.io/cgspace-notes/" > Home< / a >
< / nav >
< / div >
< / div >
< header class = "blog-header" >
< div class = "container" >
< h1 class = "blog-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/" rel = "home" > CGSpace Notes< / a > < / h1 >
< p class = "lead blog-description" dir = "auto" > Documenting day-to-day work on the < a href = "https://cgspace.cgiar.org" > CGSpace< / a > repository.< / p >
< / div >
< / header >
< div class = "container" >
< div class = "row" >
< div class = "col-sm-8 blog-main" >
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2023-04/" > April, 2023< / a > < / h2 >
< p class = "blog-post-meta" >
< time datetime = "2023-04-02T08:19:36+03:00" > Sun Apr 02, 2023< / time >
in
< span class = "fas fa-folder" aria-hidden = "true" > < / span > < a href = "/categories/notes/" rel = "category tag" > Notes< / a >
< / p >
< / header >
< h2 id = "2023-04-02" > 2023-04-02< / h2 >
< ul >
< li > Run all system updates on CGSpace and reboot it< / li >
< li > I exported CGSpace to CSV to check for any missing Initiative collection mappings
< ul >
< li > I also did a check for missing country/region mappings with csv-metadata-quality< / li >
< / ul >
< / li >
< li > Start a harvest on AReS< / li >
< / ul >
2023-04-06 15:13:30 +02:00
< ul >
< li > I’ m starting to get annoyed at my shell script for doing ImageMagick tests and looking to re-write it in something object oriented like Python
< ul >
< li > There doesn’ t seem to be an official ImageMagick Python binding on pypi.org, perhaps I can use < a href = "https://docs.wand-py.org" > Wand< / a > ?< / li >
< / ul >
< / li >
< li > Testing Wand in Python:< / li >
< / ul >
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;" > < code class = "language-python" data-lang = "python" > < span style = "display:flex;" > < span > < span style = "color:#f92672" > from< / span > wand.image < span style = "color:#f92672" > import< / span > Image
< / span > < / span > < span style = "display:flex;" > < span >
< / span > < / span > < span style = "display:flex;" > < span > < span style = "color:#66d9ef" > with< / span > Image(filename< span style = "color:#f92672" > =< / span > < span style = "color:#e6db74" > ' data/10568-103447.pdf[0]' < / span > , resolution< span style = "color:#f92672" > =< / span > < span style = "color:#ae81ff" > 144< / span > ) < span style = "color:#66d9ef" > as< / span > first_page:
< / span > < / span > < span style = "display:flex;" > < span > print(first_page< span style = "color:#f92672" > .< / span > height)
< / span > < / span > < / code > < / pre > < / div > < ul >
< li > I spent more time re-working my thumbnail scripts to compare the resized images and other minor changes
< ul >
< li > I am realizing that doing the thumbnails directly from the source improves the ssimulacra2 score by 1-3% points compared to DSpace’ s method of creating a lossy supersample followed by a lossy resized thumbnail< / li >
< / ul >
< / li >
< / ul >
< h2 id = "2023-04-03" > 2023-04-03< / h2 >
< ul >
< li > The harvest on AReS that I started yesterday never finished, and actually seems to have died…
< ul >
< li > Also, Fabio and Patrizio from Alliance emailed me to ask if there is something wrong with the REST API because they are having problems< / li >
< li > I stopped the harvest and started the plugins to get the remaining items via the sitemap… < / li >
< / ul >
< / li >
< / ul >
< h2 id = "2023-04-04" > 2023-04-04< / h2 >
< ul >
< li > Presentation about CGSpace metadata, controlled vocabularies, and curation to Pooja’ s communications and development team at UNEP
< ul >
< li > I uploaded the presentation to CGSpace here: < a href = "https://hdl.handle.net/10568/129896" > https://hdl.handle.net/10568/129896< / a > < / li >
< / ul >
< / li >
< li > Someone from the system organization contacted me to ask how to download a few thousand PDFs from a spreadsheet with DOIs and Handles< / li >
< / ul >
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;" > < code class = "language-console" data-lang = "console" > < span style = "display:flex;" > < span > $ csvcut -c Handle ~/Downloads/2023-04-04-Donald.csv < span style = "color:#ae81ff" > \
< / span > < / span > < / span > < span style = "display:flex;" > < span > < span style = "color:#ae81ff" > < / span > | sed \
< / span > < / span > < span style = "display:flex;" > < span > -e 1d \
< / span > < / span > < span style = "display:flex;" > < span > -e ' s_https://hdl.handle.net/__' \
< / span > < / span > < span style = "display:flex;" > < span > -e ' s_https://cgspace.cgiar.org/handle/__' \
< / span > < / span > < span style = "display:flex;" > < span > -e ' s_http://hdl.handle.net/__' \
< / span > < / span > < span style = "display:flex;" > < span > | sort -u > /tmp/handles.txt
< / span > < / span > < / code > < / pre > < / div > < ul >
< li > Then I used the < code > get_dspace_pdfs.py< / code > script to download them< / li >
< / ul >
< h2 id = "2023-04-05" > 2023-04-05< / h2 >
< ul >
< li > After some cleanup on Donald’ s DOIs I started the < code > get_scihub_pdfs.py< / code > script< / li >
< / ul >
< h2 id = "2023-04-06" > 2023-04-06< / h2 >
< ul >
< li > I did some more work to cleanup and streamline my next generation of DSpace thumbnail testing scripts
< ul >
< li > I think I found a bug in ImageMagick 7.1.1.5 where CMYK to sRGB conversion fails if we use image operations like < code > -density< / code > or < code > -define< / code > before reading the input file< / li >
< li > I started < a href = "https://github.com/ImageMagick/ImageMagick/discussions/6234" > a discussion on the ImageMagick GitHub< / a > to ask< / li >
< / ul >
< / li >
< li > Yesterday I started downloading the rest of the PDFs from Donald, those that had DOIs
< ul >
< li > As a measure of caution, I extracted the list of DOIs and used my < code > crossref_doi_lookup.py< / code > script to get their licenses from Crossref:< / li >
< / ul >
< / li >
< / ul >
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;" > < code class = "language-console" data-lang = "console" > < span style = "display:flex;" > < span > $ ./ilri/crossref_doi_lookup.py -e xxxx@i.org -i /tmp/dois.txt -o /tmp/donald-crossref-dois.csv -d
< / span > < / span > < / code > < / pre > < / div > < ul >
< li > Then I did some CSV manipulation to extract the DOIs that were Creative Commons licensed, excluding any that were “ No Derivatives” , and re-formatting the DOIs:< / li >
< / ul >
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;" > < code class = "language-console" data-lang = "console" > < span style = "display:flex;" > < span > $ csvcut -c doi,license /tmp/donald-crossref-dois.csv < span style = "color:#ae81ff" > \
< / span > < / span > < / span > < span style = "display:flex;" > < span > < span style = "color:#ae81ff" > < / span > | csvgrep -c license -m ' creativecommons' \
< / span > < / span > < span style = "display:flex;" > < span > | csvgrep -c license -i -r ' by-(nd|nc-nd)' \
< / span > < / span > < span style = "display:flex;" > < span > | sed -e ' s_^10_https://doi.org/10_' \
< / span > < / span > < span style = "display:flex;" > < span > -e ' s/\(am\|tdm\|unspecified\|vor\): //' \
< / span > < / span > < span style = "display:flex;" > < span > | tee /tmp/donald-open-dois.csv \
< / span > < / span > < span style = "display:flex;" > < span > | wc -l
< / span > < / span > < span style = "display:flex;" > < span > 4268
< / span > < / span > < / code > < / pre > < / div > < ul >
< li > From those I filtered for the DOIs for which I had downloaded PDFs, in the < code > filename< / code > column of the Sci-Hub script and copied them to a separate directory:< / li >
< / ul >
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;" > < code class = "language-console" data-lang = "console" > < span style = "display:flex;" > < span > $ < span style = "color:#66d9ef" > for< / span > file in < span style = "color:#66d9ef" > $(< / span > csvjoin -c doi /tmp/donald-doi-pdfs.csv /tmp/donald-open-dois.csv | csvgrep -c filename -i -r < span style = "color:#e6db74" > ' ^$' < / span > | csvcut -c filename | sed 1d< span style = "color:#66d9ef" > )< / span > ; < span style = "color:#66d9ef" > do< / span > cp --reflink< span style = "color:#f92672" > =< / span > always < span style = "color:#e6db74" > " < / span > $file< span style = "color:#e6db74" > " < / span > < span style = "color:#e6db74" > " creative-commons-licensed/< / span > $file< span style = "color:#e6db74" > " < / span > ; < span style = "color:#66d9ef" > done< / span >
< / span > < / span > < / code > < / pre > < / div > < ul >
< li > I used BTRFS copy-on-write via reflinks to make sure I didn’ t duplicate the files :-D< / li >
< li > I ran out of time and had to stop the process around 3,127 PDFs
< ul >
< li > I zipped them up and sent them to the others, along with a CSV of the DOIs, PDF filenames, and licenses< / li >
< / ul >
< / li >
< / ul >
2023-04-18 20:08:15 +02:00
< h2 id = "2023-04-17" > 2023-04-17< / h2 >
< ul >
< li > Abenet noticed a weird issue with < a href = "https://cgspace.cgiar.org/handle/10568/75611" > this item< / a >
< ul >
< li > The item has metadata, but the page is blank< / li >
< li > When I try to edit the item’ s authorization policies in XMLUI I get a nullPointerException:< / li >
< / ul >
< / li >
< / ul >
< pre tabindex = "0" > < code > Java stacktrace: java.lang.NullPointerException
at org.dspace.app.xmlui.aspect.administrative.authorization.EditItemPolicies.addBody(EditItemPolicies.java:166)
at org.dspace.app.xmlui.wing.AbstractWingTransformer.startElement(AbstractWingTransformer.java:234)
at sun.reflect.GeneratedMethodAccessor347.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy201.startElement(Unknown Source)
at org.apache.cocoon.components.sax.XMLTeePipe.startElement(XMLTeePipe.java:87)
at org.apache.cocoon.xml.AbstractXMLPipe.startElement(AbstractXMLPipe.java:94)
at org.dspace.app.xmlui.wing.AbstractWingTransformer.startElement(AbstractWingTransformer.java:251)
at sun.reflect.GeneratedMethodAccessor347.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy203.startElement(Unknown Source)
at org.apache.cocoon.xml.AbstractXMLPipe.startElement(AbstractXMLPipe.java:94)
at org.dspace.app.xmlui.wing.AbstractWingTransformer.startElement(AbstractWingTransformer.java:251)
at sun.reflect.GeneratedMethodAccessor347.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy203.startElement(Unknown Source)
at org.apache.cocoon.environment.internal.EnvironmentChanger.startElement(EnvironmentStack.java:140)
at org.apache.cocoon.components.sax.XMLTeePipe.startElement(XMLTeePipe.java:87)
at org.apache.cocoon.xml.AbstractXMLPipe.startElement(AbstractXMLPipe.java:94)
at org.dspace.app.xmlui.wing.AbstractWingTransformer.startElement(AbstractWingTransformer.java:251)
at sun.reflect.GeneratedMethodAccessor347.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy203.startElement(Unknown Source)
at org.apache.cocoon.environment.internal.EnvironmentChanger.startElement(EnvironmentStack.java:140)
at org.apache.cocoon.components.sax.XMLTeePipe.startElement(XMLTeePipe.java:87)
at org.apache.cocoon.components.sax.AbstractXMLByteStreamInterpreter.parse(AbstractXMLByteStreamInterpreter.java:117)
at org.apache.cocoon.components.sax.XMLByteStreamInterpreter.deserialize(XMLByteStreamInterpreter.java:44)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:324)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:326)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:326)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy198.generate(Unknown Source)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:439)
at sun.reflect.GeneratedMethodAccessor255.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy191.process(Unknown Source)
at org.apache.cocoon.components.treeprocessor.sitemap.SerializeNode.invoke(SerializeNode.java:147)
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:55)
at org.apache.cocoon.components.treeprocessor.sitemap.MatchNode.invoke(MatchNode.java:87)
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(PipelineNode.java:143)
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(PipelinesNode.java:81)
at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:239)
at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:171)
at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcessor.java:247)
at org.apache.cocoon.components.treeprocessor.sitemap.MountNode.invoke(MountNode.java:117)
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:55)
at org.apache.cocoon.components.treeprocessor.sitemap.MatchNode.invoke(MatchNode.java:87)
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(PipelineNode.java:143)
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(PipelinesNode.java:81)
at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:239)
at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:171)
at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcessor.java:247)
at org.apache.cocoon.components.treeprocessor.sitemap.MountNode.invoke(MountNode.java:117)
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(PipelineNode.java:143)
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(PipelinesNode.java:81)
at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:239)
at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:171)
at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcessor.java:247)
at org.apache.cocoon.servlet.RequestProcessor.process(RequestProcessor.java:351)
at org.apache.cocoon.servlet.RequestProcessor.service(RequestProcessor.java:169)
at org.apache.cocoon.sitemap.SitemapServlet.service(SitemapServlet.java:84)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
at org.apache.cocoon.servletservice.ServletServiceContext$PathDispatcher.forward(ServletServiceContext.java:468)
at org.apache.cocoon.servletservice.ServletServiceContext$PathDispatcher.forward(ServletServiceContext.java:443)
at org.apache.cocoon.servletservice.spring.ServletFactoryBean$ServiceInterceptor.invoke(ServletFactoryBean.java:264)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
at com.sun.proxy.$Proxy186.service(Unknown Source)
at org.dspace.springmvc.CocoonView.render(CocoonView.java:113)
at org.springframework.web.servlet.DispatcherServlet.render(DispatcherServlet.java:1216)
at org.springframework.web.servlet.DispatcherServlet.processDispatchResult(DispatcherServlet.java:1001)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:945)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:867)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:951)
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:853)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:647)
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:827)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.dspace.app.xmlui.cocoon.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:113)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter.doFilter(DSpaceCocoonServletFilter.java:160)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.dspace.app.xmlui.cocoon.servlet.multipart.DSpaceMultipartFilter.doFilter(DSpaceMultipartFilter.java:119)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.dspace.utils.servlet.DSpaceWebappServletFilter.doFilter(DSpaceWebappServletFilter.java:78)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:110)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:492)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:165)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:104)
at org.apache.catalina.valves.CrawlerSessionManagerValve.invoke(CrawlerSessionManagerValve.java:235)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:1025)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:451)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1201)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:654)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:317)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:750)
< / code > < / pre > < ul >
< li > I don’ t see anything on the DSpace issue tracker or mailing list so I asked about it on the DSpace Slack… < / li >
< li > Peter said CGSpace was slow and I see a lot of locks from the XMLUI
< ul >
< li > I looked and found many locks that were many hours and days old so I killed some:< / li >
< / ul >
< / li >
< / ul >
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;" > < code class = "language-console" data-lang = "console" > < span style = "display:flex;" > < span > $ psql < locks-age.sql | grep -E < span style = "color:#e6db74" > " [[:digit:]] days" < / span > | awk -F< span style = "color:#ae81ff" > \|< / span > < span style = "color:#e6db74" > ' {print $10}' < / span > | sort -u
< / span > < / span > < span style = "display:flex;" > < span > 1050672
< / span > < / span > < span style = "display:flex;" > < span > 1053773
< / span > < / span > < span style = "display:flex;" > < span > 1054602
< / span > < / span > < span style = "display:flex;" > < span > 1054702
< / span > < / span > < span style = "display:flex;" > < span > 1056782
< / span > < / span > < span style = "display:flex;" > < span > 1057629
< / span > < / span > < span style = "display:flex;" > < span > 1057630
< / span > < / span > < span style = "display:flex;" > < span > $ psql < locks-age.sql | grep -E < span style = "color:#e6db74" > " [[:digit:]] days" < / span > | awk -F< span style = "color:#ae81ff" > \|< / span > < span style = "color:#e6db74" > ' {print $10}' < / span > | sort -u | xargs kill
< / span > < / span > < / code > < / pre > < / div > < ul >
< li > I’ m also running a < code > dspace cleanup -v< / code > , but it doesn’ t seem to be finishing
< ul >
< li > I recall something like there being errors in the logs rather than on the command line in DSpace 6… < / li >
< li > I found it in the DSpace log:< / li >
< / ul >
< / li >
< / ul >
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;" > < code class = "language-console" data-lang = "console" > < span style = "display:flex;" > < span > 2023-04-17 21:09:46,004 ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ ERROR: update or delete on table " bitstream" violates foreign key constraint " bundle_primary_bitstream_id_fkey" on table " bundle"
< / span > < / span > < span style = "display:flex;" > < span > Detail: Key (uuid)=(a7ddf477-1c04-4de0-9c7a-4d3c84a875bc) is still referenced from table " bundle" .
< / span > < / span > < / code > < / pre > < / div > < ul >
< li > If I mark the primary bitstream as null manually the cleanup script continues until it finds a few more
< ul >
< li > I ended up with a long list of UUIDs to fix before the script would complete:< / li >
< / ul >
< / li >
< / ul >
2023-04-21 07:44:18 +02:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;" > < code class = "language-console" data-lang = "console" > < span style = "display:flex;" > < span > $ psql -d dspace -c < span style = "color:#e6db74" > " update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (' a7ddf477-1c04-4de0-9c7a-4d3c84a875bc' , ' 9582b661-9c2d-4c86-be22-c3b0942b646a' , ' 210a4d5d-3af9-46f0-84cc-682dd1431762' , ' 51115f07-0a60-4988-8536-b9ebd2a5e15e' , ' 0fc5021d-3264-413a-b2e2-74bda38a394e' , ' 4704fa62-b8ab-4dfe-b7aa-0e4905f8412a' )" < / span >
< / span > < / span > < / code > < / pre > < / div > < ul >
< li > This process ended up taking a few days because each iteration ran for over four hours before failing on the next UUID, sighhhhh< / li >
< / ul >
< h2 id = "2023-04-18" > 2023-04-18< / h2 >
2023-04-18 20:08:15 +02:00
< ul >
< li > Regarding the item Abenet noticed yesterday that has a blank page and a nullPointerException
< ul >
< li > It appears OK on DSpace Test! < a href = "https://dspacetest.cgiar.org/handle/10568/75611" > https://dspacetest.cgiar.org/handle/10568/75611< / a > < / li >
< li > And according to the REST API on CGSpace the item was modified on 2023-04-11, so last week… < / li >
< li > According to the DSpace logs it was Francesca who edited the item last week, so I asked her for more information before I troubleshoot more< / li >
< / ul >
< / li >
< / ul >
2023-04-21 07:44:18 +02:00
< h2 id = "2023-04-19" > 2023-04-19< / h2 >
< ul >
< li > I fixed the Bioversity item by deleting the < code > 9781138781276.jpg< / code > bitstream via the REST API
< ul >
< li > I < em > think< / em > Francesca might have changed the “ format” of it?< / li >
< li > Anyway, this item has a PDF so we have a proper thumbnail and don’ t need that other journal cover one< / li >
< / ul >
< / li >
< li > I noticed a URL for this < a href = "https://hdl.handle.net/10568/89049" > Bioversity item< / a > redirects incorrectly
< ul >
< li > I had mentioned this to Maria and Francesca a few months ago but it seems to never have been resolved< / li >
< / ul >
< / li >
< li > The < code > dspace cleanup -v< / code > finally finished after a few days of running and stopping… < / li >
< li > I decided to update the thumbnails in the Bioversity books collection because I saw a few old ones suffering from the CropBox issue< / li >
< li > Also, all day there’ s been a high load on CGSpace, with lots of locks in PostgreSQL
< ul >
< li > I had been waiting until the bitstream cleanup finished… now I might need to restart PostgreSQL to kill some old locks as something needs to give< / li >
< li > I restarted PostgreSQL, but DSpace was still hanging on simple XMLUI options so I ended up restarting Tomcat< / li >
< / ul >
< / li >
< li > Tag 544 ORCID identifiers with my script< / li >
< li > I updated my < code > generation-loss.sh< / code > and < code > improved-dspace-thumbnails< / code > scripts to include thirty-five PDFs from CGSpace (up from twenty-four) to get a larger sample
< ul >
< li > Now starting to get some numbers comparing JPEG, WebP, and AVIF< / li >
< li > First, out of curiousity, I checked the average ssimulacra2 scores at Q75, Q80, and Q92 for each format:< / li >
< / ul >
< / li >
< / ul >
< table >
< thead >
< tr >
< th > < / th >
< th > Q75< / th >
< th > Q80< / th >
< th > Q92< / th >
< / tr >
< / thead >
< tbody >
< tr >
< td > JPEG< / td >
2023-05-04 16:27:29 +02:00
< td > 71< / td >
< td > 74< / td >
2023-04-21 07:44:18 +02:00
< td > 88< / td >
< / tr >
< tr >
< td > WebP< / td >
2023-05-04 16:27:29 +02:00
< td > 74< / td >
< td > 77< / td >
2023-04-21 07:44:18 +02:00
< td > 82< / td >
< / tr >
< tr >
< td > AVIF< / td >
< td > 82< / td >
< td > 83< / td >
2023-05-04 16:27:29 +02:00
< td > 86< / td >
2023-04-21 07:44:18 +02:00
< / tr >
< / tbody >
< / table >
< ul >
< li > Then I checked the quality and file size (bytes) needed to hit an average ssimulacra2 score of 80 with each format:
< ul >
2023-05-04 16:27:29 +02:00
< li > < strong > JPEG< / strong > : Q89, 124923 bytes< / li >
< li > < strong > WebP< / strong > : Q86, 84662 bytes (33% smaller than JPEG size)< / li >
< li > < strong > AVIF< / strong > : Q65, 67597 bytes (56% smaller than JPEG size)< / li >
2023-04-21 07:44:18 +02:00
< / ul >
< / li >
< li > < a href = "https://developers.google.com/speed/webp/docs/webp_study" > Google’ s original WebP study< / a > uses this technique to compare WebP to JPEG too
< ul >
< li > As the quality settings are not comparable between formats, we need to compare the formats at matching perceptual scores (ssimulacra2 in this case)< / li >
< li > I used a ssimulacra2 score of 80 because that’ s the about the highest score I see with WebP using my samples, though JPEG and AVIF do go higher< / li >
< li > Also, according to current ssimulacra2 (v2.1), a score of 70 is “ high quality” and a score of 90 is “ very high quality” , so 80 should be reasonably high enough… < / li >
< / ul >
< / li >
2023-04-27 22:10:13 +02:00
< li > Here is a plot of the qualities and ssimulacra2 scores:< / li >
< / ul >
< p > < img src = "/cgspace-notes/2023/04/quality-vs-score-ssimulacra-v2.1.png" alt = "Quality vs Score" > < / p >
< ul >
2023-04-21 07:44:18 +02:00
< li > Export CGSpace to check for missing Initiatives mappings< / li >
< / ul >
2023-04-23 01:37:19 +02:00
< h2 id = "2023-04-22" > 2023-04-22< / h2 >
< ul >
< li > Export the Initiatives collection to run it through csv-metadata-quality
< ul >
< li > I wanted to make sure all the Initiatives items had correct regions< / li >
< li > I had to manually fix a few license identifiers and ISSNs< / li >
< li > Also, I found a few items submitted by MEL that had dates in DD/MM/YYYY format, so I sent them to Salem for him to investigate< / li >
< / ul >
< / li >
< li > Start a harvest on AReS< / li >
< / ul >
2023-04-27 22:10:13 +02:00
< h2 id = "2023-04-26" > 2023-04-26< / h2 >
< ul >
< li > Begin working on the list of non-AGROVOC CGSpace subjects for FAO
< ul >
< li > The last time I did this was in 2022-06< / li >
< li > I used the following SQL query to dump values from all subject fields, lower case them, and group by counts:< / li >
< / ul >
< / li >
< / ul >
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;" > < code class = "language-console" data-lang = "console" > < span style = "display:flex;" > < span > localhost/dspacetest= ☘ \COPY (SELECT DISTINCT(lower(text_value)) AS " subject" , count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (187, 120, 210, 122, 215, 127, 208, 124, 128, 123, 125, 135, 203, 236, 238, 119) GROUP BY " subject" ORDER BY count DESC) to /tmp/2023-04-26-cgspace-subjects.csv WITH CSV HEADER;
< / span > < / span > < span style = "display:flex;" > < span > COPY 26315
< / span > < / span > < span style = "display:flex;" > < span > Time: 2761.981 ms (00:02.762)
< / span > < / span > < / code > < / pre > < / div > < ul >
< li > Then I extracted the subjects and looked them up against AGROVOC:< / li >
< / ul >
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;" > < code class = "language-console" data-lang = "console" > < span style = "display:flex;" > < span > $ csvcut -c subject /tmp/2023-04-26-cgspace-subjects.csv | sed < span style = "color:#e6db74" > ' 1d' < / span > > /tmp/2023-04-26-cgspace-subjects.txt
< / span > < / span > < span style = "display:flex;" > < span > $ ./ilri/agrovoc_lookup.py -i /tmp/2023-04-26-cgspace-subjects.txt -o /tmp/2023-04-26-cgspace-subjects-results.csv
< / span > < / span > < / code > < / pre > < / div > < h2 id = "2023-04-27" > 2023-04-27< / h2 >
< ul >
< li > The AGROVOC lookup from yesterday finished, so I extracted all terms that did not match and joined them with the original CSV so I can see the counts:
< ul >
< li > (I also note that the < code > agrovoc_lookup.py< / code > script didn’ t seem to be caching properly, as it had to look up everything again the next time I ran it despite the requests cache being 174MB!)< / li >
< / ul >
< / li >
< / ul >
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;" > < code class = "language-console" data-lang = "console" > < span style = "display:flex;" > < span > csvgrep -c ' number of matches' -r ' ^0$' /tmp/2023-04-26-cgspace-subjects-results.csv \
< / span > < / span > < span style = "display:flex;" > < span > | csvcut -c subject \
< / span > < / span > < span style = "display:flex;" > < span > | csvjoin -c subject /tmp/2023-04-26-cgspace-subjects.csv - \
< / span > < / span > < span style = "display:flex;" > < span > > /tmp/2023-04-26-cgspace-non-agrovoc.csv
< / span > < / span > < / code > < / pre > < / div > < ul >
< li > I filtered for only those terms that had counts larger than fifty
< ul >
< li > I also removed terms like “ forages” , “ policy” , “ pests and diseases” because those exist as singular or separate terms in AGROVOC< / li >
< li > I also removed ambiguous terms like “ cocoa” , “ diversity” , “ resistance” etc because there are various other preferred terms for those in AGROVOC< / li >
< li > I also removed spelling mistakes like “ modeling” and “ savanas” because those exist in their correct form in AGROVOC< / li >
< li > I also removed internal CGIAR terms like “ tac” , “ crp” , “ internal review” etc (note: these are mostly from CGIAR System Office’ s subjects… perhaps I exclude those next time?)< / li >
< / ul >
< / li >
< li > I note that many of < em > our< / em > terms would match if they were singular, plural, or split up into separate terms, so perhaps we should pair this with an excercise to review our own terms< / li >
< li > I couldn’ t finish the work locally yet so I uploaded my list to Google Docs to continue later< / li >
< / ul >
2023-05-02 09:39:34 +02:00
< h2 id = "2023-04-28" > 2023-04-28< / h2 >
< ul >
< li > The ImageMagick CMYK issue is bothering me still
< ul >
< li > I am on a plane currently, but I have a Docker image of ImageMagick 7.1.1-3 and I compared the output of all CMYK PDFs using the same command on my local machine< / li >
< li > The images from the Docker environment are correct with < em > only< / em > < code > -colorspace sRGB< / code > (no profiles!) as the commenters on GitHub said< / li >
< li > This leads me to believe something wrong in my own environment, perhaps Ghostscript… ?< / li >
< li > The container has Ghostscript 9.53.3~dfsg-7+deb11u2 from Debian 11, while my Arch Linux system has Ghostscript 10.01.1-1< / li >
< / ul >
< / li >
< / ul >
2023-04-02 08:16:25 +02:00
<!-- raw HTML omitted -->
< / article >
< / div > <!-- /.blog - main -->
< aside class = "col-sm-3 ml-auto blog-sidebar" >
< section class = "sidebar-module" >
< h4 > Recent Posts< / h4 >
< ol class = "list-unstyled" >
2023-05-03 16:10:37 +02:00
< li > < a href = "/cgspace-notes/2023-05/" > May, 2023< / a > < / li >
2023-04-02 08:16:25 +02:00
< li > < a href = "/cgspace-notes/2023-04/" > April, 2023< / a > < / li >
< li > < a href = "/cgspace-notes/2023-03/" > March, 2023< / a > < / li >
< li > < a href = "/cgspace-notes/2023-02/" > February, 2023< / a > < / li >
< li > < a href = "/cgspace-notes/2023-01/" > January, 2023< / a > < / li >
< / ol >
< / section >
< section class = "sidebar-module" >
< h4 > Links< / h4 >
< ol class = "list-unstyled" >
< li > < a href = "https://cgspace.cgiar.org" > CGSpace< / a > < / li >
< li > < a href = "https://dspacetest.cgiar.org" > DSpace Test< / a > < / li >
< li > < a href = "https://github.com/ilri/DSpace" > CGSpace @ GitHub< / a > < / li >
< / ol >
< / section >
< / aside >
< / div > <!-- /.row -->
< / div > <!-- /.container -->
< footer class = "blog-footer" >
< p dir = "auto" >
Blog template created by < a href = "https://twitter.com/mdo" > @mdo< / a > , ported to Hugo by < a href = 'https://twitter.com/mralanorth' > @mralanorth< / a > .
< / p >
< p >
< a href = "#" > Back to top< / a >
< / p >
< / footer >
< / body >
< / html >