mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-12-18 19:22:18 +01:00
341 lines
15 KiB
HTML
341 lines
15 KiB
HTML
<!DOCTYPE html>
|
|
<html lang="en">
|
|
|
|
<head>
|
|
<meta charset="utf-8">
|
|
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
|
|
|
|
<meta property="og:title" content="December, 2018" />
|
|
<meta property="og:description" content="2018-12-01
|
|
|
|
|
|
Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK
|
|
I manually installed OpenJDK, then removed Oracle JDK, then re-ran the Ansible playbook to update all configuration files, etc
|
|
Then I ran all system updates and restarted the server
|
|
|
|
|
|
2018-12-02
|
|
|
|
|
|
I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another Ghostscript vulnerability last week
|
|
" />
|
|
<meta property="og:type" content="article" />
|
|
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-12/" /><meta property="article:published_time" content="2018-12-02T02:09:30+02:00"/>
|
|
<meta property="article:modified_time" content="2018-12-02T17:55:32+02:00"/>
|
|
|
|
<meta name="twitter:card" content="summary"/>
|
|
<meta name="twitter:title" content="December, 2018"/>
|
|
<meta name="twitter:description" content="2018-12-01
|
|
|
|
|
|
Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK
|
|
I manually installed OpenJDK, then removed Oracle JDK, then re-ran the Ansible playbook to update all configuration files, etc
|
|
Then I ran all system updates and restarted the server
|
|
|
|
|
|
2018-12-02
|
|
|
|
|
|
I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another Ghostscript vulnerability last week
|
|
"/>
|
|
<meta name="generator" content="Hugo 0.52" />
|
|
|
|
|
|
|
|
<script type="application/ld+json">
|
|
{
|
|
"@context": "http://schema.org",
|
|
"@type": "BlogPosting",
|
|
"headline": "December, 2018",
|
|
"url": "https://alanorth.github.io/cgspace-notes/2018-12/",
|
|
"wordCount": "875",
|
|
"datePublished": "2018-12-02T02:09:30+02:00",
|
|
"dateModified": "2018-12-02T17:55:32+02:00",
|
|
"author": {
|
|
"@type": "Person",
|
|
"name": "Alan Orth"
|
|
},
|
|
"keywords": "Notes"
|
|
}
|
|
</script>
|
|
|
|
|
|
|
|
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2018-12/">
|
|
|
|
<title>December, 2018 | CGSpace Notes</title>
|
|
|
|
<!-- combined, minified CSS -->
|
|
<link href="https://alanorth.github.io/cgspace-notes/css/style.css" rel="stylesheet" integrity="sha384-Upm5uY/SXdvbjuIGH6fBjF5vOYUr9DguqBskM+EQpLBzO9U+9fMVmWEt+TTlGrWQ" crossorigin="anonymous">
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
</head>
|
|
|
|
<body>
|
|
|
|
|
|
<div class="blog-masthead">
|
|
<div class="container">
|
|
<nav class="nav blog-nav">
|
|
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
|
|
</nav>
|
|
</div>
|
|
</div>
|
|
|
|
|
|
|
|
<header class="blog-header">
|
|
<div class="container">
|
|
<h1 class="blog-title"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
|
|
<p class="lead blog-description">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
|
|
</div>
|
|
</header>
|
|
|
|
|
|
|
|
<div class="container">
|
|
<div class="row">
|
|
<div class="col-sm-8 blog-main">
|
|
|
|
|
|
|
|
|
|
<article class="blog-post">
|
|
<header>
|
|
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2018-12/">December, 2018</a></h2>
|
|
<p class="blog-post-meta"><time datetime="2018-12-02T02:09:30+02:00">Sun Dec 02, 2018</time> by Alan Orth in
|
|
|
|
<i class="fa fa-tag" aria-hidden="true"></i> <a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
|
|
|
|
</p>
|
|
</header>
|
|
<h2 id="2018-12-01">2018-12-01</h2>
|
|
|
|
<ul>
|
|
<li>Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK</li>
|
|
<li>I manually installed OpenJDK, then removed Oracle JDK, then re-ran the <a href="http://github.com/ilri/rmg-ansible-public">Ansible playbook</a> to update all configuration files, etc</li>
|
|
<li>Then I ran all system updates and restarted the server</li>
|
|
</ul>
|
|
|
|
<h2 id="2018-12-02">2018-12-02</h2>
|
|
|
|
<ul>
|
|
<li>I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another <a href="https://usn.ubuntu.com/3831-1/">Ghostscript vulnerability last week</a></li>
|
|
</ul>
|
|
|
|
<ul>
|
|
<li>The error when I try to manually run the media filter for one item from the command line:</li>
|
|
</ul>
|
|
|
|
<pre><code>org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: FailedToExecuteCommand `"gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" -dFirstPage=1 -dLastPage=1 "-sOutputFile=/tmp/magick-12989PcFN0DnJOej7%d" "-f/tmp/magick-129895Bmp44lvUfxo" "-f/tmp/magick-12989C0QFG51fktLF"' (-1) @ error/delegate.c/ExternalDelegateCommand/461.
|
|
org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: FailedToExecuteCommand `"gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" -dFirstPage=1 -dLastPage=1 "-sOutputFile=/tmp/magick-12989PcFN0DnJOej7%d" "-f/tmp/magick-129895Bmp44lvUfxo" "-f/tmp/magick-12989C0QFG51fktLF"' (-1) @ error/delegate.c/ExternalDelegateCommand/461.
|
|
at org.im4java.core.Info.getBaseInfo(Info.java:360)
|
|
at org.im4java.core.Info.<init>(Info.java:151)
|
|
at org.dspace.app.mediafilter.ImageMagickThumbnailFilter.getImageFile(ImageMagickThumbnailFilter.java:142)
|
|
at org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter.getDestinationStream(ImageMagickPdfThumbnailFilter.java:24)
|
|
at org.dspace.app.mediafilter.FormatFilter.processBitstream(FormatFilter.java:170)
|
|
at org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:475)
|
|
at org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:429)
|
|
at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:401)
|
|
at org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:237)
|
|
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
|
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
|
|
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
|
at java.lang.reflect.Method.invoke(Method.java:498)
|
|
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
|
|
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>A comment on <a href="https://stackoverflow.com/questions/53560755/ghostscript-9-26-update-breaks-imagick-readimage-for-multipage-pdf">StackOverflow question</a> from yesterday suggests it might be a bug with the <code>pngalpha</code> device in Ghostscript and <a href="https://bugs.ghostscript.com/show_bug.cgi?id=699815">links to an upstream bug</a></li>
|
|
<li>I think we need to wait for a fix from Ubuntu</li>
|
|
<li>For what it’s worth, I get the same error on my local Arch Linux environment with Ghostscript 9.26:</li>
|
|
</ul>
|
|
|
|
<pre><code>$ gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=pngalpha -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r72x72 -dFirstPage=1 -dLastPage=1 -sOutputFile=/tmp/out%d -f/home/aorth/Desktop/Food\ safety\ Kenya\ fruits.pdf
|
|
DEBUG: FC_WEIGHT didn't match
|
|
zsh: segmentation fault (core dumped) gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>When I replace the <code>pngalpha</code> device with <code>png16m</code> as suggested in the StackOverflow comments it works:</li>
|
|
</ul>
|
|
|
|
<pre><code>$ gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=png16m -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r72x72 -dFirstPage=1 -dLastPage=1 -sOutputFile=/tmp/out%d -f/home/aorth/Desktop/Food\ safety\ Kenya\ fruits.pdf
|
|
DEBUG: FC_WEIGHT didn't match
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Start proofing the latest round of 226 IITA archive records that Bosede sent last week and Sisay uploaded to DSpace Test this weekend (<a href="https://dspacetest.cgiar.org/handle/10568/108298">IITA_Dec_1_1997 aka Daniel1807</a>)
|
|
|
|
<ul>
|
|
<li>One item missing the authorship type</li>
|
|
<li>Some invalid countries (smart quotes, mispellings)</li>
|
|
<li>Added countries to some items that mentioned research in particular countries in their abstracts</li>
|
|
<li>One item had “MADAGASCAR” for ISI Journal</li>
|
|
<li>Minor corrections in IITA subject (LIVELIHOOD→LIVELIHOODS)</li>
|
|
<li>Trim whitespace in abstract field</li>
|
|
<li>Fix some sponsors (though some with “Governments of Canada” etc I’m not sure why those are plural)</li>
|
|
<li>Eighteen items had <code>en||fr</code> for the language, but the content was only in French so changed them to just <code>fr</code></li>
|
|
<li>Six items had encoding errors in French text so I will ask Bosede to re-do them carefully</li>
|
|
<li>Correct and normalize a few AGROVOC subjects</li>
|
|
</ul></li>
|
|
<li>Expand my “encoding error” detection GREL to include <code>~</code> as I saw a lot of that in some copy pasted French text recently:</li>
|
|
</ul>
|
|
|
|
<pre><code>or(
|
|
isNotNull(value.match(/.*\uFFFD.*/)),
|
|
isNotNull(value.match(/.*\u00A0.*/)),
|
|
isNotNull(value.match(/.*\u200A.*/)),
|
|
isNotNull(value.match(/.*\u2019.*/)),
|
|
isNotNull(value.match(/.*\u00b4.*/)),
|
|
isNotNull(value.match(/.*\u007e.*/))
|
|
)
|
|
</code></pre>
|
|
|
|
<h2 id="2018-12-03">2018-12-03</h2>
|
|
|
|
<ul>
|
|
<li>I looked at the DSpace Ghostscript issue more and it seems to only affect certain PDFs…</li>
|
|
<li>I can successfully generate a thumbnail for another recent item (<a href="https://hdl.handle.net/10568/98394"><sup>10568</sup>⁄<sub>98394</sub></a>), but not for <a href="https://hdl.handle.net/10568/98390"><sup>10568</sup>⁄<sub>98930</sub></a></li>
|
|
<li>Even manually on my Arch Linux desktop with ghostscript 9.26-1 and the <code>pngalpha</code> device, I can generate a thumbnail for the first one (<sup>10568</sup>⁄<sub>98394</sub>):</li>
|
|
</ul>
|
|
|
|
<pre><code>$ gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=pngalpha -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r72x72 -dFirstPage=1 -dLastPage=1 -sOutputFile=/tmp/out%d -f/home/aorth/Desktop/Info\ Note\ Mainstreaming\ gender\ and\ social\ differentiation\ into\ CCAFS\ research\ activities\ in\ West\ Africa-converted.pdf
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>So it seems to be something about the PDFs themselves, perhaps related to alpha support?</li>
|
|
<li>The first item (<sup>10568</sup>⁄<sub>98394</sub>) has the following information:</li>
|
|
</ul>
|
|
|
|
<pre><code>$ identify Info\ Note\ Mainstreaming\ gender\ and\ social\ differentiation\ into\ CCAFS\ research\ activities\ in\ West\ Africa-converted.pdf\[0\]
|
|
Info Note Mainstreaming gender and social differentiation into CCAFS research activities in West Africa-converted.pdf[0]=>Info Note Mainstreaming gender and social differentiation into CCAFS research activities in West Africa-converted.pdf PDF 595x841 595x841+0+0 16-bit sRGB 107443B 0.000u 0:00.000
|
|
identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/1746.
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>And wow, I can’t even run ImageMagick’s <code>identify</code> on the first page of the second item (<sup>10568</sup>⁄<sub>98930</sub>):</li>
|
|
</ul>
|
|
|
|
<pre><code>$ identify Food\ safety\ Kenya\ fruits.pdf\[0\]
|
|
zsh: abort (core dumped) identify Food\ safety\ Kenya\ fruits.pdf\[0\]
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>But with GraphicsMagick’s <code>identify</code> it works:</li>
|
|
</ul>
|
|
|
|
<pre><code>$ gm identify Food\ safety\ Kenya\ fruits.pdf\[0\]
|
|
DEBUG: FC_WEIGHT didn't match
|
|
Food safety Kenya fruits.pdf PDF 612x792+0+0 DirectClass 8-bit 1.4Mi 0.000u 0m:0.000002s
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Interesting that ImageMagick’s <code>identify</code> <em>does</em> work if you do not specify a page, perhaps as <a href="https://bugs.ghostscript.com/show_bug.cgi?id=699815">alluded to in the recent Ghostscript bug report</a>:</li>
|
|
</ul>
|
|
|
|
<pre><code>$ identify Food\ safety\ Kenya\ fruits.pdf
|
|
Food safety Kenya fruits.pdf[0] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009
|
|
Food safety Kenya fruits.pdf[1] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009
|
|
Food safety Kenya fruits.pdf[2] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009
|
|
Food safety Kenya fruits.pdf[3] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009
|
|
Food safety Kenya fruits.pdf[4] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009
|
|
identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/1746.
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>As I expected, ImageMagick cannot generate a thumbnail, but GraphicsMagick can (though it looks like crap):</li>
|
|
</ul>
|
|
|
|
<pre><code>$ convert Food\ safety\ Kenya\ fruits.pdf\[0\] -thumbnail 600x600 -flatten Food\ safety\ Kenya\ fruits.pdf.jpg
|
|
zsh: abort (core dumped) convert Food\ safety\ Kenya\ fruits.pdf\[0\] -thumbnail 600x600 -flatten
|
|
$ gm convert Food\ safety\ Kenya\ fruits.pdf\[0\] -thumbnail 600x600 -flatten Food\ safety\ Kenya\ fruits.pdf.jpg
|
|
DEBUG: FC_WEIGHT didn't match
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>I inspected the troublesome PDF using <a href="http://jhove.openpreservation.org/">jhove</a> and noticed that it is using <code>ISO PDF/A-1, Level B</code> and the other one doesn’t list a profile, though I don’t think this is relevant</li>
|
|
</ul>
|
|
|
|
<!-- vim: set sw=2 ts=2: -->
|
|
|
|
|
|
|
|
|
|
|
|
</article>
|
|
|
|
|
|
|
|
</div> <!-- /.blog-main -->
|
|
|
|
<aside class="col-sm-3 ml-auto blog-sidebar">
|
|
|
|
|
|
|
|
<section class="sidebar-module">
|
|
<h4>Recent Posts</h4>
|
|
<ol class="list-unstyled">
|
|
|
|
|
|
<li><a href="/cgspace-notes/2018-12/">December, 2018</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2018-11/">November, 2018</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2018-10/">October, 2018</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2018-09/">September, 2018</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2018-08/">August, 2018</a></li>
|
|
|
|
</ol>
|
|
</section>
|
|
|
|
|
|
|
|
|
|
<section class="sidebar-module">
|
|
<h4>Links</h4>
|
|
<ol class="list-unstyled">
|
|
|
|
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
|
|
|
|
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
|
|
|
|
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
|
|
|
|
</ol>
|
|
</section>
|
|
|
|
</aside>
|
|
|
|
|
|
</div> <!-- /.row -->
|
|
</div> <!-- /.container -->
|
|
|
|
|
|
|
|
<footer class="blog-footer">
|
|
<p>
|
|
|
|
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
|
|
|
|
</p>
|
|
<p>
|
|
<a href="#">Back to top</a>
|
|
</p>
|
|
</footer>
|
|
|
|
|
|
</body>
|
|
|
|
</html>
|