Add notes for 2020-01-27

This commit is contained in:
2020-01-27 16:20:44 +02:00
parent 207ace0883
commit 8feb93be39
112 changed files with 11466 additions and 5158 deletions

View File

@ -33,7 +33,7 @@ Then I ran all system updates and restarted the server
I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another Ghostscript vulnerability last week
"/>
<meta name="generator" content="Hugo 0.62.2" />
<meta name="generator" content="Hugo 0.63.1" />
@ -63,7 +63,7 @@ I noticed that there is another issue with PDF thumbnails on CGSpace, and I see
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.a20c1a4367639632cdb341d23c27ca44fedcc75b0f8b3cbea6203010da153d3c.css" rel="stylesheet" integrity="sha256-ogwaQ2djljLNs0HSPCfKRP7cx1sPizy&#43;piAwENoVPTw=" crossorigin="anonymous">
<link href="https://alanorth.github.io/cgspace-notes/css/style.23e2c3298bcc8c1136c19aba330c211ec94c36f7c4454ea15cf4d3548370042a.css" rel="stylesheet" integrity="sha256-I&#43;LDKYvMjBE2wZq6MwwhHslMNvfERU6hXPTTVINwBCo=" crossorigin="anonymous">
<!-- RSS 2.0 feed -->
@ -110,7 +110,7 @@ I noticed that there is another issue with PDF thumbnails on CGSpace, and I see
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-12/">December, 2018</a></h2>
<p class="blog-post-meta"><time datetime="2018-12-02T02:09:30&#43;02:00">Sun Dec 02, 2018</time> by Alan Orth in
<i class="fa fa-folder" aria-hidden="true"></i>&nbsp;<a href="/cgspace-notes/categories/notes" rel="category tag">Notes</a>
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes" rel="category tag">Notes</a>
</p>
@ -148,7 +148,7 @@ org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.c
</code></pre><ul>
<li>A comment on <a href="https://stackoverflow.com/questions/53560755/ghostscript-9-26-update-breaks-imagick-readimage-for-multipage-pdf">StackOverflow question</a> from yesterday suggests it might be a bug with the <code>pngalpha</code> device in Ghostscript and <a href="https://bugs.ghostscript.com/show_bug.cgi?id=699815">links to an upstream bug</a></li>
<li>I think we need to wait for a fix from Ubuntu</li>
<li>For what it's worth, I get the same error on my local Arch Linux environment with Ghostscript 9.26:</li>
<li>For what it&rsquo;s worth, I get the same error on my local Arch Linux environment with Ghostscript 9.26:</li>
</ul>
<pre><code>$ gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=pngalpha -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r72x72 -dFirstPage=1 -dLastPage=1 -sOutputFile=/tmp/out%d -f/home/aorth/Desktop/Food\ safety\ Kenya\ fruits.pdf
DEBUG: FC_WEIGHT didn't match
@ -167,7 +167,7 @@ DEBUG: FC_WEIGHT didn't match
<li>One item had &ldquo;MADAGASCAR&rdquo; for ISI Journal</li>
<li>Minor corrections in IITA subject (LIVELIHOOD→LIVELIHOODS)</li>
<li>Trim whitespace in abstract field</li>
<li>Fix some sponsors (though some with &ldquo;Governments of Canada&rdquo; etc I'm not sure why those are plural)</li>
<li>Fix some sponsors (though some with &ldquo;Governments of Canada&rdquo; etc I&rsquo;m not sure why those are plural)</li>
<li>Eighteen items had <code>en||fr</code> for the language, but the content was only in French so changed them to just <code>fr</code></li>
<li>Six items had encoding errors in French text so I will ask Bosede to re-do them carefully</li>
<li>Correct and normalize a few AGROVOC subjects</li>
@ -198,18 +198,18 @@ DEBUG: FC_WEIGHT didn't match
Info Note Mainstreaming gender and social differentiation into CCAFS research activities in West Africa-converted.pdf[0]=&gt;Info Note Mainstreaming gender and social differentiation into CCAFS research activities in West Africa-converted.pdf PDF 595x841 595x841+0+0 16-bit sRGB 107443B 0.000u 0:00.000
identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/1746.
</code></pre><ul>
<li>And wow, I can't even run ImageMagick's <code>identify</code> on the first page of the second item (10568/98930):</li>
<li>And wow, I can&rsquo;t even run ImageMagick&rsquo;s <code>identify</code> on the first page of the second item (10568/98930):</li>
</ul>
<pre><code>$ identify Food\ safety\ Kenya\ fruits.pdf\[0\]
zsh: abort (core dumped) identify Food\ safety\ Kenya\ fruits.pdf\[0\]
</code></pre><ul>
<li>But with GraphicsMagick's <code>identify</code> it works:</li>
<li>But with GraphicsMagick&rsquo;s <code>identify</code> it works:</li>
</ul>
<pre><code>$ gm identify Food\ safety\ Kenya\ fruits.pdf\[0\]
DEBUG: FC_WEIGHT didn't match
Food safety Kenya fruits.pdf PDF 612x792+0+0 DirectClass 8-bit 1.4Mi 0.000u 0m:0.000002s
</code></pre><ul>
<li>Interesting that ImageMagick's <code>identify</code> <em>does</em> work if you do not specify a page, perhaps as <a href="https://bugs.ghostscript.com/show_bug.cgi?id=699815">alluded to in the recent Ghostscript bug report</a>:</li>
<li>Interesting that ImageMagick&rsquo;s <code>identify</code> <em>does</em> work if you do not specify a page, perhaps as <a href="https://bugs.ghostscript.com/show_bug.cgi?id=699815">alluded to in the recent Ghostscript bug report</a>:</li>
</ul>
<pre><code>$ identify Food\ safety\ Kenya\ fruits.pdf
Food safety Kenya fruits.pdf[0] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009
@ -226,7 +226,7 @@ zsh: abort (core dumped) convert Food\ safety\ Kenya\ fruits.pdf\[0\] -thumbnai
$ gm convert Food\ safety\ Kenya\ fruits.pdf\[0\] -thumbnail 600x600 -flatten Food\ safety\ Kenya\ fruits.pdf.jpg
DEBUG: FC_WEIGHT didn't match
</code></pre><ul>
<li>I inspected the troublesome PDF using <a href="http://jhove.openpreservation.org/">jhove</a> and noticed that it is using <code>ISO PDF/A-1, Level B</code> and the other one doesn't list a profile, though I don't think this is relevant</li>
<li>I inspected the troublesome PDF using <a href="http://jhove.openpreservation.org/">jhove</a> and noticed that it is using <code>ISO PDF/A-1, Level B</code> and the other one doesn&rsquo;t list a profile, though I don&rsquo;t think this is relevant</li>
<li>I found another item that fails when generating a thumbnail (<a href="https://hdl.handle.net/10568/98391">10568/98391</a>, DSpace complains:</li>
</ul>
<pre><code>org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: FailedToExecuteCommand `&quot;gs&quot; -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 &quot;-sDEVICE=pngalpha&quot; -dTextAlphaBits=4 -dGraphicsAlphaBits=4 &quot;-r72x72&quot; -dFirstPage=1 -dLastPage=1 &quot;-sOutputFile=/tmp/magick-142966vQs5Di64ntH%d&quot; &quot;-f/tmp/magick-14296Q0rJjfCeIj3w&quot; &quot;-f/tmp/magick-14296k_K6MWqwvpDm&quot;' (-1) @ error/delegate.c/ExternalDelegateCommand/461.
@ -256,16 +256,16 @@ Caused by: org.im4java.core.CommandException: identify: FailedToExecuteCommand `
at org.im4java.core.ImageCommand.run(ImageCommand.java:215)
... 15 more
</code></pre><ul>
<li>And on my Arch Linux environment ImageMagick's <code>convert</code> also segfaults:</li>
<li>And on my Arch Linux environment ImageMagick&rsquo;s <code>convert</code> also segfaults:</li>
</ul>
<pre><code>$ convert bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf\[0\] -thumbnail x600 -flatten bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf.jpg
zsh: abort (core dumped) convert bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf\[0\] x60
</code></pre><ul>
<li>But GraphicsMagick's <code>convert</code> works:</li>
<li>But GraphicsMagick&rsquo;s <code>convert</code> works:</li>
</ul>
<pre><code>$ gm convert bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf\[0\] -thumbnail x600 -flatten bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf.jpg
</code></pre><ul>
<li>So far the only thing that stands out is that the two files that don't work were created with Microsoft Office 2016:</li>
<li>So far the only thing that stands out is that the two files that don&rsquo;t work were created with Microsoft Office 2016:</li>
</ul>
<pre><code>$ pdfinfo bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf | grep -E '^(Creator|Producer)'
Creator: Microsoft® Word 2016
@ -285,14 +285,14 @@ Producer: Microsoft® Word for Office 365
<pre><code>$ inkscape Food\ safety\ Kenya\ fruits.pdf -z --export-dpi=72 --export-area-drawing --export-png='cover.png'
$ gm convert -resize x600 -flatten -quality 85 cover.png cover.jpg
</code></pre><ul>
<li>I've tried a few times this week to register for the <a href="https://www.evisa.gov.et/">Ethiopian eVisa website</a>, but it is never successful</li>
<li>I&rsquo;ve tried a few times this week to register for the <a href="https://www.evisa.gov.et/">Ethiopian eVisa website</a>, but it is never successful</li>
<li>In the end I tried one last time to just apply without registering and it was apparently successful</li>
<li>Testing DSpace 5.8 (<code>5_x-prod</code> branch) in an Ubuntu 18.04 VM with Tomcat 8.5 and had some issues:
<ul>
<li>JSPUI shows an internal error (log shows something about tag cloud, though, so might be unrelated)</li>
<li>Atmire Listings and Reports, which use JSPUI, asks you to log in again and then doesn't work</li>
<li>Content and Usage Analysis doesn't show up in the sidebar after logging in</li>
<li>I can navigate to <a href="https://dspacetest.cgiar.org/atmire/reporting-suite/usage-graph-editor">/atmire/reporting-suite/usage-graph-editor</a>, but it's only the Atmire theme and a &ldquo;page not found&rdquo; message</li>
<li>Atmire Listings and Reports, which use JSPUI, asks you to log in again and then doesn&rsquo;t work</li>
<li>Content and Usage Analysis doesn&rsquo;t show up in the sidebar after logging in</li>
<li>I can navigate to <a href="https://dspacetest.cgiar.org/atmire/reporting-suite/usage-graph-editor">/atmire/reporting-suite/usage-graph-editor</a>, but it&rsquo;s only the Atmire theme and a &ldquo;page not found&rdquo; message</li>
<li>Related messages from dspace.log:</li>
</ul>
</li>
@ -311,7 +311,7 @@ $ gm convert -resize x600 -flatten -quality 85 cover.png cover.jpg
</ul>
<h2 id="2018-12-04">2018-12-04</h2>
<ul>
<li>Last night Linode sent a message that the load on CGSpace (linode18) was too high, here's a list of the top users at the time and throughout the day:</li>
<li>Last night Linode sent a message that the load on CGSpace (linode18) was too high, here&rsquo;s a list of the top users at the time and throughout the day:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Dec/2018:1(5|6|7|8)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
225 40.77.167.142
@ -336,14 +336,14 @@ $ gm convert -resize x600 -flatten -quality 85 cover.png cover.jpg
3210 2a01:4f8:140:3192::2
4190 35.237.175.180
</code></pre><ul>
<li><code>35.237.175.180</code> is known to us (CCAFS?), and I've already added it to the list of bot IPs in nginx, which appears to be working:</li>
<li><code>35.237.175.180</code> is known to us (CCAFS?), and I&rsquo;ve already added it to the list of bot IPs in nginx, which appears to be working:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=35.237.175.180' dspace.log.2018-12-03
4772
$ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=35.237.175.180' dspace.log.2018-12-03 | sort | uniq | wc -l
630
</code></pre><ul>
<li>I haven't seen <code>2a01:4f8:140:3192::2</code> before. Its user agent is some new bot:</li>
<li>I haven&rsquo;t seen <code>2a01:4f8:140:3192::2</code> before. Its user agent is some new bot:</li>
</ul>
<pre><code>Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)
</code></pre><ul>
@ -366,7 +366,7 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=2a01:4f8:140:3192::2' dspace.log.2
$ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.79.71' dspace.log.2018-12-03 | sort | uniq | wc -l
1
</code></pre><ul>
<li>In other news, it's good to see my re-work of the database connectivity in the <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> actually caused a reduction of persistent database connections (from 1 to 0, but still!):</li>
<li>In other news, it&rsquo;s good to see my re-work of the database connectivity in the <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> actually caused a reduction of persistent database connections (from 1 to 0, but still!):</li>
</ul>
<p><img src="/cgspace-notes/2018/12/postgres_connections_db-month.png" alt="PostgreSQL connections day"></p>
<h2 id="2018-12-05">2018-12-05</h2>
@ -376,7 +376,7 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.79.71' dspace.log.2018-12-03
<h2 id="2018-12-06">2018-12-06</h2>
<ul>
<li>Linode sent a message that the CPU usage of CGSpace (linode18) is too high last night</li>
<li>I looked in the logs and there's nothing particular going on:</li>
<li>I looked in the logs and there&rsquo;s nothing particular going on:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;05/Dec/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1225 157.55.39.177
@ -402,8 +402,8 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=54.70.40.11' dspace.log.2018-12-05
1156
</code></pre><ul>
<li><code>2a01:7e00::f03c:91ff:fe0a:d645</code> appears to be the CKM dev server where Danny is testing harvesting via Drupal</li>
<li>It seems they are hitting the XMLUI's OpenSearch a bit, but mostly on the REST API so no issues here yet</li>
<li><code>Drupal</code> is already in the Tomcat Crawler Session Manager Valve's regex so that's good!</li>
<li>It seems they are hitting the XMLUI&rsquo;s OpenSearch a bit, but mostly on the REST API so no issues here yet</li>
<li><code>Drupal</code> is already in the Tomcat Crawler Session Manager Valve&rsquo;s regex so that&rsquo;s good!</li>
</ul>
<h2 id="2018-12-10">2018-12-10</h2>
<ul>
@ -414,7 +414,7 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=54.70.40.11' dspace.log.2018-12-05
<li>It sounds kinda crazy, but she said when she talked to Altmetric about their Twitter harvesting they said their coverage is not perfect, so it might be some kinda prioritization thing where they only do it for popular items?</li>
<li>I am testing this by <a href="https://twitter.com/mralanorth/status/1072153586342211584">tweeting</a> one <a href="https://cgspace.cgiar.org/handle/10568/98380">WLE item from CGSpace</a> that currently has no Altmetric score</li>
<li>Interestingly, after about an hour I see it has already been <a href="https://cgspace.altmetric.com/details/50160871/twitter">picked up by Altmetric</a> and has my tweet as well as some other tweet from over a month ago&hellip;</li>
<li>I <a href="https://twitter.com/mralanorth/status/1072198292182892545">tweeted a link to the item's DOI</a> to see if Altmetric will notice it, hopefully associated with the Handle I tweeted earlier</li>
<li>I <a href="https://twitter.com/mralanorth/status/1072198292182892545">tweeted a link to the item&rsquo;s DOI</a> to see if Altmetric will notice it, hopefully associated with the Handle I tweeted earlier</li>
</ul>
</li>
</ul>
@ -429,9 +429,9 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=54.70.40.11' dspace.log.2018-12-05
</ul>
<h2 id="2018-12-13">2018-12-13</h2>
<ul>
<li>Oh this is very interesting: <a href="https://digitalarchive.worldfishcenter.org">WorldFish's repository is live now</a></li>
<li>It's running DSpace 5.9-SNAPSHOT running on KnowledgeArc and the OAI and REST interfaces are active at least</li>
<li>Also, I notice they ended up registering a Handle (they had been considering taking KnowledgeArc's advice to <em>not</em> use Handles!)</li>
<li>Oh this is very interesting: <a href="https://digitalarchive.worldfishcenter.org">WorldFish&rsquo;s repository is live now</a></li>
<li>It&rsquo;s running DSpace 5.9-SNAPSHOT running on KnowledgeArc and the OAI and REST interfaces are active at least</li>
<li>Also, I notice they ended up registering a Handle (they had been considering taking KnowledgeArc&rsquo;s advice to <em>not</em> use Handles!)</li>
<li>Did some coordination work on the hotel bookings for the January AReS workshop in Amman</li>
</ul>
<h2 id="2018-12-17">2018-12-17</h2>
@ -479,7 +479,7 @@ $ ls -lh cgspace_2018-12-19.backup*
-rw-r--r-- 1 aorth aorth 94M Dec 20 11:36 cgspace_2018-12-19.backup.gz
-rw-r--r-- 1 aorth aorth 93M Dec 20 11:35 cgspace_2018-12-19.backup.xz
</code></pre><ul>
<li>Looks like it's really not worth it&hellip;</li>
<li>Looks like it&rsquo;s really not worth it&hellip;</li>
<li>Peter pointed out that Discovery filters for CTA subjects on item pages were not working</li>
<li>It looks like there were some mismatches in the Discovery index names and the XMLUI configuration, so I fixed them (<a href="https://github.com/ilri/DSpace/pull/406">#406</a>)</li>
<li>Peter asked if we could create a controlled vocabulary for publishers (<code>dc.publisher</code>)</li>
@ -491,7 +491,7 @@ $ ls -lh cgspace_2018-12-19.backup*
3522
(1 row)
</code></pre><ul>
<li>I reverted the metadata changes related to &ldquo;Unrestricted Access&rdquo; and &ldquo;Restricted Access&rdquo; on DSpace Test because we're not pushing forward with the new status terms for now</li>
<li>I reverted the metadata changes related to &ldquo;Unrestricted Access&rdquo; and &ldquo;Restricted Access&rdquo; on DSpace Test because we&rsquo;re not pushing forward with the new status terms for now</li>
<li>Purge remaining Oracle Java 8 stuff from CGSpace (linode18) since we migrated to OpenJDK a few months ago:</li>
</ul>
<pre><code># dpkg -P oracle-java8-installer oracle-java8-set-default
@ -514,7 +514,7 @@ Fixed 466 occurences of: Copyrighted; Any re-use allowed
# pg_dropcluster 9.5 main
# dpkg -l | grep postgresql | grep 9.5 | awk '{print $2}' | xargs dpkg -r
</code></pre><ul>
<li>I've been running PostgreSQL 9.6 for months on my local development and public DSpace Test (linode19) environments</li>
<li>I&rsquo;ve been running PostgreSQL 9.6 for months on my local development and public DSpace Test (linode19) environments</li>
<li>Run all system updates on CGSpace (linode18) and restart the server</li>
<li>Try to run the DSpace cleanup script on CGSpace (linode18), but I get some errors about foreign key constraints:</li>
</ul>
@ -564,7 +564,7 @@ UPDATE 1
1253 54.70.40.11
</code></pre><ul>
<li>All these look ok (<code>54.70.40.11</code> is known to us from earlier this month and should be reusing its Tomcat sessions)</li>
<li>So I'm not sure what was going on last night&hellip;</li>
<li>So I&rsquo;m not sure what was going on last night&hellip;</li>
</ul>
<!-- raw HTML omitted -->