Add notes for 2021-09-13

This commit is contained in:
2021-09-13 16:21:16 +03:00
parent 8b487a4a77
commit c05c7213c2
109 changed files with 2627 additions and 2530 deletions

View File

@ -36,7 +36,7 @@ Then I ran all system updates and restarted the server
I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another Ghostscript vulnerability last week
"/>
<meta name="generator" content="Hugo 0.87.0" />
<meta name="generator" content="Hugo 0.88.1" />
@ -135,7 +135,7 @@ I noticed that there is another issue with PDF thumbnails on CGSpace, and I see
<ul>
<li>The error when I try to manually run the media filter for one item from the command line:</li>
</ul>
<pre><code>org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: FailedToExecuteCommand `&quot;gs&quot; -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 &quot;-sDEVICE=pngalpha&quot; -dTextAlphaBits=4 -dGraphicsAlphaBits=4 &quot;-r72x72&quot; -dFirstPage=1 -dLastPage=1 &quot;-sOutputFile=/tmp/magick-12989PcFN0DnJOej7%d&quot; &quot;-f/tmp/magick-129895Bmp44lvUfxo&quot; &quot;-f/tmp/magick-12989C0QFG51fktLF&quot;' (-1) @ error/delegate.c/ExternalDelegateCommand/461.
<pre tabindex="0"><code>org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: FailedToExecuteCommand `&quot;gs&quot; -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 &quot;-sDEVICE=pngalpha&quot; -dTextAlphaBits=4 -dGraphicsAlphaBits=4 &quot;-r72x72&quot; -dFirstPage=1 -dLastPage=1 &quot;-sOutputFile=/tmp/magick-12989PcFN0DnJOej7%d&quot; &quot;-f/tmp/magick-129895Bmp44lvUfxo&quot; &quot;-f/tmp/magick-12989C0QFG51fktLF&quot;' (-1) @ error/delegate.c/ExternalDelegateCommand/461.
org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: FailedToExecuteCommand `&quot;gs&quot; -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 &quot;-sDEVICE=pngalpha&quot; -dTextAlphaBits=4 -dGraphicsAlphaBits=4 &quot;-r72x72&quot; -dFirstPage=1 -dLastPage=1 &quot;-sOutputFile=/tmp/magick-12989PcFN0DnJOej7%d&quot; &quot;-f/tmp/magick-129895Bmp44lvUfxo&quot; &quot;-f/tmp/magick-12989C0QFG51fktLF&quot;' (-1) @ error/delegate.c/ExternalDelegateCommand/461.
at org.im4java.core.Info.getBaseInfo(Info.java:360)
at org.im4java.core.Info.&lt;init&gt;(Info.java:151)
@ -157,13 +157,13 @@ org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.c
<li>I think we need to wait for a fix from Ubuntu</li>
<li>For what it&rsquo;s worth, I get the same error on my local Arch Linux environment with Ghostscript 9.26:</li>
</ul>
<pre><code>$ gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=pngalpha -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r72x72 -dFirstPage=1 -dLastPage=1 -sOutputFile=/tmp/out%d -f/home/aorth/Desktop/Food\ safety\ Kenya\ fruits.pdf
<pre tabindex="0"><code>$ gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=pngalpha -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r72x72 -dFirstPage=1 -dLastPage=1 -sOutputFile=/tmp/out%d -f/home/aorth/Desktop/Food\ safety\ Kenya\ fruits.pdf
DEBUG: FC_WEIGHT didn't match
zsh: segmentation fault (core dumped) gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000
</code></pre><ul>
<li>When I replace the <code>pngalpha</code> device with <code>png16m</code> as suggested in the StackOverflow comments it works:</li>
</ul>
<pre><code>$ gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=png16m -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r72x72 -dFirstPage=1 -dLastPage=1 -sOutputFile=/tmp/out%d -f/home/aorth/Desktop/Food\ safety\ Kenya\ fruits.pdf
<pre tabindex="0"><code>$ gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=png16m -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r72x72 -dFirstPage=1 -dLastPage=1 -sOutputFile=/tmp/out%d -f/home/aorth/Desktop/Food\ safety\ Kenya\ fruits.pdf
DEBUG: FC_WEIGHT didn't match
</code></pre><ul>
<li>Start proofing the latest round of 226 IITA archive records that Bosede sent last week and Sisay uploaded to DSpace Test this weekend (<a href="https://dspacetest.cgiar.org/handle/10568/108298">IITA_Dec_1_1997 aka Daniel1807</a>)
@ -182,7 +182,7 @@ DEBUG: FC_WEIGHT didn't match
</li>
<li>Expand my &ldquo;encoding error&rdquo; detection GREL to include <code>~</code> as I saw a lot of that in some copy pasted French text recently:</li>
</ul>
<pre><code>or(
<pre tabindex="0"><code>or(
isNotNull(value.match(/.*\uFFFD.*/)),
isNotNull(value.match(/.*\u00A0.*/)),
isNotNull(value.match(/.*\u200A.*/)),
@ -196,29 +196,29 @@ DEBUG: FC_WEIGHT didn't match
<li>I can successfully generate a thumbnail for another recent item (<a href="https://hdl.handle.net/10568/98394">10568/98394</a>), but not for <a href="https://hdl.handle.net/10568/98390">10568/98930</a></li>
<li>Even manually on my Arch Linux desktop with ghostscript 9.26-1 and the <code>pngalpha</code> device, I can generate a thumbnail for the first one (10568/98394):</li>
</ul>
<pre><code>$ gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=pngalpha -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r72x72 -dFirstPage=1 -dLastPage=1 -sOutputFile=/tmp/out%d -f/home/aorth/Desktop/Info\ Note\ Mainstreaming\ gender\ and\ social\ differentiation\ into\ CCAFS\ research\ activities\ in\ West\ Africa-converted.pdf
<pre tabindex="0"><code>$ gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=pngalpha -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r72x72 -dFirstPage=1 -dLastPage=1 -sOutputFile=/tmp/out%d -f/home/aorth/Desktop/Info\ Note\ Mainstreaming\ gender\ and\ social\ differentiation\ into\ CCAFS\ research\ activities\ in\ West\ Africa-converted.pdf
</code></pre><ul>
<li>So it seems to be something about the PDFs themselves, perhaps related to alpha support?</li>
<li>The first item (10568/98394) has the following information:</li>
</ul>
<pre><code>$ identify Info\ Note\ Mainstreaming\ gender\ and\ social\ differentiation\ into\ CCAFS\ research\ activities\ in\ West\ Africa-converted.pdf\[0\]
<pre tabindex="0"><code>$ identify Info\ Note\ Mainstreaming\ gender\ and\ social\ differentiation\ into\ CCAFS\ research\ activities\ in\ West\ Africa-converted.pdf\[0\]
Info Note Mainstreaming gender and social differentiation into CCAFS research activities in West Africa-converted.pdf[0]=&gt;Info Note Mainstreaming gender and social differentiation into CCAFS research activities in West Africa-converted.pdf PDF 595x841 595x841+0+0 16-bit sRGB 107443B 0.000u 0:00.000
identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/1746.
</code></pre><ul>
<li>And wow, I can&rsquo;t even run ImageMagick&rsquo;s <code>identify</code> on the first page of the second item (10568/98930):</li>
</ul>
<pre><code>$ identify Food\ safety\ Kenya\ fruits.pdf\[0\]
<pre tabindex="0"><code>$ identify Food\ safety\ Kenya\ fruits.pdf\[0\]
zsh: abort (core dumped) identify Food\ safety\ Kenya\ fruits.pdf\[0\]
</code></pre><ul>
<li>But with GraphicsMagick&rsquo;s <code>identify</code> it works:</li>
</ul>
<pre><code>$ gm identify Food\ safety\ Kenya\ fruits.pdf\[0\]
<pre tabindex="0"><code>$ gm identify Food\ safety\ Kenya\ fruits.pdf\[0\]
DEBUG: FC_WEIGHT didn't match
Food safety Kenya fruits.pdf PDF 612x792+0+0 DirectClass 8-bit 1.4Mi 0.000u 0m:0.000002s
</code></pre><ul>
<li>Interesting that ImageMagick&rsquo;s <code>identify</code> <em>does</em> work if you do not specify a page, perhaps as <a href="https://bugs.ghostscript.com/show_bug.cgi?id=699815">alluded to in the recent Ghostscript bug report</a>:</li>
</ul>
<pre><code>$ identify Food\ safety\ Kenya\ fruits.pdf
<pre tabindex="0"><code>$ identify Food\ safety\ Kenya\ fruits.pdf
Food safety Kenya fruits.pdf[0] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009
Food safety Kenya fruits.pdf[1] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009
Food safety Kenya fruits.pdf[2] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009
@ -228,7 +228,7 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
</code></pre><ul>
<li>As I expected, ImageMagick cannot generate a thumbnail, but GraphicsMagick can (though it looks like crap):</li>
</ul>
<pre><code>$ convert Food\ safety\ Kenya\ fruits.pdf\[0\] -thumbnail 600x600 -flatten Food\ safety\ Kenya\ fruits.pdf.jpg
<pre tabindex="0"><code>$ convert Food\ safety\ Kenya\ fruits.pdf\[0\] -thumbnail 600x600 -flatten Food\ safety\ Kenya\ fruits.pdf.jpg
zsh: abort (core dumped) convert Food\ safety\ Kenya\ fruits.pdf\[0\] -thumbnail 600x600 -flatten
$ gm convert Food\ safety\ Kenya\ fruits.pdf\[0\] -thumbnail 600x600 -flatten Food\ safety\ Kenya\ fruits.pdf.jpg
DEBUG: FC_WEIGHT didn't match
@ -236,7 +236,7 @@ DEBUG: FC_WEIGHT didn't match
<li>I inspected the troublesome PDF using <a href="http://jhove.openpreservation.org/">jhove</a> and noticed that it is using <code>ISO PDF/A-1, Level B</code> and the other one doesn&rsquo;t list a profile, though I don&rsquo;t think this is relevant</li>
<li>I found another item that fails when generating a thumbnail (<a href="https://hdl.handle.net/10568/98391">10568/98391</a>, DSpace complains:</li>
</ul>
<pre><code>org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: FailedToExecuteCommand `&quot;gs&quot; -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 &quot;-sDEVICE=pngalpha&quot; -dTextAlphaBits=4 -dGraphicsAlphaBits=4 &quot;-r72x72&quot; -dFirstPage=1 -dLastPage=1 &quot;-sOutputFile=/tmp/magick-142966vQs5Di64ntH%d&quot; &quot;-f/tmp/magick-14296Q0rJjfCeIj3w&quot; &quot;-f/tmp/magick-14296k_K6MWqwvpDm&quot;' (-1) @ error/delegate.c/ExternalDelegateCommand/461.
<pre tabindex="0"><code>org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: FailedToExecuteCommand `&quot;gs&quot; -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 &quot;-sDEVICE=pngalpha&quot; -dTextAlphaBits=4 -dGraphicsAlphaBits=4 &quot;-r72x72&quot; -dFirstPage=1 -dLastPage=1 &quot;-sOutputFile=/tmp/magick-142966vQs5Di64ntH%d&quot; &quot;-f/tmp/magick-14296Q0rJjfCeIj3w&quot; &quot;-f/tmp/magick-14296k_K6MWqwvpDm&quot;' (-1) @ error/delegate.c/ExternalDelegateCommand/461.
org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: FailedToExecuteCommand `&quot;gs&quot; -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 &quot;-sDEVICE=pngalpha&quot; -dTextAlphaBits=4 -dGraphicsAlphaBits=4 &quot;-r72x72&quot; -dFirstPage=1 -dLastPage=1 &quot;-sOutputFile=/tmp/magick-142966vQs5Di64ntH%d&quot; &quot;-f/tmp/magick-14296Q0rJjfCeIj3w&quot; &quot;-f/tmp/magick-14296k_K6MWqwvpDm&quot;' (-1) @ error/delegate.c/ExternalDelegateCommand/461.
at org.im4java.core.Info.getBaseInfo(Info.java:360)
at org.im4java.core.Info.&lt;init&gt;(Info.java:151)
@ -265,16 +265,16 @@ Caused by: org.im4java.core.CommandException: identify: FailedToExecuteCommand `
</code></pre><ul>
<li>And on my Arch Linux environment ImageMagick&rsquo;s <code>convert</code> also segfaults:</li>
</ul>
<pre><code>$ convert bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf\[0\] -thumbnail x600 -flatten bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf.jpg
<pre tabindex="0"><code>$ convert bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf\[0\] -thumbnail x600 -flatten bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf.jpg
zsh: abort (core dumped) convert bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf\[0\] x60
</code></pre><ul>
<li>But GraphicsMagick&rsquo;s <code>convert</code> works:</li>
</ul>
<pre><code>$ gm convert bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf\[0\] -thumbnail x600 -flatten bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf.jpg
<pre tabindex="0"><code>$ gm convert bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf\[0\] -thumbnail x600 -flatten bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf.jpg
</code></pre><ul>
<li>So far the only thing that stands out is that the two files that don&rsquo;t work were created with Microsoft Office 2016:</li>
</ul>
<pre><code>$ pdfinfo bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf | grep -E '^(Creator|Producer)'
<pre tabindex="0"><code>$ pdfinfo bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf | grep -E '^(Creator|Producer)'
Creator: Microsoft® Word 2016
Producer: Microsoft® Word 2016
$ pdfinfo Food\ safety\ Kenya\ fruits.pdf | grep -E '^(Creator|Producer)'
@ -283,13 +283,13 @@ Producer: Microsoft® Word 2016
</code></pre><ul>
<li>And the one that works was created with Office 365:</li>
</ul>
<pre><code>$ pdfinfo Info\ Note\ Mainstreaming\ gender\ and\ social\ differentiation\ into\ CCAFS\ research\ activities\ in\ West\ Africa-converted.pdf | grep -E '^(Creator|Producer)'
<pre tabindex="0"><code>$ pdfinfo Info\ Note\ Mainstreaming\ gender\ and\ social\ differentiation\ into\ CCAFS\ research\ activities\ in\ West\ Africa-converted.pdf | grep -E '^(Creator|Producer)'
Creator: Microsoft® Word for Office 365
Producer: Microsoft® Word for Office 365
</code></pre><ul>
<li>I remembered an old technique I was using to generate thumbnails in 2015 using Inkscape followed by ImageMagick or GraphicsMagick:</li>
</ul>
<pre><code>$ inkscape Food\ safety\ Kenya\ fruits.pdf -z --export-dpi=72 --export-area-drawing --export-png='cover.png'
<pre tabindex="0"><code>$ inkscape Food\ safety\ Kenya\ fruits.pdf -z --export-dpi=72 --export-area-drawing --export-png='cover.png'
$ gm convert -resize x600 -flatten -quality 85 cover.png cover.jpg
</code></pre><ul>
<li>I&rsquo;ve tried a few times this week to register for the <a href="https://www.evisa.gov.et/">Ethiopian eVisa website</a>, but it is never successful</li>
@ -304,7 +304,7 @@ $ gm convert -resize x600 -flatten -quality 85 cover.png cover.jpg
</ul>
</li>
</ul>
<pre><code>2018-12-03 15:44:00,030 WARN org.dspace.core.ConfigurationManager @ Requested configuration module: atmire-datatables not found
<pre tabindex="0"><code>2018-12-03 15:44:00,030 WARN org.dspace.core.ConfigurationManager @ Requested configuration module: atmire-datatables not found
2018-12-03 15:44:03,390 ERROR com.atmire.app.webui.servlet.ExportServlet @ Error converter plugin not found: interface org.infoCon.ConverterPlugin
...
2018-12-03 15:45:01,667 WARN org.dspace.core.ConfigurationManager @ Requested configuration module: atmire-listing-and-reports not found
@ -312,7 +312,7 @@ $ gm convert -resize x600 -flatten -quality 85 cover.png cover.jpg
<li>I tested it on my local environment with Tomcat 8.5.34 and the JSPUI application still has an error (again, the logs show something about tag cloud, so be unrelated), and the Listings and Reports still asks you to log in again, despite already being logged in in XMLUI, but does appear to work (I generated a report and exported a PDF)</li>
<li>I think the errors about missing Atmire components must be important, here on my local machine as well (though not the one about atmire-listings-and-reports):</li>
</ul>
<pre><code>2018-12-03 16:44:00,009 WARN org.dspace.core.ConfigurationManager @ Requested configuration module: atmire-datatables not found
<pre tabindex="0"><code>2018-12-03 16:44:00,009 WARN org.dspace.core.ConfigurationManager @ Requested configuration module: atmire-datatables not found
</code></pre><ul>
<li>This has got to be part Ubuntu Tomcat packaging, and part DSpace 5.x Tomcat 8.5 readiness&hellip;?</li>
</ul>
@ -320,7 +320,7 @@ $ gm convert -resize x600 -flatten -quality 85 cover.png cover.jpg
<ul>
<li>Last night Linode sent a message that the load on CGSpace (linode18) was too high, here&rsquo;s a list of the top users at the time and throughout the day:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Dec/2018:1(5|6|7|8)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Dec/2018:1(5|6|7|8)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
225 40.77.167.142
226 66.249.64.63
232 46.101.86.248
@ -345,30 +345,30 @@ $ gm convert -resize x600 -flatten -quality 85 cover.png cover.jpg
</code></pre><ul>
<li><code>35.237.175.180</code> is known to us (CCAFS?), and I&rsquo;ve already added it to the list of bot IPs in nginx, which appears to be working:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=35.237.175.180' dspace.log.2018-12-03
<pre tabindex="0"><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=35.237.175.180' dspace.log.2018-12-03
4772
$ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=35.237.175.180' dspace.log.2018-12-03 | sort | uniq | wc -l
630
</code></pre><ul>
<li>I haven&rsquo;t seen <code>2a01:4f8:140:3192::2</code> before. Its user agent is some new bot:</li>
</ul>
<pre><code>Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)
<pre tabindex="0"><code>Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)
</code></pre><ul>
<li>At least it seems the Tomcat Crawler Session Manager Valve is working to re-use the common bot XMLUI sessions:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=2a01:4f8:140:3192::2' dspace.log.2018-12-03
<pre tabindex="0"><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=2a01:4f8:140:3192::2' dspace.log.2018-12-03
5111
$ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=2a01:4f8:140:3192::2' dspace.log.2018-12-03 | sort | uniq | wc -l
419
</code></pre><ul>
<li><code>78.46.79.71</code> is another host on Hetzner with the following user agent:</li>
</ul>
<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
<pre tabindex="0"><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
</code></pre><ul>
<li>This is not the first time a host on Hetzner has used a &ldquo;normal&rdquo; user agent to make thousands of requests</li>
<li>At least it is re-using its Tomcat sessions somehow:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.79.71' dspace.log.2018-12-03
<pre tabindex="0"><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.79.71' dspace.log.2018-12-03
2044
$ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.79.71' dspace.log.2018-12-03 | sort | uniq | wc -l
1
@ -385,7 +385,7 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.79.71' dspace.log.2018-12-03
<li>Linode sent a message that the CPU usage of CGSpace (linode18) is too high last night</li>
<li>I looked in the logs and there&rsquo;s nothing particular going on:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;05/Dec/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;05/Dec/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1225 157.55.39.177
1240 207.46.13.12
1261 207.46.13.101
@ -399,11 +399,11 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.79.71' dspace.log.2018-12-03
</code></pre><ul>
<li><code>54.70.40.11</code> is some new bot with the following user agent:</li>
</ul>
<pre><code>Mozilla/5.0 (compatible) SemanticScholarBot (+https://www.semanticscholar.org/crawler)
<pre tabindex="0"><code>Mozilla/5.0 (compatible) SemanticScholarBot (+https://www.semanticscholar.org/crawler)
</code></pre><ul>
<li>But Tomcat is forcing them to re-use their Tomcat sessions with the Crawler Session Manager valve:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=54.70.40.11' dspace.log.2018-12-05
<pre tabindex="0"><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=54.70.40.11' dspace.log.2018-12-05
6980
$ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=54.70.40.11' dspace.log.2018-12-05 | sort | uniq | wc -l
1156
@ -446,7 +446,7 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=54.70.40.11' dspace.log.2018-12-05
<li>Linode alerted me twice today that the load on CGSpace (linode18) was very high</li>
<li>Looking at the nginx logs I see a few new IPs in the top 10:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;17/Dec/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;17/Dec/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
927 157.55.39.81
975 54.70.40.11
2090 50.116.102.77
@ -460,7 +460,7 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=54.70.40.11' dspace.log.2018-12-05
</code></pre><ul>
<li><code>94.71.244.172</code> and <code>143.233.227.216</code> are both in Greece and use the following user agent:</li>
</ul>
<pre><code>Mozilla/3.0 (compatible; Indy Library)
<pre tabindex="0"><code>Mozilla/3.0 (compatible; Indy Library)
</code></pre><ul>
<li>I see that I added this bot to the Tomcat Crawler Session Manager valve in 2017-12 so its XMLUI sessions are getting re-used</li>
<li><code>2a01:4f8:173:1e85::2</code> is some new bot called <code>BLEXBot/1.0</code> which should be matching the existing &ldquo;bot&rdquo; pattern in the Tomcat Crawler Session Manager regex</li>
@ -477,7 +477,7 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=54.70.40.11' dspace.log.2018-12-05
<ul>
<li>Testing compression of PostgreSQL backups with xz and gzip:</li>
</ul>
<pre><code>$ time xz -c cgspace_2018-12-19.backup &gt; cgspace_2018-12-19.backup.xz
<pre tabindex="0"><code>$ time xz -c cgspace_2018-12-19.backup &gt; cgspace_2018-12-19.backup.xz
xz -c cgspace_2018-12-19.backup &gt; cgspace_2018-12-19.backup.xz 48.29s user 0.19s system 99% cpu 48.579 total
$ time gzip -c cgspace_2018-12-19.backup &gt; cgspace_2018-12-19.backup.gz
gzip -c cgspace_2018-12-19.backup &gt; cgspace_2018-12-19.backup.gz 2.78s user 0.09s system 99% cpu 2.899 total
@ -492,7 +492,7 @@ $ ls -lh cgspace_2018-12-19.backup*
<li>Peter asked if we could create a controlled vocabulary for publishers (<code>dc.publisher</code>)</li>
<li>I see we have about 3500 distinct publishers:</li>
</ul>
<pre><code># SELECT COUNT(DISTINCT(text_value)) FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=39;
<pre tabindex="0"><code># SELECT COUNT(DISTINCT(text_value)) FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=39;
count
-------
3522
@ -501,17 +501,17 @@ $ ls -lh cgspace_2018-12-19.backup*
<li>I reverted the metadata changes related to &ldquo;Unrestricted Access&rdquo; and &ldquo;Restricted Access&rdquo; on DSpace Test because we&rsquo;re not pushing forward with the new status terms for now</li>
<li>Purge remaining Oracle Java 8 stuff from CGSpace (linode18) since we migrated to OpenJDK a few months ago:</li>
</ul>
<pre><code># dpkg -P oracle-java8-installer oracle-java8-set-default
<pre tabindex="0"><code># dpkg -P oracle-java8-installer oracle-java8-set-default
</code></pre><ul>
<li>Update usage rights on CGSpace as we agreed with Maria Garruccio and Peter last month:</li>
</ul>
<pre><code>$ ./fix-metadata-values.py -i /tmp/2018-11-27-update-rights.csv -f dc.rights -t correct -m 53 -db dspace -u dspace -p 'fuu' -d
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2018-11-27-update-rights.csv -f dc.rights -t correct -m 53 -db dspace -u dspace -p 'fuu' -d
Connected to database.
Fixed 466 occurences of: Copyrighted; Any re-use allowed
</code></pre><ul>
<li>Upgrade PostgreSQL on CGSpace (linode18) from 9.5 to 9.6:</li>
</ul>
<pre><code># apt install postgresql-9.6 postgresql-client-9.6 postgresql-contrib-9.6 postgresql-server-dev-9.6
<pre tabindex="0"><code># apt install postgresql-9.6 postgresql-client-9.6 postgresql-contrib-9.6 postgresql-server-dev-9.6
# pg_ctlcluster 9.5 main stop
# tar -cvzpf var-lib-postgresql-9.5.tar.gz /var/lib/postgresql/9.5
# tar -cvzpf etc-postgresql-9.5.tar.gz /etc/postgresql/9.5
@ -525,7 +525,7 @@ Fixed 466 occurences of: Copyrighted; Any re-use allowed
<li>Run all system updates on CGSpace (linode18) and restart the server</li>
<li>Try to run the DSpace cleanup script on CGSpace (linode18), but I get some errors about foreign key constraints:</li>
</ul>
<pre><code>$ dspace cleanup -v
<pre tabindex="0"><code>$ dspace cleanup -v
- Deleting bitstream information (ID: 158227)
- Deleting bitstream record from database (ID: 158227)
Error: ERROR: update or delete on table &quot;bitstream&quot; violates foreign key constraint &quot;bundle_primary_bitstream_id_fkey&quot; on table &quot;bundle&quot;
@ -534,7 +534,7 @@ Error: ERROR: update or delete on table &quot;bitstream&quot; violates foreign k
</code></pre><ul>
<li>As always, the solution is to delete those IDs manually in PostgreSQL:</li>
</ul>
<pre><code>$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (158227, 158251);'
<pre tabindex="0"><code>$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (158227, 158251);'
UPDATE 1
</code></pre><ul>
<li>After all that I started a full Discovery reindex to get the index name changes and rights updates</li>
@ -544,7 +544,7 @@ UPDATE 1
<li>CGSpace went down today for a few minutes while I was at dinner and I quickly restarted Tomcat</li>
<li>The top IP addresses as of this evening are:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;29/Dec/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;29/Dec/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
963 40.77.167.152
987 35.237.175.180
1062 40.77.167.55
@ -558,7 +558,7 @@ UPDATE 1
</code></pre><ul>
<li>And just around the time of the alert:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/*.log.1 /var/log/nginx/*.log.2.gz | grep -E &quot;29/Dec/2018:1(6|7|8)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log.1 /var/log/nginx/*.log.2.gz | grep -E &quot;29/Dec/2018:1(6|7|8)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
115 66.249.66.223
118 207.46.13.14
123 34.218.226.147