--- title: "December, 2018" date: 2018-12-02T02:09:30+02:00 author: "Alan Orth" tags: ["Notes"] --- ## 2018-12-01 - Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK - I manually installed OpenJDK, then removed Oracle JDK, then re-ran the [Ansible playbook](http://github.com/ilri/rmg-ansible-public) to update all configuration files, etc - Then I ran all system updates and restarted the server ## 2018-12-02 - I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another [Ghostscript vulnerability last week](https://usn.ubuntu.com/3831-1/) - The error when I try to manually run the media filter for one item from the command line: ``` org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: FailedToExecuteCommand `"gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" -dFirstPage=1 -dLastPage=1 "-sOutputFile=/tmp/magick-12989PcFN0DnJOej7%d" "-f/tmp/magick-129895Bmp44lvUfxo" "-f/tmp/magick-12989C0QFG51fktLF"' (-1) @ error/delegate.c/ExternalDelegateCommand/461. org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: FailedToExecuteCommand `"gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" -dFirstPage=1 -dLastPage=1 "-sOutputFile=/tmp/magick-12989PcFN0DnJOej7%d" "-f/tmp/magick-129895Bmp44lvUfxo" "-f/tmp/magick-12989C0QFG51fktLF"' (-1) @ error/delegate.c/ExternalDelegateCommand/461. at org.im4java.core.Info.getBaseInfo(Info.java:360) at org.im4java.core.Info.(Info.java:151) at org.dspace.app.mediafilter.ImageMagickThumbnailFilter.getImageFile(ImageMagickThumbnailFilter.java:142) at org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter.getDestinationStream(ImageMagickPdfThumbnailFilter.java:24) at org.dspace.app.mediafilter.FormatFilter.processBitstream(FormatFilter.java:170) at org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:475) at org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:429) at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:401) at org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:237) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226) at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78) ``` - A comment on [StackOverflow question](https://stackoverflow.com/questions/53560755/ghostscript-9-26-update-breaks-imagick-readimage-for-multipage-pdf) from yesterday suggests it might be a bug with the `pngalpha` device in Ghostscript and [links to an upstream bug](https://bugs.ghostscript.com/show_bug.cgi?id=699815) - I think we need to wait for a fix from Ubuntu - For what it's worth, I get the same error on my local Arch Linux environment with Ghostscript 9.26: ``` $ gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=pngalpha -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r72x72 -dFirstPage=1 -dLastPage=1 -sOutputFile=/tmp/out%d -f/home/aorth/Desktop/Food\ safety\ Kenya\ fruits.pdf DEBUG: FC_WEIGHT didn't match zsh: segmentation fault (core dumped) gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 ``` - When I replace the `pngalpha` device with `png16m` as suggested in the StackOverflow comments it works: ``` $ gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=png16m -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r72x72 -dFirstPage=1 -dLastPage=1 -sOutputFile=/tmp/out%d -f/home/aorth/Desktop/Food\ safety\ Kenya\ fruits.pdf DEBUG: FC_WEIGHT didn't match ``` - Start proofing the latest round of 226 IITA archive records that Bosede sent last week and Sisay uploaded to DSpace Test this weekend ([IITA_Dec_1_1997 aka Daniel1807](https://dspacetest.cgiar.org/handle/10568/108298)) - One item missing the authorship type - Some invalid countries (smart quotes, mispellings) - Added countries to some items that mentioned research in particular countries in their abstracts - One item had "MADAGASCAR" for ISI Journal - Minor corrections in IITA subject (LIVELIHOOD→LIVELIHOODS) - Trim whitespace in abstract field - Fix some sponsors (though some with "Governments of Canada" etc I'm not sure why those are plural) - Eighteen items had `en||fr` for the language, but the content was only in French so changed them to just `fr` - Six items had encoding errors in French text so I will ask Bosede to re-do them carefully - Correct and normalize a few AGROVOC subjects - Expand my "encoding error" detection GREL to include `~` as I saw a lot of that in some copy pasted French text recently: ``` or( isNotNull(value.match(/.*\uFFFD.*/)), isNotNull(value.match(/.*\u00A0.*/)), isNotNull(value.match(/.*\u200A.*/)), isNotNull(value.match(/.*\u2019.*/)), isNotNull(value.match(/.*\u00b4.*/)), isNotNull(value.match(/.*\u007e.*/)) ) ``` ## 2018-12-03 - I looked at the DSpace Ghostscript issue more and it seems to only affect certain PDFs... - I can successfully generate a thumbnail for another recent item ([10568/98394](https://hdl.handle.net/10568/98394)), but not for [10568/98930](https://hdl.handle.net/10568/98390) - Even manually on my Arch Linux desktop with ghostscript 9.26-1 and the `pngalpha` device, I can generate a thumbnail for the first one (10568/98394): ``` $ gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=pngalpha -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r72x72 -dFirstPage=1 -dLastPage=1 -sOutputFile=/tmp/out%d -f/home/aorth/Desktop/Info\ Note\ Mainstreaming\ gender\ and\ social\ differentiation\ into\ CCAFS\ research\ activities\ in\ West\ Africa-converted.pdf ``` - So it seems to be something about the PDFs themselves, perhaps related to alpha support? - The first item (10568/98394) has the following information: ``` $ identify Info\ Note\ Mainstreaming\ gender\ and\ social\ differentiation\ into\ CCAFS\ research\ activities\ in\ West\ Africa-converted.pdf\[0\] Info Note Mainstreaming gender and social differentiation into CCAFS research activities in West Africa-converted.pdf[0]=>Info Note Mainstreaming gender and social differentiation into CCAFS research activities in West Africa-converted.pdf PDF 595x841 595x841+0+0 16-bit sRGB 107443B 0.000u 0:00.000 identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/1746. ``` - And wow, I can't even run ImageMagick's `identify` on the first page of the second item (10568/98930): ``` $ identify Food\ safety\ Kenya\ fruits.pdf\[0\] zsh: abort (core dumped) identify Food\ safety\ Kenya\ fruits.pdf\[0\] ``` - But with GraphicsMagick's `identify` it works: ``` $ gm identify Food\ safety\ Kenya\ fruits.pdf\[0\] DEBUG: FC_WEIGHT didn't match Food safety Kenya fruits.pdf PDF 612x792+0+0 DirectClass 8-bit 1.4Mi 0.000u 0m:0.000002s ``` - Interesting that ImageMagick's `identify` *does* work if you do not specify a page, perhaps as [alluded to in the recent Ghostscript bug report](https://bugs.ghostscript.com/show_bug.cgi?id=699815): ``` $ identify Food\ safety\ Kenya\ fruits.pdf Food safety Kenya fruits.pdf[0] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009 Food safety Kenya fruits.pdf[1] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009 Food safety Kenya fruits.pdf[2] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009 Food safety Kenya fruits.pdf[3] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009 Food safety Kenya fruits.pdf[4] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009 identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/1746. ``` - As I expected, ImageMagick cannot generate a thumbnail, but GraphicsMagick can (though it looks like crap): ``` $ convert Food\ safety\ Kenya\ fruits.pdf\[0\] -thumbnail 600x600 -flatten Food\ safety\ Kenya\ fruits.pdf.jpg zsh: abort (core dumped) convert Food\ safety\ Kenya\ fruits.pdf\[0\] -thumbnail 600x600 -flatten $ gm convert Food\ safety\ Kenya\ fruits.pdf\[0\] -thumbnail 600x600 -flatten Food\ safety\ Kenya\ fruits.pdf.jpg DEBUG: FC_WEIGHT didn't match ``` - I inspected the troublesome PDF using [jhove](http://jhove.openpreservation.org/) and noticed that it is using `ISO PDF/A-1, Level B` and the other one doesn't list a profile, though I don't think this is relevant - I found another item that fails when generating a thumbnail ([10568/98391](https://hdl.handle.net/10568/98391), DSpace complains: ``` org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: FailedToExecuteCommand `"gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" -dFirstPage=1 -dLastPage=1 "-sOutputFile=/tmp/magick-142966vQs5Di64ntH%d" "-f/tmp/magick-14296Q0rJjfCeIj3w" "-f/tmp/magick-14296k_K6MWqwvpDm"' (-1) @ error/delegate.c/ExternalDelegateCommand/461. org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: FailedToExecuteCommand `"gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" -dFirstPage=1 -dLastPage=1 "-sOutputFile=/tmp/magick-142966vQs5Di64ntH%d" "-f/tmp/magick-14296Q0rJjfCeIj3w" "-f/tmp/magick-14296k_K6MWqwvpDm"' (-1) @ error/delegate.c/ExternalDelegateCommand/461. at org.im4java.core.Info.getBaseInfo(Info.java:360) at org.im4java.core.Info.(Info.java:151) at org.dspace.app.mediafilter.ImageMagickThumbnailFilter.getImageFile(ImageMagickThumbnailFilter.java:142) at org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter.getDestinationStream(ImageMagickPdfThumbnailFilter.java:24) at org.dspace.app.mediafilter.FormatFilter.processBitstream(FormatFilter.java:170) at org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:475) at org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:429) at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:401) at org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:237) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226) at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78) Caused by: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: FailedToExecuteCommand `"gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" -dFirstPage=1 -dLastPage=1 "-sOutputFile=/tmp/magick-142966vQs5Di64ntH%d" "-f/tmp/magick-14296Q0rJjfCeIj3w" "-f/tmp/magick-14296k_K6MWqwvpDm"' (-1) @ error/delegate.c/ExternalDelegateCommand/461. at org.im4java.core.ImageCommand.run(ImageCommand.java:219) at org.im4java.core.Info.getBaseInfo(Info.java:342) ... 14 more Caused by: org.im4java.core.CommandException: identify: FailedToExecuteCommand `"gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" -dFirstPage=1 -dLastPage=1 "-sOutputFile=/tmp/magick-142966vQs5Di64ntH%d" "-f/tmp/magick-14296Q0rJjfCeIj3w" "-f/tmp/magick-14296k_K6MWqwvpDm"' (-1) @ error/delegate.c/ExternalDelegateCommand/461. at org.im4java.core.ImageCommand.finished(ImageCommand.java:253) at org.im4java.process.ProcessStarter.run(ProcessStarter.java:314) at org.im4java.core.ImageCommand.run(ImageCommand.java:215) ... 15 more ``` - And on my Arch Linux environment ImageMagick's `convert` also segfaults: ``` $ convert bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf\[0\] -thumbnail x600 -flatten bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf.jpg zsh: abort (core dumped) convert bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf\[0\] x60 ``` - But GraphicsMagick's `convert` works: ``` $ gm convert bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf\[0\] -thumbnail x600 -flatten bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf.jpg ``` - So far the only thing that stands out is that the two files that don't work were created with Microsoft Office 2016: ``` $ pdfinfo bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf | grep -E '^(Creator|Producer)' Creator: Microsoft® Word 2016 Producer: Microsoft® Word 2016 $ pdfinfo Food\ safety\ Kenya\ fruits.pdf | grep -E '^(Creator|Producer)' Creator: Microsoft® Word 2016 Producer: Microsoft® Word 2016 ``` - And the one that works was created with Office 365: ``` $ pdfinfo Info\ Note\ Mainstreaming\ gender\ and\ social\ differentiation\ into\ CCAFS\ research\ activities\ in\ West\ Africa-converted.pdf | grep -E '^(Creator|Producer)' Creator: Microsoft® Word for Office 365 Producer: Microsoft® Word for Office 365 ```