diff --git a/content/posts/2018-12.md b/content/posts/2018-12.md index 1a670c3d5..f47d18202 100644 --- a/content/posts/2018-12.md +++ b/content/posts/2018-12.md @@ -80,4 +80,61 @@ or( ) ``` +## 2018-12-03 + +- I looked at the DSpace Ghostscript issue more and it seems to only affect certain PDFs... +- I can successfully generate a thumbnail for another recent item ([10568/98394](https://hdl.handle.net/10568/98394)), but not for [10568/98930](https://hdl.handle.net/10568/98390) +- Even manually on my Arch Linux desktop with ghostscript 9.26-1 and the `pngalpha` device, I can generate a thumbnail for the first one (10568/98394): + +``` +$ gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=pngalpha -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r72x72 -dFirstPage=1 -dLastPage=1 -sOutputFile=/tmp/out%d -f/home/aorth/Desktop/Info\ Note\ Mainstreaming\ gender\ and\ social\ differentiation\ into\ CCAFS\ research\ activities\ in\ West\ Africa-converted.pdf +``` + +- So it seems to be something about the PDFs themselves, perhaps related to alpha support? +- The first item (10568/98394) has the following information: + +``` +$ identify Info\ Note\ Mainstreaming\ gender\ and\ social\ differentiation\ into\ CCAFS\ research\ activities\ in\ West\ Africa-converted.pdf\[0\] +Info Note Mainstreaming gender and social differentiation into CCAFS research activities in West Africa-converted.pdf[0]=>Info Note Mainstreaming gender and social differentiation into CCAFS research activities in West Africa-converted.pdf PDF 595x841 595x841+0+0 16-bit sRGB 107443B 0.000u 0:00.000 +identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/1746. +``` + +- And wow, I can't even run ImageMagick's `identify` on the first page of the second item (10568/98930): + +``` +$ identify Food\ safety\ Kenya\ fruits.pdf\[0\] +zsh: abort (core dumped) identify Food\ safety\ Kenya\ fruits.pdf\[0\] +``` + +- But with GraphicsMagick's `identify` it works: + +``` +$ gm identify Food\ safety\ Kenya\ fruits.pdf\[0\] +DEBUG: FC_WEIGHT didn't match +Food safety Kenya fruits.pdf PDF 612x792+0+0 DirectClass 8-bit 1.4Mi 0.000u 0m:0.000002s +``` + +- Interesting that ImageMagick's `identify` *does* work if you do not specify a page, perhaps as [alluded to in the recent Ghostscript bug report](https://bugs.ghostscript.com/show_bug.cgi?id=699815): + +``` +$ identify Food\ safety\ Kenya\ fruits.pdf +Food safety Kenya fruits.pdf[0] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009 +Food safety Kenya fruits.pdf[1] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009 +Food safety Kenya fruits.pdf[2] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009 +Food safety Kenya fruits.pdf[3] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009 +Food safety Kenya fruits.pdf[4] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009 +identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/1746. +``` + +- As I expected, ImageMagick cannot generate a thumbnail, but GraphicsMagick can (though it looks like crap): + +``` +$ convert Food\ safety\ Kenya\ fruits.pdf\[0\] -thumbnail 600x600 -flatten Food\ safety\ Kenya\ fruits.pdf.jpg +zsh: abort (core dumped) convert Food\ safety\ Kenya\ fruits.pdf\[0\] -thumbnail 600x600 -flatten +$ gm convert Food\ safety\ Kenya\ fruits.pdf\[0\] -thumbnail 600x600 -flatten Food\ safety\ Kenya\ fruits.pdf.jpg +DEBUG: FC_WEIGHT didn't match +``` + +- I inspected the troublesome PDF using [jhove](http://jhove.openpreservation.org/) and noticed that it is using `ISO PDF/A-1, Level B` and the other one doesn't list a profile, though I don't think this is relevant + diff --git a/docs/2018-12/index.html b/docs/2018-12/index.html index 35f0cbdca..ae6e47f88 100644 --- a/docs/2018-12/index.html +++ b/docs/2018-12/index.html @@ -21,7 +21,7 @@ I noticed that there is another issue with PDF thumbnails on CGSpace, and I see " /> - + @@ -48,9 +48,9 @@ I noticed that there is another issue with PDF thumbnails on CGSpace, and I see "@type": "BlogPosting", "headline": "December, 2018", "url": "https://alanorth.github.io/cgspace-notes/2018-12/", - "wordCount": "463", + "wordCount": "875", "datePublished": "2018-12-02T02:09:30+02:00", - "dateModified": "2018-12-02T10:57:41+02:00", + "dateModified": "2018-12-02T17:55:32+02:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -200,6 +200,71 @@ DEBUG: FC_WEIGHT didn't match ) +
pngalpha
device, I can generate a thumbnail for the first one (10568⁄98394):$ gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=pngalpha -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r72x72 -dFirstPage=1 -dLastPage=1 -sOutputFile=/tmp/out%d -f/home/aorth/Desktop/Info\ Note\ Mainstreaming\ gender\ and\ social\ differentiation\ into\ CCAFS\ research\ activities\ in\ West\ Africa-converted.pdf
+
+
+$ identify Info\ Note\ Mainstreaming\ gender\ and\ social\ differentiation\ into\ CCAFS\ research\ activities\ in\ West\ Africa-converted.pdf\[0\]
+Info Note Mainstreaming gender and social differentiation into CCAFS research activities in West Africa-converted.pdf[0]=>Info Note Mainstreaming gender and social differentiation into CCAFS research activities in West Africa-converted.pdf PDF 595x841 595x841+0+0 16-bit sRGB 107443B 0.000u 0:00.000
+identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/1746.
+
+
+identify
on the first page of the second item (10568⁄98930):$ identify Food\ safety\ Kenya\ fruits.pdf\[0\]
+zsh: abort (core dumped) identify Food\ safety\ Kenya\ fruits.pdf\[0\]
+
+
+identify
it works:$ gm identify Food\ safety\ Kenya\ fruits.pdf\[0\]
+DEBUG: FC_WEIGHT didn't match
+Food safety Kenya fruits.pdf PDF 612x792+0+0 DirectClass 8-bit 1.4Mi 0.000u 0m:0.000002s
+
+
+identify
does work if you do not specify a page, perhaps as alluded to in the recent Ghostscript bug report:$ identify Food\ safety\ Kenya\ fruits.pdf
+Food safety Kenya fruits.pdf[0] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009
+Food safety Kenya fruits.pdf[1] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009
+Food safety Kenya fruits.pdf[2] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009
+Food safety Kenya fruits.pdf[3] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009
+Food safety Kenya fruits.pdf[4] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009
+identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/1746.
+
+
+$ convert Food\ safety\ Kenya\ fruits.pdf\[0\] -thumbnail 600x600 -flatten Food\ safety\ Kenya\ fruits.pdf.jpg
+zsh: abort (core dumped) convert Food\ safety\ Kenya\ fruits.pdf\[0\] -thumbnail 600x600 -flatten
+$ gm convert Food\ safety\ Kenya\ fruits.pdf\[0\] -thumbnail 600x600 -flatten Food\ safety\ Kenya\ fruits.pdf.jpg
+DEBUG: FC_WEIGHT didn't match
+
+
+ISO PDF/A-1, Level B
and the other one doesn’t list a profile, though I don’t think this is relevant