diff --git a/content/posts/2022-10.md b/content/posts/2022-10.md index 2ae9e5010..efb3bb6b9 100644 --- a/content/posts/2022-10.md +++ b/content/posts/2022-10.md @@ -555,4 +555,74 @@ $ ./ilri/fix-metadata-values.py -i 2022-10-18-update-initiatives.csv -db dspace - I created a new "TIP test" collection under Alliance's community and added the users accordingly - I think I'll be able to just add these two submit/approve users to the Alliance Admins and Alliance Editors groups once we're ready +## 2022-10-19 + +- I submitted a [bug report for the two-page portrait layout of some PDF thumbnails](https://bugs.ghostscript.com/show_bug.cgi?id=705994) on Ghostscript's bug tracker + - For reference, the thumbnail for PDFs like in [10568/116598](https://hdl.handle.net/10568/116598) looks like this: + +![gs thumbnail](/cgspace-notes/2022/10/gs-10568-116598.pdf.jpg) + +- In other news, I see `pdftocairo` from the poppler package produces a similar, though slightly prettier version of the thumbnail of that PDF: + +![pdftocairo thumbnail]('/cgspace-notes/2022/10/pdftocairo-10568-116598.pdf.jpg) + +- I used the command: + +```console +$ pdftocairo -jpeg -singlefile -f 1 -l 1 -scale-to-x 640 -scale-to-y -1 10568-116598.pdf thumb +``` + +- The Ghostscript developers responded in a few minutes (!) and explained that PDFs can contain many different "boxes": + +> PDF files can have multiple different 'Box' values; ArtBox, BleedBox, CropBox, MediaBox and TrimBox. The MediaBox is required the other boxes are optional, a given PDF page description must contain the MediaBox and may contain any or all of the others. +> +> By default Ghostscript uses the MediaBox to determine the size of the media. Other PDF consumers may exhibit other behaviours. +> +> The pages in your PDF file contain all of the Boxes. In the majority of cases the Boxes all contain the same values (which makes their inclusion pointless of course). But for page 1 they differ: +> +> /CropBox[594.375 0.0 1190.55 839.176] +> /MediaBox[0.0 0.0 1190.55 841.89] +> +> You can tell Ghostscript to use a different Box value for the media by using one of -dUseArtBox, -dUseBleedBox, -dUseCropBox, -dUseTrim,Box. If I specify -dUseCropBox then the file is rendered as you expect. + +- I confirm that adding `-define pdf:use-cropbox=true` to the ImageMagick command produces a better thumbnail in this case + - We can check the boxes in a PDF using `pdfinfo` from the poppler package: + +```console +$ pdfinfo -box data/10568-116598.pdf +Creator: Adobe InDesign 17.0 (Macintosh) +Producer: Adobe PDF Library 16.0.3 +CreationDate: Tue Dec 7 12:44:46 2021 EAT +ModDate: Tue Dec 7 15:37:58 2021 EAT +Custom Metadata: no +Metadata Stream: yes +Tagged: no +UserProperties: no +Suspects: no +Form: none +JavaScript: no +Pages: 17 +Encrypted: no +Page size: 596.175 x 839.176 pts +Page rot: 0 +MediaBox: 0.00 0.00 1190.55 841.89 +CropBox: 594.38 0.00 1190.55 839.18 +BleedBox: 594.38 0.00 1190.55 839.18 +TrimBox: 594.38 0.00 1190.55 839.18 +ArtBox: 594.38 0.00 1190.55 839.18 +File size: 572600 bytes +Optimized: no +PDF version: 1.6 +``` + +- In this case the MediaBox is a strange size, and we should use the CropBox + - I wonder if we can check that from DSpace... +- Apply some corrections from Peter on CGSpace +- Meeting with Leroy, Daniel, Francesca, and Maria from Alliance to review their TIP tool and talk about next steps + - We asked them to do some real submissions (as opposed to "I like coffee" etc) to test the full breadth of the metadata and controlled vocabularies +- Minor work on the CG Core Types spreadsheet to clear up some of the actions and incorporate some of Peter's feedback +- After looking at the request patterns in nginx on CGSpace for the past few weeks I see nothing that would explain the high loads we see several times per week (especially Sundays!) + - So I suspect there is a noisy neighbor, and actually I do see some non-trivial amount of CPU steal in my Munin graphs and `iostat` + - I asked Linode to move the instance elsewhere + diff --git a/docs/2022-10/index.html b/docs/2022-10/index.html index 13c1e138d..bb3e6867a 100644 --- a/docs/2022-10/index.html +++ b/docs/2022-10/index.html @@ -20,7 +20,7 @@ I filed an issue to ask about Java 11+ support - + @@ -46,9 +46,9 @@ I filed an issue to ask about Java 11+ support "@type": "BlogPosting", "headline": "October, 2022", "url": "https://alanorth.github.io/cgspace-notes/2022-10/", - "wordCount": "2595", + "wordCount": "3107", "datePublished": "2022-10-01T19:45:36+03:00", - "dateModified": "2022-10-17T15:58:02+03:00", + "dateModified": "2022-10-18T22:12:42+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -721,6 +721,85 @@ I filed an issue to ask about Java 11+ support +

2022-10-19

+ +

gs thumbnail

+ +

pdftocairo thumbnail

+ +
$ pdftocairo -jpeg -singlefile -f 1 -l 1 -scale-to-x 640 -scale-to-y -1 10568-116598.pdf thumb
+
+
+

PDF files can have multiple different ‘Box’ values; ArtBox, BleedBox, CropBox, MediaBox and TrimBox. The MediaBox is required the other boxes are optional, a given PDF page description must contain the MediaBox and may contain any or all of the others.

+

By default Ghostscript uses the MediaBox to determine the size of the media. Other PDF consumers may exhibit other behaviours.

+

The pages in your PDF file contain all of the Boxes. In the majority of cases the Boxes all contain the same values (which makes their inclusion pointless of course). But for page 1 they differ:

+

/CropBox[594.375 0.0 1190.55 839.176] +/MediaBox[0.0 0.0 1190.55 841.89]

+

You can tell Ghostscript to use a different Box value for the media by using one of -dUseArtBox, -dUseBleedBox, -dUseCropBox, -dUseTrim,Box. If I specify -dUseCropBox then the file is rendered as you expect.

+
+ +
$ pdfinfo -box data/10568-116598.pdf
+Creator:         Adobe InDesign 17.0 (Macintosh)
+Producer:        Adobe PDF Library 16.0.3
+CreationDate:    Tue Dec  7 12:44:46 2021 EAT
+ModDate:         Tue Dec  7 15:37:58 2021 EAT
+Custom Metadata: no
+Metadata Stream: yes
+Tagged:          no
+UserProperties:  no
+Suspects:        no
+Form:            none
+JavaScript:      no
+Pages:           17
+Encrypted:       no
+Page size:       596.175 x 839.176 pts
+Page rot:        0
+MediaBox:            0.00     0.00  1190.55   841.89
+CropBox:           594.38     0.00  1190.55   839.18
+BleedBox:          594.38     0.00  1190.55   839.18
+TrimBox:           594.38     0.00  1190.55   839.18
+ArtBox:            594.38     0.00  1190.55   839.18
+File size:       572600 bytes
+Optimized:       no
+PDF version:     1.6
+
diff --git a/docs/2022/10/gs-10568-116598.pdf.jpg b/docs/2022/10/gs-10568-116598.pdf.jpg new file mode 100644 index 000000000..b5e5402c3 Binary files /dev/null and b/docs/2022/10/gs-10568-116598.pdf.jpg differ diff --git a/docs/2022/10/pdftocairo-10568-116598.pdf.jpg b/docs/2022/10/pdftocairo-10568-116598.pdf.jpg new file mode 100644 index 000000000..70ab06642 Binary files /dev/null and b/docs/2022/10/pdftocairo-10568-116598.pdf.jpg differ diff --git a/docs/categories/index.html b/docs/categories/index.html index c6327dd5f..6e24970fd 100644 --- a/docs/categories/index.html +++ b/docs/categories/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html index 7bfce0ff2..44e23d386 100644 --- a/docs/categories/notes/index.html +++ b/docs/categories/notes/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html index 5aafb0331..6199d785a 100644 --- a/docs/categories/notes/page/2/index.html +++ b/docs/categories/notes/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html index e13fe7980..8dd31944a 100644 --- a/docs/categories/notes/page/3/index.html +++ b/docs/categories/notes/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html index 8e86c3d73..d9eb9f5b6 100644 --- a/docs/categories/notes/page/4/index.html +++ b/docs/categories/notes/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html index a80df9d9e..a57d6a589 100644 --- a/docs/categories/notes/page/5/index.html +++ b/docs/categories/notes/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/6/index.html b/docs/categories/notes/page/6/index.html index 47ce060f8..d665d9ac8 100644 --- a/docs/categories/notes/page/6/index.html +++ b/docs/categories/notes/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/7/index.html b/docs/categories/notes/page/7/index.html index bde39b343..6361bc740 100644 --- a/docs/categories/notes/page/7/index.html +++ b/docs/categories/notes/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/index.html b/docs/index.html index 53fb6e5c4..18dac51d4 100644 --- a/docs/index.html +++ b/docs/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/2/index.html b/docs/page/2/index.html index 23f17d808..5c2c18ff2 100644 --- a/docs/page/2/index.html +++ b/docs/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/3/index.html b/docs/page/3/index.html index 8e2aebcfd..39422034f 100644 --- a/docs/page/3/index.html +++ b/docs/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/4/index.html b/docs/page/4/index.html index 903a634b6..2f5149df3 100644 --- a/docs/page/4/index.html +++ b/docs/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/5/index.html b/docs/page/5/index.html index 68d245d2c..85ab4f3ba 100644 --- a/docs/page/5/index.html +++ b/docs/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/6/index.html b/docs/page/6/index.html index aee8e1f1d..df2a36c2e 100644 --- a/docs/page/6/index.html +++ b/docs/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/7/index.html b/docs/page/7/index.html index 7da062ab0..d12ea00eb 100644 --- a/docs/page/7/index.html +++ b/docs/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/8/index.html b/docs/page/8/index.html index 0a63eaf0b..3b760cd92 100644 --- a/docs/page/8/index.html +++ b/docs/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/9/index.html b/docs/page/9/index.html index b10770f4c..38b60b55c 100644 --- a/docs/page/9/index.html +++ b/docs/page/9/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/index.html b/docs/posts/index.html index 9d00497f3..1b587caa2 100644 --- a/docs/posts/index.html +++ b/docs/posts/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html index 10ccd9a94..0db98e95e 100644 --- a/docs/posts/page/2/index.html +++ b/docs/posts/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html index d9e5a804b..1b43395d3 100644 --- a/docs/posts/page/3/index.html +++ b/docs/posts/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html index a52725503..69e687a9d 100644 --- a/docs/posts/page/4/index.html +++ b/docs/posts/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html index 596f12d53..b24674334 100644 --- a/docs/posts/page/5/index.html +++ b/docs/posts/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html index 8a9109f19..6abe4aab8 100644 --- a/docs/posts/page/6/index.html +++ b/docs/posts/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html index 6f5cc7a94..a8772c931 100644 --- a/docs/posts/page/7/index.html +++ b/docs/posts/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/8/index.html b/docs/posts/page/8/index.html index aa94dc934..5e3be77f9 100644 --- a/docs/posts/page/8/index.html +++ b/docs/posts/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/9/index.html b/docs/posts/page/9/index.html index 7bb51d74c..537ad35db 100644 --- a/docs/posts/page/9/index.html +++ b/docs/posts/page/9/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 169d6d878..85b4bcfe8 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -3,19 +3,19 @@ xmlns:xhtml="http://www.w3.org/1999/xhtml"> https://alanorth.github.io/cgspace-notes/categories/ - 2022-10-17T15:58:02+03:00 + 2022-10-18T22:12:42+03:00 https://alanorth.github.io/cgspace-notes/ - 2022-10-17T15:58:02+03:00 + 2022-10-18T22:12:42+03:00 https://alanorth.github.io/cgspace-notes/categories/notes/ - 2022-10-17T15:58:02+03:00 + 2022-10-18T22:12:42+03:00 https://alanorth.github.io/cgspace-notes/2022-10/ - 2022-10-17T15:58:02+03:00 + 2022-10-18T22:12:42+03:00 https://alanorth.github.io/cgspace-notes/posts/ - 2022-10-17T15:58:02+03:00 + 2022-10-18T22:12:42+03:00 https://alanorth.github.io/cgspace-notes/2022-09/ 2022-09-30T17:29:50+03:00 diff --git a/static/2022/10/gs-10568-116598.pdf.jpg b/static/2022/10/gs-10568-116598.pdf.jpg new file mode 100644 index 000000000..b5e5402c3 Binary files /dev/null and b/static/2022/10/gs-10568-116598.pdf.jpg differ diff --git a/static/2022/10/pdftocairo-10568-116598.pdf.jpg b/static/2022/10/pdftocairo-10568-116598.pdf.jpg new file mode 100644 index 000000000..70ab06642 Binary files /dev/null and b/static/2022/10/pdftocairo-10568-116598.pdf.jpg differ