diff --git a/content/posts/2022-10.md b/content/posts/2022-10.md
index 2ae9e5010..efb3bb6b9 100644
--- a/content/posts/2022-10.md
+++ b/content/posts/2022-10.md
@@ -555,4 +555,74 @@ $ ./ilri/fix-metadata-values.py -i 2022-10-18-update-initiatives.csv -db dspace
- I created a new "TIP test" collection under Alliance's community and added the users accordingly
- I think I'll be able to just add these two submit/approve users to the Alliance Admins and Alliance Editors groups once we're ready
+## 2022-10-19
+
+- I submitted a [bug report for the two-page portrait layout of some PDF thumbnails](https://bugs.ghostscript.com/show_bug.cgi?id=705994) on Ghostscript's bug tracker
+ - For reference, the thumbnail for PDFs like in [10568/116598](https://hdl.handle.net/10568/116598) looks like this:
+
+![gs thumbnail](/cgspace-notes/2022/10/gs-10568-116598.pdf.jpg)
+
+- In other news, I see `pdftocairo` from the poppler package produces a similar, though slightly prettier version of the thumbnail of that PDF:
+
+![pdftocairo thumbnail]('/cgspace-notes/2022/10/pdftocairo-10568-116598.pdf.jpg)
+
+- I used the command:
+
+```console
+$ pdftocairo -jpeg -singlefile -f 1 -l 1 -scale-to-x 640 -scale-to-y -1 10568-116598.pdf thumb
+```
+
+- The Ghostscript developers responded in a few minutes (!) and explained that PDFs can contain many different "boxes":
+
+> PDF files can have multiple different 'Box' values; ArtBox, BleedBox, CropBox, MediaBox and TrimBox. The MediaBox is required the other boxes are optional, a given PDF page description must contain the MediaBox and may contain any or all of the others.
+>
+> By default Ghostscript uses the MediaBox to determine the size of the media. Other PDF consumers may exhibit other behaviours.
+>
+> The pages in your PDF file contain all of the Boxes. In the majority of cases the Boxes all contain the same values (which makes their inclusion pointless of course). But for page 1 they differ:
+>
+> /CropBox[594.375 0.0 1190.55 839.176]
+> /MediaBox[0.0 0.0 1190.55 841.89]
+>
+> You can tell Ghostscript to use a different Box value for the media by using one of -dUseArtBox, -dUseBleedBox, -dUseCropBox, -dUseTrim,Box. If I specify -dUseCropBox then the file is rendered as you expect.
+
+- I confirm that adding `-define pdf:use-cropbox=true` to the ImageMagick command produces a better thumbnail in this case
+ - We can check the boxes in a PDF using `pdfinfo` from the poppler package:
+
+```console
+$ pdfinfo -box data/10568-116598.pdf
+Creator: Adobe InDesign 17.0 (Macintosh)
+Producer: Adobe PDF Library 16.0.3
+CreationDate: Tue Dec 7 12:44:46 2021 EAT
+ModDate: Tue Dec 7 15:37:58 2021 EAT
+Custom Metadata: no
+Metadata Stream: yes
+Tagged: no
+UserProperties: no
+Suspects: no
+Form: none
+JavaScript: no
+Pages: 17
+Encrypted: no
+Page size: 596.175 x 839.176 pts
+Page rot: 0
+MediaBox: 0.00 0.00 1190.55 841.89
+CropBox: 594.38 0.00 1190.55 839.18
+BleedBox: 594.38 0.00 1190.55 839.18
+TrimBox: 594.38 0.00 1190.55 839.18
+ArtBox: 594.38 0.00 1190.55 839.18
+File size: 572600 bytes
+Optimized: no
+PDF version: 1.6
+```
+
+- In this case the MediaBox is a strange size, and we should use the CropBox
+ - I wonder if we can check that from DSpace...
+- Apply some corrections from Peter on CGSpace
+- Meeting with Leroy, Daniel, Francesca, and Maria from Alliance to review their TIP tool and talk about next steps
+ - We asked them to do some real submissions (as opposed to "I like coffee" etc) to test the full breadth of the metadata and controlled vocabularies
+- Minor work on the CG Core Types spreadsheet to clear up some of the actions and incorporate some of Peter's feedback
+- After looking at the request patterns in nginx on CGSpace for the past few weeks I see nothing that would explain the high loads we see several times per week (especially Sundays!)
+ - So I suspect there is a noisy neighbor, and actually I do see some non-trivial amount of CPU steal in my Munin graphs and `iostat`
+ - I asked Linode to move the instance elsewhere
+
diff --git a/docs/2022-10/index.html b/docs/2022-10/index.html
index 13c1e138d..bb3e6867a 100644
--- a/docs/2022-10/index.html
+++ b/docs/2022-10/index.html
@@ -20,7 +20,7 @@ I filed an issue to ask about Java 11+ support
-
+
@@ -46,9 +46,9 @@ I filed an issue to ask about Java 11+ support
"@type": "BlogPosting",
"headline": "October, 2022",
"url": "https://alanorth.github.io/cgspace-notes/2022-10/",
- "wordCount": "2595",
+ "wordCount": "3107",
"datePublished": "2022-10-01T19:45:36+03:00",
- "dateModified": "2022-10-17T15:58:02+03:00",
+ "dateModified": "2022-10-18T22:12:42+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@@ -721,6 +721,85 @@ I filed an issue to ask about Java 11+ support
+
2022-10-19
+
+
+
+- In other news, I see
pdftocairo
from the poppler package produces a similar, though slightly prettier version of the thumbnail of that PDF:
+
+
+
+$ pdftocairo -jpeg -singlefile -f 1 -l 1 -scale-to-x 640 -scale-to-y -1 10568-116598.pdf thumb
+
+- The Ghostscript developers responded in a few minutes (!) and explained that PDFs can contain many different “boxes”:
+
+
+PDF files can have multiple different ‘Box’ values; ArtBox, BleedBox, CropBox, MediaBox and TrimBox. The MediaBox is required the other boxes are optional, a given PDF page description must contain the MediaBox and may contain any or all of the others.
+By default Ghostscript uses the MediaBox to determine the size of the media. Other PDF consumers may exhibit other behaviours.
+The pages in your PDF file contain all of the Boxes. In the majority of cases the Boxes all contain the same values (which makes their inclusion pointless of course). But for page 1 they differ:
+/CropBox[594.375 0.0 1190.55 839.176]
+/MediaBox[0.0 0.0 1190.55 841.89]
+You can tell Ghostscript to use a different Box value for the media by using one of -dUseArtBox, -dUseBleedBox, -dUseCropBox, -dUseTrim,Box. If I specify -dUseCropBox then the file is rendered as you expect.
+
+
+- I confirm that adding
-define pdf:use-cropbox=true
to the ImageMagick command produces a better thumbnail in this case
+
+- We can check the boxes in a PDF using
pdfinfo
from the poppler package:
+
+
+
+$ pdfinfo -box data/10568-116598.pdf
+Creator: Adobe InDesign 17.0 (Macintosh)
+Producer: Adobe PDF Library 16.0.3
+CreationDate: Tue Dec 7 12:44:46 2021 EAT
+ModDate: Tue Dec 7 15:37:58 2021 EAT
+Custom Metadata: no
+Metadata Stream: yes
+Tagged: no
+UserProperties: no
+Suspects: no
+Form: none
+JavaScript: no
+Pages: 17
+Encrypted: no
+Page size: 596.175 x 839.176 pts
+Page rot: 0
+MediaBox: 0.00 0.00 1190.55 841.89
+CropBox: 594.38 0.00 1190.55 839.18
+BleedBox: 594.38 0.00 1190.55 839.18
+TrimBox: 594.38 0.00 1190.55 839.18
+ArtBox: 594.38 0.00 1190.55 839.18
+File size: 572600 bytes
+Optimized: no
+PDF version: 1.6
+
+- In this case the MediaBox is a strange size, and we should use the CropBox
+
+- I wonder if we can check that from DSpace…
+
+
+- Apply some corrections from Peter on CGSpace
+- Meeting with Leroy, Daniel, Francesca, and Maria from Alliance to review their TIP tool and talk about next steps
+
+- We asked them to do some real submissions (as opposed to “I like coffee” etc) to test the full breadth of the metadata and controlled vocabularies
+
+
+- Minor work on the CG Core Types spreadsheet to clear up some of the actions and incorporate some of Peter’s feedback
+- After looking at the request patterns in nginx on CGSpace for the past few weeks I see nothing that would explain the high loads we see several times per week (especially Sundays!)
+
+- So I suspect there is a noisy neighbor, and actually I do see some non-trivial amount of CPU steal in my Munin graphs and
iostat
+- I asked Linode to move the instance elsewhere
+
+
+
diff --git a/docs/2022/10/gs-10568-116598.pdf.jpg b/docs/2022/10/gs-10568-116598.pdf.jpg
new file mode 100644
index 000000000..b5e5402c3
Binary files /dev/null and b/docs/2022/10/gs-10568-116598.pdf.jpg differ
diff --git a/docs/2022/10/pdftocairo-10568-116598.pdf.jpg b/docs/2022/10/pdftocairo-10568-116598.pdf.jpg
new file mode 100644
index 000000000..70ab06642
Binary files /dev/null and b/docs/2022/10/pdftocairo-10568-116598.pdf.jpg differ
diff --git a/docs/categories/index.html b/docs/categories/index.html
index c6327dd5f..6e24970fd 100644
--- a/docs/categories/index.html
+++ b/docs/categories/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html
index 7bfce0ff2..44e23d386 100644
--- a/docs/categories/notes/index.html
+++ b/docs/categories/notes/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html
index 5aafb0331..6199d785a 100644
--- a/docs/categories/notes/page/2/index.html
+++ b/docs/categories/notes/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html
index e13fe7980..8dd31944a 100644
--- a/docs/categories/notes/page/3/index.html
+++ b/docs/categories/notes/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html
index 8e86c3d73..d9eb9f5b6 100644
--- a/docs/categories/notes/page/4/index.html
+++ b/docs/categories/notes/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html
index a80df9d9e..a57d6a589 100644
--- a/docs/categories/notes/page/5/index.html
+++ b/docs/categories/notes/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/6/index.html b/docs/categories/notes/page/6/index.html
index 47ce060f8..d665d9ac8 100644
--- a/docs/categories/notes/page/6/index.html
+++ b/docs/categories/notes/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/7/index.html b/docs/categories/notes/page/7/index.html
index bde39b343..6361bc740 100644
--- a/docs/categories/notes/page/7/index.html
+++ b/docs/categories/notes/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/index.html b/docs/index.html
index 53fb6e5c4..18dac51d4 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/2/index.html b/docs/page/2/index.html
index 23f17d808..5c2c18ff2 100644
--- a/docs/page/2/index.html
+++ b/docs/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/3/index.html b/docs/page/3/index.html
index 8e2aebcfd..39422034f 100644
--- a/docs/page/3/index.html
+++ b/docs/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/4/index.html b/docs/page/4/index.html
index 903a634b6..2f5149df3 100644
--- a/docs/page/4/index.html
+++ b/docs/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/5/index.html b/docs/page/5/index.html
index 68d245d2c..85ab4f3ba 100644
--- a/docs/page/5/index.html
+++ b/docs/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/6/index.html b/docs/page/6/index.html
index aee8e1f1d..df2a36c2e 100644
--- a/docs/page/6/index.html
+++ b/docs/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/7/index.html b/docs/page/7/index.html
index 7da062ab0..d12ea00eb 100644
--- a/docs/page/7/index.html
+++ b/docs/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/8/index.html b/docs/page/8/index.html
index 0a63eaf0b..3b760cd92 100644
--- a/docs/page/8/index.html
+++ b/docs/page/8/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/9/index.html b/docs/page/9/index.html
index b10770f4c..38b60b55c 100644
--- a/docs/page/9/index.html
+++ b/docs/page/9/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/index.html b/docs/posts/index.html
index 9d00497f3..1b587caa2 100644
--- a/docs/posts/index.html
+++ b/docs/posts/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html
index 10ccd9a94..0db98e95e 100644
--- a/docs/posts/page/2/index.html
+++ b/docs/posts/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html
index d9e5a804b..1b43395d3 100644
--- a/docs/posts/page/3/index.html
+++ b/docs/posts/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html
index a52725503..69e687a9d 100644
--- a/docs/posts/page/4/index.html
+++ b/docs/posts/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html
index 596f12d53..b24674334 100644
--- a/docs/posts/page/5/index.html
+++ b/docs/posts/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html
index 8a9109f19..6abe4aab8 100644
--- a/docs/posts/page/6/index.html
+++ b/docs/posts/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html
index 6f5cc7a94..a8772c931 100644
--- a/docs/posts/page/7/index.html
+++ b/docs/posts/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/8/index.html b/docs/posts/page/8/index.html
index aa94dc934..5e3be77f9 100644
--- a/docs/posts/page/8/index.html
+++ b/docs/posts/page/8/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/9/index.html b/docs/posts/page/9/index.html
index 7bb51d74c..537ad35db 100644
--- a/docs/posts/page/9/index.html
+++ b/docs/posts/page/9/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index 169d6d878..85b4bcfe8 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -3,19 +3,19 @@
xmlns:xhtml="http://www.w3.org/1999/xhtml">
https://alanorth.github.io/cgspace-notes/categories/
- 2022-10-17T15:58:02+03:00
+ 2022-10-18T22:12:42+03:00
https://alanorth.github.io/cgspace-notes/
- 2022-10-17T15:58:02+03:00
+ 2022-10-18T22:12:42+03:00
https://alanorth.github.io/cgspace-notes/categories/notes/
- 2022-10-17T15:58:02+03:00
+ 2022-10-18T22:12:42+03:00
https://alanorth.github.io/cgspace-notes/2022-10/
- 2022-10-17T15:58:02+03:00
+ 2022-10-18T22:12:42+03:00
https://alanorth.github.io/cgspace-notes/posts/
- 2022-10-17T15:58:02+03:00
+ 2022-10-18T22:12:42+03:00
https://alanorth.github.io/cgspace-notes/2022-09/
2022-09-30T17:29:50+03:00
diff --git a/static/2022/10/gs-10568-116598.pdf.jpg b/static/2022/10/gs-10568-116598.pdf.jpg
new file mode 100644
index 000000000..b5e5402c3
Binary files /dev/null and b/static/2022/10/gs-10568-116598.pdf.jpg differ
diff --git a/static/2022/10/pdftocairo-10568-116598.pdf.jpg b/static/2022/10/pdftocairo-10568-116598.pdf.jpg
new file mode 100644
index 000000000..70ab06642
Binary files /dev/null and b/static/2022/10/pdftocairo-10568-116598.pdf.jpg differ