Add notes for 2022-10-19

This commit is contained in:
Alan Orth 2022-10-19 21:32:01 +03:00
parent 7713ecefa8
commit 46a9178bdb
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
33 changed files with 183 additions and 34 deletions

View File

@ -555,4 +555,74 @@ $ ./ilri/fix-metadata-values.py -i 2022-10-18-update-initiatives.csv -db dspace
- I created a new "TIP test" collection under Alliance's community and added the users accordingly
- I think I'll be able to just add these two submit/approve users to the Alliance Admins and Alliance Editors groups once we're ready
## 2022-10-19
- I submitted a [bug report for the two-page portrait layout of some PDF thumbnails](https://bugs.ghostscript.com/show_bug.cgi?id=705994) on Ghostscript's bug tracker
- For reference, the thumbnail for PDFs like in [10568/116598](https://hdl.handle.net/10568/116598) looks like this:
![gs thumbnail](/cgspace-notes/2022/10/gs-10568-116598.pdf.jpg)
- In other news, I see `pdftocairo` from the poppler package produces a similar, though slightly prettier version of the thumbnail of that PDF:
![pdftocairo thumbnail]('/cgspace-notes/2022/10/pdftocairo-10568-116598.pdf.jpg)
- I used the command:
```console
$ pdftocairo -jpeg -singlefile -f 1 -l 1 -scale-to-x 640 -scale-to-y -1 10568-116598.pdf thumb
```
- The Ghostscript developers responded in a few minutes (!) and explained that PDFs can contain many different "boxes":
> PDF files can have multiple different 'Box' values; ArtBox, BleedBox, CropBox, MediaBox and TrimBox. The MediaBox is required the other boxes are optional, a given PDF page description must contain the MediaBox and may contain any or all of the others.
>
> By default Ghostscript uses the MediaBox to determine the size of the media. Other PDF consumers may exhibit other behaviours.
>
> The pages in your PDF file contain all of the Boxes. In the majority of cases the Boxes all contain the same values (which makes their inclusion pointless of course). But for page 1 they differ:
>
> /CropBox[594.375 0.0 1190.55 839.176]
> /MediaBox[0.0 0.0 1190.55 841.89]
>
> You can tell Ghostscript to use a different Box value for the media by using one of -dUseArtBox, -dUseBleedBox, -dUseCropBox, -dUseTrim,Box. If I specify -dUseCropBox then the file is rendered as you expect.
- I confirm that adding `-define pdf:use-cropbox=true` to the ImageMagick command produces a better thumbnail in this case
- We can check the boxes in a PDF using `pdfinfo` from the poppler package:
```console
$ pdfinfo -box data/10568-116598.pdf
Creator: Adobe InDesign 17.0 (Macintosh)
Producer: Adobe PDF Library 16.0.3
CreationDate: Tue Dec 7 12:44:46 2021 EAT
ModDate: Tue Dec 7 15:37:58 2021 EAT
Custom Metadata: no
Metadata Stream: yes
Tagged: no
UserProperties: no
Suspects: no
Form: none
JavaScript: no
Pages: 17
Encrypted: no
Page size: 596.175 x 839.176 pts
Page rot: 0
MediaBox: 0.00 0.00 1190.55 841.89
CropBox: 594.38 0.00 1190.55 839.18
BleedBox: 594.38 0.00 1190.55 839.18
TrimBox: 594.38 0.00 1190.55 839.18
ArtBox: 594.38 0.00 1190.55 839.18
File size: 572600 bytes
Optimized: no
PDF version: 1.6
```
- In this case the MediaBox is a strange size, and we should use the CropBox
- I wonder if we can check that from DSpace...
- Apply some corrections from Peter on CGSpace
- Meeting with Leroy, Daniel, Francesca, and Maria from Alliance to review their TIP tool and talk about next steps
- We asked them to do some real submissions (as opposed to "I like coffee" etc) to test the full breadth of the metadata and controlled vocabularies
- Minor work on the CG Core Types spreadsheet to clear up some of the actions and incorporate some of Peter's feedback
- After looking at the request patterns in nginx on CGSpace for the past few weeks I see nothing that would explain the high loads we see several times per week (especially Sundays!)
- So I suspect there is a noisy neighbor, and actually I do see some non-trivial amount of CPU steal in my Munin graphs and `iostat`
- I asked Linode to move the instance elsewhere
<!-- vim: set sw=2 ts=2: -->

View File

@ -20,7 +20,7 @@ I filed an issue to ask about Java 11&#43; support
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2022-10/" />
<meta property="article:published_time" content="2022-10-01T19:45:36+03:00" />
<meta property="article:modified_time" content="2022-10-17T15:58:02+03:00" />
<meta property="article:modified_time" content="2022-10-18T22:12:42+03:00" />
@ -46,9 +46,9 @@ I filed an issue to ask about Java 11&#43; support
"@type": "BlogPosting",
"headline": "October, 2022",
"url": "https://alanorth.github.io/cgspace-notes/2022-10/",
"wordCount": "2595",
"wordCount": "3107",
"datePublished": "2022-10-01T19:45:36+03:00",
"dateModified": "2022-10-17T15:58:02+03:00",
"dateModified": "2022-10-18T22:12:42+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -721,6 +721,85 @@ I filed an issue to ask about Java 11&#43; support
</ul>
</li>
</ul>
<h2 id="2022-10-19">2022-10-19</h2>
<ul>
<li>I submitted a <a href="https://bugs.ghostscript.com/show_bug.cgi?id=705994">bug report for the two-page portrait layout of some PDF thumbnails</a> on Ghostscript&rsquo;s bug tracker
<ul>
<li>For reference, the thumbnail for PDFs like in <a href="https://hdl.handle.net/10568/116598">10568/116598</a> looks like this:</li>
</ul>
</li>
</ul>
<p><img src="/cgspace-notes/2022/10/gs-10568-116598.pdf.jpg" alt="gs thumbnail"></p>
<ul>
<li>In other news, I see <code>pdftocairo</code> from the poppler package produces a similar, though slightly prettier version of the thumbnail of that PDF:</li>
</ul>
<p><img src="'/cgspace-notes/2022/10/pdftocairo-10568-116598.pdf.jpg" alt="pdftocairo thumbnail"></p>
<ul>
<li>I used the command:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ pdftocairo -jpeg -singlefile -f <span style="color:#ae81ff">1</span> -l <span style="color:#ae81ff">1</span> -scale-to-x <span style="color:#ae81ff">640</span> -scale-to-y -1 10568-116598.pdf thumb
</span></span></code></pre></div><ul>
<li>The Ghostscript developers responded in a few minutes (!) and explained that PDFs can contain many different &ldquo;boxes&rdquo;:</li>
</ul>
<blockquote>
<p>PDF files can have multiple different &lsquo;Box&rsquo; values; ArtBox, BleedBox, CropBox, MediaBox and TrimBox. The MediaBox is required the other boxes are optional, a given PDF page description must contain the MediaBox and may contain any or all of the others.</p>
<p>By default Ghostscript uses the MediaBox to determine the size of the media. Other PDF consumers may exhibit other behaviours.</p>
<p>The pages in your PDF file contain all of the Boxes. In the majority of cases the Boxes all contain the same values (which makes their inclusion pointless of course). But for page 1 they differ:</p>
<p>/CropBox[594.375 0.0 1190.55 839.176]
/MediaBox[0.0 0.0 1190.55 841.89]</p>
<p>You can tell Ghostscript to use a different Box value for the media by using one of -dUseArtBox, -dUseBleedBox, -dUseCropBox, -dUseTrim,Box. If I specify -dUseCropBox then the file is rendered as you expect.</p>
</blockquote>
<ul>
<li>I confirm that adding <code>-define pdf:use-cropbox=true</code> to the ImageMagick command produces a better thumbnail in this case
<ul>
<li>We can check the boxes in a PDF using <code>pdfinfo</code> from the poppler package:</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ pdfinfo -box data/10568-116598.pdf
</span></span><span style="display:flex;"><span>Creator: Adobe InDesign 17.0 (Macintosh)
</span></span><span style="display:flex;"><span>Producer: Adobe PDF Library 16.0.3
</span></span><span style="display:flex;"><span>CreationDate: Tue Dec 7 12:44:46 2021 EAT
</span></span><span style="display:flex;"><span>ModDate: Tue Dec 7 15:37:58 2021 EAT
</span></span><span style="display:flex;"><span>Custom Metadata: no
</span></span><span style="display:flex;"><span>Metadata Stream: yes
</span></span><span style="display:flex;"><span>Tagged: no
</span></span><span style="display:flex;"><span>UserProperties: no
</span></span><span style="display:flex;"><span>Suspects: no
</span></span><span style="display:flex;"><span>Form: none
</span></span><span style="display:flex;"><span>JavaScript: no
</span></span><span style="display:flex;"><span>Pages: 17
</span></span><span style="display:flex;"><span>Encrypted: no
</span></span><span style="display:flex;"><span>Page size: 596.175 x 839.176 pts
</span></span><span style="display:flex;"><span>Page rot: 0
</span></span><span style="display:flex;"><span>MediaBox: 0.00 0.00 1190.55 841.89
</span></span><span style="display:flex;"><span>CropBox: 594.38 0.00 1190.55 839.18
</span></span><span style="display:flex;"><span>BleedBox: 594.38 0.00 1190.55 839.18
</span></span><span style="display:flex;"><span>TrimBox: 594.38 0.00 1190.55 839.18
</span></span><span style="display:flex;"><span>ArtBox: 594.38 0.00 1190.55 839.18
</span></span><span style="display:flex;"><span>File size: 572600 bytes
</span></span><span style="display:flex;"><span>Optimized: no
</span></span><span style="display:flex;"><span>PDF version: 1.6
</span></span></code></pre></div><ul>
<li>In this case the MediaBox is a strange size, and we should use the CropBox
<ul>
<li>I wonder if we can check that from DSpace&hellip;</li>
</ul>
</li>
<li>Apply some corrections from Peter on CGSpace</li>
<li>Meeting with Leroy, Daniel, Francesca, and Maria from Alliance to review their TIP tool and talk about next steps
<ul>
<li>We asked them to do some real submissions (as opposed to &ldquo;I like coffee&rdquo; etc) to test the full breadth of the metadata and controlled vocabularies</li>
</ul>
</li>
<li>Minor work on the CG Core Types spreadsheet to clear up some of the actions and incorporate some of Peter&rsquo;s feedback</li>
<li>After looking at the request patterns in nginx on CGSpace for the past few weeks I see nothing that would explain the high loads we see several times per week (especially Sundays!)
<ul>
<li>So I suspect there is a noisy neighbor, and actually I do see some non-trivial amount of CPU steal in my Munin graphs and <code>iostat</code></li>
<li>I asked Linode to move the instance elsewhere</li>
</ul>
</li>
</ul>
<!-- raw HTML omitted -->

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
<meta property="og:updated_time" content="2022-10-17T15:58:02+03:00" />
<meta property="og:updated_time" content="2022-10-18T22:12:42+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-10-17T15:58:02+03:00" />
<meta property="og:updated_time" content="2022-10-18T22:12:42+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-10-17T15:58:02+03:00" />
<meta property="og:updated_time" content="2022-10-18T22:12:42+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-10-17T15:58:02+03:00" />
<meta property="og:updated_time" content="2022-10-18T22:12:42+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-10-17T15:58:02+03:00" />
<meta property="og:updated_time" content="2022-10-18T22:12:42+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-10-17T15:58:02+03:00" />
<meta property="og:updated_time" content="2022-10-18T22:12:42+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-10-17T15:58:02+03:00" />
<meta property="og:updated_time" content="2022-10-18T22:12:42+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-10-17T15:58:02+03:00" />
<meta property="og:updated_time" content="2022-10-18T22:12:42+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-10-17T15:58:02+03:00" />
<meta property="og:updated_time" content="2022-10-18T22:12:42+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-10-17T15:58:02+03:00" />
<meta property="og:updated_time" content="2022-10-18T22:12:42+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-10-17T15:58:02+03:00" />
<meta property="og:updated_time" content="2022-10-18T22:12:42+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-10-17T15:58:02+03:00" />
<meta property="og:updated_time" content="2022-10-18T22:12:42+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-10-17T15:58:02+03:00" />
<meta property="og:updated_time" content="2022-10-18T22:12:42+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-10-17T15:58:02+03:00" />
<meta property="og:updated_time" content="2022-10-18T22:12:42+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-10-17T15:58:02+03:00" />
<meta property="og:updated_time" content="2022-10-18T22:12:42+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-10-17T15:58:02+03:00" />
<meta property="og:updated_time" content="2022-10-18T22:12:42+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-10-17T15:58:02+03:00" />
<meta property="og:updated_time" content="2022-10-18T22:12:42+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-10-17T15:58:02+03:00" />
<meta property="og:updated_time" content="2022-10-18T22:12:42+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-10-17T15:58:02+03:00" />
<meta property="og:updated_time" content="2022-10-18T22:12:42+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-10-17T15:58:02+03:00" />
<meta property="og:updated_time" content="2022-10-18T22:12:42+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-10-17T15:58:02+03:00" />
<meta property="og:updated_time" content="2022-10-18T22:12:42+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-10-17T15:58:02+03:00" />
<meta property="og:updated_time" content="2022-10-18T22:12:42+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-10-17T15:58:02+03:00" />
<meta property="og:updated_time" content="2022-10-18T22:12:42+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-10-17T15:58:02+03:00" />
<meta property="og:updated_time" content="2022-10-18T22:12:42+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-10-17T15:58:02+03:00" />
<meta property="og:updated_time" content="2022-10-18T22:12:42+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-10-17T15:58:02+03:00" />
<meta property="og:updated_time" content="2022-10-18T22:12:42+03:00" />

View File

@ -3,19 +3,19 @@
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<url>
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
<lastmod>2022-10-17T15:58:02+03:00</lastmod>
<lastmod>2022-10-18T22:12:42+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2022-10-17T15:58:02+03:00</lastmod>
<lastmod>2022-10-18T22:12:42+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
<lastmod>2022-10-17T15:58:02+03:00</lastmod>
<lastmod>2022-10-18T22:12:42+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/2022-10/</loc>
<lastmod>2022-10-17T15:58:02+03:00</lastmod>
<lastmod>2022-10-18T22:12:42+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2022-10-17T15:58:02+03:00</lastmod>
<lastmod>2022-10-18T22:12:42+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/2022-09/</loc>
<lastmod>2022-09-30T17:29:50+03:00</lastmod>

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB