Add notes for 2022-10-28

This commit is contained in:
Alan Orth 2022-10-28 13:17:35 +03:00
parent 189f33e1ce
commit 3633377854
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
29 changed files with 205 additions and 34 deletions

View File

@ -672,4 +672,89 @@ $ pngquant /tmp/10568-125167.pdf.png
- Spent some time looking at the MediaBox / CropBox thing in DSpace's `ImageMagickThumbnailFilter.java`
- We need to make sure to put `-define pdf:use-cropbox=true` before we specify the input file or else it will not have any effect
## 2022-10-27
- I found out that we can use [pdfcpu to remove the CropBox from a PDF](https://pdfcpu.io/boxes/boxes_remove.html#examples) for testing:
```console
$ pdfcpu box rem -- "crop" in.pdf out.pdf
```
- I filed [an issue on DSpace](https://github.com/DSpace/DSpace/issues/8549) for the ImageMagick `CropBox` problem
- I decided that this is a bug that should be fixed separately from the "improving thumbnail quality" issue
- I made [a pull request](https://github.com/DSpace/DSpace/pull/8550) to fix the `CropBox` issue
- I did more work on my [improved-dspace-thumbnails](https://github.com/alanorth/improved-dspace-thumbnails/) microsite to complement the DSpace thumbnail pull requests
- I am updating it to recommend using the PDF cropbox and "supersampling" with a higher density than 72
- I measured execution time of ImageMagick with `time` and found that the higher-density mode takes about five times longer on average
- I measured the [maximum heap memory of ImageMagick with Valgrind and Massif](https://stackoverflow.com/a/131346):
```console
$ valgrind --tool=massif magick convert ...
```
- Then I checked the results for each set of default DSpace thumbnail runs and "improved" thumbnail runs using `ms_print` (hacky way to get the max heap, I know):
```console
$ for file in memory-dspace/massif.out.49*; do ms_print "$file" | grep -A1 " MB" | tail -n1 | sed 's/\^.*//'; done
15.87
16.06
21.26
15.88
20.01
15.85
20.06
16.04
15.87
15.87
20.02
15.87
15.86
19.92
10.89
$ for file in memory-improved/massif.out.5*; do ms_print "$file" | grep -A1 " MB" | tail -n1 | sed 's/\^.*//'; done
245.3
245.5
298.6
245.3
306.8
245.2
306.9
245.5
245.2
245.3
306.8
245.3
244.9
306.3
165.6
```
- Ouch, this shows that it takes about *fifteen times* more memory to do the "4x" density of 288!
- It seems more reasonable to use a "2x" density of 144:
```console
$ for file in memory-improved-144/*; do ms_print "$file" | grep -A1 " MB" | tail -n1 | sed 's/\^.*//'; done
61.80
62.00
76.76
61.82
77.43
61.77
77.48
61.98
61.76
61.81
77.44
61.81
61.69
77.16
41.84
```
- There's a really cool visualizer called massif-visualizer, but it isn't easy to parse
## 2022-10-28
- I finalized the code for the ImageMagick density change and made a [pull request](https://github.com/DSpace/DSpace/pull/8553) against DSpace 7.x
<!-- vim: set sw=2 ts=2: -->

View File

@ -20,7 +20,7 @@ I filed an issue to ask about Java 11&#43; support
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2022-10/" />
<meta property="article:published_time" content="2022-10-01T19:45:36+03:00" />
<meta property="article:modified_time" content="2022-10-26T09:15:29+03:00" />
<meta property="article:modified_time" content="2022-10-26T17:50:40+03:00" />
@ -46,9 +46,9 @@ I filed an issue to ask about Java 11&#43; support
"@type": "BlogPosting",
"headline": "October, 2022",
"url": "https://alanorth.github.io/cgspace-notes/2022-10/",
"wordCount": "3320",
"wordCount": "3650",
"datePublished": "2022-10-01T19:45:36+03:00",
"dateModified": "2022-10-26T09:15:29+03:00",
"dateModified": "2022-10-26T17:50:40+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -849,6 +849,92 @@ I filed an issue to ask about Java 11&#43; support
</ul>
</li>
</ul>
<h2 id="2022-10-27">2022-10-27</h2>
<ul>
<li>I found out that we can use <a href="https://pdfcpu.io/boxes/boxes_remove.html#examples">pdfcpu to remove the CropBox from a PDF</a> for testing:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ pdfcpu box rem -- <span style="color:#e6db74">&#34;crop&#34;</span> in.pdf out.pdf
</span></span></code></pre></div><ul>
<li>I filed <a href="https://github.com/DSpace/DSpace/issues/8549">an issue on DSpace</a> for the ImageMagick <code>CropBox</code> problem
<ul>
<li>I decided that this is a bug that should be fixed separately from the &ldquo;improving thumbnail quality&rdquo; issue</li>
<li>I made <a href="https://github.com/DSpace/DSpace/pull/8550">a pull request</a> to fix the <code>CropBox</code> issue</li>
</ul>
</li>
<li>I did more work on my <a href="https://github.com/alanorth/improved-dspace-thumbnails/">improved-dspace-thumbnails</a> microsite to complement the DSpace thumbnail pull requests
<ul>
<li>I am updating it to recommend using the PDF cropbox and &ldquo;supersampling&rdquo; with a higher density than 72</li>
<li>I measured execution time of ImageMagick with <code>time</code> and found that the higher-density mode takes about five times longer on average</li>
<li>I measured the <a href="https://stackoverflow.com/a/131346">maximum heap memory of ImageMagick with Valgrind and Massif</a>:</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ valgrind --tool<span style="color:#f92672">=</span>massif magick convert ...
</span></span></code></pre></div><ul>
<li>Then I checked the results for each set of default DSpace thumbnail runs and &ldquo;improved&rdquo; thumbnail runs using <code>ms_print</code> (hacky way to get the max heap, I know):</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ <span style="color:#66d9ef">for</span> file in memory-dspace/massif.out.49*; <span style="color:#66d9ef">do</span> ms_print <span style="color:#e6db74">&#34;</span>$file<span style="color:#e6db74">&#34;</span> | grep -A1 <span style="color:#e6db74">&#34; MB&#34;</span> | tail -n1 | sed <span style="color:#e6db74">&#39;s/\^.*//&#39;</span>; <span style="color:#66d9ef">done</span>
</span></span><span style="display:flex;"><span>15.87
</span></span><span style="display:flex;"><span>16.06
</span></span><span style="display:flex;"><span>21.26
</span></span><span style="display:flex;"><span>15.88
</span></span><span style="display:flex;"><span>20.01
</span></span><span style="display:flex;"><span>15.85
</span></span><span style="display:flex;"><span>20.06
</span></span><span style="display:flex;"><span>16.04
</span></span><span style="display:flex;"><span>15.87
</span></span><span style="display:flex;"><span>15.87
</span></span><span style="display:flex;"><span>20.02
</span></span><span style="display:flex;"><span>15.87
</span></span><span style="display:flex;"><span>15.86
</span></span><span style="display:flex;"><span>19.92
</span></span><span style="display:flex;"><span>10.89
</span></span><span style="display:flex;"><span>$ <span style="color:#66d9ef">for</span> file in memory-improved/massif.out.5*; <span style="color:#66d9ef">do</span> ms_print <span style="color:#e6db74">&#34;</span>$file<span style="color:#e6db74">&#34;</span> | grep -A1 <span style="color:#e6db74">&#34; MB&#34;</span> | tail -n1 | sed <span style="color:#e6db74">&#39;s/\^.*//&#39;</span>; <span style="color:#66d9ef">done</span>
</span></span><span style="display:flex;"><span>245.3
</span></span><span style="display:flex;"><span>245.5
</span></span><span style="display:flex;"><span>298.6
</span></span><span style="display:flex;"><span>245.3
</span></span><span style="display:flex;"><span>306.8
</span></span><span style="display:flex;"><span>245.2
</span></span><span style="display:flex;"><span>306.9
</span></span><span style="display:flex;"><span>245.5
</span></span><span style="display:flex;"><span>245.2
</span></span><span style="display:flex;"><span>245.3
</span></span><span style="display:flex;"><span>306.8
</span></span><span style="display:flex;"><span>245.3
</span></span><span style="display:flex;"><span>244.9
</span></span><span style="display:flex;"><span>306.3
</span></span><span style="display:flex;"><span>165.6
</span></span></code></pre></div><ul>
<li>Ouch, this shows that it takes about <em>fifteen times</em> more memory to do the &ldquo;4x&rdquo; density of 288!
<ul>
<li>It seems more reasonable to use a &ldquo;2x&rdquo; density of 144:</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ <span style="color:#66d9ef">for</span> file in memory-improved-144/*; <span style="color:#66d9ef">do</span> ms_print <span style="color:#e6db74">&#34;</span>$file<span style="color:#e6db74">&#34;</span> | grep -A1 <span style="color:#e6db74">&#34; MB&#34;</span> | tail -n1 | sed <span style="color:#e6db74">&#39;s/\^.*//&#39;</span>; <span style="color:#66d9ef">done</span>
</span></span><span style="display:flex;"><span>61.80
</span></span><span style="display:flex;"><span>62.00
</span></span><span style="display:flex;"><span>76.76
</span></span><span style="display:flex;"><span>61.82
</span></span><span style="display:flex;"><span>77.43
</span></span><span style="display:flex;"><span>61.77
</span></span><span style="display:flex;"><span>77.48
</span></span><span style="display:flex;"><span>61.98
</span></span><span style="display:flex;"><span>61.76
</span></span><span style="display:flex;"><span>61.81
</span></span><span style="display:flex;"><span>77.44
</span></span><span style="display:flex;"><span>61.81
</span></span><span style="display:flex;"><span>61.69
</span></span><span style="display:flex;"><span>77.16
</span></span><span style="display:flex;"><span>41.84
</span></span></code></pre></div><ul>
<li>There&rsquo;s a really cool visualizer called massif-visualizer, but it isn&rsquo;t easy to parse</li>
</ul>
<h2 id="2022-10-28">2022-10-28</h2>
<ul>
<li>I finalized the code for the ImageMagick density change and made a <a href="https://github.com/DSpace/DSpace/pull/8553">pull request</a> against DSpace 7.x</li>
</ul>
<!-- raw HTML omitted -->

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
<meta property="og:updated_time" content="2022-10-26T09:15:29+03:00" />
<meta property="og:updated_time" content="2022-10-26T17:50:40+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-10-26T09:15:29+03:00" />
<meta property="og:updated_time" content="2022-10-26T17:50:40+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-10-26T09:15:29+03:00" />
<meta property="og:updated_time" content="2022-10-26T17:50:40+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-10-26T09:15:29+03:00" />
<meta property="og:updated_time" content="2022-10-26T17:50:40+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-10-26T09:15:29+03:00" />
<meta property="og:updated_time" content="2022-10-26T17:50:40+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-10-26T09:15:29+03:00" />
<meta property="og:updated_time" content="2022-10-26T17:50:40+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-10-26T09:15:29+03:00" />
<meta property="og:updated_time" content="2022-10-26T17:50:40+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-10-26T09:15:29+03:00" />
<meta property="og:updated_time" content="2022-10-26T17:50:40+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-10-26T09:15:29+03:00" />
<meta property="og:updated_time" content="2022-10-26T17:50:40+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-10-26T09:15:29+03:00" />
<meta property="og:updated_time" content="2022-10-26T17:50:40+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-10-26T09:15:29+03:00" />
<meta property="og:updated_time" content="2022-10-26T17:50:40+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-10-26T09:15:29+03:00" />
<meta property="og:updated_time" content="2022-10-26T17:50:40+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-10-26T09:15:29+03:00" />
<meta property="og:updated_time" content="2022-10-26T17:50:40+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-10-26T09:15:29+03:00" />
<meta property="og:updated_time" content="2022-10-26T17:50:40+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-10-26T09:15:29+03:00" />
<meta property="og:updated_time" content="2022-10-26T17:50:40+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-10-26T09:15:29+03:00" />
<meta property="og:updated_time" content="2022-10-26T17:50:40+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-10-26T09:15:29+03:00" />
<meta property="og:updated_time" content="2022-10-26T17:50:40+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-10-26T09:15:29+03:00" />
<meta property="og:updated_time" content="2022-10-26T17:50:40+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-10-26T09:15:29+03:00" />
<meta property="og:updated_time" content="2022-10-26T17:50:40+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-10-26T09:15:29+03:00" />
<meta property="og:updated_time" content="2022-10-26T17:50:40+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-10-26T09:15:29+03:00" />
<meta property="og:updated_time" content="2022-10-26T17:50:40+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-10-26T09:15:29+03:00" />
<meta property="og:updated_time" content="2022-10-26T17:50:40+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-10-26T09:15:29+03:00" />
<meta property="og:updated_time" content="2022-10-26T17:50:40+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-10-26T09:15:29+03:00" />
<meta property="og:updated_time" content="2022-10-26T17:50:40+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-10-26T09:15:29+03:00" />
<meta property="og:updated_time" content="2022-10-26T17:50:40+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-10-26T09:15:29+03:00" />
<meta property="og:updated_time" content="2022-10-26T17:50:40+03:00" />

View File

@ -3,19 +3,19 @@
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<url>
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
<lastmod>2022-10-26T09:15:29+03:00</lastmod>
<lastmod>2022-10-26T17:50:40+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2022-10-26T09:15:29+03:00</lastmod>
<lastmod>2022-10-26T17:50:40+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
<lastmod>2022-10-26T09:15:29+03:00</lastmod>
<lastmod>2022-10-26T17:50:40+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/2022-10/</loc>
<lastmod>2022-10-26T09:15:29+03:00</lastmod>
<lastmod>2022-10-26T17:50:40+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2022-10-26T09:15:29+03:00</lastmod>
<lastmod>2022-10-26T17:50:40+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/2022-09/</loc>
<lastmod>2022-09-30T17:29:50+03:00</lastmod>