Add notes for 2023-03-28

This commit is contained in:
Alan Orth 2023-03-28 17:04:54 +03:00
parent 37bdf2645f
commit 5cd298a37a
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
31 changed files with 114 additions and 36 deletions

View File

@ -517,4 +517,35 @@ colly | awk '{print $1}' | sort | uniq -c | sort -h
- I exported CGSpace to check for missing Initiative collection mappings
- Start a harvest on AReS
## 2023-03-27
- The harvest on AReS was incredibly slow and I stopped it about half way twelve hours later
- Then I relied on the plugins to get missing items, which caused a high load on the server but actually worked fine
- Continue working on thumbnails on DSpace
## 2023-03-28
- Regarding ImageMagick there are a few things I've learned
- The `-quality` setting does different things for different output formats, see: https://imagemagick.org/script/command-line-options.php#quality
- The `-compress` setting controls the compression algorithm for image data, and is unrelated to lossless/lossy
- On that note, `-compress lossless` for JPEGs refers to Lossless JPEG, which is not well defined or supported and should be avoided
- See: https://imagemagick.org/script/command-line-options.php#compress
- The way DSpace currently does its supersampling by exporting to a JPEG, then making a thumbnail of the JPEG, is a double lossy operation
- We should be exporting to something lossless like PNG, PPM, or MIFF, then making a thumbnail from that
- The PNG format is always lossless so the `-quality` setting controls compression and filtering, but has no effect on the appearance or signature of PNG images
- You can use `-quality n` with WebP's `-define webp:lossless=true`, but I'm not sure about the interaction between ImageMagick quality and WebP lossless...
- Also, if converting from a lossless format to WebP lossless in the same command, ImageMagick will ignore quality settings
- The MIFF format is useful for piping between ImageMagick commands, but it is also lossless and the quality setting is ignored
- You can use a format specifier when piping between ImageMagick commands without writing a file
- For example, I want to create a lossless PNG from a distorted JPEG for comparison:
```console
$ magick convert reference.jpg -quality 85 jpg:- | convert - distorted-lossless.png
```
- If I convert the JPEG to PNG directly it will ignore the quality setting, so I set the quality and the output format, then pipe it to ImageMagick again to convert to lossless PNG
- In an attempt to quantify the generation loss from DSpace's "JPG JPG" method of creating thumbnails I wrote a script called `generation-loss.sh` to test against a new "PNG JPG" method
- With my sample set of seventeen PDFs from CGSpace I found that _the "JPG JPG" method of thumbnailing results in scores an average of 1.6% lower than with the "PNG JPG" method_.
- The average file size with _the "PNG JPG" method was only 200 bytes larger_.
<!-- vim: set sw=2 ts=2: -->

View File

@ -16,7 +16,7 @@ I finally got through with porting the input form from DSpace 6 to DSpace 7
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2023-03/" />
<meta property="article:published_time" content="2023-03-01T07:58:36+03:00" />
<meta property="article:modified_time" content="2023-03-24T13:19:13+03:00" />
<meta property="article:modified_time" content="2023-03-27T10:03:45+03:00" />
@ -38,9 +38,9 @@ I finally got through with porting the input form from DSpace 6 to DSpace 7
"@type": "BlogPosting",
"headline": "March, 2023",
"url": "https://alanorth.github.io/cgspace-notes/2023-03/",
"wordCount": "3600",
"wordCount": "3988",
"datePublished": "2023-03-01T07:58:36+03:00",
"dateModified": "2023-03-24T13:19:13+03:00",
"dateModified": "2023-03-27T10:03:45+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -687,6 +687,53 @@ RL: performed 0 reads and 16 write i/o operations
<li>I exported CGSpace to check for missing Initiative collection mappings</li>
<li>Start a harvest on AReS</li>
</ul>
<h2 id="2023-03-27">2023-03-27</h2>
<ul>
<li>The harvest on AReS was incredibly slow and I stopped it about half way twelve hours later
<ul>
<li>Then I relied on the plugins to get missing items, which caused a high load on the server but actually worked fine</li>
</ul>
</li>
<li>Continue working on thumbnails on DSpace</li>
</ul>
<h2 id="2023-03-28">2023-03-28</h2>
<ul>
<li>Regarding ImageMagick there are a few things I&rsquo;ve learned
<ul>
<li>The <code>-quality</code> setting does different things for different output formats, see: <a href="https://imagemagick.org/script/command-line-options.php#quality">https://imagemagick.org/script/command-line-options.php#quality</a></li>
<li>The <code>-compress</code> setting controls the compression algorithm for image data, and is unrelated to lossless/lossy
<ul>
<li>On that note, <code>-compress lossless</code> for JPEGs refers to Lossless JPEG, which is not well defined or supported and should be avoided</li>
<li>See: <a href="https://imagemagick.org/script/command-line-options.php#compress">https://imagemagick.org/script/command-line-options.php#compress</a></li>
</ul>
</li>
<li>The way DSpace currently does its supersampling by exporting to a JPEG, then making a thumbnail of the JPEG, is a double lossy operation
<ul>
<li>We should be exporting to something lossless like PNG, PPM, or MIFF, then making a thumbnail from that</li>
</ul>
</li>
<li>The PNG format is always lossless so the <code>-quality</code> setting controls compression and filtering, but has no effect on the appearance or signature of PNG images</li>
<li>You can use <code>-quality n</code> with WebP&rsquo;s <code>-define webp:lossless=true</code>, but I&rsquo;m not sure about the interaction between ImageMagick quality and WebP lossless&hellip;
<ul>
<li>Also, if converting from a lossless format to WebP lossless in the same command, ImageMagick will ignore quality settings</li>
</ul>
</li>
<li>The MIFF format is useful for piping between ImageMagick commands, but it is also lossless and the quality setting is ignored</li>
<li>You can use a format specifier when piping between ImageMagick commands without writing a file</li>
<li>For example, I want to create a lossless PNG from a distorted JPEG for comparison:</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ magick convert reference.jpg -quality <span style="color:#ae81ff">85</span> jpg:- | convert - distorted-lossless.png
</span></span></code></pre></div><ul>
<li>If I convert the JPEG to PNG directly it will ignore the quality setting, so I set the quality and the output format, then pipe it to ImageMagick again to convert to lossless PNG</li>
<li>In an attempt to quantify the generation loss from DSpace&rsquo;s &ldquo;JPG JPG&rdquo; method of creating thumbnails I wrote a script called <code>generation-loss.sh</code> to test against a new &ldquo;PNG JPG&rdquo; method
<ul>
<li>With my sample set of seventeen PDFs from CGSpace I found that <em>the &ldquo;JPG JPG&rdquo; method of thumbnailing results in scores an average of 1.6% lower than with the &ldquo;PNG JPG&rdquo; method</em>.</li>
<li>The average file size with <em>the &ldquo;PNG JPG&rdquo; method was only 200 bytes larger</em>.</li>
</ul>
</li>
</ul>
<!-- raw HTML omitted -->

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
<meta property="og:updated_time" content="2023-03-24T13:19:13+03:00" />
<meta property="og:updated_time" content="2023-03-27T10:03:45+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-03-24T13:19:13+03:00" />
<meta property="og:updated_time" content="2023-03-27T10:03:45+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-03-24T13:19:13+03:00" />
<meta property="og:updated_time" content="2023-03-27T10:03:45+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-03-24T13:19:13+03:00" />
<meta property="og:updated_time" content="2023-03-27T10:03:45+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-03-24T13:19:13+03:00" />
<meta property="og:updated_time" content="2023-03-27T10:03:45+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-03-24T13:19:13+03:00" />
<meta property="og:updated_time" content="2023-03-27T10:03:45+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-03-24T13:19:13+03:00" />
<meta property="og:updated_time" content="2023-03-27T10:03:45+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-03-24T13:19:13+03:00" />
<meta property="og:updated_time" content="2023-03-27T10:03:45+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-03-24T13:19:13+03:00" />
<meta property="og:updated_time" content="2023-03-27T10:03:45+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-03-24T13:19:13+03:00" />
<meta property="og:updated_time" content="2023-03-27T10:03:45+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-03-24T13:19:13+03:00" />
<meta property="og:updated_time" content="2023-03-27T10:03:45+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-03-24T13:19:13+03:00" />
<meta property="og:updated_time" content="2023-03-27T10:03:45+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-03-24T13:19:13+03:00" />
<meta property="og:updated_time" content="2023-03-27T10:03:45+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-03-24T13:19:13+03:00" />
<meta property="og:updated_time" content="2023-03-27T10:03:45+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-03-24T13:19:13+03:00" />
<meta property="og:updated_time" content="2023-03-27T10:03:45+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-03-24T13:19:13+03:00" />
<meta property="og:updated_time" content="2023-03-27T10:03:45+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-03-24T13:19:13+03:00" />
<meta property="og:updated_time" content="2023-03-27T10:03:45+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-03-24T13:19:13+03:00" />
<meta property="og:updated_time" content="2023-03-27T10:03:45+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-03-24T13:19:13+03:00" />
<meta property="og:updated_time" content="2023-03-27T10:03:45+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-03-24T13:19:13+03:00" />
<meta property="og:updated_time" content="2023-03-27T10:03:45+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-03-24T13:19:13+03:00" />
<meta property="og:updated_time" content="2023-03-27T10:03:45+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-03-24T13:19:13+03:00" />
<meta property="og:updated_time" content="2023-03-27T10:03:45+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-03-24T13:19:13+03:00" />
<meta property="og:updated_time" content="2023-03-27T10:03:45+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-03-24T13:19:13+03:00" />
<meta property="og:updated_time" content="2023-03-27T10:03:45+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-03-24T13:19:13+03:00" />
<meta property="og:updated_time" content="2023-03-27T10:03:45+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-03-24T13:19:13+03:00" />
<meta property="og:updated_time" content="2023-03-27T10:03:45+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-03-24T13:19:13+03:00" />
<meta property="og:updated_time" content="2023-03-27T10:03:45+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-03-24T13:19:13+03:00" />
<meta property="og:updated_time" content="2023-03-27T10:03:45+03:00" />

View File

@ -3,19 +3,19 @@
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<url>
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
<lastmod>2023-03-24T13:19:13+03:00</lastmod>
<lastmod>2023-03-27T10:03:45+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2023-03-24T13:19:13+03:00</lastmod>
<lastmod>2023-03-27T10:03:45+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/2023-03/</loc>
<lastmod>2023-03-24T13:19:13+03:00</lastmod>
<lastmod>2023-03-27T10:03:45+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
<lastmod>2023-03-24T13:19:13+03:00</lastmod>
<lastmod>2023-03-27T10:03:45+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2023-03-24T13:19:13+03:00</lastmod>
<lastmod>2023-03-27T10:03:45+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/2023-02/</loc>
<lastmod>2023-03-01T08:30:25+03:00</lastmod>