Update notes for 2020-01-23

This commit is contained in:
Alan Orth 2020-01-23 15:56:46 +02:00
parent 832b60c906
commit 9abe34ec6f
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
3 changed files with 44 additions and 11 deletions

View File

@ -243,6 +243,25 @@ $ convert -density 288 -filter lagrange -thumbnail 25% -background white -alpha
``` ```
- Here I'm also explicitly setting the background to white and removing any alpha layers, but I could probably also just keep using `-flatten` like DSpace already does - Here I'm also explicitly setting the background to white and removing any alpha layers, but I could probably also just keep using `-flatten` like DSpace already does
- I wonder if I could hack this into DSpace code to get better thumbnails... - I did some tests with a modified version of above that uses uses `-flatten` and drops the sampling-factor and colorspace, but bumps up the image size to 600px (default on CGSpace is currently 300):
```
$ convert -density 288 -filter lagrange -resize 25% -flatten 10568-97925.pdf\[0\] 10568-97925-d288-lagrange.pdf.jpg
$ convert -flatten 10568-97925.pdf\[0\] 10568-97925.pdf.jpg
$ convert -thumbnail x600 10568-97925-d288-lagrange.pdf.jpg 10568-97925-d288-lagrange-thumbnail.pdf.jpg
$ convert -thumbnail x600 10568-97925.pdf.jpg 10568-97925-thumbnail.pdf.jpg
```
- This emulate's DSpace's method of generating a high-quality image from the PDF and then creating a thumbnail
- I put together a proof of concept of this by adding the extra options to dspace-api's `ImageMagickThumbnailFilter.java` and it works
- I need to run tests on a handful of PDFs to see if there are any side effects
- The file size is about double the old ones, but the quality is very good and the file size is nowhere near ilri.org's 400KiB PNG!
- Peter sent me the corrections and deletions for affiliations last night so I imported them into OpenRefine to work around the normal UTF-8 issue, ran them through csv-metadata-quality to make sure all Unicode values were normalized (NFC), then applied them on DSpace Test and CGSpace:
```
$ csv-metadata-quality -i ~/Downloads/2020-01-22-fix-1113-affiliations.csv -o /tmp/2020-01-22-fix-1113-affiliations.csv -u --exclude-fields 'dc.date.issued,dc.date.issued[],cg.contributor.affiliation'
$ ./fix-metadata-values.py -i /tmp/2020-01-22-fix-1113-affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -m 211 -t correct
$ ./delete-metadata-values.py -i /tmp/2020-01-22-delete-36-affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -m 211
```
<!-- vim: set sw=2 ts=2: --> <!-- vim: set sw=2 ts=2: -->

View File

@ -29,7 +29,7 @@ I tweeted the CGSpace repository link
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2020-01/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2020-01/" />
<meta property="article:published_time" content="2020-01-06T10:48:30+02:00" /> <meta property="article:published_time" content="2020-01-06T10:48:30+02:00" />
<meta property="article:modified_time" content="2020-01-22T14:16:08+02:00" /> <meta property="article:modified_time" content="2020-01-23T12:46:39+02:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="January, 2020"/> <meta name="twitter:title" content="January, 2020"/>
@ -63,9 +63,9 @@ I tweeted the CGSpace repository link
"@type": "BlogPosting", "@type": "BlogPosting",
"headline": "January, 2020", "headline": "January, 2020",
"url": "https:\/\/alanorth.github.io\/cgspace-notes\/2020-01\/", "url": "https:\/\/alanorth.github.io\/cgspace-notes\/2020-01\/",
"wordCount": "1905", "wordCount": "2117",
"datePublished": "2020-01-06T10:48:30+02:00", "datePublished": "2020-01-06T10:48:30+02:00",
"dateModified": "2020-01-22T14:16:08+02:00", "dateModified": "2020-01-23T12:46:39+02:00",
"author": { "author": {
"@type": "Person", "@type": "Person",
"name": "Alan Orth" "name": "Alan Orth"
@ -383,9 +383,23 @@ $ wc -l hung-nguyen-a*handles.txt
<pre><code>$ convert -density 288 -filter lagrange -thumbnail 25% -background white -alpha remove -sampling-factor 1:1 -colorspace sRGB 10568-97925.pdf\[0\] 10568-97925.jpg <pre><code>$ convert -density 288 -filter lagrange -thumbnail 25% -background white -alpha remove -sampling-factor 1:1 -colorspace sRGB 10568-97925.pdf\[0\] 10568-97925.jpg
</code></pre><ul> </code></pre><ul>
<li>Here I'm also explicitly setting the background to white and removing any alpha layers, but I could probably also just keep using <code>-flatten</code> like DSpace already does</li> <li>Here I'm also explicitly setting the background to white and removing any alpha layers, but I could probably also just keep using <code>-flatten</code> like DSpace already does</li>
<li>I wonder if I could hack this into DSpace code to get better thumbnails&hellip;</li> <li>I did some tests with a modified version of above that uses uses <code>-flatten</code> and drops the sampling-factor and colorspace, but bumps up the image size to 600px (default on CGSpace is currently 300):</li>
</ul> </ul>
<!-- raw HTML omitted --> <pre><code>$ convert -density 288 -filter lagrange -resize 25% -flatten 10568-97925.pdf\[0\] 10568-97925-d288-lagrange.pdf.jpg
$ convert -flatten 10568-97925.pdf\[0\] 10568-97925.pdf.jpg
$ convert -thumbnail x600 10568-97925-d288-lagrange.pdf.jpg 10568-97925-d288-lagrange-thumbnail.pdf.jpg
$ convert -thumbnail x600 10568-97925.pdf.jpg 10568-97925-thumbnail.pdf.jpg
</code></pre><ul>
<li>This emulate's DSpace's method of generating a high-quality image from the PDF and then creating a thumbnail</li>
<li>I put together a proof of concept of this by adding the extra options to dspace-api's <code>ImageMagickThumbnailFilter.java</code> and it works</li>
<li>I need to run tests on a handful of PDFs to see if there are any side effects</li>
<li>The file size is about double the old ones, but the quality is very good and the file size is nowhere near ilri.org's 400KiB PNG!</li>
<li>Peter sent me the corrections and deletions for affiliations last night so I imported them into OpenRefine to work around the normal UTF-8 issue, ran them through csv-metadata-quality to make sure all Unicode values were normalized (NFC), then applied them on DSpace Test and CGSpace:</li>
</ul>
<pre><code>$ csv-metadata-quality -i ~/Downloads/2020-01-22-fix-1113-affiliations.csv -o /tmp/2020-01-22-fix-1113-affiliations.csv -u --exclude-fields 'dc.date.issued,dc.date.issued[],cg.contributor.affiliation'
$ ./fix-metadata-values.py -i /tmp/2020-01-22-fix-1113-affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -m 211 -t correct
$ ./delete-metadata-values.py -i /tmp/2020-01-22-delete-36-affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -m 211
</code></pre><!-- raw HTML omitted -->

View File

@ -4,27 +4,27 @@
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc> <loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
<lastmod>2020-01-22T14:16:08+02:00</lastmod> <lastmod>2020-01-23T12:46:39+02:00</lastmod>
</url> </url>
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/</loc> <loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2020-01-22T14:16:08+02:00</lastmod> <lastmod>2020-01-23T12:46:39+02:00</lastmod>
</url> </url>
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/2020-01/</loc> <loc>https://alanorth.github.io/cgspace-notes/2020-01/</loc>
<lastmod>2020-01-22T14:16:08+02:00</lastmod> <lastmod>2020-01-23T12:46:39+02:00</lastmod>
</url> </url>
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc> <loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
<lastmod>2020-01-22T14:16:08+02:00</lastmod> <lastmod>2020-01-23T12:46:39+02:00</lastmod>
</url> </url>
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc> <loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2020-01-22T14:16:08+02:00</lastmod> <lastmod>2020-01-23T12:46:39+02:00</lastmod>
</url> </url>
<url> <url>