mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-22 14:45:03 +01:00
Update notes for 2020-01-23
This commit is contained in:
parent
832b60c906
commit
9abe34ec6f
@ -243,6 +243,25 @@ $ convert -density 288 -filter lagrange -thumbnail 25% -background white -alpha
|
|||||||
```
|
```
|
||||||
|
|
||||||
- Here I'm also explicitly setting the background to white and removing any alpha layers, but I could probably also just keep using `-flatten` like DSpace already does
|
- Here I'm also explicitly setting the background to white and removing any alpha layers, but I could probably also just keep using `-flatten` like DSpace already does
|
||||||
- I wonder if I could hack this into DSpace code to get better thumbnails...
|
- I did some tests with a modified version of above that uses uses `-flatten` and drops the sampling-factor and colorspace, but bumps up the image size to 600px (default on CGSpace is currently 300):
|
||||||
|
|
||||||
|
```
|
||||||
|
$ convert -density 288 -filter lagrange -resize 25% -flatten 10568-97925.pdf\[0\] 10568-97925-d288-lagrange.pdf.jpg
|
||||||
|
$ convert -flatten 10568-97925.pdf\[0\] 10568-97925.pdf.jpg
|
||||||
|
$ convert -thumbnail x600 10568-97925-d288-lagrange.pdf.jpg 10568-97925-d288-lagrange-thumbnail.pdf.jpg
|
||||||
|
$ convert -thumbnail x600 10568-97925.pdf.jpg 10568-97925-thumbnail.pdf.jpg
|
||||||
|
```
|
||||||
|
|
||||||
|
- This emulate's DSpace's method of generating a high-quality image from the PDF and then creating a thumbnail
|
||||||
|
- I put together a proof of concept of this by adding the extra options to dspace-api's `ImageMagickThumbnailFilter.java` and it works
|
||||||
|
- I need to run tests on a handful of PDFs to see if there are any side effects
|
||||||
|
- The file size is about double the old ones, but the quality is very good and the file size is nowhere near ilri.org's 400KiB PNG!
|
||||||
|
- Peter sent me the corrections and deletions for affiliations last night so I imported them into OpenRefine to work around the normal UTF-8 issue, ran them through csv-metadata-quality to make sure all Unicode values were normalized (NFC), then applied them on DSpace Test and CGSpace:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ csv-metadata-quality -i ~/Downloads/2020-01-22-fix-1113-affiliations.csv -o /tmp/2020-01-22-fix-1113-affiliations.csv -u --exclude-fields 'dc.date.issued,dc.date.issued[],cg.contributor.affiliation'
|
||||||
|
$ ./fix-metadata-values.py -i /tmp/2020-01-22-fix-1113-affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -m 211 -t correct
|
||||||
|
$ ./delete-metadata-values.py -i /tmp/2020-01-22-delete-36-affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -m 211
|
||||||
|
```
|
||||||
|
|
||||||
<!-- vim: set sw=2 ts=2: -->
|
<!-- vim: set sw=2 ts=2: -->
|
||||||
|
@ -29,7 +29,7 @@ I tweeted the CGSpace repository link
|
|||||||
<meta property="og:type" content="article" />
|
<meta property="og:type" content="article" />
|
||||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2020-01/" />
|
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2020-01/" />
|
||||||
<meta property="article:published_time" content="2020-01-06T10:48:30+02:00" />
|
<meta property="article:published_time" content="2020-01-06T10:48:30+02:00" />
|
||||||
<meta property="article:modified_time" content="2020-01-22T14:16:08+02:00" />
|
<meta property="article:modified_time" content="2020-01-23T12:46:39+02:00" />
|
||||||
|
|
||||||
<meta name="twitter:card" content="summary"/>
|
<meta name="twitter:card" content="summary"/>
|
||||||
<meta name="twitter:title" content="January, 2020"/>
|
<meta name="twitter:title" content="January, 2020"/>
|
||||||
@ -63,9 +63,9 @@ I tweeted the CGSpace repository link
|
|||||||
"@type": "BlogPosting",
|
"@type": "BlogPosting",
|
||||||
"headline": "January, 2020",
|
"headline": "January, 2020",
|
||||||
"url": "https:\/\/alanorth.github.io\/cgspace-notes\/2020-01\/",
|
"url": "https:\/\/alanorth.github.io\/cgspace-notes\/2020-01\/",
|
||||||
"wordCount": "1905",
|
"wordCount": "2117",
|
||||||
"datePublished": "2020-01-06T10:48:30+02:00",
|
"datePublished": "2020-01-06T10:48:30+02:00",
|
||||||
"dateModified": "2020-01-22T14:16:08+02:00",
|
"dateModified": "2020-01-23T12:46:39+02:00",
|
||||||
"author": {
|
"author": {
|
||||||
"@type": "Person",
|
"@type": "Person",
|
||||||
"name": "Alan Orth"
|
"name": "Alan Orth"
|
||||||
@ -383,9 +383,23 @@ $ wc -l hung-nguyen-a*handles.txt
|
|||||||
<pre><code>$ convert -density 288 -filter lagrange -thumbnail 25% -background white -alpha remove -sampling-factor 1:1 -colorspace sRGB 10568-97925.pdf\[0\] 10568-97925.jpg
|
<pre><code>$ convert -density 288 -filter lagrange -thumbnail 25% -background white -alpha remove -sampling-factor 1:1 -colorspace sRGB 10568-97925.pdf\[0\] 10568-97925.jpg
|
||||||
</code></pre><ul>
|
</code></pre><ul>
|
||||||
<li>Here I'm also explicitly setting the background to white and removing any alpha layers, but I could probably also just keep using <code>-flatten</code> like DSpace already does</li>
|
<li>Here I'm also explicitly setting the background to white and removing any alpha layers, but I could probably also just keep using <code>-flatten</code> like DSpace already does</li>
|
||||||
<li>I wonder if I could hack this into DSpace code to get better thumbnails…</li>
|
<li>I did some tests with a modified version of above that uses uses <code>-flatten</code> and drops the sampling-factor and colorspace, but bumps up the image size to 600px (default on CGSpace is currently 300):</li>
|
||||||
</ul>
|
</ul>
|
||||||
<!-- raw HTML omitted -->
|
<pre><code>$ convert -density 288 -filter lagrange -resize 25% -flatten 10568-97925.pdf\[0\] 10568-97925-d288-lagrange.pdf.jpg
|
||||||
|
$ convert -flatten 10568-97925.pdf\[0\] 10568-97925.pdf.jpg
|
||||||
|
$ convert -thumbnail x600 10568-97925-d288-lagrange.pdf.jpg 10568-97925-d288-lagrange-thumbnail.pdf.jpg
|
||||||
|
$ convert -thumbnail x600 10568-97925.pdf.jpg 10568-97925-thumbnail.pdf.jpg
|
||||||
|
</code></pre><ul>
|
||||||
|
<li>This emulate's DSpace's method of generating a high-quality image from the PDF and then creating a thumbnail</li>
|
||||||
|
<li>I put together a proof of concept of this by adding the extra options to dspace-api's <code>ImageMagickThumbnailFilter.java</code> and it works</li>
|
||||||
|
<li>I need to run tests on a handful of PDFs to see if there are any side effects</li>
|
||||||
|
<li>The file size is about double the old ones, but the quality is very good and the file size is nowhere near ilri.org's 400KiB PNG!</li>
|
||||||
|
<li>Peter sent me the corrections and deletions for affiliations last night so I imported them into OpenRefine to work around the normal UTF-8 issue, ran them through csv-metadata-quality to make sure all Unicode values were normalized (NFC), then applied them on DSpace Test and CGSpace:</li>
|
||||||
|
</ul>
|
||||||
|
<pre><code>$ csv-metadata-quality -i ~/Downloads/2020-01-22-fix-1113-affiliations.csv -o /tmp/2020-01-22-fix-1113-affiliations.csv -u --exclude-fields 'dc.date.issued,dc.date.issued[],cg.contributor.affiliation'
|
||||||
|
$ ./fix-metadata-values.py -i /tmp/2020-01-22-fix-1113-affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -m 211 -t correct
|
||||||
|
$ ./delete-metadata-values.py -i /tmp/2020-01-22-delete-36-affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -m 211
|
||||||
|
</code></pre><!-- raw HTML omitted -->
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@ -4,27 +4,27 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
|
||||||
<lastmod>2020-01-22T14:16:08+02:00</lastmod>
|
<lastmod>2020-01-23T12:46:39+02:00</lastmod>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||||
<lastmod>2020-01-22T14:16:08+02:00</lastmod>
|
<lastmod>2020-01-23T12:46:39+02:00</lastmod>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/2020-01/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/2020-01/</loc>
|
||||||
<lastmod>2020-01-22T14:16:08+02:00</lastmod>
|
<lastmod>2020-01-23T12:46:39+02:00</lastmod>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
|
||||||
<lastmod>2020-01-22T14:16:08+02:00</lastmod>
|
<lastmod>2020-01-23T12:46:39+02:00</lastmod>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||||
<lastmod>2020-01-22T14:16:08+02:00</lastmod>
|
<lastmod>2020-01-23T12:46:39+02:00</lastmod>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
|
Loading…
Reference in New Issue
Block a user