From 9abe34ec6fe4df5e7a6f6708ea1cecdb983f7dd1 Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Thu, 23 Jan 2020 15:56:46 +0200 Subject: [PATCH] Update notes for 2020-01-23 --- content/posts/2020-01.md | 21 ++++++++++++++++++++- docs/2020-01/index.html | 24 +++++++++++++++++++----- docs/sitemap.xml | 10 +++++----- 3 files changed, 44 insertions(+), 11 deletions(-) diff --git a/content/posts/2020-01.md b/content/posts/2020-01.md index c50881974..a76efdd54 100644 --- a/content/posts/2020-01.md +++ b/content/posts/2020-01.md @@ -243,6 +243,25 @@ $ convert -density 288 -filter lagrange -thumbnail 25% -background white -alpha ``` - Here I'm also explicitly setting the background to white and removing any alpha layers, but I could probably also just keep using `-flatten` like DSpace already does -- I wonder if I could hack this into DSpace code to get better thumbnails... +- I did some tests with a modified version of above that uses uses `-flatten` and drops the sampling-factor and colorspace, but bumps up the image size to 600px (default on CGSpace is currently 300): + +``` +$ convert -density 288 -filter lagrange -resize 25% -flatten 10568-97925.pdf\[0\] 10568-97925-d288-lagrange.pdf.jpg +$ convert -flatten 10568-97925.pdf\[0\] 10568-97925.pdf.jpg +$ convert -thumbnail x600 10568-97925-d288-lagrange.pdf.jpg 10568-97925-d288-lagrange-thumbnail.pdf.jpg +$ convert -thumbnail x600 10568-97925.pdf.jpg 10568-97925-thumbnail.pdf.jpg +``` + +- This emulate's DSpace's method of generating a high-quality image from the PDF and then creating a thumbnail +- I put together a proof of concept of this by adding the extra options to dspace-api's `ImageMagickThumbnailFilter.java` and it works +- I need to run tests on a handful of PDFs to see if there are any side effects +- The file size is about double the old ones, but the quality is very good and the file size is nowhere near ilri.org's 400KiB PNG! +- Peter sent me the corrections and deletions for affiliations last night so I imported them into OpenRefine to work around the normal UTF-8 issue, ran them through csv-metadata-quality to make sure all Unicode values were normalized (NFC), then applied them on DSpace Test and CGSpace: + +``` +$ csv-metadata-quality -i ~/Downloads/2020-01-22-fix-1113-affiliations.csv -o /tmp/2020-01-22-fix-1113-affiliations.csv -u --exclude-fields 'dc.date.issued,dc.date.issued[],cg.contributor.affiliation' +$ ./fix-metadata-values.py -i /tmp/2020-01-22-fix-1113-affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -m 211 -t correct +$ ./delete-metadata-values.py -i /tmp/2020-01-22-delete-36-affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -m 211 +``` diff --git a/docs/2020-01/index.html b/docs/2020-01/index.html index cd9269f86..271eb7d6c 100644 --- a/docs/2020-01/index.html +++ b/docs/2020-01/index.html @@ -29,7 +29,7 @@ I tweeted the CGSpace repository link - + @@ -63,9 +63,9 @@ I tweeted the CGSpace repository link "@type": "BlogPosting", "headline": "January, 2020", "url": "https:\/\/alanorth.github.io\/cgspace-notes\/2020-01\/", - "wordCount": "1905", + "wordCount": "2117", "datePublished": "2020-01-06T10:48:30+02:00", - "dateModified": "2020-01-22T14:16:08+02:00", + "dateModified": "2020-01-23T12:46:39+02:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -383,9 +383,23 @@ $ wc -l hung-nguyen-a*handles.txt
$ convert -density 288 -filter lagrange -thumbnail 25% -background white -alpha remove -sampling-factor 1:1 -colorspace sRGB 10568-97925.pdf\[0\] 10568-97925.jpg
 
- +
$ convert -density 288 -filter lagrange -resize 25% -flatten 10568-97925.pdf\[0\] 10568-97925-d288-lagrange.pdf.jpg
+$ convert -flatten 10568-97925.pdf\[0\] 10568-97925.pdf.jpg
+$ convert -thumbnail x600 10568-97925-d288-lagrange.pdf.jpg 10568-97925-d288-lagrange-thumbnail.pdf.jpg
+$ convert -thumbnail x600 10568-97925.pdf.jpg 10568-97925-thumbnail.pdf.jpg
+
+
$ csv-metadata-quality -i ~/Downloads/2020-01-22-fix-1113-affiliations.csv -o /tmp/2020-01-22-fix-1113-affiliations.csv -u --exclude-fields 'dc.date.issued,dc.date.issued[],cg.contributor.affiliation'
+$ ./fix-metadata-values.py -i /tmp/2020-01-22-fix-1113-affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -m 211 -t correct
+$ ./delete-metadata-values.py -i /tmp/2020-01-22-delete-36-affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -m 211
+
diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 0e7b23b10..068c6922f 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,27 +4,27 @@ https://alanorth.github.io/cgspace-notes/categories/ - 2020-01-22T14:16:08+02:00 + 2020-01-23T12:46:39+02:00 https://alanorth.github.io/cgspace-notes/ - 2020-01-22T14:16:08+02:00 + 2020-01-23T12:46:39+02:00 https://alanorth.github.io/cgspace-notes/2020-01/ - 2020-01-22T14:16:08+02:00 + 2020-01-23T12:46:39+02:00 https://alanorth.github.io/cgspace-notes/categories/notes/ - 2020-01-22T14:16:08+02:00 + 2020-01-23T12:46:39+02:00 https://alanorth.github.io/cgspace-notes/posts/ - 2020-01-22T14:16:08+02:00 + 2020-01-23T12:46:39+02:00