diff --git a/content/posts/2023-04.md b/content/posts/2023-04.md index 9a209752e..ec6fe8c5c 100644 --- a/content/posts/2023-04.md +++ b/content/posts/2023-04.md @@ -438,9 +438,11 @@ $ psql < locks-age.sql | grep -E "[[:digit:]] days" | awk -F\| '{print $10}' | s - I ended up with a long list of UUIDs to fix before the script would complete: ```console -$ psql -d dspace -c "update bundle set primary_bitstream_id=NULL where primary_bitstream_id in ('a7ddf477-1c04-4de0-9c7a-4d3c84a875bc', '9582b661-9c2d-4c86-be22-c3b0942b646a', '210a4d5d-3af9-46f0-84cc-682dd1431762')" +$ psql -d dspace -c "update bundle set primary_bitstream_id=NULL where primary_bitstream_id in ('a7ddf477-1c04-4de0-9c7a-4d3c84a875bc', '9582b661-9c2d-4c86-be22-c3b0942b646a', '210a4d5d-3af9-46f0-84cc-682dd1431762', '51115f07-0a60-4988-8536-b9ebd2a5e15e', '0fc5021d-3264-413a-b2e2-74bda38a394e', '4704fa62-b8ab-4dfe-b7aa-0e4905f8412a')" ``` +- This process ended up taking a few days because each iteration ran for over four hours before failing on the next UUID, sighhhhh + ## 2023-04-18 - Regarding the item Abenet noticed yesterday that has a blank page and a nullPointerException @@ -448,4 +450,37 @@ $ psql -d dspace -c "update bundle set primary_bitstream_id=NULL where primary_b - And according to the REST API on CGSpace the item was modified on 2023-04-11, so last week... - According to the DSpace logs it was Francesca who edited the item last week, so I asked her for more information before I troubleshoot more +## 2023-04-19 + +- I fixed the Bioversity item by deleting the `9781138781276.jpg` bitstream via the REST API + - I *think* Francesca might have changed the "format" of it? + - Anyway, this item has a PDF so we have a proper thumbnail and don't need that other journal cover one +- I noticed a URL for this [Bioversity item](https://hdl.handle.net/10568/89049) redirects incorrectly + - I had mentioned this to Maria and Francesca a few months ago but it seems to never have been resolved +- The `dspace cleanup -v` finally finished after a few days of running and stopping... +- I decided to update the thumbnails in the Bioversity books collection because I saw a few old ones suffering from the CropBox issue +- Also, all day there's been a high load on CGSpace, with lots of locks in PostgreSQL + - I had been waiting until the bitstream cleanup finished... now I might need to restart PostgreSQL to kill some old locks as something needs to give + - I restarted PostgreSQL, but DSpace was still hanging on simple XMLUI options so I ended up restarting Tomcat +- Tag 544 ORCID identifiers with my script +- I updated my `generation-loss.sh` and `improved-dspace-thumbnails` scripts to include thirty-five PDFs from CGSpace (up from twenty-four) to get a larger sample + - Now starting to get some numbers comparing JPEG, WebP, and AVIF + - First, out of curiousity, I checked the average ssimulacra2 scores at Q75, Q80, and Q92 for each format: + +| | Q75 | Q80 | Q92 | +|------|-----|-----|-----| +| JPEG | 70 | 73 | 88 | +| WebP | 73 | 76 | 82 | +| AVIF | 82 | 83 | 92 | + +- Then I checked the quality and file size (bytes) needed to hit an average ssimulacra2 score of 80 with each format: + - **JPEG**: Q89, 124596 bytes + - **WebP**: Q88, 84935 bytes (32% smaller than JPEG size) + - **AVIF**: Q62, 60347 bytes (52% smaller than JPEG size) +- [Google's original WebP study](https://developers.google.com/speed/webp/docs/webp_study) uses this technique to compare WebP to JPEG too + - As the quality settings are not comparable between formats, we need to compare the formats at matching perceptual scores (ssimulacra2 in this case) + - I used a ssimulacra2 score of 80 because that's the about the highest score I see with WebP using my samples, though JPEG and AVIF do go higher + - Also, according to current ssimulacra2 (v2.1), a score of 70 is "high quality" and a score of 90 is "very high quality", so 80 should be reasonably high enough... +- Export CGSpace to check for missing Initiatives mappings + diff --git a/docs/2022-10/index.html b/docs/2022-10/index.html index 7f169d4d3..9a0db791e 100644 --- a/docs/2022-10/index.html +++ b/docs/2022-10/index.html @@ -20,7 +20,7 @@ I filed an issue to ask about Java 11+ support - + @@ -48,7 +48,7 @@ I filed an issue to ask about Java 11+ support "url": "https://alanorth.github.io/cgspace-notes/2022-10/", "wordCount": "3768", "datePublished": "2022-10-01T19:45:36+03:00", - "dateModified": "2022-10-31T16:59:47+03:00", + "dateModified": "2023-04-18T11:08:15-07:00", "author": { "@type": "Person", "name": "Alan Orth" diff --git a/docs/2023-04/index.html b/docs/2023-04/index.html index 5561312cc..c08f8bfac 100644 --- a/docs/2023-04/index.html +++ b/docs/2023-04/index.html @@ -20,7 +20,7 @@ Start a harvest on AReS - + @@ -46,9 +46,9 @@ Start a harvest on AReS "@type": "BlogPosting", "headline": "April, 2023", "url": "https://alanorth.github.io/cgspace-notes/2023-04/", - "wordCount": "1556", + "wordCount": "1970", "datePublished": "2023-04-02T08:19:36+03:00", - "dateModified": "2023-04-06T16:13:30+03:00", + "dateModified": "2023-04-18T11:08:15-07:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -573,8 +573,11 @@ Start a harvest on AReS -
$ psql -d dspace -c "update bundle set primary_bitstream_id=NULL where primary_bitstream_id in ('a7ddf477-1c04-4de0-9c7a-4d3c84a875bc', '9582b661-9c2d-4c86-be22-c3b0942b646a', '210a4d5d-3af9-46f0-84cc-682dd1431762')"
-

2023-04-18

+
$ psql -d dspace -c "update bundle set primary_bitstream_id=NULL where primary_bitstream_id in ('a7ddf477-1c04-4de0-9c7a-4d3c84a875bc', '9582b661-9c2d-4c86-be22-c3b0942b646a', '210a4d5d-3af9-46f0-84cc-682dd1431762', '51115f07-0a60-4988-8536-b9ebd2a5e15e', '0fc5021d-3264-413a-b2e2-74bda38a394e', '4704fa62-b8ab-4dfe-b7aa-0e4905f8412a')"
+
+

2023-04-18

+

2023-04-19

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Q75Q80Q92
JPEG707388
WebP737682
AVIF828392
+ diff --git a/docs/categories/index.html b/docs/categories/index.html index 455017e99..c2ea535dd 100644 --- a/docs/categories/index.html +++ b/docs/categories/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html index 70bdadaf7..c196b97b7 100644 --- a/docs/categories/notes/index.html +++ b/docs/categories/notes/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html index 63f0ad5de..9be28ec6a 100644 --- a/docs/categories/notes/page/2/index.html +++ b/docs/categories/notes/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html index 3482d980e..57cf29998 100644 --- a/docs/categories/notes/page/3/index.html +++ b/docs/categories/notes/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html index 57a82a49c..633fd3999 100644 --- a/docs/categories/notes/page/4/index.html +++ b/docs/categories/notes/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html index 562631efc..f6dd8911f 100644 --- a/docs/categories/notes/page/5/index.html +++ b/docs/categories/notes/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/6/index.html b/docs/categories/notes/page/6/index.html index abdcb2aed..b94b96287 100644 --- a/docs/categories/notes/page/6/index.html +++ b/docs/categories/notes/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/7/index.html b/docs/categories/notes/page/7/index.html index 7b543a353..94f335655 100644 --- a/docs/categories/notes/page/7/index.html +++ b/docs/categories/notes/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/index.html b/docs/index.html index b37c67bbb..4793c8800 100644 --- a/docs/index.html +++ b/docs/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/10/index.html b/docs/page/10/index.html index 82095e5cc..a1cb5b1ff 100644 --- a/docs/page/10/index.html +++ b/docs/page/10/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/2/index.html b/docs/page/2/index.html index 6a727495d..4e25562cb 100644 --- a/docs/page/2/index.html +++ b/docs/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/3/index.html b/docs/page/3/index.html index e1774e315..9ef38a8d5 100644 --- a/docs/page/3/index.html +++ b/docs/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/4/index.html b/docs/page/4/index.html index e1abc36ef..810ece543 100644 --- a/docs/page/4/index.html +++ b/docs/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/5/index.html b/docs/page/5/index.html index e722a35ba..e5bfa3323 100644 --- a/docs/page/5/index.html +++ b/docs/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/6/index.html b/docs/page/6/index.html index b0926b98b..f1585186b 100644 --- a/docs/page/6/index.html +++ b/docs/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/7/index.html b/docs/page/7/index.html index a6983d222..1939151da 100644 --- a/docs/page/7/index.html +++ b/docs/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/8/index.html b/docs/page/8/index.html index da2c3c382..c8828e061 100644 --- a/docs/page/8/index.html +++ b/docs/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/9/index.html b/docs/page/9/index.html index dbb50a8e3..156e00296 100644 --- a/docs/page/9/index.html +++ b/docs/page/9/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/index.html b/docs/posts/index.html index 6c737bfb6..31b85a715 100644 --- a/docs/posts/index.html +++ b/docs/posts/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/10/index.html b/docs/posts/page/10/index.html index 8e666a73f..3eae5d4c0 100644 --- a/docs/posts/page/10/index.html +++ b/docs/posts/page/10/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html index 537c91d3d..d73995049 100644 --- a/docs/posts/page/2/index.html +++ b/docs/posts/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html index 6d571ba73..d70bb6b6a 100644 --- a/docs/posts/page/3/index.html +++ b/docs/posts/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html index 169ff1d0a..164593baf 100644 --- a/docs/posts/page/4/index.html +++ b/docs/posts/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html index 75722c821..9cbb205e9 100644 --- a/docs/posts/page/5/index.html +++ b/docs/posts/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html index a057f5d79..e551df1ba 100644 --- a/docs/posts/page/6/index.html +++ b/docs/posts/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html index 65d284350..81fd85853 100644 --- a/docs/posts/page/7/index.html +++ b/docs/posts/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/8/index.html b/docs/posts/page/8/index.html index 3ef970059..e040082a3 100644 --- a/docs/posts/page/8/index.html +++ b/docs/posts/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/9/index.html b/docs/posts/page/9/index.html index ea0d0bb25..f79f7204f 100644 --- a/docs/posts/page/9/index.html +++ b/docs/posts/page/9/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 8e2d734e5..dced67b39 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -3,19 +3,19 @@ xmlns:xhtml="http://www.w3.org/1999/xhtml"> https://alanorth.github.io/cgspace-notes/2023-04/ - 2023-04-06T16:13:30+03:00 + 2023-04-18T11:08:15-07:00 https://alanorth.github.io/cgspace-notes/categories/ - 2023-04-06T16:13:30+03:00 + 2023-04-18T11:08:15-07:00 https://alanorth.github.io/cgspace-notes/ - 2023-04-06T16:13:30+03:00 + 2023-04-18T11:08:15-07:00 https://alanorth.github.io/cgspace-notes/categories/notes/ - 2023-04-06T16:13:30+03:00 + 2023-04-18T11:08:15-07:00 https://alanorth.github.io/cgspace-notes/posts/ - 2023-04-06T16:13:30+03:00 + 2023-04-18T11:08:15-07:00 https://alanorth.github.io/cgspace-notes/2023-03/ 2023-04-02T09:16:25+03:00 @@ -33,7 +33,7 @@ 2023-01-04T10:53:02+03:00 https://alanorth.github.io/cgspace-notes/2022-10/ - 2022-10-31T16:59:47+03:00 + 2023-04-18T11:08:15-07:00 https://alanorth.github.io/cgspace-notes/2022-09/ 2022-09-30T17:29:50+03:00