diff --git a/content/posts/2023-04.md b/content/posts/2023-04.md
index 9a209752e..ec6fe8c5c 100644
--- a/content/posts/2023-04.md
+++ b/content/posts/2023-04.md
@@ -438,9 +438,11 @@ $ psql < locks-age.sql | grep -E "[[:digit:]] days" | awk -F\| '{print $10}' | s
- I ended up with a long list of UUIDs to fix before the script would complete:
```console
-$ psql -d dspace -c "update bundle set primary_bitstream_id=NULL where primary_bitstream_id in ('a7ddf477-1c04-4de0-9c7a-4d3c84a875bc', '9582b661-9c2d-4c86-be22-c3b0942b646a', '210a4d5d-3af9-46f0-84cc-682dd1431762')"
+$ psql -d dspace -c "update bundle set primary_bitstream_id=NULL where primary_bitstream_id in ('a7ddf477-1c04-4de0-9c7a-4d3c84a875bc', '9582b661-9c2d-4c86-be22-c3b0942b646a', '210a4d5d-3af9-46f0-84cc-682dd1431762', '51115f07-0a60-4988-8536-b9ebd2a5e15e', '0fc5021d-3264-413a-b2e2-74bda38a394e', '4704fa62-b8ab-4dfe-b7aa-0e4905f8412a')"
```
+- This process ended up taking a few days because each iteration ran for over four hours before failing on the next UUID, sighhhhh
+
## 2023-04-18
- Regarding the item Abenet noticed yesterday that has a blank page and a nullPointerException
@@ -448,4 +450,37 @@ $ psql -d dspace -c "update bundle set primary_bitstream_id=NULL where primary_b
- And according to the REST API on CGSpace the item was modified on 2023-04-11, so last week...
- According to the DSpace logs it was Francesca who edited the item last week, so I asked her for more information before I troubleshoot more
+## 2023-04-19
+
+- I fixed the Bioversity item by deleting the `9781138781276.jpg` bitstream via the REST API
+ - I *think* Francesca might have changed the "format" of it?
+ - Anyway, this item has a PDF so we have a proper thumbnail and don't need that other journal cover one
+- I noticed a URL for this [Bioversity item](https://hdl.handle.net/10568/89049) redirects incorrectly
+ - I had mentioned this to Maria and Francesca a few months ago but it seems to never have been resolved
+- The `dspace cleanup -v` finally finished after a few days of running and stopping...
+- I decided to update the thumbnails in the Bioversity books collection because I saw a few old ones suffering from the CropBox issue
+- Also, all day there's been a high load on CGSpace, with lots of locks in PostgreSQL
+ - I had been waiting until the bitstream cleanup finished... now I might need to restart PostgreSQL to kill some old locks as something needs to give
+ - I restarted PostgreSQL, but DSpace was still hanging on simple XMLUI options so I ended up restarting Tomcat
+- Tag 544 ORCID identifiers with my script
+- I updated my `generation-loss.sh` and `improved-dspace-thumbnails` scripts to include thirty-five PDFs from CGSpace (up from twenty-four) to get a larger sample
+ - Now starting to get some numbers comparing JPEG, WebP, and AVIF
+ - First, out of curiousity, I checked the average ssimulacra2 scores at Q75, Q80, and Q92 for each format:
+
+| | Q75 | Q80 | Q92 |
+|------|-----|-----|-----|
+| JPEG | 70 | 73 | 88 |
+| WebP | 73 | 76 | 82 |
+| AVIF | 82 | 83 | 92 |
+
+- Then I checked the quality and file size (bytes) needed to hit an average ssimulacra2 score of 80 with each format:
+ - **JPEG**: Q89, 124596 bytes
+ - **WebP**: Q88, 84935 bytes (32% smaller than JPEG size)
+ - **AVIF**: Q62, 60347 bytes (52% smaller than JPEG size)
+- [Google's original WebP study](https://developers.google.com/speed/webp/docs/webp_study) uses this technique to compare WebP to JPEG too
+ - As the quality settings are not comparable between formats, we need to compare the formats at matching perceptual scores (ssimulacra2 in this case)
+ - I used a ssimulacra2 score of 80 because that's the about the highest score I see with WebP using my samples, though JPEG and AVIF do go higher
+ - Also, according to current ssimulacra2 (v2.1), a score of 70 is "high quality" and a score of 90 is "very high quality", so 80 should be reasonably high enough...
+- Export CGSpace to check for missing Initiatives mappings
+
diff --git a/docs/2022-10/index.html b/docs/2022-10/index.html
index 7f169d4d3..9a0db791e 100644
--- a/docs/2022-10/index.html
+++ b/docs/2022-10/index.html
@@ -20,7 +20,7 @@ I filed an issue to ask about Java 11+ support
-
+
@@ -48,7 +48,7 @@ I filed an issue to ask about Java 11+ support
"url": "https://alanorth.github.io/cgspace-notes/2022-10/",
"wordCount": "3768",
"datePublished": "2022-10-01T19:45:36+03:00",
- "dateModified": "2022-10-31T16:59:47+03:00",
+ "dateModified": "2023-04-18T11:08:15-07:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
diff --git a/docs/2023-04/index.html b/docs/2023-04/index.html
index 5561312cc..c08f8bfac 100644
--- a/docs/2023-04/index.html
+++ b/docs/2023-04/index.html
@@ -20,7 +20,7 @@ Start a harvest on AReS
-
+
@@ -46,9 +46,9 @@ Start a harvest on AReS
"@type": "BlogPosting",
"headline": "April, 2023",
"url": "https://alanorth.github.io/cgspace-notes/2023-04/",
- "wordCount": "1556",
+ "wordCount": "1970",
"datePublished": "2023-04-02T08:19:36+03:00",
- "dateModified": "2023-04-06T16:13:30+03:00",
+ "dateModified": "2023-04-18T11:08:15-07:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@@ -573,8 +573,11 @@ Start a harvest on AReS
-
$ psql -d dspace -c "update bundle set primary_bitstream_id=NULL where primary_bitstream_id in ('a7ddf477-1c04-4de0-9c7a-4d3c84a875bc', '9582b661-9c2d-4c86-be22-c3b0942b646a', '210a4d5d-3af9-46f0-84cc-682dd1431762')"
-
2023-04-18
+$ psql -d dspace -c "update bundle set primary_bitstream_id=NULL where primary_bitstream_id in ('a7ddf477-1c04-4de0-9c7a-4d3c84a875bc', '9582b661-9c2d-4c86-be22-c3b0942b646a', '210a4d5d-3af9-46f0-84cc-682dd1431762', '51115f07-0a60-4988-8536-b9ebd2a5e15e', '0fc5021d-3264-413a-b2e2-74bda38a394e', '4704fa62-b8ab-4dfe-b7aa-0e4905f8412a')"
+
+- This process ended up taking a few days because each iteration ran for over four hours before failing on the next UUID, sighhhhh
+
+2023-04-18
- Regarding the item Abenet noticed yesterday that has a blank page and a nullPointerException
@@ -584,6 +587,82 @@ Start a harvest on AReS
+2023-04-19
+
+- I fixed the Bioversity item by deleting the
9781138781276.jpg
bitstream via the REST API
+
+- I think Francesca might have changed the “format” of it?
+- Anyway, this item has a PDF so we have a proper thumbnail and don’t need that other journal cover one
+
+
+- I noticed a URL for this Bioversity item redirects incorrectly
+
+- I had mentioned this to Maria and Francesca a few months ago but it seems to never have been resolved
+
+
+- The
dspace cleanup -v
finally finished after a few days of running and stopping…
+- I decided to update the thumbnails in the Bioversity books collection because I saw a few old ones suffering from the CropBox issue
+- Also, all day there’s been a high load on CGSpace, with lots of locks in PostgreSQL
+
+- I had been waiting until the bitstream cleanup finished… now I might need to restart PostgreSQL to kill some old locks as something needs to give
+- I restarted PostgreSQL, but DSpace was still hanging on simple XMLUI options so I ended up restarting Tomcat
+
+
+- Tag 544 ORCID identifiers with my script
+- I updated my
generation-loss.sh
and improved-dspace-thumbnails
scripts to include thirty-five PDFs from CGSpace (up from twenty-four) to get a larger sample
+
+- Now starting to get some numbers comparing JPEG, WebP, and AVIF
+- First, out of curiousity, I checked the average ssimulacra2 scores at Q75, Q80, and Q92 for each format:
+
+
+
+
+
+
+ |
+Q75 |
+Q80 |
+Q92 |
+
+
+
+
+JPEG |
+70 |
+73 |
+88 |
+
+
+WebP |
+73 |
+76 |
+82 |
+
+
+AVIF |
+82 |
+83 |
+92 |
+
+
+
+
+- Then I checked the quality and file size (bytes) needed to hit an average ssimulacra2 score of 80 with each format:
+
+- JPEG: Q89, 124596 bytes
+- WebP: Q88, 84935 bytes (32% smaller than JPEG size)
+- AVIF: Q62, 60347 bytes (52% smaller than JPEG size)
+
+
+- Google’s original WebP study uses this technique to compare WebP to JPEG too
+
+- As the quality settings are not comparable between formats, we need to compare the formats at matching perceptual scores (ssimulacra2 in this case)
+- I used a ssimulacra2 score of 80 because that’s the about the highest score I see with WebP using my samples, though JPEG and AVIF do go higher
+- Also, according to current ssimulacra2 (v2.1), a score of 70 is “high quality” and a score of 90 is “very high quality”, so 80 should be reasonably high enough…
+
+
+- Export CGSpace to check for missing Initiatives mappings
+
diff --git a/docs/categories/index.html b/docs/categories/index.html
index 455017e99..c2ea535dd 100644
--- a/docs/categories/index.html
+++ b/docs/categories/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html
index 70bdadaf7..c196b97b7 100644
--- a/docs/categories/notes/index.html
+++ b/docs/categories/notes/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html
index 63f0ad5de..9be28ec6a 100644
--- a/docs/categories/notes/page/2/index.html
+++ b/docs/categories/notes/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html
index 3482d980e..57cf29998 100644
--- a/docs/categories/notes/page/3/index.html
+++ b/docs/categories/notes/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html
index 57a82a49c..633fd3999 100644
--- a/docs/categories/notes/page/4/index.html
+++ b/docs/categories/notes/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html
index 562631efc..f6dd8911f 100644
--- a/docs/categories/notes/page/5/index.html
+++ b/docs/categories/notes/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/6/index.html b/docs/categories/notes/page/6/index.html
index abdcb2aed..b94b96287 100644
--- a/docs/categories/notes/page/6/index.html
+++ b/docs/categories/notes/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/7/index.html b/docs/categories/notes/page/7/index.html
index 7b543a353..94f335655 100644
--- a/docs/categories/notes/page/7/index.html
+++ b/docs/categories/notes/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/index.html b/docs/index.html
index b37c67bbb..4793c8800 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/10/index.html b/docs/page/10/index.html
index 82095e5cc..a1cb5b1ff 100644
--- a/docs/page/10/index.html
+++ b/docs/page/10/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/2/index.html b/docs/page/2/index.html
index 6a727495d..4e25562cb 100644
--- a/docs/page/2/index.html
+++ b/docs/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/3/index.html b/docs/page/3/index.html
index e1774e315..9ef38a8d5 100644
--- a/docs/page/3/index.html
+++ b/docs/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/4/index.html b/docs/page/4/index.html
index e1abc36ef..810ece543 100644
--- a/docs/page/4/index.html
+++ b/docs/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/5/index.html b/docs/page/5/index.html
index e722a35ba..e5bfa3323 100644
--- a/docs/page/5/index.html
+++ b/docs/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/6/index.html b/docs/page/6/index.html
index b0926b98b..f1585186b 100644
--- a/docs/page/6/index.html
+++ b/docs/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/7/index.html b/docs/page/7/index.html
index a6983d222..1939151da 100644
--- a/docs/page/7/index.html
+++ b/docs/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/8/index.html b/docs/page/8/index.html
index da2c3c382..c8828e061 100644
--- a/docs/page/8/index.html
+++ b/docs/page/8/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/9/index.html b/docs/page/9/index.html
index dbb50a8e3..156e00296 100644
--- a/docs/page/9/index.html
+++ b/docs/page/9/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/index.html b/docs/posts/index.html
index 6c737bfb6..31b85a715 100644
--- a/docs/posts/index.html
+++ b/docs/posts/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/10/index.html b/docs/posts/page/10/index.html
index 8e666a73f..3eae5d4c0 100644
--- a/docs/posts/page/10/index.html
+++ b/docs/posts/page/10/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html
index 537c91d3d..d73995049 100644
--- a/docs/posts/page/2/index.html
+++ b/docs/posts/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html
index 6d571ba73..d70bb6b6a 100644
--- a/docs/posts/page/3/index.html
+++ b/docs/posts/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html
index 169ff1d0a..164593baf 100644
--- a/docs/posts/page/4/index.html
+++ b/docs/posts/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html
index 75722c821..9cbb205e9 100644
--- a/docs/posts/page/5/index.html
+++ b/docs/posts/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html
index a057f5d79..e551df1ba 100644
--- a/docs/posts/page/6/index.html
+++ b/docs/posts/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html
index 65d284350..81fd85853 100644
--- a/docs/posts/page/7/index.html
+++ b/docs/posts/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/8/index.html b/docs/posts/page/8/index.html
index 3ef970059..e040082a3 100644
--- a/docs/posts/page/8/index.html
+++ b/docs/posts/page/8/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/9/index.html b/docs/posts/page/9/index.html
index ea0d0bb25..f79f7204f 100644
--- a/docs/posts/page/9/index.html
+++ b/docs/posts/page/9/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index 8e2d734e5..dced67b39 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -3,19 +3,19 @@
xmlns:xhtml="http://www.w3.org/1999/xhtml">
https://alanorth.github.io/cgspace-notes/2023-04/
- 2023-04-06T16:13:30+03:00
+ 2023-04-18T11:08:15-07:00
https://alanorth.github.io/cgspace-notes/categories/
- 2023-04-06T16:13:30+03:00
+ 2023-04-18T11:08:15-07:00
https://alanorth.github.io/cgspace-notes/
- 2023-04-06T16:13:30+03:00
+ 2023-04-18T11:08:15-07:00
https://alanorth.github.io/cgspace-notes/categories/notes/
- 2023-04-06T16:13:30+03:00
+ 2023-04-18T11:08:15-07:00
https://alanorth.github.io/cgspace-notes/posts/
- 2023-04-06T16:13:30+03:00
+ 2023-04-18T11:08:15-07:00
https://alanorth.github.io/cgspace-notes/2023-03/
2023-04-02T09:16:25+03:00
@@ -33,7 +33,7 @@
2023-01-04T10:53:02+03:00
https://alanorth.github.io/cgspace-notes/2022-10/
- 2022-10-31T16:59:47+03:00
+ 2023-04-18T11:08:15-07:00
https://alanorth.github.io/cgspace-notes/2022-09/
2022-09-30T17:29:50+03:00