Add notes for 2023-04-20

This commit is contained in:
2023-04-20 22:44:18 -07:00
parent b024eb1f94
commit c20f1e1f89
32 changed files with 156 additions and 42 deletions

View File

@ -438,9 +438,11 @@ $ psql < locks-age.sql | grep -E "[[:digit:]] days" | awk -F\| '{print $10}' | s
- I ended up with a long list of UUIDs to fix before the script would complete:
```console
$ psql -d dspace -c "update bundle set primary_bitstream_id=NULL where primary_bitstream_id in ('a7ddf477-1c04-4de0-9c7a-4d3c84a875bc', '9582b661-9c2d-4c86-be22-c3b0942b646a', '210a4d5d-3af9-46f0-84cc-682dd1431762')"
$ psql -d dspace -c "update bundle set primary_bitstream_id=NULL where primary_bitstream_id in ('a7ddf477-1c04-4de0-9c7a-4d3c84a875bc', '9582b661-9c2d-4c86-be22-c3b0942b646a', '210a4d5d-3af9-46f0-84cc-682dd1431762', '51115f07-0a60-4988-8536-b9ebd2a5e15e', '0fc5021d-3264-413a-b2e2-74bda38a394e', '4704fa62-b8ab-4dfe-b7aa-0e4905f8412a')"
```
- This process ended up taking a few days because each iteration ran for over four hours before failing on the next UUID, sighhhhh
## 2023-04-18
- Regarding the item Abenet noticed yesterday that has a blank page and a nullPointerException
@ -448,4 +450,37 @@ $ psql -d dspace -c "update bundle set primary_bitstream_id=NULL where primary_b
- And according to the REST API on CGSpace the item was modified on 2023-04-11, so last week...
- According to the DSpace logs it was Francesca who edited the item last week, so I asked her for more information before I troubleshoot more
## 2023-04-19
- I fixed the Bioversity item by deleting the `9781138781276.jpg` bitstream via the REST API
- I *think* Francesca might have changed the "format" of it?
- Anyway, this item has a PDF so we have a proper thumbnail and don't need that other journal cover one
- I noticed a URL for this [Bioversity item](https://hdl.handle.net/10568/89049) redirects incorrectly
- I had mentioned this to Maria and Francesca a few months ago but it seems to never have been resolved
- The `dspace cleanup -v` finally finished after a few days of running and stopping...
- I decided to update the thumbnails in the Bioversity books collection because I saw a few old ones suffering from the CropBox issue
- Also, all day there's been a high load on CGSpace, with lots of locks in PostgreSQL
- I had been waiting until the bitstream cleanup finished... now I might need to restart PostgreSQL to kill some old locks as something needs to give
- I restarted PostgreSQL, but DSpace was still hanging on simple XMLUI options so I ended up restarting Tomcat
- Tag 544 ORCID identifiers with my script
- I updated my `generation-loss.sh` and `improved-dspace-thumbnails` scripts to include thirty-five PDFs from CGSpace (up from twenty-four) to get a larger sample
- Now starting to get some numbers comparing JPEG, WebP, and AVIF
- First, out of curiousity, I checked the average ssimulacra2 scores at Q75, Q80, and Q92 for each format:
| | Q75 | Q80 | Q92 |
|------|-----|-----|-----|
| JPEG | 70 | 73 | 88 |
| WebP | 73 | 76 | 82 |
| AVIF | 82 | 83 | 92 |
- Then I checked the quality and file size (bytes) needed to hit an average ssimulacra2 score of 80 with each format:
- **JPEG**: Q89, 124596 bytes
- **WebP**: Q88, 84935 bytes (32% smaller than JPEG size)
- **AVIF**: Q62, 60347 bytes (52% smaller than JPEG size)
- [Google's original WebP study](https://developers.google.com/speed/webp/docs/webp_study) uses this technique to compare WebP to JPEG too
- As the quality settings are not comparable between formats, we need to compare the formats at matching perceptual scores (ssimulacra2 in this case)
- I used a ssimulacra2 score of 80 because that's the about the highest score I see with WebP using my samples, though JPEG and AVIF do go higher
- Also, according to current ssimulacra2 (v2.1), a score of 70 is "high quality" and a score of 90 is "very high quality", so 80 should be reasonably high enough...
- Export CGSpace to check for missing Initiatives mappings
<!-- vim: set sw=2 ts=2: -->