Add notes for 2022-10-28

This commit is contained in:
2022-10-28 13:17:35 +03:00
parent 189f33e1ce
commit 3633377854
29 changed files with 205 additions and 34 deletions

View File

@ -672,4 +672,89 @@ $ pngquant /tmp/10568-125167.pdf.png
- Spent some time looking at the MediaBox / CropBox thing in DSpace's `ImageMagickThumbnailFilter.java`
- We need to make sure to put `-define pdf:use-cropbox=true` before we specify the input file or else it will not have any effect
## 2022-10-27
- I found out that we can use [pdfcpu to remove the CropBox from a PDF](https://pdfcpu.io/boxes/boxes_remove.html#examples) for testing:
```console
$ pdfcpu box rem -- "crop" in.pdf out.pdf
```
- I filed [an issue on DSpace](https://github.com/DSpace/DSpace/issues/8549) for the ImageMagick `CropBox` problem
- I decided that this is a bug that should be fixed separately from the "improving thumbnail quality" issue
- I made [a pull request](https://github.com/DSpace/DSpace/pull/8550) to fix the `CropBox` issue
- I did more work on my [improved-dspace-thumbnails](https://github.com/alanorth/improved-dspace-thumbnails/) microsite to complement the DSpace thumbnail pull requests
- I am updating it to recommend using the PDF cropbox and "supersampling" with a higher density than 72
- I measured execution time of ImageMagick with `time` and found that the higher-density mode takes about five times longer on average
- I measured the [maximum heap memory of ImageMagick with Valgrind and Massif](https://stackoverflow.com/a/131346):
```console
$ valgrind --tool=massif magick convert ...
```
- Then I checked the results for each set of default DSpace thumbnail runs and "improved" thumbnail runs using `ms_print` (hacky way to get the max heap, I know):
```console
$ for file in memory-dspace/massif.out.49*; do ms_print "$file" | grep -A1 " MB" | tail -n1 | sed 's/\^.*//'; done
15.87
16.06
21.26
15.88
20.01
15.85
20.06
16.04
15.87
15.87
20.02
15.87
15.86
19.92
10.89
$ for file in memory-improved/massif.out.5*; do ms_print "$file" | grep -A1 " MB" | tail -n1 | sed 's/\^.*//'; done
245.3
245.5
298.6
245.3
306.8
245.2
306.9
245.5
245.2
245.3
306.8
245.3
244.9
306.3
165.6
```
- Ouch, this shows that it takes about *fifteen times* more memory to do the "4x" density of 288!
- It seems more reasonable to use a "2x" density of 144:
```console
$ for file in memory-improved-144/*; do ms_print "$file" | grep -A1 " MB" | tail -n1 | sed 's/\^.*//'; done
61.80
62.00
76.76
61.82
77.43
61.77
77.48
61.98
61.76
61.81
77.44
61.81
61.69
77.16
41.84
```
- There's a really cool visualizer called massif-visualizer, but it isn't easy to parse
## 2022-10-28
- I finalized the code for the ImageMagick density change and made a [pull request](https://github.com/DSpace/DSpace/pull/8553) against DSpace 7.x
<!-- vim: set sw=2 ts=2: -->