Add notes for 2023-11-11

This commit is contained in:
2023-11-13 16:54:36 +03:00
parent 01fb17950b
commit d14dd7114a
137 changed files with 259 additions and 172 deletions

View File

@ -90,5 +90,45 @@ $ dspace index-discovery
```
- I don't know what is happening... I will increase the heap size from 6144m to 7168m again...
- I did some work on the value mappings in AReS
- I wanted to test the import/export feature, and found that I could get a JSON and convert it to CSV for manipulation in OpenRefine
- Importing duplicates records, so I deleted and re-created the index in Elasticsearch first
- Then I started a new harvest on AReS to make sure the mappings are applied
## 2023-11-09
- Ryan asked me for help uploading a large PDF to CGSpace
- I tried my usual GhostScript preprint invocation and found the size decrease significantly, but some minor artifacts appeared in the images
- Interestingly, the [GhostScript docs](https://ghostscript.com/docs/9.54.0/VectorDevices.htm) mention that `prepress` doesn't give the best results:
> Please be aware that the /prepress setting does not indicate the highest quality conversion. Using any of these presets will involve altering the input, and as such may result in a PDF of poorer quality (compared to the input) than simply using the defaults. The 'best' quality (where best means closest to the original input) is obtained by not setting this parameter at all (or by using /default).
- Also, I found [a question on StackOverflow discussing some further techniques for PDFs with images](https://stackoverflow.com/questions/40849325/ghostscript-pdfwrite-specify-jpeg-quality):
```console
$ gs -sOutputFile=137166-default-dct.pdf -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dNOPAUSE -dBATCH -dPDFSETTINGS=/default -c "<< /ColorACSImageDict << /VSamples [ 1 1 1 1 ] /HSam
ples [ 1 1 1 1 ] /QFactor 0.08 /Blend 1 >> /ColorImageDownsampleType /Bicubic /ColorConversionStrategy /LeaveColorUnchanged >> setdistillerparams" -f 137166.pdf
```
- This looks much better, and is still much smaller than the original
- Also, I used `pdfimages` to extract all the images from the original and the one above and found:
```console
$ du -sh images-*
886M images-default-dct
1012M images-original
```
- And from [WeCompress's analysis](https://www.wecompress.com/en/analyze) I see that the images are 85% of the size of the PDF
## 2023-11-10
- I finished checking the IFPRI Slideshare records and added some tagging of countries, regions, and CRPs and then uploaded them to CGSpace
## 2023-11-11
- Salem fixed a bug on OpenRXV that was splitting country values by "," before matching them with ISO countries
- I exported CGSpace to check for missing Initiative collection mappings
- Start a fresh harvest on AReS
<!-- vim: set sw=2 ts=2: -->