cgspace-notes/content/posts/2018-12.md

84 lines
5.6 KiB
Markdown
Raw Normal View History

2018-12-02 09:47:41 +01:00
---
title: "December, 2018"
date: 2018-12-02T02:09:30+02:00
author: "Alan Orth"
tags: ["Notes"]
---
## 2018-12-01
- Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK
- I manually installed OpenJDK, then removed Oracle JDK, then re-ran the [Ansible playbook](http://github.com/ilri/rmg-ansible-public) to update all configuration files, etc
- Then I ran all system updates and restarted the server
## 2018-12-02
- I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another [Ghostscript vulnerability last week](https://usn.ubuntu.com/3831-1/)
<!--more-->
- The error when I try to manually run the media filter for one item from the command line:
```
org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: FailedToExecuteCommand `"gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" -dFirstPage=1 -dLastPage=1 "-sOutputFile=/tmp/magick-12989PcFN0DnJOej7%d" "-f/tmp/magick-129895Bmp44lvUfxo" "-f/tmp/magick-12989C0QFG51fktLF"' (-1) @ error/delegate.c/ExternalDelegateCommand/461.
org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: FailedToExecuteCommand `"gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" -dFirstPage=1 -dLastPage=1 "-sOutputFile=/tmp/magick-12989PcFN0DnJOej7%d" "-f/tmp/magick-129895Bmp44lvUfxo" "-f/tmp/magick-12989C0QFG51fktLF"' (-1) @ error/delegate.c/ExternalDelegateCommand/461.
at org.im4java.core.Info.getBaseInfo(Info.java:360)
at org.im4java.core.Info.<init>(Info.java:151)
at org.dspace.app.mediafilter.ImageMagickThumbnailFilter.getImageFile(ImageMagickThumbnailFilter.java:142)
at org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter.getDestinationStream(ImageMagickPdfThumbnailFilter.java:24)
at org.dspace.app.mediafilter.FormatFilter.processBitstream(FormatFilter.java:170)
at org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:475)
at org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:429)
at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:401)
at org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:237)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
```
- A comment on [StackOverflow question](https://stackoverflow.com/questions/53560755/ghostscript-9-26-update-breaks-imagick-readimage-for-multipage-pdf) from yesterday suggests it might be a bug with the `pngalpha` device in Ghostscript and [links to an upstream bug](https://bugs.ghostscript.com/show_bug.cgi?id=699815)
- I think we need to wait for a fix from Ubuntu
2018-12-02 09:57:41 +01:00
- For what it's worth, I get the same error on my local Arch Linux environment with Ghostscript 9.26:
```
$ gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=pngalpha -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r72x72 -dFirstPage=1 -dLastPage=1 -sOutputFile=/tmp/out%d -f/home/aorth/Desktop/Food\ safety\ Kenya\ fruits.pdf
DEBUG: FC_WEIGHT didn't match
zsh: segmentation fault (core dumped) gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000
```
- When I replace the `pngalpha` device with `png16m` as suggested in the StackOverflow comments it works:
```
$ gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=png16m -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r72x72 -dFirstPage=1 -dLastPage=1 -sOutputFile=/tmp/out%d -f/home/aorth/Desktop/Food\ safety\ Kenya\ fruits.pdf
DEBUG: FC_WEIGHT didn't match
```
2018-12-02 09:47:41 +01:00
2018-12-02 16:55:32 +01:00
- Start proofing the latest round of 226 IITA archive records that Bosede sent last week and Sisay uploaded to DSpace Test this weekend ([IITA_Dec_1_1997 aka Daniel1807](https://dspacetest.cgiar.org/handle/10568/108298))
- One item missing the authorship type
- Some invalid countries (smart quotes, mispellings)
- Added countries to some items that mentioned research in particular countries in their abstracts
- One item had "MADAGASCAR" for ISI Journal
- Minor corrections in IITA subject (LIVELIHOOD→LIVELIHOODS)
- Trim whitespace in abstract field
- Fix some sponsors (though some with "Governments of Canada" etc I'm not sure why those are plural)
- Eighteen items had `en||fr` for the language, but the content was only in French so changed them to just `fr`
- Six items had encoding errors in French text so I will ask Bosede to re-do them carefully
- Correct and normalize a few AGROVOC subjects
- Expand my "encoding error" detection GREL to include `~` as I saw a lot of that in some copy pasted French text recently:
```
or(
isNotNull(value.match(/.*\uFFFD.*/)),
isNotNull(value.match(/.*\u00A0.*/)),
isNotNull(value.match(/.*\u200A.*/)),
isNotNull(value.match(/.*\u2019.*/)),
isNotNull(value.match(/.*\u00b4.*/)),
isNotNull(value.match(/.*\u007e.*/))
)
```
2018-12-02 09:47:41 +01:00
<!-- vim: set sw=2 ts=2: -->