Add notes for 2022-11-26

This commit is contained in:
2022-11-26 17:38:27 +03:00
parent b5b28f2d78
commit 59cd155eb3
29 changed files with 241 additions and 34 deletions

View File

@ -247,4 +247,110 @@ $ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -
- I did some more work on my `post-ciat-pdfs.py` script and tested uploading the items to my local DSpace and DSpace Test
- Then I ran the script on CGSpace, uploading ~1,500 PDFs to to existing items
## 2022-11-25
- Tony Murray, who is working on IFPRI's CGSpace integration, emailed me to ask some questions about the REST API
- Oh no, I realized there is a logic issue with the PDFbox cropbox code I added a few weeks ago:
```console
$ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" dspace filter-media -p "ImageMagick PDF Thumbnail" -v -f -i 10568/77010
The following MediaFilters are enabled:
Full Filter Name: org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter
org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter
Loading @mire database changes for module MQM
Changes have been processed
IM Thumbnail tropentag2016_marshall.pdf is replacable.
File: tropentag2016_marshall.pdf.jpg
ERROR filtering, skipping bitstream:
Item Handle: 10568/77010
Bundle Name: ORIGINAL
File Size: 1486580
Checksum: 1ad66d918a56a5e84667386e1a32e352 (MD5)
Asset Store: 0
java.lang.IndexOutOfBoundsException: 1-based index out of bounds: 2
java.lang.IndexOutOfBoundsException: 1-based index out of bounds: 2
at org.apache.pdfbox.pdmodel.PDPageTree.get(PDPageTree.java:325)
at org.apache.pdfbox.pdmodel.PDPageTree.get(PDPageTree.java:248)
at org.apache.pdfbox.pdmodel.PDDocument.getPage(PDDocument.java:1543)
at org.dspace.app.mediafilter.ImageMagickThumbnailFilter.getImageFile(ImageMagickThumbnailFilter.java:167)
at org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter.getDestinationStream(ImageMagickPdfThumbnailFilter.java:27)
at com.atmire.dspace.app.mediafilter.AtmireMediaFilter.processBitstream(AtmireMediaFilter.java:103)
at com.atmire.dspace.app.mediafilter.AtmireMediaFilterServiceImpl.filterBitstream(AtmireMediaFilterServiceImpl.java:61)
at org.dspace.app.mediafilter.MediaFilterServiceImpl.filterItem(MediaFilterServiceImpl.java:181)
at org.dspace.app.mediafilter.MediaFilterServiceImpl.applyFiltersItem(MediaFilterServiceImpl.java:159)
at org.dspace.app.mediafilter.MediaFilterCLITool.main(MediaFilterCLITool.java:232)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)
```
- Salem gave me a list of CGSpace collections that have double spaces in the names
- Normally this would only be a minor annoyance, but he discovered that the REST API seems to trim the spaces, which causes an issue when trying to reference them!
- He sent me a list of about ten collection UUIDs so I fixed them
- I found a bunch of LIVES presentations on CGSpace that have presentations on SlideShare with incorrect licenses... I updated about fifty of them
## 2022-11-26
- Sync DSpace Test with CGSpace
- I increased the session timeout in Tomcat from thirty minutes to sixty, as requested by Maria a few weeks ago
- See: https://gitlab.inf.unibz.it/commul/docker/clarin-dspace/-/issues/44
- I re-built DSpace on CGSpace, ran all updates, and rebooted the machine
- Then after coming back up the handle server won't start
- The `handle-server.log` file shows:
```console
Shutting down...
"2022/11/26 02:12:17 CET" 25 Rotating log files
Error: null
(see the error log for details.)
```
- In the `error.log` file I see:
```console
"2022/11/26 02:12:18 CET" 25 Started new run.
java.lang.UnsupportedOperationException
at java.lang.Runtime.runFinalizersOnExit(Runtime.java:287)
at java.lang.System.runFinalizersOnExit(System.java:1059)
at net.handle.server.Main.initialize(Main.java:124)
at net.handle.server.Main.main(Main.java:75)
Shutting down...
```
- Ah, it seems to be due to an [issue in OpenJDK 1.8.0_352](https://groups.google.com/g/dspace-tech/c/PqjfA5mqG4w/m/FhxI5oXhFwAJ?pli=1)
- I see the server upgraded to the new JDK version on 2022-11-10:
```console
Upgrade: openjdk-8-jdk-headless:amd64 (8u342-b07-0ubuntu1~20.04, 8u352-ga-1~20.04), openjdk-8-jre-headless:amd64 (8u342-b07-0ubuntu1~20.04, 8u352-ga-1~20.04)
End-Date: 2022-11-10 04:10:45
```
- As highlighted in the dspace-tech mailing list thread above, [this OpenJDK release deprecated `Runtime.runFinalizersOnExit`](https://mail.openjdk.org/pipermail/jdk8u-dev/2022-October/015706.html):
```console
- JDK-8287132: Retire Runtime.runFinalizersOnExit so that it always throws UOE
```
- I downloaded the previous versions of the packages from Launchpad:
```console
# wget https://launchpad.net/~openjdk-security/+archive/ubuntu/ppa/+build/24195357/+files/openjdk-8-jdk-headless_8u342-b07-0ubuntu1~20.04_amd64.deb
# wget https://launchpad.net/~openjdk-security/+archive/ubuntu/ppa/+build/24195357/+files/openjdk-8-jre-headless_8u342-b07-0ubuntu1~20.04_amd64.deb
# dpkg -i openjdk-8-j*8u342-b07*.deb
```
- Then the handle-server process starts up fine, so I held these OpenJDK versions for now:
```console
# apt-mark hold openjdk-8-jdk-headless:amd64 apt-mark hold openjdk-8-jre-headless:amd64
openjdk-8-jdk-headless set on hold.
openjdk-8-jre-headless set on hold.
```
- Start a harvest on AReS
<!-- vim: set sw=2 ts=2: -->