CGSpace Notes

Documenting day-to-day work on the CGSpace repository.

June, 2020

2020-06-01

  • I tried to run the AtomicStatisticsUpdateCLI CUA migration script on DSpace Test (linode26) again and it is still going very slowly and has tons of errors like I noticed yesterday
    • I sent Atmire the dspace.log from today and told them to log into the server to debug the process
  • In other news, I checked the statistics API on DSpace 6 and it’s working
  • I tried to build the OAI registry on the freshly migrated DSpace 6 on DSpace Test and I get an error:
$ dspace oai import -c
OAI 2.0 manager action started
Loading @mire database changes for module MQM
Changes have been processed
Clearing index
Index cleared
Using full import.
Full import
java.lang.NullPointerException
        at org.dspace.xoai.app.XOAI.willChangeStatus(XOAI.java:438)
        at org.dspace.xoai.app.XOAI.index(XOAI.java:368)
        at org.dspace.xoai.app.XOAI.index(XOAI.java:280)
        at org.dspace.xoai.app.XOAI.indexAll(XOAI.java:227)
        at org.dspace.xoai.app.XOAI.index(XOAI.java:134)
        at org.dspace.xoai.app.XOAI.main(XOAI.java:560)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229)
        at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)

2020-06-02

  • I noticed that I was able to do a partial OAI import (ie, without -c)
    • Then I tried to clear the OAI Solr core and import, but I get the same error:
$ curl http://localhost:8080/solr/oai/update -H "Content-type: text/xml" --data-binary '<delete><query>*:*</query></delete>'
$ curl http://localhost:8080/solr/oai/update -H "Content-type: text/xml" --data-binary '<commit />'
$ ~/dspace63/bin/dspace oai import
OAI 2.0 manager action started
...
There are no indexed documents, using full import.
Full import
java.lang.NullPointerException
        at org.dspace.xoai.app.XOAI.willChangeStatus(XOAI.java:438)
        at org.dspace.xoai.app.XOAI.index(XOAI.java:368)
        at org.dspace.xoai.app.XOAI.index(XOAI.java:280)
        at org.dspace.xoai.app.XOAI.indexAll(XOAI.java:227)
        at org.dspace.xoai.app.XOAI.index(XOAI.java:143)
        at org.dspace.xoai.app.XOAI.main(XOAI.java:560)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229)
        at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)
  • I found a bug report on DSpace Jira describing this issue affecting someone else running DSpace 6.3
    • They suspect it has to do with the item having some missing group names in its authorization policies
    • I added some debugging to dspace-oai/src/main/java/org/dspace/xoai/app/XOAI.java to print the Handle of the item that causes the crash and then I looked at its authorization policies
    • Indeed there are some blank group names:

Missing group names in DSpace 6.3 item authorization policy

  • The same item on CGSpace (DSpace 5.8) also has groups with no name:

Missing group names in DSpace 5.8 item authorization policy

  • I added some debugging and found exactly where this happens
    • As it turns out we can just check if the group policy is null there and it allows the OAI import to proceed
    • Aaaaand as it turns out, this was fixed in dspace-6_x in 2018 after DSpace 6.3 was released (see DS-4019), so that was a waste of three hours.
    • I cherry picked 150e83558103ed7f50e8f323b6407b9cbdf33717 into our current 6_x-dev-atmire-modules branch

2020-06-04

  • Maria was asking about some items they are trying to map from the CGIAR Big Data collection into their Alliance of Bioversity and CIAT journal articles collection, but for some reason the items don’t show up in the item mapper
    • The items don’t even show up in the XMLUI Discover advanced search, and actually I don’t even see any recent items on the recently submitted part of the collection (but the item pages exist of course)
    • Perhaps I need to try a full Discovery re-index:
$ time chrt -i 0 ionice -c2 -n7 nice -n19 dspace index-discovery -b

real    125m37.423s
user    11m20.312s
sys     3m19.965s
  • Still I don’t see the item in XMLUI search or in the item mapper (and I made sure to clear the Cocoon cache)
    • I’m starting to think it’s something related to the database transaction issue…
    • I removed our custom JDBC driver from /usr/local/apache-tomcat... so that DSpace will use its own much older one, version 9.1-901-1.jdbc4
    • I ran all system updates on the server (linode18) and rebooted it
    • After it came back up I had to restart Tomcat five times before all Solr statistics cores came up properly
    • Unfortunately this means that the Tomcat JDBC pooling via JNDI doesn’t work, so we’re using only the 30 connections reserved for the DSpace CLI from DSpace’s own internal pool
    • Perhaps our previous issues with the database pool from a few years ago will be less now that we have much more aggressive blocking and rate limiting of bots in nginx
  • I will also import a fresh database snapshot from CGSpace and check if I can map the item in my local environment
    • After importing and forcing a full reindex locally I can see the item in search and in the item mapper
  • Abenet sent another message about two users who are having issues with submission, and I see the number of locks in PostgreSQL has sky rocketed again as of a few days ago:

PostgreSQL locks week

  • As far as I can tell this started happening for the first time in April, connections and locks:

PostgreSQL connections year PostgreSQL locks year

  • I think I need to just leave this as is with the DSpace default JDBC driver for now, but perhaps I could also downgrade the Tomcat version (I deployed Tomcat 7.0.103 in March, so perhaps that’s relevant)
  • Also, I’ll start another full reindexing to see if the issue with mapping is somehow also resolved now that the database connections are working better
    • Perhaps related, but this one finished much faster:
$ time chrt -i 0 ionice -c2 -n7 nice -n19 dspace index-discovery -b

real    101m41.195s
user    10m9.569s
sys     3m13.929s
  • Unfortunately the item is still not showing up in the item mapper…
  • Something happened to AReS Explorer (linode20) so I ran all system updates and rebooted it