CGSpace Notes

Documenting day-to-day work on the CGSpace repository.

October, 2020

2020-10-06

  • Add tests for the new /items POST handlers to the DSpace 6.x branch of my dspace-statistics-api
  • Trying to test the changes Atmire sent last week but I had to re-create my local database from a recent CGSpace dump
    • During the FlywayDB migration I got an error:
2020-10-06 21:36:04,138 ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ Batch entry 0 update public.bitstreamformatregistry set description='Electronic publishing', internal='FALSE', mimetype='application/epub+zip', short_description='EPUB', support_level=1 where bitstream_format_id=78 was aborted: ERROR: duplicate key value violates unique constraint "bitstreamformatregistry_short_description_key"
  Detail: Key (short_description)=(EPUB) already exists.  Call getNextException to see other errors in the batch.
2020-10-06 21:36:04,138 WARN  org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ SQL Error: 0, SQLState: 23505
2020-10-06 21:36:04,138 ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ ERROR: duplicate key value violates unique constraint "bitstreamformatregistry_short_description_key"
  Detail: Key (short_description)=(EPUB) already exists.
2020-10-06 21:36:04,142 ERROR org.hibernate.engine.jdbc.batch.internal.BatchingBatch @ HHH000315: Exception executing batch [could not execute batch]
2020-10-06 21:36:04,143 ERROR org.dspace.storage.rdbms.DatabaseRegistryUpdater @ Error attempting to update Bitstream Format and/or Metadata Registries
org.hibernate.exception.ConstraintViolationException: could not execute batch
	at org.hibernate.exception.internal.SQLStateConversionDelegate.convert(SQLStateConversionDelegate.java:129)
	at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:49)
	at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:124)
	at org.hibernate.engine.jdbc.batch.internal.BatchingBatch.performExecution(BatchingBatch.java:122)
	at org.hibernate.engine.jdbc.batch.internal.BatchingBatch.doExecuteBatch(BatchingBatch.java:101)
	at org.hibernate.engine.jdbc.batch.internal.AbstractBatchImpl.execute(AbstractBatchImpl.java:161)
	at org.hibernate.engine.jdbc.internal.JdbcCoordinatorImpl.executeBatch(JdbcCoordinatorImpl.java:207)
	at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:390)
	at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:304)
	at org.hibernate.event.internal.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:349)
	at org.hibernate.event.internal.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:56)
	at org.hibernate.internal.SessionImpl.flush(SessionImpl.java:1195)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.hibernate.context.internal.ThreadLocalSessionContext$TransactionProtectionWrapper.invoke(ThreadLocalSessionContext.java:352)
	at com.sun.proxy.$Proxy162.flush(Unknown Source)
	at org.dspace.core.HibernateDBConnection.commit(HibernateDBConnection.java:83)
	at org.dspace.core.Context.commit(Context.java:435)
	at org.dspace.core.Context.complete(Context.java:380)
	at org.dspace.administer.MetadataImporter.loadRegistry(MetadataImporter.java:164)
	at org.dspace.storage.rdbms.DatabaseRegistryUpdater.updateRegistries(DatabaseRegistryUpdater.java:72)
	at org.dspace.storage.rdbms.DatabaseRegistryUpdater.afterMigrate(DatabaseRegistryUpdater.java:121)
	at org.flywaydb.core.internal.command.DbMigrate$3.doInTransaction(DbMigrate.java:250)
	at org.flywaydb.core.internal.util.jdbc.TransactionTemplate.execute(TransactionTemplate.java:72)
	at org.flywaydb.core.internal.command.DbMigrate.migrate(DbMigrate.java:246)
	at org.flywaydb.core.Flyway$1.execute(Flyway.java:959)
	at org.flywaydb.core.Flyway$1.execute(Flyway.java:917)
	at org.flywaydb.core.Flyway.execute(Flyway.java:1373)
	at org.flywaydb.core.Flyway.migrate(Flyway.java:917)
	at org.dspace.storage.rdbms.DatabaseUtils.updateDatabase(DatabaseUtils.java:663)
	at org.dspace.storage.rdbms.DatabaseUtils.updateDatabase(DatabaseUtils.java:575)
	at org.dspace.storage.rdbms.DatabaseUtils.updateDatabase(DatabaseUtils.java:551)
	at org.dspace.core.Context.<clinit>(Context.java:103)
	at org.dspace.app.util.AbstractDSpaceWebapp.register(AbstractDSpaceWebapp.java:74)
	at org.dspace.app.util.DSpaceWebappListener.contextInitialized(DSpaceWebappListener.java:31)
	at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:5197)
	at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5720)
	at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183)
	at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:1016)
	at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:992)
  • I checked the database migrations with dspace database info and they were all OK
    • Then I restarted the Tomcat again and it started up OK…
  • There were two issues I had reported to Atmire last month:
    • Importing items from the command line throws a NullPointerException from com.atmire.dspace.cua.CUASolrLoggerServiceImpl for every item, but the item still gets imported
    • No results for author name in Listing and Reports, despite there being hits in Discovery search
  • To test the first one I imported a very simple CSV file with one item with minimal data
    • There is a new error now (but the item does get imported):
$ dspace metadata-import -f /tmp/2020-10-06-import-test.csv -e aorth@mjanja.ch
Loading @mire database changes for module MQM
Changes have been processed
-----------------------------------------------------------
New item:
 + New owning collection (10568/3): ILRI articles in journals
 + Add    (dc.contributor.author): Orth, Alan
 + Add    (dc.date.issued): 2020-09-01
 + Add    (dc.title): Testing CUA import NPE

1 item(s) will be changed

Do you want to make these changes? [y/n] y
-----------------------------------------------------------
New item: aff5e78d-87c9-438d-94f8-1050b649961c (10568/108548)
 + New owning collection  (10568/3): ILRI articles in journals
 + Added   (dc.contributor.author): Orth, Alan
 + Added   (dc.date.issued): 2020-09-01
 + Added   (dc.title): Testing CUA import NPE
Tue Oct 06 22:06:14 CEST 2020 | Query:containerItem:aff5e78d-87c9-438d-94f8-1050b649961c
Error while updating
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Expected mime type application/octet-stream but got text/html. <!doctype html><html lang="en"><head><title>HTTP Status 404 – Not Found</title><style type="text/css">body {font-family:Tahoma,Arial,sans-serif;} h1, h2, h3, b {color:white;background-color:#525D76;} h1 {font-size:22px;} h2 {font-size:16px;} h3 {font-size:14px;} p {font-size:12px;} a {color:black;} .line {height:1px;background-color:#525D76;border:none;}</style></head><body><h1>HTTP Status 404 – Not Found</h1><hr class="line" /><p><b>Type</b> Status Report</p><p><b>Message</b> The requested resource [/solr/update] is not available</p><p><b>Description</b> The origin server did not find a current representation for the target resource or is not willing to disclose that one exists.</p><hr class="line" /><h3>Apache Tomcat/7.0.104</h3></body></html>
        at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:512)
        at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
        at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
        at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
        at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:168)
        at com.atmire.dspace.cua.CUASolrLoggerServiceImpl$5.visit(SourceFile:1131)
        at com.atmire.dspace.cua.CUASolrLoggerServiceImpl.visitEachStatisticShard(SourceFile:212)
        at com.atmire.dspace.cua.CUASolrLoggerServiceImpl.update(SourceFile:1104)
        at com.atmire.dspace.cua.CUASolrLoggerServiceImpl.update(SourceFile:1093)
        at org.dspace.statistics.StatisticsLoggingConsumer.consume(SourceFile:104)
        at org.dspace.event.BasicDispatcher.consume(BasicDispatcher.java:177)
        at org.dspace.event.BasicDispatcher.dispatch(BasicDispatcher.java:123)
        at org.dspace.core.Context.dispatchEvents(Context.java:455)
        at org.dspace.core.Context.commit(Context.java:424)
        at org.dspace.core.Context.complete(Context.java:380)
        at org.dspace.app.bulkedit.MetadataImport.main(MetadataImport.java:1399)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229)
        at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)
  • Also, I tested Listings and Reports and there are still no hits for “Orth, Alan” as a contributor, despite there being dozens of items in the repository and the Solr query generated by Listings and Reports actually returning hits:
2020-10-06 22:23:44,116 INFO org.apache.solr.core.SolrCore @ [search] webapp=/solr path=/select params={q=*:*&fl=handle,search.resourcetype,search.resourceid,search.uniqueid&start=0&fq=NOT(withdrawn:true)&fq=NOT(discoverable:false)&fq=search.resourcetype:2&fq=author_keyword:Orth,\+A.+OR+author_keyword:Orth,\+Alan&fq=dateIssued.year:[2013+TO+2021]&rows=500&wt=javabin&version=2} hits=18 status=0 QTime=10 
  • Solr returns hits=18 for the L&R query, but there are no result shown in the L&R UI
  • I sent all this feedback to Atmire…

2020-10-07

  • Udana from IWMI had asked about stats discrepencies from reports they had generated in previous months or years
    • I told him that we very often purge bots and the number of stats can change drastically
    • Also, I told him that it is not possible to compare stats from previous exports and that the stats should be taking with a grain of salt
  • Testing POSTing items to the DSpace 6 REST API
    • We need to authenticate to get a JSESSIONID cookie first:
$ http -f POST https://dspacetest.cgiar.org/rest/login email=aorth@fuuu.com 'password=fuuuu'
$ http https://dspacetest.cgiar.org/rest/status Cookie:JSESSIONID=EABAC9EFF942028AA52DFDA16DBCAFDE
  • Then we post an item in JSON format to /rest/collections/{uuid}/items:
$ http POST https://dspacetest.cgiar.org/rest/collections/f10ad667-2746-4705-8b16-4439abe61d22/items Cookie:JSESSIONID=EABAC9EFF942028AA52DFDA16DBCAFDE < item-object.json
  • Format of JSON is:
{ "metadata": [
    {
      "key": "dc.title",
      "value": "Testing REST API post",
      "language": "en_US"
    },
    {
      "key": "dc.contributor.author",
      "value": "Orth, Alan",
      "language": "en_US"
    },
    {
      "key": "dc.date.issued",
      "value": "2020-09-01",
      "language": "en_US"
    }
  ],
  "archived":"false",
  "withdrawn":"false"
}
  • What is unclear to me is the archived parameter, it seems to do nothing… perhaps it is only used for the /items endpoint when printing information about an item
    • If I submit to a collection that has a workflow, even as a super admin and with “archived=false” in the JSON, the item enters the workflow (“Awaiting editor’s attention”)
    • If I submit to a new collection without a workflow the item gets archived immediately
    • I created some notes to share with Salem and Leroy for future reference when we start discussion POSTing items to the REST API
  • I created an account for Salem on DSpace Test and added it to the submitters group of an ICARDA collection with no other workflow steps so we can see what happens
    • We are curious to see if he gets a UUID when posting from MEL
  • I did some tests by adding his account to certain workflow steps and trying to POST the item
  • Member of collection “Submitters” step:
    • HTTP Status 401 – Unauthorized
    • The request has not been applied because it lacks valid authentication credentials for the target resource.
  • Member of collection “Accept/Reject” step:
    • Same error…
  • Member of collection “Accept/Reject/Edit Metadata” step:
    • Same error…
  • Member of collection Administrators with no other workflow steps…:
    • Posts straight to archive
  • Member of collection Administrators with empty “Accept/Reject/Edit Metadata” step:
    • Posts straight to archive
  • Member of collection Administrators with populated “Accept/Reject/Edit Metadata” step:
    • Does not post straight to archive, goes to workflow
  • Note that community administrators have no role in item submission other than being able to create/manage collection groups