CGSpace Notes

Documenting day-to-day work on the CGSpace repository.

May, 2019

2019-05-01

  • Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace
  • A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
    • Apparently if the item is in the workflowitem table it is submitted to a workflow
    • And if it is in the workspaceitem table it is in the pre-submitted state
  • The item seems to be in a pre-submitted state, so I tried to delete it from there:
dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
DELETE 1
  • But after this I tried to delete the item from the XMLUI and it is still present…
  • I managed to delete the problematic item from the database
    • First I deleted the item’s bitstream in XMLUI and then ran dspace cleanup -v to remove it from the assetstore
    • Then I ran the following SQL:
dspace=# DELETE FROM metadatavalue WHERE resource_id=74648;
dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
dspace=# DELETE FROM item WHERE item_id=74648;
  • Now the item is (hopefully) really gone and I can continue to troubleshoot the issue with REST API’s /items/find-by-metadata-value endpoint
    • Of course I run into another HTTP 401 error when I continue trying the LandPortal search from last month:
$ curl -f -H "Content-Type: application/json" -X POST "http://localhost:8080/rest/items/find-by-metadata-field" -d '{"key":"cg.subject.cpwf", "value":"WATER MANAGEMENT","language": "en_US"}'
curl: (22) The requested URL returned error: 401 Unauthorized
  • The DSpace log shows the item ID (because I modified the error text):
2019-05-01 11:41:11,069 ERROR org.dspace.rest.ItemsResource @ User(anonymous) has not permission to read item(id=77708)!
  • If I delete that one I get another, making the list of item IDs so far:
    • 74648
    • 77708
    • 85079
  • Some are in the workspaceitem table (pre-submission), others are in the workflowitem table (submitted), and others are actually approved, but withdrawn…
    • This is actually a worthless exercise because the real issue is that the /items/find-by-metadata-value endpoint is simply designed flawed and shouldn’t be fatally erroring when the search returns items the user doesn’t have permission to access
    • It would take way too much time to try to fix the fucked up items that are in limbo by deleting them in SQL, but also, it doesn’t actually fix the problem because some items are submitted but withdrawn, so they actually have handles and everything
    • I think the solution is to recommend people don’t use the /items/find-by-metadata-value endpoint
  • CIP is asking about embedding PDF thumbnail images in their RSS feeds again
    • They asked in 2018-09 as well and I told them it wasn’t possible
    • To make sure, I looked at the documentation for RSS media feeds and tried it, but couldn’t get it to work
    • It seems to be geared towards iTunes and Podcasts… I dunno
  • CIP also asked for a way to get an XML file of all their RTB journal articles on CGSpace
    • I told them to use the REST API like (where 1179 is the id of the RTB journal articles collection):
https://cgspace.cgiar.org/rest/collections/1179/items?limit=812&expand=metadata