CGSpace Notes

Documenting day-to-day work on the CGSpace repository.

May, 2017

2017-05-01

2017-05-02

  • Atmire got back about the Workflow Statistics issue, and apparently it’s a bug in the CUA module so they will send us a pull request

2017-05-04

  • Sync DSpace Test with database and assetstore from CGSpace
  • Re-deploy DSpace Test with Atmire’s CUA patch for workflow statistics, run system updates, and restart the server
  • Now I can see the workflow statistics and am able to select users, but everything returns 0 items
  • Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b
  • Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace.cgiar.org/handle/10568/80731

2017-05-05

  • Discovered that CGSpace has ~700 items that are missing the cg.identifier.status field
  • Need to perhaps try using the “required metadata” curation task to find fields missing these items:
$ [dspace]/bin/dspace curate -t requiredmetadata -i 10568/1 -r - > /tmp/curation.out
  • It seems the curation task dies when it finds an item which has missing metadata

2017-05-06

2017-05-07

  • Testing one replacement for CCAFS Flagships (cg.subject.ccafs), first changed in the submission forms, and then in the database:
$ ./fix-metadata-values.py -i ccafs-flagships-may7.csv -f cg.subject.ccafs -t correct -m 210 -d dspace -u dspace -p fuuu
  • Also, CCAFS wants to re-order their flagships to prioritize the Phase II ones
  • Waiting for feedback from CCAFS, then I can merge #320

2017-05-08

  • Start working on CGIAR Library migration
  • We decided to use AIP export to preserve the hierarchies and handles of communities and collections
  • When ingesting some collections I was getting java.lang.OutOfMemoryError: GC overhead limit exceeded, which can be solved by disabling the GC timeout with -XX:-UseGCOverheadLimit
  • Other times I was getting an error about heap space, so I kept bumping the RAM allocation by 512MB each time (up to 4096m!) it crashed
  • This leads to tens of thousands of abandoned files in the assetstore, which need to be cleaned up using dspace cleanup -v, or else you’ll run out of disk space
  • In the end I realized it’s better to use submission mode (-s) to ingest the community object as a single AIP without its children, followed by each of the collections:
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx2048m -XX:-UseGCOverheadLimit"
$ [dspace]/bin/dspace packager -s -o ignoreHandle=false -t AIP -e some@user.com -p 10568/87775 /home/aorth/10947-1/10947-1.zip
$ for collection in /home/aorth/10947-1/COLLECTION@10947-*; do [dspace]/bin/dspace packager -s -o ignoreHandle=false -t AIP -e aorth@mjanja.ch -p 10947/1 $collection; done
$ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done