CGSpace Notes

Documenting day-to-day work on the CGSpace repository.

March, 2019

2019-03-01

  • I checked IITA’s 259 Feb 14 records from last month for duplicates using Atmire’s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good
  • I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc…
  • Looking at the other half of Udana’s WLE records from 2018-11
    • I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)
    • I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items
    • Most worryingly, there are encoding errors in the abstracts for eleven items, for example:
    • 68.15% � 9.45 instead of 68.15% ± 9.45
    • 2003�2013 instead of 2003–2013
  • I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs

2019-03-03

  • Trying to finally upload IITA’s 259 Feb 14 items to CGSpace so I exported them from DSpace Test:
$ mkdir 2019-03-03-IITA-Feb14
$ dspace export -i 10568/108684 -t COLLECTION -m -n 0 -d 2019-03-03-IITA-Feb14
  • As I was inspecting the archive I noticed that there were some problems with the bitsreams:
    • First, Sisay didn’t include the bitstream descriptions
    • Second, only five items had bitstreams and I remember in the discussion with IITA that there should have been nine!
    • I had to refer to the original CSV from January to find the file names, then download and add them to the export contents manually!
  • After adding the missing bitstreams and descriptions manually I tested them again locally, then imported them to a temporary collection on CGSpace:
$ dspace import -a -c 10568/99832 -e aorth@stfu.com -m 2019-03-03-IITA-Feb14.map -s /tmp/2019-03-03-IITA-Feb14
  • DSpace’s export function doesn’t include the collections for some reason, so you need to import them somewhere first, then export the collection metadata and re-map the items to proper owning collections based on their types using OpenRefine or something
  • After re-importing to CGSpace to apply the mappings, I deleted the collection on DSpace Test and ran the dspace cleanup script
  • Merge the IITA research theme changes from last month to the 5_x-prod branch (#413)
    • I will deploy to CGSpace soon and then think about how to batch tag all IITA’s existing items with this metadata
  • Deploy Tomcat 7.0.93 on CGSpace (linode18) after having tested it on DSpace Test (linode19) for a week

2019-03-06

  • Abenet was having problems with a CIP user account, I think that the user could not register
  • I suspect it’s related to the email issue that ICT hasn’t responded about since last week
  • As I thought, I still cannot send emails from CGSpace:
$ dspace test-email

About to send test email:
 - To: blah@stfu.com
 - Subject: DSpace test email
 - Server: smtp.office365.com

Error sending email:
 - Error: javax.mail.AuthenticationFailedException
  • I will send a follow-up to ICT to ask them to reset the password

2019-03-07

  • ICT reset the email password and I confirmed that it is working now
  • Generate a controlled vocabulary of 1187 AGROVOC subjects from the top 1500 that I checked last month, dumping the terms themselves using csvcut and then applying XML controlled vocabulary format in vim and then checking with tidy for good measure:
$ csvcut -c name 2019-02-22-subjects.csv > dspace/config/controlled-vocabularies/dc-contributor-author.xml
$ # apply formatting in XML file
$ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/dc-subject.xml
  • I tested the AGROVOC controlled vocabulary locally and will deploy it on DSpace Test soon so people can see it
  • Atmire noticed my message about the “solr_update_time_stamp” error on the dspace-tech mailing list and created an issue on their tracker to discuss it with me
    • They say the error is harmless, but has nevertheless been fixed in their newer module versions