CGSpace Notes

Documenting day-to-day work on the CGSpace repository.

July, 2017

2017-07-01

  • Run system updates and reboot DSpace Test

2017-07-04

  • Merge changes for WLE Phase II theme rename (#329)
  • Looking at extracting the metadata registries from ICARDA’s MEL DSpace database so we can compare fields with CGSpace
  • We can use PostgreSQL’s extended output format (-x) plus sed to format the output into quasi XML:

$ psql dspacenew -x -c 'select element, qualifier, scope_note from metadatafieldregistry where metadata_schema_id=5 order by element, qualifier;' | sed -r 's:^-\[ RECORD (.*) \]-+$:</dc-type>\n<dc-type>\n<schema>cg</schema>:;s:([^ ]*) +\| (.*):  <\1>\2</\1>:;s:^$:</dc-type>:;1s:</dc-type>\n::'
  • The sed script is from a post on the PostgreSQL mailing list
  • Abenet says the ILRI board wants to be able to have “lead author” for every item, so I’ve whipped up a WIP test in the 5_x-lead-author branch
  • It works but is still very rough and we haven’t thought out the whole lifecycle yet

Testing lead author in submission form

  • I assume that “lead author” would actually be the first question on the item submission form
  • We also need to check to see which ORCID authority core this uses, because it seems to be using an entirely new one rather than the one for dc.contributor.author (which makes sense of course, but fuck, all the author problems aren’t bad enough?!)
  • Also would need to edit XMLUI item displays to incorporate this into authors list
  • And fuck, then anyone consuming our data via REST / OAI will not notice that we have an author outside of dc.contributor.authors… ugh
  • What if we modify the item submission form to use type-bind fields to show/hide certain fields depending on the type?

2017-07-05

  • Adjust WLE Research Theme to include both Phase I and II on the submission form according to editor feedback (#330)
  • Generate list of fields in the current CGSpace cg scheme so we can record them properly in the metadata registry:
$ psql dspace -x -c 'select element, qualifier, scope_note from metadatafieldregistry where metadata_schema_id=2 order by element, qualifier;' | sed -r 's:^-\[ RECORD (.*) \]-+$:</dc-type>\n<dc-type>\n<schema>cg</schema>:;s:([^ ]*) +\| (.*):  <\1>\2</\1>:;s:^$:</dc-type>:;1s:</dc-type>\n::' > cg-types.xml
  • CGSpace was unavailable briefly, and I saw this error in the DSpace log file:
2017-07-05 13:05:36,452 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserved for non-replication superuser connections
  • Looking at the pg_stat_activity table I saw there were indeed 98 active connections to PostgreSQL, and at this time the limit is 100, so that makes sense
  • Tsega restarted Tomcat and it’s working now
  • Abenet said she was generating a report with Atmire’s CUA module, so it could be due to that?
  • Looking in the logs I see this random error again that I should report to DSpace:
2017-07-05 13:50:07,196 ERROR org.dspace.statistics.SolrLogger @ COUNTRY ERROR: EU
  • Seems to come from dspace-api/src/main/java/org/dspace/statistics/SolrLogger.java

2017-07-06

  • Sisay tried to help by making a pull request for the RTB flagships but there are formatting errors, unrelated changes, and the flagship names are not in the style I requested
  • Abenet talked to CIP and they said they are actually ok with using collection names rather than adding a new metadata field

2017-07-13

  • Remove UKaid from the controlled vocabulary for dc.description.sponsorship, as Department for International Development, United Kingdom is the correct form and it is already present (#334)

2017-07-14

  • Sisay sent me a patch to add “Photo Report” to dc.type so I’ve added it to the 5_x-prod branch

2017-07-17

  • Linode shut down our seventeen (17) VMs due to nonpayment of the July 1st invoice
  • It took me a few hours to find the ICT/Finance contacts to pay the bill and boot all the servers back up
  • Since the server was down anyways, I decided to run all system updates and re-deploy CGSpace so that the latest changes to input-forms.xml and the sponsors controlled vocabulary

2017-07-20

  • Skype chat with Addis team about the status of the CGIAR Library migration
  • Need to add the CGIAR System Organization subjects to Discovery Facets (test first)
  • Tentative list of dates for the migration:
    • August 4: aim to finish data cleanup and then give Peter a list of authors
    • August 18: ready to show System Office
    • September 4: all feedback and decisions (including workflows) from System Office
    • September 1011: go live?
  • Talk to Tsega and Danny about exporting/injesting the blog posts from Drupal into DSpace?
  • Followup meeting on August 89?
  • Sent Abenet the 2415 records from CGIAR Library’s Historical Archive (109471) after cleaning up the author authorities and HTML entities in dc.contributor.author and dc.description.abstract using OpenRefine:
    • Authors: value.replace(/::\w{8}-\w{4}-\w{4}-\w{4}-\w{12}::600/,"")
    • Abstracts: replace(value,/<\/?\w+((\s+\w+(\s*=\s*(?:".*?"|'.*?'|[^'">\s]+))?)+\s*|\s*)\/?>/,'')