CGSpace Notes

Documenting day-to-day work on the CGSpace repository.

July, 2017

2017-07-01

  • Run system updates and reboot DSpace Test

2017-07-04

  • Merge changes for WLE Phase II theme rename (#329)
  • Looking at extracting the metadata registries from ICARDA’s MEL DSpace database so we can compare fields with CGSpace
  • We can use PostgreSQL’s extended output format (-x) plus sed to format the output into quasi XML:
$ psql dspacenew -x -c 'select element, qualifier, scope_note from metadatafieldregistry where metadata_schema_id=5 order by element, qualifier;' | sed -r 's:^-\[ RECORD (.*) \]-+$:</dc-type>\n<dc-type>\n<schema>cg</schema>:;s:([^ ]*) +\| (.*):  <\1>\2</\1>:;s:^$:</dc-type>:;1s:</dc-type>\n::'
  • The sed script is from a post on the PostgreSQL mailing list
  • Abenet says the ILRI board wants to be able to have “lead author” for every item, so I’ve whipped up a WIP test in the 5_x-lead-author branch
  • It works but is still very rough and we haven’t thought out the whole lifecycle yet

Testing lead author in submission form

  • I assume that “lead author” would actually be the first question on the item submission form
  • We also need to check to see which ORCID authority core this uses, because it seems to be using an entirely new one rather than the one for dc.contributor.author (which makes sense of course, but fuck, all the author problems aren’t bad enough?!)
  • Also would need to edit XMLUI item displays to incorporate this into authors list
  • And fuck, then anyone consuming our data via REST / OAI will not notice that we have an author outside of dc.contributor.authors… ugh
  • What if we modify the item submission form to use type-bind fields to show/hide certain fields depending on the type?

2017-07-05

  • Adjust WLE Research Theme to include both Phase I and II on the submission form according to editor feedback (#330)
  • Generate list of fields in the current CGSpace cg scheme so we can record them properly in the metadata registry:
$ psql dspace -x -c 'select element, qualifier, scope_note from metadatafieldregistry where metadata_schema_id=2 order by element, qualifier;' | sed -r 's:^-\[ RECORD (.*) \]-+$:</dc-type>\n<dc-type>\n<schema>cg</schema>:;s:([^ ]*) +\| (.*):  <\1>\2</\1>:;s:^$:</dc-type>:;1s:</dc-type>\n::' > cg-types.xml
  • CGSpace was unavailable briefly, and I saw this error in the DSpace log file:
2017-07-05 13:05:36,452 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserved for non-replication superuser connections
  • Looking at the pg_stat_activity table I saw there were indeed 98 active connections to PostgreSQL, and at this time the limit is 100, so that makes sense
  • Tsega restarted Tomcat and it’s working now
  • Abenet said she was generating a report with Atmire’s CUA module, so it could be due to that?
  • Looking in the logs I see this random error again that I should report to DSpace:
2017-07-05 13:50:07,196 ERROR org.dspace.statistics.SolrLogger @ COUNTRY ERROR: EU
  • Seems to come from dspace-api/src/main/java/org/dspace/statistics/SolrLogger.java

2017-07-06

  • Sisay tried to help by making a pull request for the RTB flagships but there are formatting errors, unrelated changes, and the flagship names are not in the style I requested
  • Abenet talked to CIP and they said they are actually ok with using collection names rather than adding a new metadata field

2017-07-13

  • Remove UKaid from the controlled vocabulary for dc.description.sponsorship, as Department for International Development, United Kingdom is the correct form and it is already present (#334)

2017-07-14

  • Sisay sent me a patch to add “Photo Report” to dc.type so I’ve added it to the 5_x-prod branch

2017-07-17

  • Linode shut down our seventeen (17) VMs due to nonpayment of the July 1st invoice
  • It took me a few hours to find the ICT/Finance contacts to pay the bill and boot all the servers back up
  • Since the server was down anyways, I decided to run all system updates and re-deploy CGSpace so that the latest changes to input-forms.xml and the sponsors controlled vocabulary

2017-07-20

  • Skype chat with Addis team about the status of the CGIAR Library migration
  • Need to add the CGIAR System Organization subjects to Discovery Facets (test first)
  • Tentative list of dates for the migration:
    • August 4: aim to finish data cleanup and then give Peter a list of authors
    • August 18: ready to show System Office
    • September 4: all feedback and decisions (including workflows) from System Office
    • September 10/11: go live?
  • Talk to Tsega and Danny about exporting/injesting the blog posts from Drupal into DSpace?
  • Followup meeting on August 8/9?
  • Sent Abenet the 2415 records from CGIAR Library’s Historical Archive (10947/1) after cleaning up the author authorities and HTML entities in dc.contributor.author and dc.description.abstract using OpenRefine:
    • Authors: value.replace(/::\w{8}-\w{4}-\w{4}-\w{4}-\w{12}::600/,"")
    • Abstracts: replace(value,/<\/?\w+((\s+\w+(\s*=\s*(?:".*?"|'.*?'|[^'">\s]+))?)+\s*|\s*)\/?>/,'')

2017-07-24

  • Move two top-level communities to be sub-communities of ILRI Projects
$ for community in 10568/2347 10568/25209; do /home/cgspace.cgiar.org/bin/dspace community-filiator --set --parent=10568/27629 --child="$community"; done
  • Discuss CGIAR Library data cleanup with Sisay and Abenet

2017-07-27

  • Help Sisay with some transforms to add descriptions to the filename column of some CIAT Presentations he’s working on in OpenRefine
  • Marianne emailed a few days ago to ask why “Integrating Ecosystem Solutions” was not in the list of WLE Phase I Research Themes on the input form
  • I told her that I only added the themes that I saw in the WLE Phase I Research Themes community
  • Then Mia from WLE also emailed to ask where some WLE focal regions went, and I said I didn’t understand what she was talking about, as all we did in our previous work was rename the old “Research Themes” subcommunity to “WLE Phase I Research Themes” and add a new subcommunity for “WLE Phase II Research Themes”.
  • Discuss some modifications to the CCAFS project tags in CGSpace submission form and in the database

2017-07-28

  • Discuss updates to the Phase II CCAFS project tags with Andrea from Macaroni Bros
  • I will do the renaming and untagging of items in CGSpace database, and he will update his webservice with the latest project tags and I will get the XML from here for our input-forms.xml: https://ccafs.cgiar.org/export/ccafsproject

2017-07-29

  • Move some WLE items into appropriate Phase I Research Themes communities and delete some empty collections in WLE Regions community

2017-07-30

  • Start working on CCAFS project tag cleanup
  • More questions about inconsistencies and spelling mistakes in their tags, so I’ve sent some questions for followup

2017-07-31

  • Looks like the final list of metadata corrections for CCAFS project tags will be:
delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-FP4_CRMWestAfrica';
update metadatavalue set text_value='FP3_VietnamLED' where resource_type_id=2 and metadata_field_id=134 and text_value='FP3_VeitnamLED';
update metadatavalue set text_value='PII-FP1_PIRCCA' where resource_type_id=2 and metadata_field_id=235 and text_value='PII-SEA_PIRCCA';
delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-WA_IntegratedInterventions';
  • Now just waiting to run them on CGSpace, and then apply the modified input forms after Macaroni Bros give me an updated list
  • Temporarily increase the nginx upload limit to 200MB for Sisay to upload the CIAT presentations
  • Looking at CGSpace activity page, there are 52 Baidu bots concurrently crawling our website (I copied the activity page to a text file and grep it)!
$ grep 180.76. /tmp/status | awk '{print $5}' | sort | uniq | wc -l
52
  • From looking at the dspace.log I see they are all using the same session, which means our Crawler Session Manager Valve is working