CGSpace Notes

Documenting day-to-day work on the CGSpace repository.

March, 2021

2021-03-01

  • Discuss some OpenRXV issues with Abdullah from CodeObia
    • He’s trying to work on the DSpace 6+ metadata schema autoimport using the DSpace 6+ REST API
    • Also, we found some issues building and running OpenRXV currently due to ecosystem shift in the Node.js dependencies

2021-03-02

2021-03-03

2021-03-04

  • Peter is having issues with the workflow since yesterday
    • I looked at the Munin stats and see a high number of database locks since yesterday

PostgreSQL locks week PostgreSQL connections week

  • I looked at the number of connections in PostgreSQL and it’s definitely high again:
$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
1020
  • I reported it to Atmire to take a look, on the same issue we had been tracking this before
  • Abenet asked me to add a new ORCID for ILRI staff member Zoe Campbell
  • I added it to the controlled vocabulary and then tagged her existing items on CGSpace using my add-orcid-identifier.py script:
$ cat 2021-03-04-add-zoe-campbell-orcid.csv 
dc.contributor.author,cg.creator.identifier
"Campbell, Zoƫ","Zoe Campbell: 0000-0002-4759-9976"
"Campbell, Zoe A.","Zoe Campbell: 0000-0002-4759-9976"
$ ./ilri/add-orcid-identifiers-csv.py -i 2021-03-04-add-zoe-campbell-orcid.csv -db dspace -u dspace -p 'fuuu'
  • I still need to do cleanup on the journal articles metadata
    • Peter sent me some cleanups but I can’t use them in the search/replace format he gave
    • I think it’s better to export the metadata values with IDs and import cleaned up ones as CSV
localhost/dspace63= > \COPY (SELECT dspace_object_id AS id, text_value as "cg.journal" FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=251) to /tmp/2021-02-24-journals.csv WITH CSV HEADER;
COPY 32087
  • I used OpenRefine to remove all journal values that didn’t have one of these values: ; ( )
    • Then I cloned the cg.journal field to cg.volume and cg.issue
    • I used some GREL expressions like these to extract the journal name, volume, and issue:
value.partition(';')[0].trim() # to get journal names
value.partition(/[0-9]+\([0-9]+\)/)[1].replace(/^(\d+)\(\d+\)/,"$1") # to get journal volumes
value.partition(/[0-9]+\([0-9]+\)/)[1].replace(/^\d+\((\d+)\)/,"$1") # to get journal issues
  • Then I uploaded the changes to CGSpace using dspace metadata-import
  • Margarita from CCAFS was asking about an error deleting some items that were showing up in Google and should have been private
  • Yesterday Abenet added me to a WLE collection approver/editer steps so we can try to figure out why Niroshini is having issues adding metadata to Udana’s submissions
    • I edited Udana’s submission to CGSpace:
      • corrected the title
      • added language English
      • changed the link to the external item page instead of PDF
      • added SDGs from the external item page
      • added AGROVOC subjects from the external item page
      • added pagination (extent)
      • changed the license to “other” because CC-BY-NC-ND is not printed anywhere in the PDF or external item page