CGSpace Notes

Documenting day-to-day work on the CGSpace repository.

October, 2019

2019-10-01

  • Udana from IWMI asked me for a CSV export of their community on CGSpace

    • I exported it, but a quick run through the csv-metadata-quality tool shows that there are some low-hanging fruits we can fix before I send him the data
    • I will limit the scope to the titles, regions, subregions, and river basins for now to manually fix some non-breaking spaces (U+00A0) there that would otherwise be removed by the csv-metadata-quality script’s “unneccesary Unicode” fix:

      $ csvcut -c 'id,dc.title[en_US],cg.coverage.region[en_US],cg.coverage.subregion[en_US],cg.river.basin[en_US]' ~/Downloads/10568-16814.csv > /tmp/iwmi-title-region-subregion-river.csv
      
    • Then I replace them in vim with :% s/\%u00a0/ /g because I can’t figure out the correct sed syntax to do it directly from the pipe above

    • I uploaded those to CGSpace and then re-exported the metadata

    • Now that I think about it, I shouldn’t be removing non-breaking spaces (U+00A0), I should be replacing them with normal spaces!

    • I modified the script so it replaces the non-breaking spaces instead of removing them

    • Then I ran the csv-metadata-quality script to do some general cleanups (though I temporarily commented out the whitespace fixes because it was too many thousands of rows):

      $ csv-metadata-quality -i ~/Downloads/10568-16814.csv -o /tmp/iwmi.csv -x 'dc.date.issued,dc.date.issued[],dc.date.issued[en_US]' -u
      
    • That fixed 153 items (unnecessary Unicode, duplicates, comma–space fixes, etc)

  • Release version 0.3.1 of the csv-metadata-quality script with the non-breaking spaces change

2019-10-03

  • Upload the 117 IITA records that we had been working on last month (aka 20196th.xls aka Sept 6) to CGSpace

2019-10-04

  • Create an account for Bioversity’s ICT consultant Francesco on DSpace Test:

    $ dspace user -a -m blah@mail.it -g Francesco -s Vernocchi -p 'fffff'
    
  • Email Francesca and Carol to ask for follow up about the test upload I did on 2019-09-21

    • I suggested that if they still want to do value addition of those records (like adding countries, regions, etc) that they could maybe do it after we migrate the records to CGSpace
    • Carol responded to tell me where to map the items with type Brochure, Journal Item, and Thesis, so I applied them to the collection on DSpace Test