CGSpace Notes

Documenting day-to-day work on the CGSpace repository.

February, 2021


  • Check the results of the AReS harvesting from last night:
$ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*&pretty'
  "count" : 100875,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
January, 2021


  • Peter notified me that some filters on AReS were broken again
    • It’s the same issue with the field names getting .keyword appended to the end that I already filed an issue on OpenRXV about last month
    • I fixed the broken filters (careful to not edit any others, lest they break too!)
  • Fix an issue with start page number for the DSpace REST API and statistics API in OpenRXV
    • The start page had been “1” in the UI, but in the backend they were doing some gymnastics to adjust to the zero-based offset/limit/page of the DSpace REST API and the statistics API
    • I adjusted it to default to 0 and added a note to the admin screen
    • I realized that this issue was actually causing the first page of 100 statistics to be missing…
    • For example, this item has 51 views on CGSpace, but 0 on AReS
December, 2020


  • Atmire responded about the issue with duplicate data in our Solr statistics
    • They noticed that some records in the statistics-2015 core haven’t been migrated with the AtomicStatisticsUpdateCLI tool yet and assumed that I haven’t migrated any of the records yet
    • That’s strange, as I checked all ten cores and 2015 is the only one with some unmigrated documents, as according to the cua_version field
    • I started processing those (about 411,000 records):
November, 2020


  • Continue with processing the statistics-2019 Solr core with the AtomicStatisticsUpdateCLI tool on DSpace Test
    • So far we’ve spent at least fifty hours to process the statistics and statistics-2019 core… wow.
October, 2020


  • Add tests for the new /items POST handlers to the DSpace 6.x branch of my dspace-statistics-api
  • Trying to test the changes Atmire sent last week but I had to re-create my local database from a recent CGSpace dump
    • During the FlywayDB migration I got an error:
September, 2020


  • Replace Marissa van Epp for Rhys Bucknall in the CCAFS groups on CGSpace because Marissa no longer works at CCAFS
  • The AReS Explorer hasn’t updated its index since 2020-08-22 when I last forced it
    • I restarted it again now and told Moayad that the automatic indexing isn’t working
  • Add Alliance of Bioversity International and CIAT to affiliations on CGSpace
  • Abenet told me that the general search text on AReS doesn’t get reset when you use the “Reset Filters” button
  • I filed an issue on OpenRXV to make some minor edits to the admin UI:
August, 2020


  • I spent a few days working on a Java-based curation task to tag items with ISO 3166-1 Alpha2 country codes based on their text values
    • It looks up the names in ISO 3166-1 first, and then in our CGSpace countries mapping (which has five or so of Peter’s preferred “display” country names)
    • It implements a “force” mode too that will clear existing country codes and re-tag everything
    • It is class based so I can easily add support for other vocabularies, and the technique could even be used for organizations with mappings to ROR and Clarisa…
July, 2020


  • A few users noticed that CGSpace wasn’t loading items today, item pages seem blank
    • I looked at the PostgreSQL locks but they don’t seem unusual
    • I guess this is the same “blank item page” issue that we had a few times in 2019 that we never solved
    • I restarted Tomcat and PostgreSQL and the issue was gone
  • Since I was restarting Tomcat anyways I decided to redeploy the latest changes from the 5_x-prod branch and I added a note about COVID-19 items to the CGSpace frontpage at Peter’s request
June, 2020


  • I tried to run the AtomicStatisticsUpdateCLI CUA migration script on DSpace Test (linode26) again and it is still going very slowly and has tons of errors like I noticed yesterday
    • I sent Atmire the dspace.log from today and told them to log into the server to debug the process
  • In other news, I checked the statistics API on DSpace 6 and it’s working
  • I tried to build the OAI registry on the freshly migrated DSpace 6 on DSpace Test and I get an error:
