CGSpace Notes

Documenting day-to-day work on the CGSpace repository.

June, 2021

2021-06-01

  • IWMI notified me that AReS was down with an HTTP 502 error
    • Looking at UptimeRobot I see it has been down for 33 hours, but I never got a notification
    • I don’t see anything in the Elasticsearch container logs, or the systemd journal on the host, but I notice that the angular_nginx container isn’t running
    • I simply started it and AReS was running again:
Read more →

July, 2021

2021-07-01

  • Export another list of ALL subjects on CGSpace, including AGROVOC and non-AGROVOC for Enrico:
localhost/dspace63= > \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-01-all-subjects.csv WITH CSV HEADER;
COPY 20994
Read more →

May, 2021

2021-05-01

  • I looked at the top user agents and IPs in the Solr statistics for last month and I see these user agents:
    • “RI/1.0”, 1337
    • “Microsoft Office Word 2014”, 941
  • I will add the RI/1.0 pattern to our DSpace agents overload and purge them from Solr (we had previously seen this agent with 9,000 hits or so in 2020-09), but I think I will leave the Microsoft Word one… as that’s an actual user…
Read more →

April, 2021

2021-04-01

  • I wrote a script to query Sherpa’s API for our ISSNs: sherpa-issn-lookup.py
    • I’m curious to see how the results compare with the results from Crossref yesterday
  • AReS Explorer was down since this morning, I didn’t see anything in the systemd journal
    • I simply took everything down with docker-compose and then back up, and then it was OK
    • Perhaps one of the containers crashed, I should have looked closer but I was in a hurry
Read more →

March, 2021

2021-03-01

  • Discuss some OpenRXV issues with Abdullah from CodeObia
    • He’s trying to work on the DSpace 6+ metadata schema autoimport using the DSpace 6+ REST API
    • Also, we found some issues building and running OpenRXV currently due to ecosystem shift in the Node.js dependencies
Read more →

February, 2021

2021-02-01

  • Abenet said that CIP found more duplicate records in their export from AReS
  • I had a call with CodeObia to discuss the work on OpenRXV
  • Check the results of the AReS harvesting from last night:
$ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*&pretty'
{
  "count" : 100875,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  }
}
Read more →

January, 2021

2021-01-03

  • Peter notified me that some filters on AReS were broken again
    • It’s the same issue with the field names getting .keyword appended to the end that I already filed an issue on OpenRXV about last month
    • I fixed the broken filters (careful to not edit any others, lest they break too!)
  • Fix an issue with start page number for the DSpace REST API and statistics API in OpenRXV
    • The start page had been “1” in the UI, but in the backend they were doing some gymnastics to adjust to the zero-based offset/limit/page of the DSpace REST API and the statistics API
    • I adjusted it to default to 0 and added a note to the admin screen
    • I realized that this issue was actually causing the first page of 100 statistics to be missing…
    • For example, this item has 51 views on CGSpace, but 0 on AReS
Read more →

December, 2020

2020-12-01

  • Atmire responded about the issue with duplicate data in our Solr statistics
    • They noticed that some records in the statistics-2015 core haven’t been migrated with the AtomicStatisticsUpdateCLI tool yet and assumed that I haven’t migrated any of the records yet
    • That’s strange, as I checked all ten cores and 2015 is the only one with some unmigrated documents, as according to the cua_version field
    • I started processing those (about 411,000 records):
Read more →