CGSpace Notes

Documenting day-to-day work on the CGSpace repository.

September, 2017

2017-09-06

  • Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours

2017-09-07

  • Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account is both in the approvers step as well as the group

2017-09-10

  • Delete 58 blank metadata values from the CGSpace database:
dspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
DELETE 58
  • I also ran it on DSpace Test because we’ll be migrating the CGIAR Library soon and it would be good to catch these before we migrate
  • Run system updates and restart DSpace Test
  • We only have 7.7GB of free space on DSpace Test so I need to copy some data off of it before doing the CGIAR Library migration (requires lots of exporting and creating temp files)
  • I still have the original data from the CGIAR Library so I’ve zipped it up and sent it off to linode18 for now
  • sha256sum of original-cgiar-library-6.6GB.tar.gz is: bcfabb52f51cbdf164b61b7e9b3a0e498479e4c1ed1d547d32d11f44c0d5eb8a
  • Start doing a test run of the CGIAR Library migration locally
  • Notes and todo checklist here for now: https://gist.github.com/alanorth/3579b74e116ab13418d187ed379abd9c
  • Create pull request for Phase I and II changes to CCAFS Project Tags: #336
  • We’ve been discussing with Macaroni Bros and CCAFS for the past month or so and the list of tags was recently finalized
  • There will need to be some metadata updates — though if I recall correctly it is only about seven records — for that as well, I had made some notes about it in 2017-07, but I’ve asked for more clarification from Lili just in case
  • Looking at the DSpace logs to see if we’ve had a change in the “Cannot get a connection” errors since last month when we adjusted the db.maxconnections parameter on CGSpace:
# grep -c "Cannot get a connection, pool error Timeout waiting for idle object" dspace.log.2017-09-*
dspace.log.2017-09-01:0
dspace.log.2017-09-02:0
dspace.log.2017-09-03:9
dspace.log.2017-09-04:17
dspace.log.2017-09-05:752
dspace.log.2017-09-06:0
dspace.log.2017-09-07:0
dspace.log.2017-09-08:10
dspace.log.2017-09-09:0
dspace.log.2017-09-10:0
  • Also, since last month (2017-08) Macaroni Bros no longer runs their REST API scraper every hour, so I’m sure that helped
  • There are still some errors, though, so maybe I should bump the connection limit up a bit
  • I remember seeing that Munin shows that the average number of connections is 50 (which is probably mostly from the XMLUI) and we’re currently allowing 40 connections per app, so maybe it would be good to bump that value up to 50 or 60 along with the system’s PostgreSQL max_connections (formula should be: webapps * 60 + 3, or 3 * 60 + 3 = 183 in our case)
  • I updated both CGSpace and DSpace Test to use these new settings (60 connections per web app and 183 for system PostgreSQL limit)
  • I’m expecting to see 0 connection errors for the next few months