CGSpace Notes

Documenting day-to-day work on the CGSpace repository.

August, 2023

2023-08-03

  • I finally got around to working on Peter’s cleanups for affiliations, authors, and donors from last week
    • I did some minor cleanups myself and applied them to CGSpace
  • Start working on some batch uploads for IFPRI

2023-08-04

2023-08-05

  • Export CGSpace to check for missing Initiative collection mappings
  • Start a harvest on AReS

2023-08-07

  • I’m checking the PostgreSQL logs now that statement logging has been enabled for a few days on DSpace Test
    • I see the logs are about 7 or 8 GB, which is larger than expected—and this is the test server!
    • I will now play with pgbadger to see if it gives any useful insights
    • Hmm, it sems the log_statement advice was old as pgbadger itself says:

Do not enable log_statement as its log format will not be parsed by pgBadger.

… and:

Warning: Do not enable both log_min_duration_statement, log_duration and log_statement all together, this will result in wrong counter values. Note that this will also increase drastically the size of your log. log_min_duration_statement should always be preferred.

  • So we need to follow pgbadger’s instructions rather to get a suitable log file
    • After enabling the new settings I see that our log file is going to be reaallllly big… hmmmm will check tomorrow morning
  • More work on the IFPRI batch uploads

2023-08-08

  • Apply more corrections to authors from Peter on CGSpace
  • I finally figured out a log_line_prefix for PostgreSQL that works for pgBadger:
log_line_prefix = '%t [%p]: user=%u,db=%d,app=%a,client=%h '
  • Now I can generate reports:
# /usr/bin/pgbadger -I -q /var/log/postgresql/postgresql-14-main.log -O /srv/www/pgbadger
  • Ideally we would run this incremental report every day on the postgresql-14-main.log.1 aka yesterday’s version of the log file after it is rotated
    • Now I have to see how large the file will be…
  • I did some final updates to the ninety IFPRI records and uploaded them to DSpace Test first, then to CGSpace