cgspace-notes/content/posts/2024-09.md

4.7 KiB
Raw Blame History

title date author categories
September, 2024 2024-09-01T21:16:00-07:00 Alan Orth
Notes

2024-09-01

  • Upgrade CGSpace to DSpace 7.6.2

2024-09-05

  • Finalize work on migrating DSpace Angular from Yarn to NPM

2024-09-06

  • This morning Tomcat crashed due to an OOM kill:
Sep 06 00:00:24 server systemd[1]: tomcat9.service: A process of this unit has been killed by the OOM killer.
Sep 06 00:00:25 server systemd[1]: tomcat9.service: Main process exited, code=killed, status=9/KILL
Sep 06 00:00:25 server systemd[1]: tomcat9.service: Failed with result 'oom-kill'.
  • According to the system journal, it was a Node.js dspace-angular process that tried to allocate memory and failed, thus invoking the OOM killer
  • Currently I see high memory usage in those processes:
$ pm2 status
┌────┬──────────────┬─────────────┬─────────┬─────────┬──────────┬────────┬──────┬───────────┬──────────┬──────────┬──────────┬──────────┐
│ id │ name         │ namespace   │ version │ mode    │ pid      │ uptime │ ↺    │ status    │ cpu      │ mem      │ user     │ watching │
├────┼──────────────┼─────────────┼─────────┼─────────┼──────────┼────────┼──────┼───────────┼──────────┼──────────┼──────────┼──────────┤
│ 0  │ dspace-ui    │ default     │ 7.6.3-… │ cluster │ 994      │ 4D     │ 0    │ online    │ 0%       │ 3.4gb    │ dspace   │ disabled │
│ 1  │ dspace-ui    │ default     │ 7.6.3-… │ cluster │ 1015     │ 4D     │ 0    │ online    │ 0%       │ 3.4gb    │ dspace   │ disabled │
│ 2  │ dspace-ui    │ default     │ 7.6.3-… │ cluster │ 1029     │ 4D     │ 0    │ online    │ 0%       │ 3.4gb    │ dspace   │ disabled │
│ 3  │ dspace-ui    │ default     │ 7.6.3-… │ cluster │ 1042     │ 4D     │ 0    │ online    │ 0%       │ 3.4gb    │ dspace   │ disabled │
└────┴──────────────┴─────────────┴─────────┴─────────┴──────────┴────────┴──────┴───────────┴──────────┴──────────┴──────────┴──────────┘
  • I bet if I look in the logs I'd find some kind of heavy traffic on the frontend, causing high caching for Angular SSR

2024-09-08

  • Analyzing memory use in our DSpace hosts, which have 32GB of memory
    • Effective cache of PostgreSQL is estimated at 11GB, which seems way high since the database is only 2GB
    • Realistically this should be how we adjust, with PostgreSQL using ~8GB (or less) and each dspace-angular process pinned at 2GB...

Total - Solr - Tomcat Postgres - Nginx - Angular 31366 (1024×4.4) 7168 (8×1024) 512 - (4x2048) = 2796.4 left...

  • I put some of these changes in on DSpace Test and will monitor this week

2024-09-10

  • Some bot in South Africa made a ton of requests on the API and made the load hit the roof:
# grep -E '10/Sep/2024:[10-11]' /var/log/nginx/api-access.log | awk '{print $1}' | sort | uniq -c | sort -h
...
 149720 102.182.38.90
  • They are using several user agents so are obviously a bot:
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:130.0) Gecko/20100101 Firefox/130.0
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/111.0
Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:11.0) Gecko/20100101 Firefox/11.0
  • I added them to the list of bot networks in nginx and the load went down

2024-09-11

  • Upgrade DSpace 7 Test to Ubuntu 24.04
  • I did some minor maintenance to test dspace-statistics-api with Python 3.12
    • I tagged version 1.4.4 and released it on GitHub

2024-09-14

  • Noticed a persistent higher than usual load on CGSpace and checked the server logs
    • Found some new data center subnets to block because they were making thousands of requests with normal user agents
  • I enabled HTTP/3 in nginx
  • I enabled the SSR patch in Angular: https://github.com/DSpace/dspace-angular/issues/3110