mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-12-18 19:22:18 +01:00
148 lines
6.8 KiB
Markdown
148 lines
6.8 KiB
Markdown
---
|
||
title: "September, 2024"
|
||
date: 2024-09-01T21:16:00-07:00
|
||
author: "Alan Orth"
|
||
categories: ["Notes"]
|
||
---
|
||
|
||
## 2024-09-01
|
||
|
||
- Upgrade CGSpace to DSpace 7.6.2
|
||
|
||
<!--more-->
|
||
|
||
## 2024-09-05
|
||
|
||
- Finalize work on migrating DSpace Angular from Yarn to NPM
|
||
|
||
## 2024-09-06
|
||
|
||
- This morning Tomcat crashed due to an OOM kill:
|
||
|
||
```
|
||
Sep 06 00:00:24 server systemd[1]: tomcat9.service: A process of this unit has been killed by the OOM killer.
|
||
Sep 06 00:00:25 server systemd[1]: tomcat9.service: Main process exited, code=killed, status=9/KILL
|
||
Sep 06 00:00:25 server systemd[1]: tomcat9.service: Failed with result 'oom-kill'.
|
||
```
|
||
|
||
- According to the system journal, it was a Node.js dspace-angular process that tried to allocate memory and failed, thus invoking the OOM killer
|
||
- Currently I see high memory usage in those processes:
|
||
|
||
```console
|
||
$ pm2 status
|
||
┌────┬──────────────┬─────────────┬─────────┬─────────┬──────────┬────────┬──────┬───────────┬──────────┬──────────┬──────────┬──────────┐
|
||
│ id │ name │ namespace │ version │ mode │ pid │ uptime │ ↺ │ status │ cpu │ mem │ user │ watching │
|
||
├────┼──────────────┼─────────────┼─────────┼─────────┼──────────┼────────┼──────┼───────────┼──────────┼──────────┼──────────┼──────────┤
|
||
│ 0 │ dspace-ui │ default │ 7.6.3-… │ cluster │ 994 │ 4D │ 0 │ online │ 0% │ 3.4gb │ dspace │ disabled │
|
||
│ 1 │ dspace-ui │ default │ 7.6.3-… │ cluster │ 1015 │ 4D │ 0 │ online │ 0% │ 3.4gb │ dspace │ disabled │
|
||
│ 2 │ dspace-ui │ default │ 7.6.3-… │ cluster │ 1029 │ 4D │ 0 │ online │ 0% │ 3.4gb │ dspace │ disabled │
|
||
│ 3 │ dspace-ui │ default │ 7.6.3-… │ cluster │ 1042 │ 4D │ 0 │ online │ 0% │ 3.4gb │ dspace │ disabled │
|
||
└────┴──────────────┴─────────────┴─────────┴─────────┴──────────┴────────┴──────┴───────────┴──────────┴──────────┴──────────┴──────────┘
|
||
```
|
||
|
||
- I bet if I look in the logs I'd find some kind of heavy traffic on the frontend, causing high caching for Angular SSR
|
||
|
||
## 2024-09-08
|
||
|
||
- Analyzing memory use in our DSpace hosts, which have 32GB of memory
|
||
- Effective cache of PostgreSQL is estimated at 11GB, which seems way high since the database is only 2GB
|
||
- Realistically this should be how we adjust, with PostgreSQL using ~8GB (or less) and each dspace-angular process pinned at 2GB...
|
||
|
||
> Total - Solr - Tomcat Postgres - Nginx - Angular
|
||
> 31366 − (1024×4.4) − 7168 − (8×1024) − 512 - (4x2048) = 2796.4 left...
|
||
|
||
- I put some of these changes in on DSpace Test and will monitor this week
|
||
|
||
## 2024-09-10
|
||
|
||
- Some bot in South Africa made a ton of requests on the API and made the load hit the roof:
|
||
|
||
```
|
||
# grep -E '10/Sep/2024:[10-11]' /var/log/nginx/api-access.log | awk '{print $1}' | sort | uniq -c | sort -h
|
||
...
|
||
149720 102.182.38.90
|
||
```
|
||
|
||
- They are using several user agents so are obviously a bot:
|
||
|
||
```
|
||
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:130.0) Gecko/20100101 Firefox/130.0
|
||
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/111.0
|
||
Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:11.0) Gecko/20100101 Firefox/11.0
|
||
```
|
||
|
||
- I added them to the list of bot networks in nginx and the load went down
|
||
|
||
## 2024-09-11
|
||
|
||
- Upgrade DSpace 7 Test to Ubuntu 24.04
|
||
- I did some minor maintenance to test dspace-statistics-api with Python 3.12
|
||
- I tagged version 1.4.4 and released it on GitHub
|
||
|
||
## 2024-09-14
|
||
|
||
- Noticed a persistent higher than usual load on CGSpace and checked the server logs
|
||
- Found some new data center subnets to block because they were making thousands of requests with normal user agents
|
||
- I enabled HTTP/3 in nginx
|
||
- I enabled the SSR patch in Angular: https://github.com/DSpace/dspace-angular/issues/3110
|
||
|
||
## 2024-09-16
|
||
|
||
- Experiment with the <a href="https://github.com/codeobia/dspace-statistics-api-js">dspace-statistics-api-js</a> on DSpace 7 Test
|
||
- In the past it always caused Solr to run out of memory, but I increased Solr's heap from 2g to 3g and it runs without crashing
|
||
- I attached VisualVM to Solr with a 3g and 4g heap and iterated over 1260 pages of results in the dspace-statistics-api-js:
|
||
|
||
![Solr with 3g heap](/cgspace-notes/2024/09/2024-09-16-Solr-3g-heap.png)
|
||
|
||
![Solr with 4g heap](/cgspace-notes/2024/09/2024-09-16-Solr-4g-heap.png)
|
||
|
||
## 2024-09-23
|
||
|
||
- Upgrade PostgreSQL from version 14 to 15 on DSpace Test the same way I did last year:
|
||
|
||
```console
|
||
# apt update
|
||
# apt install postgresql-15
|
||
# Update configs with Ansible
|
||
# systemctl stop tomcat9
|
||
# pg_ctlcluster 14 main stop
|
||
# tar -cvzpf var-lib-postgresql-14.tar.gz /var/lib/postgresql/14
|
||
# tar -cvzpf etc-postgresql-14.tar.gz /etc/postgresql/14
|
||
# pg_ctlcluster 15 main stop
|
||
# pg_dropcluster 15 main
|
||
# pg_upgradecluster 14 main
|
||
# pg_ctlcluster 15 main start
|
||
...
|
||
|
||
ERROR: could not find function "xml_is_well_formed" in file "/usr/lib/postgresql/15/lib/pgxml.so"
|
||
ERROR: function public.xml_is_well_formed(text) does not exist
|
||
ERROR: could not find function "xml_is_well_formed" in file "/usr/lib/postgresql/15/lib/pgxml.so"
|
||
ERROR: function public.xml_valid(text) does not exist
|
||
```
|
||
|
||
|
||
- After that I [re-indexed the database indexes](https://adamj.eu/tech/2021/04/13/reindexing-all-tables-after-upgrading-to-postgresql-13/) using a query:
|
||
|
||
```console
|
||
$ su - postgres
|
||
$ cat /tmp/generate-reindex.sql
|
||
SELECT 'REINDEX TABLE CONCURRENTLY ' || quote_ident(relname) || ' /*' || pg_size_pretty(pg_total_relation_size(C.oid)) || '*/;'
|
||
FROM pg_class C
|
||
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
|
||
WHERE nspname = 'public'
|
||
AND C.relkind = 'r'
|
||
AND nspname !~ '^pg_toast'
|
||
ORDER BY pg_total_relation_size(C.oid) ASC;
|
||
$ psql dspace < /tmp/generate-reindex.sql > /tmp/reindex.sql
|
||
$ <trim the extra stuff from /tmp/reindex.sql>
|
||
$ psql dspace < /tmp/reindex.sql
|
||
```
|
||
|
||
- The database shrunk by 186MB!
|
||
|
||
## 2024-09-29
|
||
|
||
- I upgraded the database on CGSpace to PostgreSQL 15
|
||
|
||
<!-- vim: set sw=2 ts=2: -->
|