Update notes for 2018-02-07

This commit is contained in:
2018-02-07 18:33:26 +02:00
parent 0dafafb517
commit 389ab31e63
3 changed files with 121 additions and 8 deletions

View File

@ -139,3 +139,56 @@ Tue Feb 6 09:30:32 UTC 2018
- According to [the announcement](https://groups.google.com/forum/#!topic/orcid-api-users/qfg-HwAB1bk) the v1 API was moved from `http://pub.orcid.org/` to `https://pub.orcid.org/v1.2` until March 1st when it will be discontinued for good
- But the old URL is hard coded in DSpace and it doesn't work anyways, because it currently redirects you to `https://pub.orcid.org/v2.0/v1.2`
- So I guess we have to disable that shit once and for all and switch to a controlled vocabulary
- CGSpace crashed again, this time around `Wed Feb 7 11:20:28 UTC 2018`
- I took a few snapshots of the PostgreSQL activity at the time and as the minutes went on and the connections were very high at first but reduced on their own:
```
$ psql -c 'select * from pg_stat_activity' > /tmp/pg_stat_activity.txt
$ grep -c 'PostgreSQL JDBC' /tmp/pg_stat_activity*
/tmp/pg_stat_activity1.txt:300
/tmp/pg_stat_activity2.txt:272
/tmp/pg_stat_activity3.txt:168
/tmp/pg_stat_activity4.txt:5
/tmp/pg_stat_activity5.txt:6
```
- Interestingly, all of those 751 connections were idle!
```
$ grep "PostgreSQL JDBC" /tmp/pg_stat_activity* | grep -c idle
751
```
- Since I was restarting Tomcat anyways, I decided to deploy the changes to create two different pools for web and API apps
- Looking the Munin graphs, I can see that there were almost double the normal number of DSpace sessions at the time of the crash (and also yesterday!):
![DSpace Sessions](/cgspace-notes/2018/02/jmx_dspace-sessions-day.png)
- Indeed it seems like there were over 1800 sessions today around the hours of 10 and 11 AM:
```
$ grep -E '^2018-02-07 (10|11)' dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
1828
```
- CGSpace went down again a few hours later, and now the connections to the dspaceWeb pool are maxed at 250 (the new limit I imposed with the new separate pool scheme)
- What's interesting is that the DSpace log says the connections are all busy:
```
org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-328] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:250; busy:250; idle:0; lastwait:5000].
```
- ... but in PostgreSQL I see them `idle` or `idle in transaction`:
```
$ psql -c 'select * from pg_stat_activity' | grep -c dspaceWeb
250
$ psql -c 'select * from pg_stat_activity' | grep dspaceWeb | grep -c idle
250
$ psql -c 'select * from pg_stat_activity' | grep dspaceWeb | grep -c "idle in transaction"
187
```
- What the fuck, does DSpace think all connections are busy?
- I suspect these are issues with abandoned connections or maybe a leak, so I'm going to try adding the `removeAbandoned='true'` parameter which is apparently off by default
- I will try `testOnReturn='true'` too, just to add more validation, because I'm fucking grasping at straws