mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2018-01-31
This commit is contained in:
@ -1288,3 +1288,75 @@ Catalina:type=DataSource,class=javax.sql.DataSource,name="jdbc/dspace" maxActiv
|
||||
```
|
||||
|
||||
- I filed a ticket with Atmire: https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=566
|
||||
|
||||
## 2018-01-31
|
||||
|
||||
- UptimeRobot says CGSpace went down at 7:57 AM, and indeed I see a lot of HTTP 499 codes in nginx logs
|
||||
- PostgreSQL activity shows 222 database connections
|
||||
- Now PostgreSQL activity shows 265 database connections!
|
||||
- I don't see any errors anywhere...
|
||||
- Now PostgreSQL activity shows 308 connections!
|
||||
- Well this is interesting, there are 400 Tomcat threads busy:
|
||||
|
||||
```
|
||||
# munin-run tomcat_threads
|
||||
busy.value 400
|
||||
idle.value 0
|
||||
max.value 400
|
||||
```
|
||||
|
||||
- And wow, we finally exhausted the database connections, from dspace.log:
|
||||
|
||||
```
|
||||
2018-01-31 08:05:28,964 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
|
||||
org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-451] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:300; busy:300; idle:0; lastwait:5000].
|
||||
```
|
||||
|
||||
- Now even the nightly Atmire background thing is getting HTTP 500 error:
|
||||
|
||||
```
|
||||
Jan 31, 2018 8:16:05 AM com.sun.jersey.spi.container.ContainerResponse logException
|
||||
SEVERE: Mapped exception to response: 500 (Internal Server Error)
|
||||
javax.ws.rs.WebApplicationException
|
||||
```
|
||||
|
||||
- For now I will restart Tomcat to clear this shit and bring the site back up
|
||||
- The top IPs from this morning, during 7 and 8AM in XMLUI and REST/OAI:
|
||||
|
||||
```
|
||||
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 /var/log/nginx/error.log /var/log/nginx/error.log.1 | grep -E "31/Jan/2018:(07|08)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
67 66.249.66.70
|
||||
70 207.46.13.12
|
||||
71 197.210.168.174
|
||||
83 207.46.13.13
|
||||
85 157.55.39.79
|
||||
89 207.46.13.14
|
||||
123 68.180.228.157
|
||||
198 66.249.66.90
|
||||
219 41.204.190.40
|
||||
255 2405:204:a208:1e12:132:2a8e:ad28:46c0
|
||||
# cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "31/Jan/2018:(07|08)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
2 65.55.210.187
|
||||
2 66.249.66.90
|
||||
3 157.55.39.79
|
||||
4 197.232.39.92
|
||||
4 34.216.252.127
|
||||
6 104.196.152.243
|
||||
6 213.55.85.89
|
||||
15 122.52.115.13
|
||||
16 213.55.107.186
|
||||
596 45.5.184.196
|
||||
```
|
||||
|
||||
- This looks reasonable to me, so I have no idea why we ran out of Tomcat threads
|
||||
|
||||

|
||||
|
||||
- We need to start graphing the Tomcat sessions as well, though that requires JMX
|
||||
- Also, I wonder if I could disable the nightly Atmire thing
|
||||
- God, I don't know where this load is coming from
|
||||
- Since I bumped up the Tomcat threads from 200 to 400 the load on the server has been sustained at about 200%:
|
||||
|
||||

|
||||
|
||||
- I should make separate database pools for the web applications and the API applications like REST and OAI
|
||||
|
Reference in New Issue
Block a user