mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-26 00:18:21 +01:00
Add notes for 2018-01-31
This commit is contained in:
parent
b051fb4bf6
commit
0e9a9d06a4
@ -1288,3 +1288,75 @@ Catalina:type=DataSource,class=javax.sql.DataSource,name="jdbc/dspace" maxActiv
|
|||||||
```
|
```
|
||||||
|
|
||||||
- I filed a ticket with Atmire: https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=566
|
- I filed a ticket with Atmire: https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=566
|
||||||
|
|
||||||
|
## 2018-01-31
|
||||||
|
|
||||||
|
- UptimeRobot says CGSpace went down at 7:57 AM, and indeed I see a lot of HTTP 499 codes in nginx logs
|
||||||
|
- PostgreSQL activity shows 222 database connections
|
||||||
|
- Now PostgreSQL activity shows 265 database connections!
|
||||||
|
- I don't see any errors anywhere...
|
||||||
|
- Now PostgreSQL activity shows 308 connections!
|
||||||
|
- Well this is interesting, there are 400 Tomcat threads busy:
|
||||||
|
|
||||||
|
```
|
||||||
|
# munin-run tomcat_threads
|
||||||
|
busy.value 400
|
||||||
|
idle.value 0
|
||||||
|
max.value 400
|
||||||
|
```
|
||||||
|
|
||||||
|
- And wow, we finally exhausted the database connections, from dspace.log:
|
||||||
|
|
||||||
|
```
|
||||||
|
2018-01-31 08:05:28,964 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
|
||||||
|
org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-451] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:300; busy:300; idle:0; lastwait:5000].
|
||||||
|
```
|
||||||
|
|
||||||
|
- Now even the nightly Atmire background thing is getting HTTP 500 error:
|
||||||
|
|
||||||
|
```
|
||||||
|
Jan 31, 2018 8:16:05 AM com.sun.jersey.spi.container.ContainerResponse logException
|
||||||
|
SEVERE: Mapped exception to response: 500 (Internal Server Error)
|
||||||
|
javax.ws.rs.WebApplicationException
|
||||||
|
```
|
||||||
|
|
||||||
|
- For now I will restart Tomcat to clear this shit and bring the site back up
|
||||||
|
- The top IPs from this morning, during 7 and 8AM in XMLUI and REST/OAI:
|
||||||
|
|
||||||
|
```
|
||||||
|
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 /var/log/nginx/error.log /var/log/nginx/error.log.1 | grep -E "31/Jan/2018:(07|08)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||||
|
67 66.249.66.70
|
||||||
|
70 207.46.13.12
|
||||||
|
71 197.210.168.174
|
||||||
|
83 207.46.13.13
|
||||||
|
85 157.55.39.79
|
||||||
|
89 207.46.13.14
|
||||||
|
123 68.180.228.157
|
||||||
|
198 66.249.66.90
|
||||||
|
219 41.204.190.40
|
||||||
|
255 2405:204:a208:1e12:132:2a8e:ad28:46c0
|
||||||
|
# cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "31/Jan/2018:(07|08)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||||
|
2 65.55.210.187
|
||||||
|
2 66.249.66.90
|
||||||
|
3 157.55.39.79
|
||||||
|
4 197.232.39.92
|
||||||
|
4 34.216.252.127
|
||||||
|
6 104.196.152.243
|
||||||
|
6 213.55.85.89
|
||||||
|
15 122.52.115.13
|
||||||
|
16 213.55.107.186
|
||||||
|
596 45.5.184.196
|
||||||
|
```
|
||||||
|
|
||||||
|
- This looks reasonable to me, so I have no idea why we ran out of Tomcat threads
|
||||||
|
|
||||||
|
![Tomcat threads](/cgspace-notes/2018/01/tomcat-threads-day.png)
|
||||||
|
|
||||||
|
- We need to start graphing the Tomcat sessions as well, though that requires JMX
|
||||||
|
- Also, I wonder if I could disable the nightly Atmire thing
|
||||||
|
- God, I don't know where this load is coming from
|
||||||
|
- Since I bumped up the Tomcat threads from 200 to 400 the load on the server has been sustained at about 200%:
|
||||||
|
|
||||||
|
![CPU usage week](/cgspace-notes/2018/01/cpu-week.png)
|
||||||
|
|
||||||
|
- I should make separate database pools for the web applications and the API applications like REST and OAI
|
||||||
|
@ -92,7 +92,7 @@ Danny wrote to ask for help renewing the wildcard ilri.org certificate and I adv
|
|||||||
|
|
||||||
<meta property="article:published_time" content="2018-01-02T08:35:54-08:00"/>
|
<meta property="article:published_time" content="2018-01-02T08:35:54-08:00"/>
|
||||||
|
|
||||||
<meta property="article:modified_time" content="2018-01-29T09:47:55+02:00"/>
|
<meta property="article:modified_time" content="2018-01-29T12:25:30+02:00"/>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@ -194,9 +194,9 @@ Danny wrote to ask for help renewing the wildcard ilri.org certificate and I adv
|
|||||||
"@type": "BlogPosting",
|
"@type": "BlogPosting",
|
||||||
"headline": "January, 2018",
|
"headline": "January, 2018",
|
||||||
"url": "https://alanorth.github.io/cgspace-notes/2018-01/",
|
"url": "https://alanorth.github.io/cgspace-notes/2018-01/",
|
||||||
"wordCount": "7537",
|
"wordCount": "7885",
|
||||||
"datePublished": "2018-01-02T08:35:54-08:00",
|
"datePublished": "2018-01-02T08:35:54-08:00",
|
||||||
"dateModified": "2018-01-29T09:47:55+02:00",
|
"dateModified": "2018-01-29T12:25:30+02:00",
|
||||||
"author": {
|
"author": {
|
||||||
"@type": "Person",
|
"@type": "Person",
|
||||||
"name": "Alan Orth"
|
"name": "Alan Orth"
|
||||||
@ -1688,6 +1688,88 @@ Catalina:type=DataSource,class=javax.sql.DataSource,name="jdbc/dspace"
|
|||||||
<li>I filed a ticket with Atmire: <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=566">https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=566</a></li>
|
<li>I filed a ticket with Atmire: <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=566">https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=566</a></li>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
|
<h2 id="2018-01-31">2018-01-31</h2>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>UptimeRobot says CGSpace went down at 7:57 AM, and indeed I see a lot of HTTP 499 codes in nginx logs</li>
|
||||||
|
<li>PostgreSQL activity shows 222 database connections</li>
|
||||||
|
<li>Now PostgreSQL activity shows 265 database connections!</li>
|
||||||
|
<li>I don’t see any errors anywhere…</li>
|
||||||
|
<li>Now PostgreSQL activity shows 308 connections!</li>
|
||||||
|
<li>Well this is interesting, there are 400 Tomcat threads busy:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code># munin-run tomcat_threads
|
||||||
|
busy.value 400
|
||||||
|
idle.value 0
|
||||||
|
max.value 400
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>And wow, we finally exhausted the database connections, from dspace.log:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>2018-01-31 08:05:28,964 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
|
||||||
|
org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-451] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:300; busy:300; idle:0; lastwait:5000].
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>Now even the nightly Atmire background thing is getting HTTP 500 error:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>Jan 31, 2018 8:16:05 AM com.sun.jersey.spi.container.ContainerResponse logException
|
||||||
|
SEVERE: Mapped exception to response: 500 (Internal Server Error)
|
||||||
|
javax.ws.rs.WebApplicationException
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>For now I will restart Tomcat to clear this shit and bring the site back up</li>
|
||||||
|
<li>The top IPs from this morning, during 7 and 8AM in XMLUI and REST/OAI:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 /var/log/nginx/error.log /var/log/nginx/error.log.1 | grep -E "31/Jan/2018:(07|08)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||||
|
67 66.249.66.70
|
||||||
|
70 207.46.13.12
|
||||||
|
71 197.210.168.174
|
||||||
|
83 207.46.13.13
|
||||||
|
85 157.55.39.79
|
||||||
|
89 207.46.13.14
|
||||||
|
123 68.180.228.157
|
||||||
|
198 66.249.66.90
|
||||||
|
219 41.204.190.40
|
||||||
|
255 2405:204:a208:1e12:132:2a8e:ad28:46c0
|
||||||
|
# cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "31/Jan/2018:(07|08)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||||
|
2 65.55.210.187
|
||||||
|
2 66.249.66.90
|
||||||
|
3 157.55.39.79
|
||||||
|
4 197.232.39.92
|
||||||
|
4 34.216.252.127
|
||||||
|
6 104.196.152.243
|
||||||
|
6 213.55.85.89
|
||||||
|
15 122.52.115.13
|
||||||
|
16 213.55.107.186
|
||||||
|
596 45.5.184.196
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>This looks reasonable to me, so I have no idea why we ran out of Tomcat threads</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<p><img src="/cgspace-notes/2018/01/tomcat-threads-day.png" alt="Tomcat threads" /></p>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>We need to start graphing the Tomcat sessions as well, though that requires JMX</li>
|
||||||
|
<li>Also, I wonder if I could disable the nightly Atmire thing</li>
|
||||||
|
<li>God, I don’t know where this load is coming from</li>
|
||||||
|
<li>Since I bumped up the Tomcat threads from 200 to 400 the load on the server has been sustained at about 200%:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<p><img src="/cgspace-notes/2018/01/cpu-week.png" alt="CPU usage week" /></p>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>I should make separate database pools for the web applications and the API applications like REST and OAI</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
BIN
public/2018/01/cpu-week.png
Normal file
BIN
public/2018/01/cpu-week.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 12 KiB |
BIN
public/2018/01/tomcat-threads-day.png
Normal file
BIN
public/2018/01/tomcat-threads-day.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 7.1 KiB |
@ -4,7 +4,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/2018-01/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/2018-01/</loc>
|
||||||
<lastmod>2018-01-29T09:47:55+02:00</lastmod>
|
<lastmod>2018-01-29T12:25:30+02:00</lastmod>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
@ -144,7 +144,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||||
<lastmod>2018-01-29T09:47:55+02:00</lastmod>
|
<lastmod>2018-01-29T12:25:30+02:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
@ -155,7 +155,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||||
<lastmod>2018-01-29T09:47:55+02:00</lastmod>
|
<lastmod>2018-01-29T12:25:30+02:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
@ -167,13 +167,13 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/post/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/post/</loc>
|
||||||
<lastmod>2018-01-29T09:47:55+02:00</lastmod>
|
<lastmod>2018-01-29T12:25:30+02:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||||
<lastmod>2018-01-29T09:47:55+02:00</lastmod>
|
<lastmod>2018-01-29T12:25:30+02:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
|
BIN
static/2018/01/cpu-week.png
Normal file
BIN
static/2018/01/cpu-week.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 12 KiB |
BIN
static/2018/01/tomcat-threads-day.png
Normal file
BIN
static/2018/01/tomcat-threads-day.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 7.1 KiB |
Loading…
Reference in New Issue
Block a user