Add notes for 2018-01-31

This commit is contained in:
Alan Orth 2018-01-31 12:01:34 +02:00
parent b051fb4bf6
commit 0e9a9d06a4
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
7 changed files with 162 additions and 8 deletions

View File

@ -1288,3 +1288,75 @@ Catalina:type=DataSource,class=javax.sql.DataSource,name="jdbc/dspace" maxActiv
```
- I filed a ticket with Atmire: https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=566
## 2018-01-31
- UptimeRobot says CGSpace went down at 7:57 AM, and indeed I see a lot of HTTP 499 codes in nginx logs
- PostgreSQL activity shows 222 database connections
- Now PostgreSQL activity shows 265 database connections!
- I don't see any errors anywhere...
- Now PostgreSQL activity shows 308 connections!
- Well this is interesting, there are 400 Tomcat threads busy:
```
# munin-run tomcat_threads
busy.value 400
idle.value 0
max.value 400
```
- And wow, we finally exhausted the database connections, from dspace.log:
```
2018-01-31 08:05:28,964 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-451] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:300; busy:300; idle:0; lastwait:5000].
```
- Now even the nightly Atmire background thing is getting HTTP 500 error:
```
Jan 31, 2018 8:16:05 AM com.sun.jersey.spi.container.ContainerResponse logException
SEVERE: Mapped exception to response: 500 (Internal Server Error)
javax.ws.rs.WebApplicationException
```
- For now I will restart Tomcat to clear this shit and bring the site back up
- The top IPs from this morning, during 7 and 8AM in XMLUI and REST/OAI:
```
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 /var/log/nginx/error.log /var/log/nginx/error.log.1 | grep -E "31/Jan/2018:(07|08)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
67 66.249.66.70
70 207.46.13.12
71 197.210.168.174
83 207.46.13.13
85 157.55.39.79
89 207.46.13.14
123 68.180.228.157
198 66.249.66.90
219 41.204.190.40
255 2405:204:a208:1e12:132:2a8e:ad28:46c0
# cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "31/Jan/2018:(07|08)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
2 65.55.210.187
2 66.249.66.90
3 157.55.39.79
4 197.232.39.92
4 34.216.252.127
6 104.196.152.243
6 213.55.85.89
15 122.52.115.13
16 213.55.107.186
596 45.5.184.196
```
- This looks reasonable to me, so I have no idea why we ran out of Tomcat threads
![Tomcat threads](/cgspace-notes/2018/01/tomcat-threads-day.png)
- We need to start graphing the Tomcat sessions as well, though that requires JMX
- Also, I wonder if I could disable the nightly Atmire thing
- God, I don't know where this load is coming from
- Since I bumped up the Tomcat threads from 200 to 400 the load on the server has been sustained at about 200%:
![CPU usage week](/cgspace-notes/2018/01/cpu-week.png)
- I should make separate database pools for the web applications and the API applications like REST and OAI

View File

@ -92,7 +92,7 @@ Danny wrote to ask for help renewing the wildcard ilri.org certificate and I adv
<meta property="article:published_time" content="2018-01-02T08:35:54-08:00"/>
<meta property="article:modified_time" content="2018-01-29T09:47:55&#43;02:00"/>
<meta property="article:modified_time" content="2018-01-29T12:25:30&#43;02:00"/>
@ -194,9 +194,9 @@ Danny wrote to ask for help renewing the wildcard ilri.org certificate and I adv
"@type": "BlogPosting",
"headline": "January, 2018",
"url": "https://alanorth.github.io/cgspace-notes/2018-01/",
"wordCount": "7537",
"wordCount": "7885",
"datePublished": "2018-01-02T08:35:54-08:00",
"dateModified": "2018-01-29T09:47:55&#43;02:00",
"dateModified": "2018-01-29T12:25:30&#43;02:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -1688,6 +1688,88 @@ Catalina:type=DataSource,class=javax.sql.DataSource,name=&quot;jdbc/dspace&quot;
<li>I filed a ticket with Atmire: <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=566">https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=566</a></li>
</ul>
<h2 id="2018-01-31">2018-01-31</h2>
<ul>
<li>UptimeRobot says CGSpace went down at 7:57 AM, and indeed I see a lot of HTTP 499 codes in nginx logs</li>
<li>PostgreSQL activity shows 222 database connections</li>
<li>Now PostgreSQL activity shows 265 database connections!</li>
<li>I don&rsquo;t see any errors anywhere&hellip;</li>
<li>Now PostgreSQL activity shows 308 connections!</li>
<li>Well this is interesting, there are 400 Tomcat threads busy:</li>
</ul>
<pre><code># munin-run tomcat_threads
busy.value 400
idle.value 0
max.value 400
</code></pre>
<ul>
<li>And wow, we finally exhausted the database connections, from dspace.log:</li>
</ul>
<pre><code>2018-01-31 08:05:28,964 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-451] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:300; busy:300; idle:0; lastwait:5000].
</code></pre>
<ul>
<li>Now even the nightly Atmire background thing is getting HTTP 500 error:</li>
</ul>
<pre><code>Jan 31, 2018 8:16:05 AM com.sun.jersey.spi.container.ContainerResponse logException
SEVERE: Mapped exception to response: 500 (Internal Server Error)
javax.ws.rs.WebApplicationException
</code></pre>
<ul>
<li>For now I will restart Tomcat to clear this shit and bring the site back up</li>
<li>The top IPs from this morning, during 7 and 8AM in XMLUI and REST/OAI:</li>
</ul>
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 /var/log/nginx/error.log /var/log/nginx/error.log.1 | grep -E &quot;31/Jan/2018:(07|08)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
67 66.249.66.70
70 207.46.13.12
71 197.210.168.174
83 207.46.13.13
85 157.55.39.79
89 207.46.13.14
123 68.180.228.157
198 66.249.66.90
219 41.204.190.40
255 2405:204:a208:1e12:132:2a8e:ad28:46c0
# cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E &quot;31/Jan/2018:(07|08)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
2 65.55.210.187
2 66.249.66.90
3 157.55.39.79
4 197.232.39.92
4 34.216.252.127
6 104.196.152.243
6 213.55.85.89
15 122.52.115.13
16 213.55.107.186
596 45.5.184.196
</code></pre>
<ul>
<li>This looks reasonable to me, so I have no idea why we ran out of Tomcat threads</li>
</ul>
<p><img src="/cgspace-notes/2018/01/tomcat-threads-day.png" alt="Tomcat threads" /></p>
<ul>
<li>We need to start graphing the Tomcat sessions as well, though that requires JMX</li>
<li>Also, I wonder if I could disable the nightly Atmire thing</li>
<li>God, I don&rsquo;t know where this load is coming from</li>
<li>Since I bumped up the Tomcat threads from 200 to 400 the load on the server has been sustained at about 200%:</li>
</ul>
<p><img src="/cgspace-notes/2018/01/cpu-week.png" alt="CPU usage week" /></p>
<ul>
<li>I should make separate database pools for the web applications and the API applications like REST and OAI</li>
</ul>

BIN
public/2018/01/cpu-week.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.1 KiB

View File

@ -4,7 +4,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/2018-01/</loc>
<lastmod>2018-01-29T09:47:55+02:00</lastmod>
<lastmod>2018-01-29T12:25:30+02:00</lastmod>
</url>
<url>
@ -144,7 +144,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2018-01-29T09:47:55+02:00</lastmod>
<lastmod>2018-01-29T12:25:30+02:00</lastmod>
<priority>0</priority>
</url>
@ -155,7 +155,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
<lastmod>2018-01-29T09:47:55+02:00</lastmod>
<lastmod>2018-01-29T12:25:30+02:00</lastmod>
<priority>0</priority>
</url>
@ -167,13 +167,13 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/post/</loc>
<lastmod>2018-01-29T09:47:55+02:00</lastmod>
<lastmod>2018-01-29T12:25:30+02:00</lastmod>
<priority>0</priority>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
<lastmod>2018-01-29T09:47:55+02:00</lastmod>
<lastmod>2018-01-29T12:25:30+02:00</lastmod>
<priority>0</priority>
</url>

BIN
static/2018/01/cpu-week.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.1 KiB