Update notes for 2018-02-07

This commit is contained in:
Alan Orth 2018-02-08 00:55:28 +02:00
parent 69cbd6169b
commit 5e4dab0002
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
3 changed files with 223 additions and 8 deletions

View File

@ -198,3 +198,105 @@ $ psql -c 'select * from pg_stat_activity' | grep dspaceWeb | grep -c "idle in t
Wed Feb 07 15:01:54 UTC 2018 | Query:containerItem:91917 AND type:2 Wed Feb 07 15:01:54 UTC 2018 | Query:containerItem:91917 AND type:2
Exception in thread "http-bio-127.0.0.1-8081-exec-58" java.lang.OutOfMemoryError: Java heap space Exception in thread "http-bio-127.0.0.1-8081-exec-58" java.lang.OutOfMemoryError: Java heap space
``` ```
- I'm trying to find a way to determine what was using all those Tomcat sessions, but parsing the DSpace log is hard because some IPs are IPv6, which contain colons!
- Looking at the first crash this morning around 11, I see these IPv4 addresses making requests around 10 and 11AM:
```
$ grep -E '^2018-02-07 (10|11)' dspace.log.2018-02-07 | grep -o -E 'ip_addr=[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | sort -n | uniq -c | sort -n | tail -n 20
34 ip_addr=46.229.168.67
34 ip_addr=46.229.168.73
37 ip_addr=46.229.168.76
40 ip_addr=34.232.65.41
41 ip_addr=46.229.168.71
44 ip_addr=197.210.168.174
55 ip_addr=181.137.2.214
55 ip_addr=213.55.99.121
58 ip_addr=46.229.168.65
64 ip_addr=66.249.66.91
67 ip_addr=66.249.66.90
71 ip_addr=207.46.13.54
78 ip_addr=130.82.1.40
104 ip_addr=40.77.167.36
151 ip_addr=68.180.228.157
174 ip_addr=207.46.13.135
194 ip_addr=54.83.138.123
198 ip_addr=40.77.167.62
210 ip_addr=207.46.13.71
214 ip_addr=104.196.152.243
```
- These IPs made thousands of sessions today:
```
$ grep 104.196.152.243 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
530
$ grep 207.46.13.71 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
859
$ grep 40.77.167.62 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
610
$ grep 54.83.138.123 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
8
$ grep 207.46.13.135 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
826
$ grep 68.180.228.157 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
727
$ grep 40.77.167.36 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
181
$ grep 130.82.1.40 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
24
$ grep 207.46.13.54 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
166
$ grep 46.229.168 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
992
```
- Let's investigate who these IPs belong to:
- 104.196.152.243 is CIAT, which is already marked as a bot via nginx!
- 207.46.13.71 is Bing, which is already marked as a bot in Tomcat's Crawler Session Manager Valve!
- 40.77.167.62 is Bing, which is already marked as a bot in Tomcat's Crawler Session Manager Valve!
- 207.46.13.135 is Bing, which is already marked as a bot in Tomcat's Crawler Session Manager Valve!
- 68.180.228.157 is Yahoo, which is already marked as a bot in Tomcat's Crawler Session Manager Valve!
- 40.77.167.36 is Bing, which is already marked as a bot in Tomcat's Crawler Session Manager Valve!
- 207.46.13.54 is Bing, which is already marked as a bot in Tomcat's Crawler Session Manager Valve!
- 46.229.168.x is Semrush, which is already marked as a bot in Tomcat's Crawler Session Manager Valve!
- Nice, so these are all known bots that are already crammed into one session by Tomcat's Crawler Session Manager Valve.
- What in the actual fuck, why is our load doing this? It's gotta be something fucked up with the database pool being "busy" but everything is fucking idle
- One that I should probably add in nginx is 54.83.138.123, which is apparently the following user agent:
```
BUbiNG (+http://law.di.unimi.it/BUbiNG.html)
```
- This one makes two thousand requests per day or so recently:
```
# grep -c BUbiNG /var/log/nginx/access.log /var/log/nginx/access.log.1
/var/log/nginx/access.log:1925
/var/log/nginx/access.log.1:2029
```
- And they have 30 IPs, so fuck that shit I'm going to add them to the Tomcat Crawler Session Manager Valve nowwww
- Lots of discussions on the dspace-tech mailing list over the last few years about leaky transactions being a known problem with DSpace
- Helix84 recommends restarting PostgreSQL instead of Tomcat because it restarts quicker
- This is how the connections looked when it crashed this afternoon:
```
$ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c
5 dspaceApi
290 dspaceWeb
```
- This is how it is right now:
```
$ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c
5 dspaceApi
5 dspaceWeb
```
- So is this just some fucked up XMLUI database leaking?
- I notice there is an issue (that I've probably noticed before) on the Jira tracker about this that was fixed in DSpace 5.7: https://jira.duraspace.org/browse/DS-3551
- I seriously doubt this leaking shit is fixed for sure, but I'm gonna cherry-pick all those commits and try them on DSpace Test and probably even CGSpace because I'm fed up with this shit
- If I do, I need to remember to remove them when I rebase `5_x-prod` on top of the latest upstream 5.x branch later

View File

@ -23,7 +23,7 @@ I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu’s munin-pl
<meta property="article:published_time" content="2018-02-01T16:28:54&#43;02:00"/> <meta property="article:published_time" content="2018-02-01T16:28:54&#43;02:00"/>
<meta property="article:modified_time" content="2018-02-07T18:33:26&#43;02:00"/> <meta property="article:modified_time" content="2018-02-07T18:34:39&#43;02:00"/>
@ -57,9 +57,9 @@ I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu&rsquo;s munin-pl
"@type": "BlogPosting", "@type": "BlogPosting",
"headline": "February, 2018", "headline": "February, 2018",
"url": "https://alanorth.github.io/cgspace-notes/2018-02/", "url": "https://alanorth.github.io/cgspace-notes/2018-02/",
"wordCount": "1363", "wordCount": "2087",
"datePublished": "2018-02-01T16:28:54&#43;02:00", "datePublished": "2018-02-01T16:28:54&#43;02:00",
"dateModified": "2018-02-07T18:33:26&#43;02:00", "dateModified": "2018-02-07T18:34:39&#43;02:00",
"author": { "author": {
"@type": "Person", "@type": "Person",
"name": "Alan Orth" "name": "Alan Orth"
@ -341,6 +341,119 @@ $ psql -c 'select * from pg_stat_activity' | grep dspaceWeb | grep -c &quot;idle
Exception in thread &quot;http-bio-127.0.0.1-8081-exec-58&quot; java.lang.OutOfMemoryError: Java heap space Exception in thread &quot;http-bio-127.0.0.1-8081-exec-58&quot; java.lang.OutOfMemoryError: Java heap space
</code></pre> </code></pre>
<ul>
<li>I&rsquo;m trying to find a way to determine what was using all those Tomcat sessions, but parsing the DSpace log is hard because some IPs are IPv6, which contain colons!</li>
<li>Looking at the first crash this morning around 11, I see these IPv4 addresses making requests around 10 and 11AM:</li>
</ul>
<pre><code>$ grep -E '^2018-02-07 (10|11)' dspace.log.2018-02-07 | grep -o -E 'ip_addr=[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | sort -n | uniq -c | sort -n | tail -n 20
34 ip_addr=46.229.168.67
34 ip_addr=46.229.168.73
37 ip_addr=46.229.168.76
40 ip_addr=34.232.65.41
41 ip_addr=46.229.168.71
44 ip_addr=197.210.168.174
55 ip_addr=181.137.2.214
55 ip_addr=213.55.99.121
58 ip_addr=46.229.168.65
64 ip_addr=66.249.66.91
67 ip_addr=66.249.66.90
71 ip_addr=207.46.13.54
78 ip_addr=130.82.1.40
104 ip_addr=40.77.167.36
151 ip_addr=68.180.228.157
174 ip_addr=207.46.13.135
194 ip_addr=54.83.138.123
198 ip_addr=40.77.167.62
210 ip_addr=207.46.13.71
214 ip_addr=104.196.152.243
</code></pre>
<ul>
<li>These IPs made thousands of sessions today:</li>
</ul>
<pre><code>$ grep 104.196.152.243 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
530
$ grep 207.46.13.71 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
859
$ grep 40.77.167.62 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
610
$ grep 54.83.138.123 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
8
$ grep 207.46.13.135 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
826
$ grep 68.180.228.157 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
727
$ grep 40.77.167.36 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
181
$ grep 130.82.1.40 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
24
$ grep 207.46.13.54 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
166
$ grep 46.229.168 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
992
</code></pre>
<ul>
<li>Let&rsquo;s investigate who these IPs belong to:
<ul>
<li>104.196.152.243 is CIAT, which is already marked as a bot via nginx!</li>
<li>207.46.13.71 is Bing, which is already marked as a bot in Tomcat&rsquo;s Crawler Session Manager Valve!</li>
<li>40.77.167.62 is Bing, which is already marked as a bot in Tomcat&rsquo;s Crawler Session Manager Valve!</li>
<li>207.46.13.135 is Bing, which is already marked as a bot in Tomcat&rsquo;s Crawler Session Manager Valve!</li>
<li>68.180.228.157 is Yahoo, which is already marked as a bot in Tomcat&rsquo;s Crawler Session Manager Valve!</li>
<li>40.77.167.36 is Bing, which is already marked as a bot in Tomcat&rsquo;s Crawler Session Manager Valve!</li>
<li>207.46.13.54 is Bing, which is already marked as a bot in Tomcat&rsquo;s Crawler Session Manager Valve!</li>
<li>46.229.168.x is Semrush, which is already marked as a bot in Tomcat&rsquo;s Crawler Session Manager Valve!</li>
</ul></li>
<li>Nice, so these are all known bots that are already crammed into one session by Tomcat&rsquo;s Crawler Session Manager Valve.</li>
<li>What in the actual fuck, why is our load doing this? It&rsquo;s gotta be something fucked up with the database pool being &ldquo;busy&rdquo; but everything is fucking idle</li>
<li>One that I should probably add in nginx is 54.83.138.123, which is apparently the following user agent:</li>
</ul>
<pre><code>BUbiNG (+http://law.di.unimi.it/BUbiNG.html)
</code></pre>
<ul>
<li>This one makes two thousand requests per day or so recently:</li>
</ul>
<pre><code># grep -c BUbiNG /var/log/nginx/access.log /var/log/nginx/access.log.1
/var/log/nginx/access.log:1925
/var/log/nginx/access.log.1:2029
</code></pre>
<ul>
<li>And they have 30 IPs, so fuck that shit I&rsquo;m going to add them to the Tomcat Crawler Session Manager Valve nowwww</li>
<li>Lots of discussions on the dspace-tech mailing list over the last few years about leaky transactions being a known problem with DSpace</li>
<li>Helix84 recommends restarting PostgreSQL instead of Tomcat because it restarts quicker</li>
<li>This is how the connections looked when it crashed this afternoon:</li>
</ul>
<pre><code>$ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c
5 dspaceApi
290 dspaceWeb
</code></pre>
<ul>
<li>This is how it is right now:</li>
</ul>
<pre><code>$ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c
5 dspaceApi
5 dspaceWeb
</code></pre>
<ul>
<li>So is this just some fucked up XMLUI database leaking?</li>
<li>I notice there is an issue (that I&rsquo;ve probably noticed before) on the Jira tracker about this that was fixed in DSpace 5.7: <a href="https://jira.duraspace.org/browse/DS-3551">https://jira.duraspace.org/browse/DS-3551</a></li>
<li>I seriously doubt this leaking shit is fixed for sure, but I&rsquo;m gonna cherry-pick all those commits and try them on DSpace Test and probably even CGSpace because I&rsquo;m fed up with this shit</li>
<li>If I do, I need to remember to remove them when I rebase <code>5_x-prod</code> on top of the latest upstream 5.x branch later</li>
</ul>

View File

@ -4,7 +4,7 @@
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/2018-02/</loc> <loc>https://alanorth.github.io/cgspace-notes/2018-02/</loc>
<lastmod>2018-02-07T18:33:26+02:00</lastmod> <lastmod>2018-02-07T18:34:39+02:00</lastmod>
</url> </url>
<url> <url>
@ -149,7 +149,7 @@
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/</loc> <loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2018-02-07T18:33:26+02:00</lastmod> <lastmod>2018-02-07T18:34:39+02:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>
@ -160,7 +160,7 @@
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc> <loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
<lastmod>2018-02-07T18:33:26+02:00</lastmod> <lastmod>2018-02-07T18:34:39+02:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>
@ -172,13 +172,13 @@
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/post/</loc> <loc>https://alanorth.github.io/cgspace-notes/post/</loc>
<lastmod>2018-02-07T18:33:26+02:00</lastmod> <lastmod>2018-02-07T18:34:39+02:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc> <loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
<lastmod>2018-02-07T18:33:26+02:00</lastmod> <lastmod>2018-02-07T18:34:39+02:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>