diff --git a/content/post/2018-02.md b/content/post/2018-02.md index db651c2db..b57a25eb9 100644 --- a/content/post/2018-02.md +++ b/content/post/2018-02.md @@ -198,3 +198,105 @@ $ psql -c 'select * from pg_stat_activity' | grep dspaceWeb | grep -c "idle in t Wed Feb 07 15:01:54 UTC 2018 | Query:containerItem:91917 AND type:2 Exception in thread "http-bio-127.0.0.1-8081-exec-58" java.lang.OutOfMemoryError: Java heap space ``` + +- I'm trying to find a way to determine what was using all those Tomcat sessions, but parsing the DSpace log is hard because some IPs are IPv6, which contain colons! +- Looking at the first crash this morning around 11, I see these IPv4 addresses making requests around 10 and 11AM: + +``` +$ grep -E '^2018-02-07 (10|11)' dspace.log.2018-02-07 | grep -o -E 'ip_addr=[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | sort -n | uniq -c | sort -n | tail -n 20 + 34 ip_addr=46.229.168.67 + 34 ip_addr=46.229.168.73 + 37 ip_addr=46.229.168.76 + 40 ip_addr=34.232.65.41 + 41 ip_addr=46.229.168.71 + 44 ip_addr=197.210.168.174 + 55 ip_addr=181.137.2.214 + 55 ip_addr=213.55.99.121 + 58 ip_addr=46.229.168.65 + 64 ip_addr=66.249.66.91 + 67 ip_addr=66.249.66.90 + 71 ip_addr=207.46.13.54 + 78 ip_addr=130.82.1.40 + 104 ip_addr=40.77.167.36 + 151 ip_addr=68.180.228.157 + 174 ip_addr=207.46.13.135 + 194 ip_addr=54.83.138.123 + 198 ip_addr=40.77.167.62 + 210 ip_addr=207.46.13.71 + 214 ip_addr=104.196.152.243 +``` + +- These IPs made thousands of sessions today: + +``` +$ grep 104.196.152.243 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l +530 +$ grep 207.46.13.71 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l +859 +$ grep 40.77.167.62 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l +610 +$ grep 54.83.138.123 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l +8 +$ grep 207.46.13.135 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l +826 +$ grep 68.180.228.157 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l +727 +$ grep 40.77.167.36 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l +181 +$ grep 130.82.1.40 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l +24 +$ grep 207.46.13.54 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l +166 +$ grep 46.229.168 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l +992 + +``` + +- Let's investigate who these IPs belong to: + - 104.196.152.243 is CIAT, which is already marked as a bot via nginx! + - 207.46.13.71 is Bing, which is already marked as a bot in Tomcat's Crawler Session Manager Valve! + - 40.77.167.62 is Bing, which is already marked as a bot in Tomcat's Crawler Session Manager Valve! + - 207.46.13.135 is Bing, which is already marked as a bot in Tomcat's Crawler Session Manager Valve! + - 68.180.228.157 is Yahoo, which is already marked as a bot in Tomcat's Crawler Session Manager Valve! + - 40.77.167.36 is Bing, which is already marked as a bot in Tomcat's Crawler Session Manager Valve! + - 207.46.13.54 is Bing, which is already marked as a bot in Tomcat's Crawler Session Manager Valve! + - 46.229.168.x is Semrush, which is already marked as a bot in Tomcat's Crawler Session Manager Valve! +- Nice, so these are all known bots that are already crammed into one session by Tomcat's Crawler Session Manager Valve. +- What in the actual fuck, why is our load doing this? It's gotta be something fucked up with the database pool being "busy" but everything is fucking idle +- One that I should probably add in nginx is 54.83.138.123, which is apparently the following user agent: + +``` +BUbiNG (+http://law.di.unimi.it/BUbiNG.html) +``` + +- This one makes two thousand requests per day or so recently: + +``` +# grep -c BUbiNG /var/log/nginx/access.log /var/log/nginx/access.log.1 +/var/log/nginx/access.log:1925 +/var/log/nginx/access.log.1:2029 +``` + +- And they have 30 IPs, so fuck that shit I'm going to add them to the Tomcat Crawler Session Manager Valve nowwww +- Lots of discussions on the dspace-tech mailing list over the last few years about leaky transactions being a known problem with DSpace +- Helix84 recommends restarting PostgreSQL instead of Tomcat because it restarts quicker +- This is how the connections looked when it crashed this afternoon: + +``` +$ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c + 5 dspaceApi + 290 dspaceWeb +``` + +- This is how it is right now: + +``` +$ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c + 5 dspaceApi + 5 dspaceWeb +``` + +- So is this just some fucked up XMLUI database leaking? +- I notice there is an issue (that I've probably noticed before) on the Jira tracker about this that was fixed in DSpace 5.7: https://jira.duraspace.org/browse/DS-3551 +- I seriously doubt this leaking shit is fixed for sure, but I'm gonna cherry-pick all those commits and try them on DSpace Test and probably even CGSpace because I'm fed up with this shit +- If I do, I need to remember to remove them when I rebase `5_x-prod` on top of the latest upstream 5.x branch later diff --git a/public/2018-02/index.html b/public/2018-02/index.html index ba2b8ec0a..537ae8d11 100644 --- a/public/2018-02/index.html +++ b/public/2018-02/index.html @@ -23,7 +23,7 @@ I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu’s munin-pl - + @@ -57,9 +57,9 @@ I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu’s munin-pl "@type": "BlogPosting", "headline": "February, 2018", "url": "https://alanorth.github.io/cgspace-notes/2018-02/", - "wordCount": "1363", + "wordCount": "2087", "datePublished": "2018-02-01T16:28:54+02:00", - "dateModified": "2018-02-07T18:33:26+02:00", + "dateModified": "2018-02-07T18:34:39+02:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -341,6 +341,119 @@ $ psql -c 'select * from pg_stat_activity' | grep dspaceWeb | grep -c "idle Exception in thread "http-bio-127.0.0.1-8081-exec-58" java.lang.OutOfMemoryError: Java heap space + + +
$ grep -E '^2018-02-07 (10|11)' dspace.log.2018-02-07 | grep -o -E 'ip_addr=[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | sort -n | uniq -c | sort -n | tail -n 20
+     34 ip_addr=46.229.168.67
+     34 ip_addr=46.229.168.73
+     37 ip_addr=46.229.168.76
+     40 ip_addr=34.232.65.41
+     41 ip_addr=46.229.168.71
+     44 ip_addr=197.210.168.174
+     55 ip_addr=181.137.2.214
+     55 ip_addr=213.55.99.121
+     58 ip_addr=46.229.168.65
+     64 ip_addr=66.249.66.91
+     67 ip_addr=66.249.66.90
+     71 ip_addr=207.46.13.54
+     78 ip_addr=130.82.1.40
+    104 ip_addr=40.77.167.36
+    151 ip_addr=68.180.228.157
+    174 ip_addr=207.46.13.135
+    194 ip_addr=54.83.138.123
+    198 ip_addr=40.77.167.62
+    210 ip_addr=207.46.13.71
+    214 ip_addr=104.196.152.243
+
+ + + +
$ grep 104.196.152.243 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
+530
+$ grep 207.46.13.71 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
+859
+$ grep 40.77.167.62 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
+610
+$ grep 54.83.138.123 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
+8
+$ grep 207.46.13.135 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
+826
+$ grep 68.180.228.157 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
+727
+$ grep 40.77.167.36 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
+181
+$ grep 130.82.1.40 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
+24
+$ grep 207.46.13.54 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
+166
+$ grep 46.229.168 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
+992
+
+
+ + + +
BUbiNG (+http://law.di.unimi.it/BUbiNG.html)
+
+ + + +
# grep -c BUbiNG /var/log/nginx/access.log /var/log/nginx/access.log.1
+/var/log/nginx/access.log:1925
+/var/log/nginx/access.log.1:2029
+
+ + + +
$ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c
+      5 dspaceApi
+    290 dspaceWeb
+
+ + + +
$ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c
+      5 dspaceApi
+      5 dspaceWeb
+
+ + + diff --git a/public/sitemap.xml b/public/sitemap.xml index 48d5a6e4f..6556eb90d 100644 --- a/public/sitemap.xml +++ b/public/sitemap.xml @@ -4,7 +4,7 @@ https://alanorth.github.io/cgspace-notes/2018-02/ - 2018-02-07T18:33:26+02:00 + 2018-02-07T18:34:39+02:00 @@ -149,7 +149,7 @@ https://alanorth.github.io/cgspace-notes/ - 2018-02-07T18:33:26+02:00 + 2018-02-07T18:34:39+02:00 0 @@ -160,7 +160,7 @@ https://alanorth.github.io/cgspace-notes/tags/notes/ - 2018-02-07T18:33:26+02:00 + 2018-02-07T18:34:39+02:00 0 @@ -172,13 +172,13 @@ https://alanorth.github.io/cgspace-notes/post/ - 2018-02-07T18:33:26+02:00 + 2018-02-07T18:34:39+02:00 0 https://alanorth.github.io/cgspace-notes/tags/ - 2018-02-07T18:33:26+02:00 + 2018-02-07T18:34:39+02:00 0