From a421ac82279cc5bdfadf14853f03b1ad95792051 Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Wed, 10 Jan 2018 13:05:02 +0200 Subject: [PATCH] Update notes --- content/post/2018-01.md | 78 +++++++++++++++++++++++++++++++++ public/2018-01/index.html | 92 +++++++++++++++++++++++++++++++++++++-- public/sitemap.xml | 10 ++--- 3 files changed, 172 insertions(+), 8 deletions(-) diff --git a/content/post/2018-01.md b/content/post/2018-01.md index 2cee8bf46..fd88e9ae1 100644 --- a/content/post/2018-01.md +++ b/content/post/2018-01.md @@ -370,6 +370,84 @@ $ http 'http://localhost:3000/solr/statistics/select?q=-owningColl%3A*&wt=json&i "response":{"numFound":34879872,"start":0,"docs":[ ``` +- I tested the `dspace stats-util -s` process on my local machine and it failed the same way +- It doesn't seem to be helpful, but the dspace log shows this: + +``` +2018-01-10 10:51:19,301 INFO org.dspace.statistics.SolrLogger @ Created core with name: statistics-2016 +2018-01-10 10:51:19,301 INFO org.dspace.statistics.SolrLogger @ Moving: 3821 records into core statistics-2016 +``` + +- Terry Brady has written some notes on the DSpace Wiki about Solr sharing issues: https://wiki.duraspace.org/display/%7Eterrywbrady/Statistics+Import+Export+Issues +- Uptime Robot said that CGSpace went down at around 9:43 AM +- I looked at PostgreSQL's `pg_stat_activity` table and saw 161 active connections, but no pool errors in the DSpace logs: + +``` +$ grep -c "Timeout: Pool empty." dspace.log.2018-01-10 +0 +``` + +- The XMLUI logs show quite a bit of activity today: + +``` +# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep "10/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail + 951 207.46.13.159 + 954 157.55.39.123 + 1217 95.108.181.88 + 1503 104.196.152.243 + 6455 70.36.107.50 + 11412 70.36.107.190 + 16730 70.36.107.49 + 17386 2607:fa98:40:9:26b6:fdff:feff:1c96 + 21566 2607:fa98:40:9:26b6:fdff:feff:195d + 45384 2607:fa98:40:9:26b6:fdff:feff:1888 +``` + +- The user agent for the top six or so IPs are all the same: + +``` +"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36" +``` + +- `whois` says they come from [Perfect IP](http://www.perfectip.net/) +- I've never seen those top IPs before, but they have created 50,000 Tomcat sessions today: + +``` +$ grep -E '(2607:fa98:40:9:26b6:fdff:feff:1888|2607:fa98:40:9:26b6:fdff:feff:195d|2607:fa98:40:9:26b6:fdff:feff:1c96|70.36.107.49|70.36.107.190|70.36.107.50)' /home/cgspace.cgiar.org/log/dspace.log.2018-01-10 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l +49096 +``` + +- Rather than blocking their IPs, I think I might just add their user agent to the "badbots" zone with Baidu, because they seem to be the only ones using that user agent: + +``` +# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari +/537.36" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail + 6796 70.36.107.50 + 11870 70.36.107.190 + 17323 70.36.107.49 + 19204 2607:fa98:40:9:26b6:fdff:feff:1c96 + 23401 2607:fa98:40:9:26b6:fdff:feff:195d + 47875 2607:fa98:40:9:26b6:fdff:feff:1888 +``` + +- I added the user agent to nginx's badbots limit req zone but upon testing the config I got an error: + +``` +# nginx -t +nginx: [emerg] could not build map_hash, you should increase map_hash_bucket_size: 64 +nginx: configuration file /etc/nginx/nginx.conf test failed +``` + +- According to nginx docs the [bucket size should be a multiple of the CPU's cache alignment](https://nginx.org/en/docs/hash.html), which is 64 for us: + +``` +# cat /proc/cpuinfo | grep cache_alignment | head -n1 +cache_alignment : 64 +``` + +- On our servers that is 64, so I increased this parameter to 128 and deployed the changes to nginx +- Almost immediately the PostgreSQL connections dropped back down to 40 or so, and UptimeRobot said the site was back up +- So that's interesting that we're not out of PostgreSQL connections (current pool maxActive is 300!) but the system is "down" to UptimeRobot and very slow to use - Linode continues to test mitigations for Meltdown and Spectre: https://blog.linode.com/2018/01/03/cpu-vulnerabilities-meltdown-spectre/ - I rebooted DSpace Test to see if the kernel will be updated (currently Linux 4.14.12-x86_64-linode92)... nope. - It looks like Linode will reboot the KVM hosts later this week, though diff --git a/public/2018-01/index.html b/public/2018-01/index.html index 709f21fa2..b1325645d 100644 --- a/public/2018-01/index.html +++ b/public/2018-01/index.html @@ -92,7 +92,7 @@ Danny wrote to ask for help renewing the wildcard ilri.org certificate and I adv - + @@ -194,9 +194,9 @@ Danny wrote to ask for help renewing the wildcard ilri.org certificate and I adv "@type": "BlogPosting", "headline": "January, 2018", "url": "https://alanorth.github.io/cgspace-notes/2018-01/", - "wordCount": "1602", + "wordCount": "2043", "datePublished": "2018-01-02T08:35:54-08:00", - "dateModified": "2018-01-09T10:02:52+01:00", + "dateModified": "2018-01-10T09:39:16+02:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -657,6 +657,92 @@ $ http 'http://localhost:3000/solr/statistics/select?q=-owningColl%3A*&wt=js + +
2018-01-10 10:51:19,301 INFO  org.dspace.statistics.SolrLogger @ Created core with name: statistics-2016
+2018-01-10 10:51:19,301 INFO  org.dspace.statistics.SolrLogger @ Moving: 3821 records into core statistics-2016
+
+ + + +
$ grep -c "Timeout: Pool empty." dspace.log.2018-01-10 
+0
+
+ + + +
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep "10/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
+    951 207.46.13.159
+    954 157.55.39.123
+   1217 95.108.181.88
+   1503 104.196.152.243
+   6455 70.36.107.50
+  11412 70.36.107.190
+  16730 70.36.107.49
+  17386 2607:fa98:40:9:26b6:fdff:feff:1c96
+  21566 2607:fa98:40:9:26b6:fdff:feff:195d
+  45384 2607:fa98:40:9:26b6:fdff:feff:1888
+
+ + + +
"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36"
+
+ + + +
$ grep -E '(2607:fa98:40:9:26b6:fdff:feff:1888|2607:fa98:40:9:26b6:fdff:feff:195d|2607:fa98:40:9:26b6:fdff:feff:1c96|70.36.107.49|70.36.107.190|70.36.107.50)' /home/cgspace.cgiar.org/log/dspace.log.2018-01-10 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l                                                                                                                                                                                                  
+49096
+
+ + + +
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari
+/537.36" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
+   6796 70.36.107.50
+  11870 70.36.107.190
+  17323 70.36.107.49
+  19204 2607:fa98:40:9:26b6:fdff:feff:1c96
+  23401 2607:fa98:40:9:26b6:fdff:feff:195d 
+  47875 2607:fa98:40:9:26b6:fdff:feff:1888
+
+ + + +
# nginx -t
+nginx: [emerg] could not build map_hash, you should increase map_hash_bucket_size: 64
+nginx: configuration file /etc/nginx/nginx.conf test failed
+
+ + + +
# cat /proc/cpuinfo | grep cache_alignment | head -n1
+cache_alignment : 64
+
+ +