diff --git a/content/posts/2018-10.md b/content/posts/2018-10.md index ad8a30719..d2dd41c0c 100644 --- a/content/posts/2018-10.md +++ b/content/posts/2018-10.md @@ -53,5 +53,47 @@ Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752 - It appears to be Jim Lorenzen... I need to check that later! - I merged the changes to the `5_x-prod` branch ([#390](https://github.com/ilri/DSpace/pull/390)) +- Linode sent another alert about CPU usage on CGSpace (linode18) this evening +- It seems that Moayad is making quite a lot of requests today: + +``` +# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Oct/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 + 1594 157.55.39.160 + 1627 157.55.39.173 + 1774 136.243.6.84 + 4228 35.237.175.180 + 4497 70.32.83.92 + 4856 66.249.64.59 + 7120 50.116.102.77 + 12518 138.201.49.199 + 87646 34.218.226.147 + 111729 213.139.53.62 +``` + +- But in super positive news, he says they are using my new [dspace-statistics-api](https://github.com/alanorth/dspace-statistics-api) and it's MUCH faster than using Atmire CUA's internal "restlet" API +- I don't recognize the `138.201.49.199` IP, but it is in Germany (Hetzner) and appears to be paginating over some browse pages and downloading bitstreams: + +``` +# grep 138.201.49.199 /var/log/nginx/access.log | grep -o -E 'GET /[a-z]+' | sort | uniq -c + 8324 GET /bitstream + 4193 GET /handle +``` + +- Suspiciously, it's only grabbing the CGIAR System Office community (handle prefix 10947): + +``` +# grep 138.201.49.199 /var/log/nginx/access.log | grep -o -E 'GET /handle/[0-9]{5}' | sort | uniq -c + 7 GET /handle/10568 + 4186 GET /handle/10947 +``` + +- The user agent is suspicious too: + +``` +Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36 +``` + +- It's clearly a bot and it's not re-using its Tomcat session, so I will add its IP to the nginx bad bot list +- I looked in Solr's statistics core and these hits were actually all counted as `isBot:false` (of course)... hmmm diff --git a/docs/2018-10/index.html b/docs/2018-10/index.html index 550095370..0eaf95c9d 100644 --- a/docs/2018-10/index.html +++ b/docs/2018-10/index.html @@ -9,7 +9,7 @@ - + @@ -24,9 +24,9 @@ "@type": "BlogPosting", "headline": "October, 2018", "url": "https://alanorth.github.io/cgspace-notes/2018-10/", - "wordCount": "231", + "wordCount": "460", "datePublished": "2018-10-01T22:31:54+03:00", - "dateModified": "2018-10-03T11:52:48+03:00", + "dateModified": "2018-10-03T17:54:58+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -149,6 +149,52 @@ Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752
5_x-prod
branch (#390)# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Oct/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
+ 1594 157.55.39.160
+ 1627 157.55.39.173
+ 1774 136.243.6.84
+ 4228 35.237.175.180
+ 4497 70.32.83.92
+ 4856 66.249.64.59
+ 7120 50.116.102.77
+ 12518 138.201.49.199
+ 87646 34.218.226.147
+ 111729 213.139.53.62
+
+
+138.201.49.199
IP, but it is in Germany (Hetzner) and appears to be paginating over some browse pages and downloading bitstreams:# grep 138.201.49.199 /var/log/nginx/access.log | grep -o -E 'GET /[a-z]+' | sort | uniq -c
+ 8324 GET /bitstream
+ 4193 GET /handle
+
+
+# grep 138.201.49.199 /var/log/nginx/access.log | grep -o -E 'GET /handle/[0-9]{5}' | sort | uniq -c
+ 7 GET /handle/10568
+ 4186 GET /handle/10947
+
+
+Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36
+
+
+isBot:false
(of course)… hmmm