diff --git a/content/posts/2018-10.md b/content/posts/2018-10.md index ad8a30719..d2dd41c0c 100644 --- a/content/posts/2018-10.md +++ b/content/posts/2018-10.md @@ -53,5 +53,47 @@ Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752 - It appears to be Jim Lorenzen... I need to check that later! - I merged the changes to the `5_x-prod` branch ([#390](https://github.com/ilri/DSpace/pull/390)) +- Linode sent another alert about CPU usage on CGSpace (linode18) this evening +- It seems that Moayad is making quite a lot of requests today: + +``` +# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Oct/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 + 1594 157.55.39.160 + 1627 157.55.39.173 + 1774 136.243.6.84 + 4228 35.237.175.180 + 4497 70.32.83.92 + 4856 66.249.64.59 + 7120 50.116.102.77 + 12518 138.201.49.199 + 87646 34.218.226.147 + 111729 213.139.53.62 +``` + +- But in super positive news, he says they are using my new [dspace-statistics-api](https://github.com/alanorth/dspace-statistics-api) and it's MUCH faster than using Atmire CUA's internal "restlet" API +- I don't recognize the `138.201.49.199` IP, but it is in Germany (Hetzner) and appears to be paginating over some browse pages and downloading bitstreams: + +``` +# grep 138.201.49.199 /var/log/nginx/access.log | grep -o -E 'GET /[a-z]+' | sort | uniq -c + 8324 GET /bitstream + 4193 GET /handle +``` + +- Suspiciously, it's only grabbing the CGIAR System Office community (handle prefix 10947): + +``` +# grep 138.201.49.199 /var/log/nginx/access.log | grep -o -E 'GET /handle/[0-9]{5}' | sort | uniq -c + 7 GET /handle/10568 + 4186 GET /handle/10947 +``` + +- The user agent is suspicious too: + +``` +Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36 +``` + +- It's clearly a bot and it's not re-using its Tomcat session, so I will add its IP to the nginx bad bot list +- I looked in Solr's statistics core and these hits were actually all counted as `isBot:false` (of course)... hmmm diff --git a/docs/2018-10/index.html b/docs/2018-10/index.html index 550095370..0eaf95c9d 100644 --- a/docs/2018-10/index.html +++ b/docs/2018-10/index.html @@ -9,7 +9,7 @@ - + @@ -24,9 +24,9 @@ "@type": "BlogPosting", "headline": "October, 2018", "url": "https://alanorth.github.io/cgspace-notes/2018-10/", - "wordCount": "231", + "wordCount": "460", "datePublished": "2018-10-01T22:31:54+03:00", - "dateModified": "2018-10-03T11:52:48+03:00", + "dateModified": "2018-10-03T17:54:58+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -149,6 +149,52 @@ Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752 + +
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Oct/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
+   1594 157.55.39.160
+   1627 157.55.39.173
+   1774 136.243.6.84
+   4228 35.237.175.180
+   4497 70.32.83.92
+   4856 66.249.64.59
+   7120 50.116.102.77
+  12518 138.201.49.199
+  87646 34.218.226.147
+ 111729 213.139.53.62
+
+ + + +
# grep 138.201.49.199 /var/log/nginx/access.log | grep -o -E 'GET /[a-z]+' | sort | uniq -c
+   8324 GET /bitstream
+   4193 GET /handle
+
+ + + +
# grep 138.201.49.199 /var/log/nginx/access.log | grep -o -E 'GET /handle/[0-9]{5}' | sort | uniq -c
+      7 GET /handle/10568
+   4186 GET /handle/10947
+
+ + + +
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36
+
+ + diff --git a/docs/robots.txt b/docs/robots.txt index c81a84496..80af82d91 100644 --- a/docs/robots.txt +++ b/docs/robots.txt @@ -40,7 +40,7 @@ Disallow: /cgspace-notes/2015-12/ Disallow: /cgspace-notes/2015-11/ Disallow: /cgspace-notes/ Disallow: /cgspace-notes/categories/ -Disallow: /cgspace-notes/tags/notes/ Disallow: /cgspace-notes/categories/notes/ +Disallow: /cgspace-notes/tags/notes/ Disallow: /cgspace-notes/posts/ Disallow: /cgspace-notes/tags/ diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 0998785d7..e9869ef00 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,7 +4,7 @@ https://alanorth.github.io/cgspace-notes/2018-10/ - 2018-10-03T11:52:48+03:00 + 2018-10-03T17:54:58+03:00 @@ -189,7 +189,7 @@ https://alanorth.github.io/cgspace-notes/ - 2018-10-03T11:52:48+03:00 + 2018-10-03T17:54:58+03:00 0 @@ -198,27 +198,27 @@ 0 - - https://alanorth.github.io/cgspace-notes/tags/notes/ - 2018-10-03T11:52:48+03:00 - 0 - - https://alanorth.github.io/cgspace-notes/categories/notes/ 2018-03-09T22:10:33+02:00 0 + + https://alanorth.github.io/cgspace-notes/tags/notes/ + 2018-10-03T17:54:58+03:00 + 0 + + https://alanorth.github.io/cgspace-notes/posts/ - 2018-10-03T11:52:48+03:00 + 2018-10-03T17:54:58+03:00 0 https://alanorth.github.io/cgspace-notes/tags/ - 2018-10-03T11:52:48+03:00 + 2018-10-03T17:54:58+03:00 0