diff --git a/content/posts/2018-12.md b/content/posts/2018-12.md index 46767b1ff..2f5046f30 100644 --- a/content/posts/2018-12.md +++ b/content/posts/2018-12.md @@ -369,4 +369,32 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=54.70.40.11' dspace.log.2018-12-05 - Also, I notice they ended up registering a Handle (they had been considering taking KnowledgeArc's advice to *not* use Handles!) - Did some coordination work on the hotel bookings for the January AReS workshop in Amman +## 2018-12-17 + +- Linode alerted me twice today that the load on CGSpace (linode18) was very high +- Looking at the nginx logs I see a few new IPs in the top 10: + +``` +# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "17/Dec/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 + 927 157.55.39.81 + 975 54.70.40.11 + 2090 50.116.102.77 + 2121 66.249.66.219 + 3811 35.237.175.180 + 4590 205.186.128.185 + 4590 70.32.83.92 + 5436 2a01:4f8:173:1e85::2 + 5438 143.233.227.216 + 6706 94.71.244.172 +``` + +- `94.71.244.172` and `143.233.227.216` are both in Greece and use the following user agent: + +``` +Mozilla/3.0 (compatible; Indy Library) +``` + +- I see that I added this bot to the Tomcat Crawler Session Manager valve in 2017-12 so its XMLUI sessions are getting re-used +- `2a01:4f8:173:1e85::2` is some new bot called `BLEXBot/1.0` which should be matching the existing "bot" pattern in the Tomcat Crawler Session Manager regex + diff --git a/docs/2018-12/index.html b/docs/2018-12/index.html index fda6a495e..36c327303 100644 --- a/docs/2018-12/index.html +++ b/docs/2018-12/index.html @@ -21,7 +21,7 @@ I noticed that there is another issue with PDF thumbnails on CGSpace, and I see " /> - + @@ -48,9 +48,9 @@ I noticed that there is another issue with PDF thumbnails on CGSpace, and I see "@type": "BlogPosting", "headline": "December, 2018", "url": "https://alanorth.github.io/cgspace-notes/2018-12/", - "wordCount": "2311", + "wordCount": "2448", "datePublished": "2018-12-02T02:09:30+02:00", - "dateModified": "2018-12-11T12:27:53+03:00", + "dateModified": "2018-12-13T22:50:17+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -535,6 +535,38 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=54.70.40.11' dspace.log.2018-12-05
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "17/Dec/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
+ 927 157.55.39.81
+ 975 54.70.40.11
+ 2090 50.116.102.77
+ 2121 66.249.66.219
+ 3811 35.237.175.180
+ 4590 205.186.128.185
+ 4590 70.32.83.92
+ 5436 2a01:4f8:173:1e85::2
+ 5438 143.233.227.216
+ 6706 94.71.244.172
+
+
+94.71.244.172
and 143.233.227.216
are both in Greece and use the following user agent:Mozilla/3.0 (compatible; Indy Library)
+
+
+2a01:4f8:173:1e85::2
is some new bot called BLEXBot/1.0
which should be matching the existing “bot” pattern in the Tomcat Crawler Session Manager regex