diff --git a/content/posts/2018-07.md b/content/posts/2018-07.md index b2c4d062f..58fc80914 100644 --- a/content/posts/2018-07.md +++ b/content/posts/2018-07.md @@ -179,5 +179,40 @@ org.apache.solr.client.solrj.SolrServerException: IOException occured when talki ``` - But not sure what caused that... +- I got a message from Linode tonight that CPU usage was high on CGSpace for the past few hours around 8PM GMT +- Looking in the nginx logs I see the top ten IP addresses active today: + +``` +# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "09/Jul/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 + 1691 40.77.167.84 + 1701 40.77.167.69 + 1718 50.116.102.77 + 1872 137.108.70.6 + 2172 157.55.39.234 + 2190 207.46.13.47 + 2848 178.154.200.38 + 4367 35.227.26.162 + 4387 70.32.83.92 + 4738 95.108.181.88 +``` + +- Of those, *all* except `70.32.83.92` and `50.116.102.77` are *NOT* re-using their Tomcat sessions, for example from the XMLUI logs: + +``` +$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=95.108.181.88' dspace.log.2018-07-09 +4435 +``` + +- `95.108.181.88` appears to be Yandex, so I dunno why it's creating so many sessions, as its user agent should match Tomcat's Crawler Session Manager Valve +- `70.32.83.92` is on MediaTemple but I'm not sure who it is. They are mostly hitting REST so I guess that's fine +- `35.227.26.162` doesn't declare a user agent and is on Google Cloud, so I should probably mark them as a bot in nginx +- `178.154.200.38` is Yandex again +- `207.46.13.47` is Bing +- `157.55.39.234` is Bing +- `137.108.70.6` is our old friend CORE bot +- `50.116.102.77` doesn't declare a user agent and lives on HostGator, but mostly just hits the REST API so I guess that's fine +- `40.77.167.84` is Bing again +- Interestingly, the first time that I see `35.227.26.162` was on 2018-06-08 +- I've added `35.227.26.162` to the bot tagging logic in the nginx vhost diff --git a/docs/2018-07/index.html b/docs/2018-07/index.html index 9d635387c..44a2e0888 100644 --- a/docs/2018-07/index.html +++ b/docs/2018-07/index.html @@ -30,7 +30,7 @@ There is insufficient memory for the Java Runtime Environment to continue. - + @@ -71,9 +71,9 @@ There is insufficient memory for the Java Runtime Environment to continue. "@type": "BlogPosting", "headline": "July, 2018", "url": "https://alanorth.github.io/cgspace-notes/2018-07/", - "wordCount": "1213", + "wordCount": "1454", "datePublished": "2018-07-01T12:56:54+03:00", - "dateModified": "2018-07-09T07:51:04+03:00", + "dateModified": "2018-07-09T16:45:50+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -342,6 +342,43 @@ org.apache.solr.client.solrj.SolrServerException: IOException occured when talki
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "09/Jul/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
+ 1691 40.77.167.84
+ 1701 40.77.167.69
+ 1718 50.116.102.77
+ 1872 137.108.70.6
+ 2172 157.55.39.234
+ 2190 207.46.13.47
+ 2848 178.154.200.38
+ 4367 35.227.26.162
+ 4387 70.32.83.92
+ 4738 95.108.181.88
+
+
+70.32.83.92
and 50.116.102.77
are NOT re-using their Tomcat sessions, for example from the XMLUI logs:$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=95.108.181.88' dspace.log.2018-07-09
+4435
+
+
+95.108.181.88
appears to be Yandex, so I dunno why it’s creating so many sessions, as its user agent should match Tomcat’s Crawler Session Manager Valve70.32.83.92
is on MediaTemple but I’m not sure who it is. They are mostly hitting REST so I guess that’s fine35.227.26.162
doesn’t declare a user agent and is on Google Cloud, so I should probably mark them as a bot in nginx178.154.200.38
is Yandex again207.46.13.47
is Bing157.55.39.234
is Bing137.108.70.6
is our old friend CORE bot50.116.102.77
doesn’t declare a user agent and lives on HostGator, but mostly just hits the REST API so I guess that’s fine40.77.167.84
is Bing again35.227.26.162
was on 2018-06-0835.227.26.162
to the bot tagging logic in the nginx vhost