diff --git a/content/posts/2018-11.md b/content/posts/2018-11.md index c7d273bd3..ec2878a3f 100644 --- a/content/posts/2018-11.md +++ b/content/posts/2018-11.md @@ -15,6 +15,9 @@ tags: ["Notes"] - Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage - Today these are the top 10 IPs: + + + ``` # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 1300 66.249.64.63 @@ -61,7 +64,67 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11 - Ah, we've apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day... - I wonder if it's worth adding them to the list of bots in the nginx config? +- Linode sent a mail that CGSpace (linode18) is using high outgoing bandwidth +- Looking at the nginx logs again I see the following top ten IPs: - +``` +# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 + 1979 50.116.102.77 + 1980 35.237.175.180 + 2186 207.46.13.156 + 2208 40.77.167.175 + 2843 66.249.64.63 + 4220 84.38.130.177 + 4537 70.32.83.92 + 5593 66.249.64.61 + 12557 78.46.89.18 + 32152 66.249.64.59 +``` + +- `78.46.89.18` is new since I last checked a few hours ago, and it's from Hetzner with the following user agent: + +``` +Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0 +``` + +- It's making lots of requests and using quite a number of Tomcat sessions: + +``` +$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.89.18' /home/cgspace.cgiar.org/log/dspace.log.2018-11-03 | sort | uniq +8449 +``` + +- I could add this IP to the list of bot IPs in nginx, but it seems like a futile effort when some new IP could come along and do the same thing +- Perhaps I should think about adding rate limits to dynamic pages like `/discover` and `/browse` +- I think it's reasonable for a human to click one of those links five or ten times a minute... +- To contrast, `78.46.89.18` made about 300 requests per minute for a few hours today: + +``` +# grep 78.46.89.18 /var/log/nginx/access.log | grep -o -E '03/Nov/2018:[0-9][0-9]:[0-9][0-9]' | sort | uniq -c | sort -n | tail -n 20 + 286 03/Nov/2018:18:02 + 287 03/Nov/2018:18:21 + 289 03/Nov/2018:18:23 + 291 03/Nov/2018:18:27 + 293 03/Nov/2018:18:34 + 300 03/Nov/2018:17:58 + 300 03/Nov/2018:18:22 + 300 03/Nov/2018:18:32 + 304 03/Nov/2018:18:12 + 305 03/Nov/2018:18:13 + 305 03/Nov/2018:18:24 + 312 03/Nov/2018:18:39 + 322 03/Nov/2018:18:17 + 326 03/Nov/2018:18:38 + 327 03/Nov/2018:18:16 + 330 03/Nov/2018:17:57 + 332 03/Nov/2018:18:19 + 336 03/Nov/2018:17:56 + 340 03/Nov/2018:18:14 + 341 03/Nov/2018:18:18 +``` + +- If they want to download all our metadata and PDFs they should use an API rather than scraping the XMLUI +- I will add them to the list of bot IPs in nginx for now and think about enforcing rate limits in XMLUI later +- Also, this is the third (?) time a mysterious IP in Hetzner has done this... who is this? diff --git a/docs/2018-11/index.html b/docs/2018-11/index.html index b628d3387..2c3bc422b 100644 --- a/docs/2018-11/index.html +++ b/docs/2018-11/index.html @@ -20,62 +20,10 @@ Linode has been sending mails a few times a day recently that CGSpace (linode18) Today these are the top 10 IPs: -# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 - 1300 66.249.64.63 - 1384 35.237.175.180 - 1430 138.201.52.218 - 1455 207.46.13.156 - 1500 40.77.167.175 - 1979 50.116.102.77 - 2790 66.249.64.61 - 3367 84.38.130.177 - 4537 70.32.83.92 - 22508 66.249.64.59 - - - -The 66.249.64.x are definitely Google -70.32.83.92 is well known, probably CCAFS or something, as it’s only a few thousand requests and always to REST API -84.38.130.177 is some new IP in Latvia that is only hitting the XMLUI, using the following user agent: - - -Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1 - - - -They at least seem to be re-using their Tomcat sessions: - - -$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq -342 - - - -50.116.102.77 is also a regular REST API user -40.77.167.175 and 207.46.13.156 seem to be Bing -138.201.52.218 seems to be on Hetzner in Germany, but is using this user agent: - - -Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0 - - - -And it doesn’t seem they are re-using their Tomcat sessions: - - -$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq -1243 - - - -Ah, we’ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day… -I wonder if it’s worth adding them to the list of bots in the nginx config? - - " /> - + @@ -93,58 +41,6 @@ Linode has been sending mails a few times a day recently that CGSpace (linode18) Today these are the top 10 IPs: -# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 - 1300 66.249.64.63 - 1384 35.237.175.180 - 1430 138.201.52.218 - 1455 207.46.13.156 - 1500 40.77.167.175 - 1979 50.116.102.77 - 2790 66.249.64.61 - 3367 84.38.130.177 - 4537 70.32.83.92 - 22508 66.249.64.59 - - - -The 66.249.64.x are definitely Google -70.32.83.92 is well known, probably CCAFS or something, as it’s only a few thousand requests and always to REST API -84.38.130.177 is some new IP in Latvia that is only hitting the XMLUI, using the following user agent: - - -Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1 - - - -They at least seem to be re-using their Tomcat sessions: - - -$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq -342 - - - -50.116.102.77 is also a regular REST API user -40.77.167.175 and 207.46.13.156 seem to be Bing -138.201.52.218 seems to be on Hetzner in Germany, but is using this user agent: - - -Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0 - - - -And it doesn’t seem they are re-using their Tomcat sessions: - - -$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq -1243 - - - -Ah, we’ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day… -I wonder if it’s worth adding them to the list of bots in the nginx config? - - "/> @@ -156,9 +52,9 @@ I wonder if it’s worth adding them to the list of bots in the nginx config "@type": "BlogPosting", "headline": "November, 2018", "url": "https://alanorth.github.io/cgspace-notes/2018-11/", - "wordCount": "260", + "wordCount": "586", "datePublished": "2018-11-01T16:41:30+02:00", - "dateModified": "2018-11-01T16:43:37+02:00", + "dateModified": "2018-11-03T18:13:49+02:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -238,6 +134,8 @@ I wonder if it’s worth adding them to the list of bots in the nginx config
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1300 66.249.64.63
1384 35.237.175.180
@@ -288,9 +186,73 @@ I wonder if it’s worth adding them to the list of bots in the nginx config
- Ah, we’ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day…
- I wonder if it’s worth adding them to the list of bots in the nginx config?
+- Linode sent a mail that CGSpace (linode18) is using high outgoing bandwidth
+- Looking at the nginx logs again I see the following top ten IPs:
-
+# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
+ 1979 50.116.102.77
+ 1980 35.237.175.180
+ 2186 207.46.13.156
+ 2208 40.77.167.175
+ 2843 66.249.64.63
+ 4220 84.38.130.177
+ 4537 70.32.83.92
+ 5593 66.249.64.61
+ 12557 78.46.89.18
+ 32152 66.249.64.59
+
+
+
+78.46.89.18
is new since I last checked a few hours ago, and it’s from Hetzner with the following user agent:
+
+
+Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
+
+
+
+- It’s making lots of requests and using quite a number of Tomcat sessions:
+
+
+$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.89.18' /home/cgspace.cgiar.org/log/dspace.log.2018-11-03 | sort | uniq
+8449
+
+
+
+- I could add this IP to the list of bot IPs in nginx, but it seems like a futile effort when some new IP could come along and do the same thing
+- Perhaps I should think about adding rate limits to dynamic pages like
/discover
and /browse
+- I think it’s reasonable for a human to click one of those links five or ten times a minute…
+- To contrast,
78.46.89.18
made about 300 requests per minute for a few hours today:
+
+
+# grep 78.46.89.18 /var/log/nginx/access.log | grep -o -E '03/Nov/2018:[0-9][0-9]:[0-9][0-9]' | sort | uniq -c | sort -n | tail -n 20
+ 286 03/Nov/2018:18:02
+ 287 03/Nov/2018:18:21
+ 289 03/Nov/2018:18:23
+ 291 03/Nov/2018:18:27
+ 293 03/Nov/2018:18:34
+ 300 03/Nov/2018:17:58
+ 300 03/Nov/2018:18:22
+ 300 03/Nov/2018:18:32
+ 304 03/Nov/2018:18:12
+ 305 03/Nov/2018:18:13
+ 305 03/Nov/2018:18:24
+ 312 03/Nov/2018:18:39
+ 322 03/Nov/2018:18:17
+ 326 03/Nov/2018:18:38
+ 327 03/Nov/2018:18:16
+ 330 03/Nov/2018:17:57
+ 332 03/Nov/2018:18:19
+ 336 03/Nov/2018:17:56
+ 340 03/Nov/2018:18:14
+ 341 03/Nov/2018:18:18
+
+
+
+- If they want to download all our metadata and PDFs they should use an API rather than scraping the XMLUI
+- I will add them to the list of bot IPs in nginx for now and think about enforcing rate limits in XMLUI later
+- Also, this is the third (?) time a mysterious IP in Hetzner has done this… who is this?
+
diff --git a/docs/categories/index.html b/docs/categories/index.html
index cadaf6a54..f0adf6a89 100644
--- a/docs/categories/index.html
+++ b/docs/categories/index.html
@@ -110,58 +110,6 @@
Today these are the top 10 IPs:
-# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
- 1300 66.249.64.63
- 1384 35.237.175.180
- 1430 138.201.52.218
- 1455 207.46.13.156
- 1500 40.77.167.175
- 1979 50.116.102.77
- 2790 66.249.64.61
- 3367 84.38.130.177
- 4537 70.32.83.92
- 22508 66.249.64.59
-
-
-
-- The
66.249.64.x
are definitely Google
-70.32.83.92
is well known, probably CCAFS or something, as it’s only a few thousand requests and always to REST API
-84.38.130.177
is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:
-
-
-Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
-
-
-
-- They at least seem to be re-using their Tomcat sessions:
-
-
-$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
-342
-
-
-
-50.116.102.77
is also a regular REST API user
-40.77.167.175
and 207.46.13.156
seem to be Bing
-138.201.52.218
seems to be on Hetzner in Germany, but is using this user agent:
-
-
-Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
-
-
-
-- And it doesn’t seem they are re-using their Tomcat sessions:
-
-
-$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
-1243
-
-
-
-- Ah, we’ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day…
-- I wonder if it’s worth adding them to the list of bots in the nginx config?
-
-
Read more →
diff --git a/docs/index.html b/docs/index.html
index 013095086..23bec0f5d 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -112,58 +112,6 @@
Today these are the top 10 IPs:
-# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
- 1300 66.249.64.63
- 1384 35.237.175.180
- 1430 138.201.52.218
- 1455 207.46.13.156
- 1500 40.77.167.175
- 1979 50.116.102.77
- 2790 66.249.64.61
- 3367 84.38.130.177
- 4537 70.32.83.92
- 22508 66.249.64.59
-
-
-
-- The
66.249.64.x
are definitely Google
-70.32.83.92
is well known, probably CCAFS or something, as it’s only a few thousand requests and always to REST API
-84.38.130.177
is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:
-
-
-Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
-
-
-
-- They at least seem to be re-using their Tomcat sessions:
-
-
-$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
-342
-
-
-
-50.116.102.77
is also a regular REST API user
-40.77.167.175
and 207.46.13.156
seem to be Bing
-138.201.52.218
seems to be on Hetzner in Germany, but is using this user agent:
-
-
-Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
-
-
-
-- And it doesn’t seem they are re-using their Tomcat sessions:
-
-
-$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
-1243
-
-
-
-- Ah, we’ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day…
-- I wonder if it’s worth adding them to the list of bots in the nginx config?
-
-
Read more →
diff --git a/docs/index.xml b/docs/index.xml
index 90c7c10ee..6ca38da91 100644
--- a/docs/index.xml
+++ b/docs/index.xml
@@ -31,58 +31,6 @@
<li>Today these are the top 10 IPs:</li>
</ul>
-<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
- 1300 66.249.64.63
- 1384 35.237.175.180
- 1430 138.201.52.218
- 1455 207.46.13.156
- 1500 40.77.167.175
- 1979 50.116.102.77
- 2790 66.249.64.61
- 3367 84.38.130.177
- 4537 70.32.83.92
- 22508 66.249.64.59
-</code></pre>
-
-<ul>
-<li>The <code>66.249.64.x</code> are definitely Google</li>
-<li><code>70.32.83.92</code> is well known, probably CCAFS or something, as it’s only a few thousand requests and always to REST API</li>
-<li><code>84.38.130.177</code> is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:</li>
-</ul>
-
-<pre><code>Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
-</code></pre>
-
-<ul>
-<li>They at least seem to be re-using their Tomcat sessions:</li>
-</ul>
-
-<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
-342
-</code></pre>
-
-<ul>
-<li><code>50.116.102.77</code> is also a regular REST API user</li>
-<li><code>40.77.167.175</code> and <code>207.46.13.156</code> seem to be Bing</li>
-<li><code>138.201.52.218</code> seems to be on Hetzner in Germany, but is using this user agent:</li>
-</ul>
-
-<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
-</code></pre>
-
-<ul>
-<li>And it doesn’t seem they are re-using their Tomcat sessions:</li>
-</ul>
-
-<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
-1243
-</code></pre>
-
-<ul>
-<li>Ah, we’ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day…</li>
-<li>I wonder if it’s worth adding them to the list of bots in the nginx config?</li>
-</ul>
-
<p></p>
diff --git a/docs/posts/index.html b/docs/posts/index.html
index 2ace50760..3cef573e3 100644
--- a/docs/posts/index.html
+++ b/docs/posts/index.html
@@ -112,58 +112,6 @@
Today these are the top 10 IPs:
-# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
- 1300 66.249.64.63
- 1384 35.237.175.180
- 1430 138.201.52.218
- 1455 207.46.13.156
- 1500 40.77.167.175
- 1979 50.116.102.77
- 2790 66.249.64.61
- 3367 84.38.130.177
- 4537 70.32.83.92
- 22508 66.249.64.59
-
-
-
-- The
66.249.64.x
are definitely Google
-70.32.83.92
is well known, probably CCAFS or something, as it’s only a few thousand requests and always to REST API
-84.38.130.177
is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:
-
-
-Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
-
-
-
-- They at least seem to be re-using their Tomcat sessions:
-
-
-$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
-342
-
-
-
-50.116.102.77
is also a regular REST API user
-40.77.167.175
and 207.46.13.156
seem to be Bing
-138.201.52.218
seems to be on Hetzner in Germany, but is using this user agent:
-
-
-Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
-
-
-
-- And it doesn’t seem they are re-using their Tomcat sessions:
-
-
-$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
-1243
-
-
-
-- Ah, we’ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day…
-- I wonder if it’s worth adding them to the list of bots in the nginx config?
-
-
Read more →
diff --git a/docs/posts/index.xml b/docs/posts/index.xml
index 7900abe7d..f84ea936a 100644
--- a/docs/posts/index.xml
+++ b/docs/posts/index.xml
@@ -31,58 +31,6 @@
<li>Today these are the top 10 IPs:</li>
</ul>
-<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
- 1300 66.249.64.63
- 1384 35.237.175.180
- 1430 138.201.52.218
- 1455 207.46.13.156
- 1500 40.77.167.175
- 1979 50.116.102.77
- 2790 66.249.64.61
- 3367 84.38.130.177
- 4537 70.32.83.92
- 22508 66.249.64.59
-</code></pre>
-
-<ul>
-<li>The <code>66.249.64.x</code> are definitely Google</li>
-<li><code>70.32.83.92</code> is well known, probably CCAFS or something, as it’s only a few thousand requests and always to REST API</li>
-<li><code>84.38.130.177</code> is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:</li>
-</ul>
-
-<pre><code>Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
-</code></pre>
-
-<ul>
-<li>They at least seem to be re-using their Tomcat sessions:</li>
-</ul>
-
-<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
-342
-</code></pre>
-
-<ul>
-<li><code>50.116.102.77</code> is also a regular REST API user</li>
-<li><code>40.77.167.175</code> and <code>207.46.13.156</code> seem to be Bing</li>
-<li><code>138.201.52.218</code> seems to be on Hetzner in Germany, but is using this user agent:</li>
-</ul>
-
-<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
-</code></pre>
-
-<ul>
-<li>And it doesn’t seem they are re-using their Tomcat sessions:</li>
-</ul>
-
-<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
-1243
-</code></pre>
-
-<ul>
-<li>Ah, we’ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day…</li>
-<li>I wonder if it’s worth adding them to the list of bots in the nginx config?</li>
-</ul>
-
<p></p>
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index cdf9f2d3a..e3b842c53 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -4,7 +4,7 @@
https://alanorth.github.io/cgspace-notes/2018-11/
- 2018-11-01T16:43:37+02:00
+ 2018-11-03T18:13:49+02:00
@@ -194,7 +194,7 @@
https://alanorth.github.io/cgspace-notes/
- 2018-11-01T16:43:37+02:00
+ 2018-11-03T18:13:49+02:00
0
@@ -205,7 +205,7 @@
https://alanorth.github.io/cgspace-notes/tags/notes/
- 2018-11-01T16:43:37+02:00
+ 2018-11-03T18:13:49+02:00
0
@@ -217,13 +217,13 @@
https://alanorth.github.io/cgspace-notes/posts/
- 2018-11-01T16:43:37+02:00
+ 2018-11-03T18:13:49+02:00
0
https://alanorth.github.io/cgspace-notes/tags/
- 2018-11-01T16:43:37+02:00
+ 2018-11-03T18:13:49+02:00
0
diff --git a/docs/tags/index.html b/docs/tags/index.html
index 2fb2db9d4..a4d0067b8 100644
--- a/docs/tags/index.html
+++ b/docs/tags/index.html
@@ -112,58 +112,6 @@
Today these are the top 10 IPs:
-# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
- 1300 66.249.64.63
- 1384 35.237.175.180
- 1430 138.201.52.218
- 1455 207.46.13.156
- 1500 40.77.167.175
- 1979 50.116.102.77
- 2790 66.249.64.61
- 3367 84.38.130.177
- 4537 70.32.83.92
- 22508 66.249.64.59
-
-
-
-- The
66.249.64.x
are definitely Google
-70.32.83.92
is well known, probably CCAFS or something, as it’s only a few thousand requests and always to REST API
-84.38.130.177
is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:
-
-
-Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
-
-
-
-- They at least seem to be re-using their Tomcat sessions:
-
-
-$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
-342
-
-
-
-50.116.102.77
is also a regular REST API user
-40.77.167.175
and 207.46.13.156
seem to be Bing
-138.201.52.218
seems to be on Hetzner in Germany, but is using this user agent:
-
-
-Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
-
-
-
-- And it doesn’t seem they are re-using their Tomcat sessions:
-
-
-$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
-1243
-
-
-
-- Ah, we’ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day…
-- I wonder if it’s worth adding them to the list of bots in the nginx config?
-
-
Read more →
diff --git a/docs/tags/notes/index.html b/docs/tags/notes/index.html
index 4082624e0..6854409e6 100644
--- a/docs/tags/notes/index.html
+++ b/docs/tags/notes/index.html
@@ -97,58 +97,6 @@
Today these are the top 10 IPs:
-# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
- 1300 66.249.64.63
- 1384 35.237.175.180
- 1430 138.201.52.218
- 1455 207.46.13.156
- 1500 40.77.167.175
- 1979 50.116.102.77
- 2790 66.249.64.61
- 3367 84.38.130.177
- 4537 70.32.83.92
- 22508 66.249.64.59
-
-
-
-- The
66.249.64.x
are definitely Google
-70.32.83.92
is well known, probably CCAFS or something, as it’s only a few thousand requests and always to REST API
-84.38.130.177
is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:
-
-
-Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
-
-
-
-- They at least seem to be re-using their Tomcat sessions:
-
-
-$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
-342
-
-
-
-50.116.102.77
is also a regular REST API user
-40.77.167.175
and 207.46.13.156
seem to be Bing
-138.201.52.218
seems to be on Hetzner in Germany, but is using this user agent:
-
-
-Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
-
-
-
-- And it doesn’t seem they are re-using their Tomcat sessions:
-
-
-$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
-1243
-
-
-
-- Ah, we’ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day…
-- I wonder if it’s worth adding them to the list of bots in the nginx config?
-
-
Read more →
diff --git a/docs/tags/notes/index.xml b/docs/tags/notes/index.xml
index ed12c0f79..b6f0a76d6 100644
--- a/docs/tags/notes/index.xml
+++ b/docs/tags/notes/index.xml
@@ -31,58 +31,6 @@
<li>Today these are the top 10 IPs:</li>
</ul>
-<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
- 1300 66.249.64.63
- 1384 35.237.175.180
- 1430 138.201.52.218
- 1455 207.46.13.156
- 1500 40.77.167.175
- 1979 50.116.102.77
- 2790 66.249.64.61
- 3367 84.38.130.177
- 4537 70.32.83.92
- 22508 66.249.64.59
-</code></pre>
-
-<ul>
-<li>The <code>66.249.64.x</code> are definitely Google</li>
-<li><code>70.32.83.92</code> is well known, probably CCAFS or something, as it’s only a few thousand requests and always to REST API</li>
-<li><code>84.38.130.177</code> is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:</li>
-</ul>
-
-<pre><code>Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
-</code></pre>
-
-<ul>
-<li>They at least seem to be re-using their Tomcat sessions:</li>
-</ul>
-
-<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
-342
-</code></pre>
-
-<ul>
-<li><code>50.116.102.77</code> is also a regular REST API user</li>
-<li><code>40.77.167.175</code> and <code>207.46.13.156</code> seem to be Bing</li>
-<li><code>138.201.52.218</code> seems to be on Hetzner in Germany, but is using this user agent:</li>
-</ul>
-
-<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
-</code></pre>
-
-<ul>
-<li>And it doesn’t seem they are re-using their Tomcat sessions:</li>
-</ul>
-
-<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
-1243
-</code></pre>
-
-<ul>
-<li>Ah, we’ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day…</li>
-<li>I wonder if it’s worth adding them to the list of bots in the nginx config?</li>
-</ul>
-
<p></p>