From 6bf2bdae093672ed0690eec650700b0a60f47306 Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Sun, 4 Nov 2018 01:02:29 +0200 Subject: [PATCH] Update notes for 2018-11-03 --- content/posts/2018-11.md | 65 +++++++++++++- docs/2018-11/index.html | 178 +++++++++++++++---------------------- docs/categories/index.html | 52 ----------- docs/index.html | 52 ----------- docs/index.xml | 52 ----------- docs/posts/index.html | 52 ----------- docs/posts/index.xml | 52 ----------- docs/sitemap.xml | 10 +-- docs/tags/index.html | 52 ----------- docs/tags/notes/index.html | 52 ----------- docs/tags/notes/index.xml | 52 ----------- 11 files changed, 139 insertions(+), 530 deletions(-) diff --git a/content/posts/2018-11.md b/content/posts/2018-11.md index c7d273bd3..ec2878a3f 100644 --- a/content/posts/2018-11.md +++ b/content/posts/2018-11.md @@ -15,6 +15,9 @@ tags: ["Notes"] - Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage - Today these are the top 10 IPs: + + + ``` # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 1300 66.249.64.63 @@ -61,7 +64,67 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11 - Ah, we've apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day... - I wonder if it's worth adding them to the list of bots in the nginx config? +- Linode sent a mail that CGSpace (linode18) is using high outgoing bandwidth +- Looking at the nginx logs again I see the following top ten IPs: - +``` +# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 + 1979 50.116.102.77 + 1980 35.237.175.180 + 2186 207.46.13.156 + 2208 40.77.167.175 + 2843 66.249.64.63 + 4220 84.38.130.177 + 4537 70.32.83.92 + 5593 66.249.64.61 + 12557 78.46.89.18 + 32152 66.249.64.59 +``` + +- `78.46.89.18` is new since I last checked a few hours ago, and it's from Hetzner with the following user agent: + +``` +Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0 +``` + +- It's making lots of requests and using quite a number of Tomcat sessions: + +``` +$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.89.18' /home/cgspace.cgiar.org/log/dspace.log.2018-11-03 | sort | uniq +8449 +``` + +- I could add this IP to the list of bot IPs in nginx, but it seems like a futile effort when some new IP could come along and do the same thing +- Perhaps I should think about adding rate limits to dynamic pages like `/discover` and `/browse` +- I think it's reasonable for a human to click one of those links five or ten times a minute... +- To contrast, `78.46.89.18` made about 300 requests per minute for a few hours today: + +``` +# grep 78.46.89.18 /var/log/nginx/access.log | grep -o -E '03/Nov/2018:[0-9][0-9]:[0-9][0-9]' | sort | uniq -c | sort -n | tail -n 20 + 286 03/Nov/2018:18:02 + 287 03/Nov/2018:18:21 + 289 03/Nov/2018:18:23 + 291 03/Nov/2018:18:27 + 293 03/Nov/2018:18:34 + 300 03/Nov/2018:17:58 + 300 03/Nov/2018:18:22 + 300 03/Nov/2018:18:32 + 304 03/Nov/2018:18:12 + 305 03/Nov/2018:18:13 + 305 03/Nov/2018:18:24 + 312 03/Nov/2018:18:39 + 322 03/Nov/2018:18:17 + 326 03/Nov/2018:18:38 + 327 03/Nov/2018:18:16 + 330 03/Nov/2018:17:57 + 332 03/Nov/2018:18:19 + 336 03/Nov/2018:17:56 + 340 03/Nov/2018:18:14 + 341 03/Nov/2018:18:18 +``` + +- If they want to download all our metadata and PDFs they should use an API rather than scraping the XMLUI +- I will add them to the list of bot IPs in nginx for now and think about enforcing rate limits in XMLUI later +- Also, this is the third (?) time a mysterious IP in Hetzner has done this... who is this? diff --git a/docs/2018-11/index.html b/docs/2018-11/index.html index b628d3387..2c3bc422b 100644 --- a/docs/2018-11/index.html +++ b/docs/2018-11/index.html @@ -20,62 +20,10 @@ Linode has been sending mails a few times a day recently that CGSpace (linode18) Today these are the top 10 IPs: -# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 - 1300 66.249.64.63 - 1384 35.237.175.180 - 1430 138.201.52.218 - 1455 207.46.13.156 - 1500 40.77.167.175 - 1979 50.116.102.77 - 2790 66.249.64.61 - 3367 84.38.130.177 - 4537 70.32.83.92 - 22508 66.249.64.59 - - - -The 66.249.64.x are definitely Google -70.32.83.92 is well known, probably CCAFS or something, as it’s only a few thousand requests and always to REST API -84.38.130.177 is some new IP in Latvia that is only hitting the XMLUI, using the following user agent: - - -Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1 - - - -They at least seem to be re-using their Tomcat sessions: - - -$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq -342 - - - -50.116.102.77 is also a regular REST API user -40.77.167.175 and 207.46.13.156 seem to be Bing -138.201.52.218 seems to be on Hetzner in Germany, but is using this user agent: - - -Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0 - - - -And it doesn’t seem they are re-using their Tomcat sessions: - - -$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq -1243 - - - -Ah, we’ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day… -I wonder if it’s worth adding them to the list of bots in the nginx config? - - " /> - + @@ -93,58 +41,6 @@ Linode has been sending mails a few times a day recently that CGSpace (linode18) Today these are the top 10 IPs: -# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 - 1300 66.249.64.63 - 1384 35.237.175.180 - 1430 138.201.52.218 - 1455 207.46.13.156 - 1500 40.77.167.175 - 1979 50.116.102.77 - 2790 66.249.64.61 - 3367 84.38.130.177 - 4537 70.32.83.92 - 22508 66.249.64.59 - - - -The 66.249.64.x are definitely Google -70.32.83.92 is well known, probably CCAFS or something, as it’s only a few thousand requests and always to REST API -84.38.130.177 is some new IP in Latvia that is only hitting the XMLUI, using the following user agent: - - -Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1 - - - -They at least seem to be re-using their Tomcat sessions: - - -$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq -342 - - - -50.116.102.77 is also a regular REST API user -40.77.167.175 and 207.46.13.156 seem to be Bing -138.201.52.218 seems to be on Hetzner in Germany, but is using this user agent: - - -Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0 - - - -And it doesn’t seem they are re-using their Tomcat sessions: - - -$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq -1243 - - - -Ah, we’ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day… -I wonder if it’s worth adding them to the list of bots in the nginx config? - - "/> @@ -156,9 +52,9 @@ I wonder if it’s worth adding them to the list of bots in the nginx config "@type": "BlogPosting", "headline": "November, 2018", "url": "https://alanorth.github.io/cgspace-notes/2018-11/", - "wordCount": "260", + "wordCount": "586", "datePublished": "2018-11-01T16:41:30+02:00", - "dateModified": "2018-11-01T16:43:37+02:00", + "dateModified": "2018-11-03T18:13:49+02:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -238,6 +134,8 @@ I wonder if it’s worth adding them to the list of bots in the nginx config
  • Today these are the top 10 IPs:
  • +

    +
    # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
        1300 66.249.64.63
        1384 35.237.175.180
    @@ -288,9 +186,73 @@ I wonder if it’s worth adding them to the list of bots in the nginx config
     
    • Ah, we’ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day…
    • I wonder if it’s worth adding them to the list of bots in the nginx config?
    • +
    • Linode sent a mail that CGSpace (linode18) is using high outgoing bandwidth
    • +
    • Looking at the nginx logs again I see the following top ten IPs:
    -

    +
    # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
    +   1979 50.116.102.77
    +   1980 35.237.175.180
    +   2186 207.46.13.156
    +   2208 40.77.167.175
    +   2843 66.249.64.63
    +   4220 84.38.130.177
    +   4537 70.32.83.92
    +   5593 66.249.64.61
    +  12557 78.46.89.18
    +  32152 66.249.64.59
    +
    + +
      +
    • 78.46.89.18 is new since I last checked a few hours ago, and it’s from Hetzner with the following user agent:
    • +
    + +
    Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
    +
    + +
      +
    • It’s making lots of requests and using quite a number of Tomcat sessions:
    • +
    + +
    $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.89.18' /home/cgspace.cgiar.org/log/dspace.log.2018-11-03 | sort | uniq
    +8449
    +
    + +
      +
    • I could add this IP to the list of bot IPs in nginx, but it seems like a futile effort when some new IP could come along and do the same thing
    • +
    • Perhaps I should think about adding rate limits to dynamic pages like /discover and /browse
    • +
    • I think it’s reasonable for a human to click one of those links five or ten times a minute…
    • +
    • To contrast, 78.46.89.18 made about 300 requests per minute for a few hours today:
    • +
    + +
    # grep 78.46.89.18 /var/log/nginx/access.log | grep -o -E '03/Nov/2018:[0-9][0-9]:[0-9][0-9]' | sort | uniq -c | sort -n | tail -n 20
    +    286 03/Nov/2018:18:02
    +    287 03/Nov/2018:18:21
    +    289 03/Nov/2018:18:23
    +    291 03/Nov/2018:18:27
    +    293 03/Nov/2018:18:34
    +    300 03/Nov/2018:17:58
    +    300 03/Nov/2018:18:22
    +    300 03/Nov/2018:18:32
    +    304 03/Nov/2018:18:12
    +    305 03/Nov/2018:18:13
    +    305 03/Nov/2018:18:24
    +    312 03/Nov/2018:18:39
    +    322 03/Nov/2018:18:17
    +    326 03/Nov/2018:18:38
    +    327 03/Nov/2018:18:16
    +    330 03/Nov/2018:17:57
    +    332 03/Nov/2018:18:19
    +    336 03/Nov/2018:17:56
    +    340 03/Nov/2018:18:14
    +    341 03/Nov/2018:18:18
    +
    + +
      +
    • If they want to download all our metadata and PDFs they should use an API rather than scraping the XMLUI
    • +
    • I will add them to the list of bot IPs in nginx for now and think about enforcing rate limits in XMLUI later
    • +
    • Also, this is the third (?) time a mysterious IP in Hetzner has done this… who is this?
    • +
    diff --git a/docs/categories/index.html b/docs/categories/index.html index cadaf6a54..f0adf6a89 100644 --- a/docs/categories/index.html +++ b/docs/categories/index.html @@ -110,58 +110,6 @@
  • Today these are the top 10 IPs:
  • -
    # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
    -   1300 66.249.64.63
    -   1384 35.237.175.180
    -   1430 138.201.52.218
    -   1455 207.46.13.156
    -   1500 40.77.167.175
    -   1979 50.116.102.77
    -   2790 66.249.64.61
    -   3367 84.38.130.177
    -   4537 70.32.83.92
    -  22508 66.249.64.59
    -
    - -
      -
    • The 66.249.64.x are definitely Google
    • -
    • 70.32.83.92 is well known, probably CCAFS or something, as it’s only a few thousand requests and always to REST API
    • -
    • 84.38.130.177 is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:
    • -
    - -
    Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
    -
    - -
      -
    • They at least seem to be re-using their Tomcat sessions:
    • -
    - -
    $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
    -342
    -
    - -
      -
    • 50.116.102.77 is also a regular REST API user
    • -
    • 40.77.167.175 and 207.46.13.156 seem to be Bing
    • -
    • 138.201.52.218 seems to be on Hetzner in Germany, but is using this user agent:
    • -
    - -
    Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
    -
    - -
      -
    • And it doesn’t seem they are re-using their Tomcat sessions:
    • -
    - -
    $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
    -1243
    -
    - -
      -
    • Ah, we’ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day…
    • -
    • I wonder if it’s worth adding them to the list of bots in the nginx config?
    • -
    -

    Read more → diff --git a/docs/index.html b/docs/index.html index 013095086..23bec0f5d 100644 --- a/docs/index.html +++ b/docs/index.html @@ -112,58 +112,6 @@
  • Today these are the top 10 IPs:
  • -
    # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
    -   1300 66.249.64.63
    -   1384 35.237.175.180
    -   1430 138.201.52.218
    -   1455 207.46.13.156
    -   1500 40.77.167.175
    -   1979 50.116.102.77
    -   2790 66.249.64.61
    -   3367 84.38.130.177
    -   4537 70.32.83.92
    -  22508 66.249.64.59
    -
    - -
      -
    • The 66.249.64.x are definitely Google
    • -
    • 70.32.83.92 is well known, probably CCAFS or something, as it’s only a few thousand requests and always to REST API
    • -
    • 84.38.130.177 is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:
    • -
    - -
    Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
    -
    - -
      -
    • They at least seem to be re-using their Tomcat sessions:
    • -
    - -
    $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
    -342
    -
    - -
      -
    • 50.116.102.77 is also a regular REST API user
    • -
    • 40.77.167.175 and 207.46.13.156 seem to be Bing
    • -
    • 138.201.52.218 seems to be on Hetzner in Germany, but is using this user agent:
    • -
    - -
    Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
    -
    - -
      -
    • And it doesn’t seem they are re-using their Tomcat sessions:
    • -
    - -
    $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
    -1243
    -
    - -
      -
    • Ah, we’ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day…
    • -
    • I wonder if it’s worth adding them to the list of bots in the nginx config?
    • -
    -

    Read more → diff --git a/docs/index.xml b/docs/index.xml index 90c7c10ee..6ca38da91 100644 --- a/docs/index.xml +++ b/docs/index.xml @@ -31,58 +31,6 @@ <li>Today these are the top 10 IPs:</li> </ul> -<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Nov/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 - 1300 66.249.64.63 - 1384 35.237.175.180 - 1430 138.201.52.218 - 1455 207.46.13.156 - 1500 40.77.167.175 - 1979 50.116.102.77 - 2790 66.249.64.61 - 3367 84.38.130.177 - 4537 70.32.83.92 - 22508 66.249.64.59 -</code></pre> - -<ul> -<li>The <code>66.249.64.x</code> are definitely Google</li> -<li><code>70.32.83.92</code> is well known, probably CCAFS or something, as it&rsquo;s only a few thousand requests and always to REST API</li> -<li><code>84.38.130.177</code> is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:</li> -</ul> - -<pre><code>Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1 -</code></pre> - -<ul> -<li>They at least seem to be re-using their Tomcat sessions:</li> -</ul> - -<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq -342 -</code></pre> - -<ul> -<li><code>50.116.102.77</code> is also a regular REST API user</li> -<li><code>40.77.167.175</code> and <code>207.46.13.156</code> seem to be Bing</li> -<li><code>138.201.52.218</code> seems to be on Hetzner in Germany, but is using this user agent:</li> -</ul> - -<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0 -</code></pre> - -<ul> -<li>And it doesn&rsquo;t seem they are re-using their Tomcat sessions:</li> -</ul> - -<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq -1243 -</code></pre> - -<ul> -<li>Ah, we&rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&hellip;</li> -<li>I wonder if it&rsquo;s worth adding them to the list of bots in the nginx config?</li> -</ul> - <p></p> diff --git a/docs/posts/index.html b/docs/posts/index.html index 2ace50760..3cef573e3 100644 --- a/docs/posts/index.html +++ b/docs/posts/index.html @@ -112,58 +112,6 @@
  • Today these are the top 10 IPs:
  • -
    # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
    -   1300 66.249.64.63
    -   1384 35.237.175.180
    -   1430 138.201.52.218
    -   1455 207.46.13.156
    -   1500 40.77.167.175
    -   1979 50.116.102.77
    -   2790 66.249.64.61
    -   3367 84.38.130.177
    -   4537 70.32.83.92
    -  22508 66.249.64.59
    -
    - -
      -
    • The 66.249.64.x are definitely Google
    • -
    • 70.32.83.92 is well known, probably CCAFS or something, as it’s only a few thousand requests and always to REST API
    • -
    • 84.38.130.177 is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:
    • -
    - -
    Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
    -
    - -
      -
    • They at least seem to be re-using their Tomcat sessions:
    • -
    - -
    $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
    -342
    -
    - -
      -
    • 50.116.102.77 is also a regular REST API user
    • -
    • 40.77.167.175 and 207.46.13.156 seem to be Bing
    • -
    • 138.201.52.218 seems to be on Hetzner in Germany, but is using this user agent:
    • -
    - -
    Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
    -
    - -
      -
    • And it doesn’t seem they are re-using their Tomcat sessions:
    • -
    - -
    $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
    -1243
    -
    - -
      -
    • Ah, we’ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day…
    • -
    • I wonder if it’s worth adding them to the list of bots in the nginx config?
    • -
    -

    Read more → diff --git a/docs/posts/index.xml b/docs/posts/index.xml index 7900abe7d..f84ea936a 100644 --- a/docs/posts/index.xml +++ b/docs/posts/index.xml @@ -31,58 +31,6 @@ <li>Today these are the top 10 IPs:</li> </ul> -<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Nov/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 - 1300 66.249.64.63 - 1384 35.237.175.180 - 1430 138.201.52.218 - 1455 207.46.13.156 - 1500 40.77.167.175 - 1979 50.116.102.77 - 2790 66.249.64.61 - 3367 84.38.130.177 - 4537 70.32.83.92 - 22508 66.249.64.59 -</code></pre> - -<ul> -<li>The <code>66.249.64.x</code> are definitely Google</li> -<li><code>70.32.83.92</code> is well known, probably CCAFS or something, as it&rsquo;s only a few thousand requests and always to REST API</li> -<li><code>84.38.130.177</code> is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:</li> -</ul> - -<pre><code>Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1 -</code></pre> - -<ul> -<li>They at least seem to be re-using their Tomcat sessions:</li> -</ul> - -<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq -342 -</code></pre> - -<ul> -<li><code>50.116.102.77</code> is also a regular REST API user</li> -<li><code>40.77.167.175</code> and <code>207.46.13.156</code> seem to be Bing</li> -<li><code>138.201.52.218</code> seems to be on Hetzner in Germany, but is using this user agent:</li> -</ul> - -<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0 -</code></pre> - -<ul> -<li>And it doesn&rsquo;t seem they are re-using their Tomcat sessions:</li> -</ul> - -<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq -1243 -</code></pre> - -<ul> -<li>Ah, we&rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&hellip;</li> -<li>I wonder if it&rsquo;s worth adding them to the list of bots in the nginx config?</li> -</ul> - <p></p> diff --git a/docs/sitemap.xml b/docs/sitemap.xml index cdf9f2d3a..e3b842c53 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,7 +4,7 @@ https://alanorth.github.io/cgspace-notes/2018-11/ - 2018-11-01T16:43:37+02:00 + 2018-11-03T18:13:49+02:00 @@ -194,7 +194,7 @@ https://alanorth.github.io/cgspace-notes/ - 2018-11-01T16:43:37+02:00 + 2018-11-03T18:13:49+02:00 0 @@ -205,7 +205,7 @@ https://alanorth.github.io/cgspace-notes/tags/notes/ - 2018-11-01T16:43:37+02:00 + 2018-11-03T18:13:49+02:00 0 @@ -217,13 +217,13 @@ https://alanorth.github.io/cgspace-notes/posts/ - 2018-11-01T16:43:37+02:00 + 2018-11-03T18:13:49+02:00 0 https://alanorth.github.io/cgspace-notes/tags/ - 2018-11-01T16:43:37+02:00 + 2018-11-03T18:13:49+02:00 0 diff --git a/docs/tags/index.html b/docs/tags/index.html index 2fb2db9d4..a4d0067b8 100644 --- a/docs/tags/index.html +++ b/docs/tags/index.html @@ -112,58 +112,6 @@
  • Today these are the top 10 IPs:
  • -
    # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
    -   1300 66.249.64.63
    -   1384 35.237.175.180
    -   1430 138.201.52.218
    -   1455 207.46.13.156
    -   1500 40.77.167.175
    -   1979 50.116.102.77
    -   2790 66.249.64.61
    -   3367 84.38.130.177
    -   4537 70.32.83.92
    -  22508 66.249.64.59
    -
    - -
      -
    • The 66.249.64.x are definitely Google
    • -
    • 70.32.83.92 is well known, probably CCAFS or something, as it’s only a few thousand requests and always to REST API
    • -
    • 84.38.130.177 is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:
    • -
    - -
    Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
    -
    - -
      -
    • They at least seem to be re-using their Tomcat sessions:
    • -
    - -
    $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
    -342
    -
    - -
      -
    • 50.116.102.77 is also a regular REST API user
    • -
    • 40.77.167.175 and 207.46.13.156 seem to be Bing
    • -
    • 138.201.52.218 seems to be on Hetzner in Germany, but is using this user agent:
    • -
    - -
    Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
    -
    - -
      -
    • And it doesn’t seem they are re-using their Tomcat sessions:
    • -
    - -
    $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
    -1243
    -
    - -
      -
    • Ah, we’ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day…
    • -
    • I wonder if it’s worth adding them to the list of bots in the nginx config?
    • -
    -

    Read more → diff --git a/docs/tags/notes/index.html b/docs/tags/notes/index.html index 4082624e0..6854409e6 100644 --- a/docs/tags/notes/index.html +++ b/docs/tags/notes/index.html @@ -97,58 +97,6 @@
  • Today these are the top 10 IPs:
  • -
    # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
    -   1300 66.249.64.63
    -   1384 35.237.175.180
    -   1430 138.201.52.218
    -   1455 207.46.13.156
    -   1500 40.77.167.175
    -   1979 50.116.102.77
    -   2790 66.249.64.61
    -   3367 84.38.130.177
    -   4537 70.32.83.92
    -  22508 66.249.64.59
    -
    - -
      -
    • The 66.249.64.x are definitely Google
    • -
    • 70.32.83.92 is well known, probably CCAFS or something, as it’s only a few thousand requests and always to REST API
    • -
    • 84.38.130.177 is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:
    • -
    - -
    Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
    -
    - -
      -
    • They at least seem to be re-using their Tomcat sessions:
    • -
    - -
    $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
    -342
    -
    - -
      -
    • 50.116.102.77 is also a regular REST API user
    • -
    • 40.77.167.175 and 207.46.13.156 seem to be Bing
    • -
    • 138.201.52.218 seems to be on Hetzner in Germany, but is using this user agent:
    • -
    - -
    Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
    -
    - -
      -
    • And it doesn’t seem they are re-using their Tomcat sessions:
    • -
    - -
    $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
    -1243
    -
    - -
      -
    • Ah, we’ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day…
    • -
    • I wonder if it’s worth adding them to the list of bots in the nginx config?
    • -
    -

    Read more → diff --git a/docs/tags/notes/index.xml b/docs/tags/notes/index.xml index ed12c0f79..b6f0a76d6 100644 --- a/docs/tags/notes/index.xml +++ b/docs/tags/notes/index.xml @@ -31,58 +31,6 @@ <li>Today these are the top 10 IPs:</li> </ul> -<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Nov/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 - 1300 66.249.64.63 - 1384 35.237.175.180 - 1430 138.201.52.218 - 1455 207.46.13.156 - 1500 40.77.167.175 - 1979 50.116.102.77 - 2790 66.249.64.61 - 3367 84.38.130.177 - 4537 70.32.83.92 - 22508 66.249.64.59 -</code></pre> - -<ul> -<li>The <code>66.249.64.x</code> are definitely Google</li> -<li><code>70.32.83.92</code> is well known, probably CCAFS or something, as it&rsquo;s only a few thousand requests and always to REST API</li> -<li><code>84.38.130.177</code> is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:</li> -</ul> - -<pre><code>Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1 -</code></pre> - -<ul> -<li>They at least seem to be re-using their Tomcat sessions:</li> -</ul> - -<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq -342 -</code></pre> - -<ul> -<li><code>50.116.102.77</code> is also a regular REST API user</li> -<li><code>40.77.167.175</code> and <code>207.46.13.156</code> seem to be Bing</li> -<li><code>138.201.52.218</code> seems to be on Hetzner in Germany, but is using this user agent:</li> -</ul> - -<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0 -</code></pre> - -<ul> -<li>And it doesn&rsquo;t seem they are re-using their Tomcat sessions:</li> -</ul> - -<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq -1243 -</code></pre> - -<ul> -<li>Ah, we&rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&hellip;</li> -<li>I wonder if it&rsquo;s worth adding them to the list of bots in the nginx config?</li> -</ul> - <p></p>