From ee189c2ebf2f860f45dd6107af93e31e861799b8 Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Sat, 3 Nov 2018 18:13:49 +0200 Subject: [PATCH] Add notes for 2018-11-03 --- content/posts/2018-11.md | 52 +++++++++++ docs/2018-11/index.html | 183 ++++++++++++++++++++++++++++++++++++- docs/categories/index.html | 59 ++++++++++++ docs/index.html | 59 ++++++++++++ docs/index.xml | 59 ++++++++++++ docs/posts/index.html | 59 ++++++++++++ docs/posts/index.xml | 59 ++++++++++++ docs/sitemap.xml | 10 +- docs/tags/index.html | 59 ++++++++++++ docs/tags/notes/index.html | 59 ++++++++++++ docs/tags/notes/index.xml | 59 ++++++++++++ 11 files changed, 709 insertions(+), 8 deletions(-) diff --git a/content/posts/2018-11.md b/content/posts/2018-11.md index 69b0ec2c0..c7d273bd3 100644 --- a/content/posts/2018-11.md +++ b/content/posts/2018-11.md @@ -10,6 +10,58 @@ tags: ["Notes"] - Finalize AReS Phase I and Phase II ToRs - Send a note about my [dspace-statistics-api](https://github.com/ilri/dspace-statistics-api) to the dspace-tech mailing list +## 2018-11-03 + +- Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage +- Today these are the top 10 IPs: + +``` +# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 + 1300 66.249.64.63 + 1384 35.237.175.180 + 1430 138.201.52.218 + 1455 207.46.13.156 + 1500 40.77.167.175 + 1979 50.116.102.77 + 2790 66.249.64.61 + 3367 84.38.130.177 + 4537 70.32.83.92 + 22508 66.249.64.59 +``` + +- The `66.249.64.x` are definitely Google +- `70.32.83.92` is well known, probably CCAFS or something, as it's only a few thousand requests and always to REST API +- `84.38.130.177` is some new IP in Latvia that is only hitting the XMLUI, using the following user agent: + +``` +Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1 +``` + +- They at least seem to be re-using their Tomcat sessions: + +``` +$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq +342 +``` + +- `50.116.102.77` is also a regular REST API user +- `40.77.167.175` and `207.46.13.156` seem to be Bing +- `138.201.52.218` seems to be on Hetzner in Germany, but is using this user agent: + +``` +Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0 +``` + +- And it doesn't seem they are re-using their Tomcat sessions: + +``` +$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq +1243 +``` + +- Ah, we've apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day... +- I wonder if it's worth adding them to the list of bots in the nginx config? + diff --git a/docs/2018-11/index.html b/docs/2018-11/index.html index 373bea5b4..b628d3387 100644 --- a/docs/2018-11/index.html +++ b/docs/2018-11/index.html @@ -13,10 +13,69 @@ Finalize AReS Phase I and Phase II ToRs Send a note about my dspace-statistics-api to the dspace-tech mailing list +2018-11-03 + + +Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage +Today these are the top 10 IPs: + + +# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 + 1300 66.249.64.63 + 1384 35.237.175.180 + 1430 138.201.52.218 + 1455 207.46.13.156 + 1500 40.77.167.175 + 1979 50.116.102.77 + 2790 66.249.64.61 + 3367 84.38.130.177 + 4537 70.32.83.92 + 22508 66.249.64.59 + + + +The 66.249.64.x are definitely Google +70.32.83.92 is well known, probably CCAFS or something, as it’s only a few thousand requests and always to REST API +84.38.130.177 is some new IP in Latvia that is only hitting the XMLUI, using the following user agent: + + +Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1 + + + +They at least seem to be re-using their Tomcat sessions: + + +$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq +342 + + + +50.116.102.77 is also a regular REST API user +40.77.167.175 and 207.46.13.156 seem to be Bing +138.201.52.218 seems to be on Hetzner in Germany, but is using this user agent: + + +Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0 + + + +And it doesn’t seem they are re-using their Tomcat sessions: + + +$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq +1243 + + + +Ah, we’ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day… +I wonder if it’s worth adding them to the list of bots in the nginx config? + + " /> - + @@ -27,6 +86,65 @@ Finalize AReS Phase I and Phase II ToRs Send a note about my dspace-statistics-api to the dspace-tech mailing list +2018-11-03 + + +Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage +Today these are the top 10 IPs: + + +# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 + 1300 66.249.64.63 + 1384 35.237.175.180 + 1430 138.201.52.218 + 1455 207.46.13.156 + 1500 40.77.167.175 + 1979 50.116.102.77 + 2790 66.249.64.61 + 3367 84.38.130.177 + 4537 70.32.83.92 + 22508 66.249.64.59 + + + +The 66.249.64.x are definitely Google +70.32.83.92 is well known, probably CCAFS or something, as it’s only a few thousand requests and always to REST API +84.38.130.177 is some new IP in Latvia that is only hitting the XMLUI, using the following user agent: + + +Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1 + + + +They at least seem to be re-using their Tomcat sessions: + + +$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq +342 + + + +50.116.102.77 is also a regular REST API user +40.77.167.175 and 207.46.13.156 seem to be Bing +138.201.52.218 seems to be on Hetzner in Germany, but is using this user agent: + + +Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0 + + + +And it doesn’t seem they are re-using their Tomcat sessions: + + +$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq +1243 + + + +Ah, we’ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day… +I wonder if it’s worth adding them to the list of bots in the nginx config? + + "/> @@ -38,9 +156,9 @@ Send a note about my dspace-statistics-api to the dspace-tech mailing list "@type": "BlogPosting", "headline": "November, 2018", "url": "https://alanorth.github.io/cgspace-notes/2018-11/", - "wordCount": "20", + "wordCount": "260", "datePublished": "2018-11-01T16:41:30+02:00", - "dateModified": "2018-11-01T16:41:30+02:00", + "dateModified": "2018-11-01T16:43:37+02:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -113,6 +231,65 @@ Send a note about my dspace-statistics-api to the dspace-tech mailing list
  • Send a note about my dspace-statistics-api to the dspace-tech mailing list
  • +

    2018-11-03

    + + + +
    # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
    +   1300 66.249.64.63
    +   1384 35.237.175.180
    +   1430 138.201.52.218
    +   1455 207.46.13.156
    +   1500 40.77.167.175
    +   1979 50.116.102.77
    +   2790 66.249.64.61
    +   3367 84.38.130.177
    +   4537 70.32.83.92
    +  22508 66.249.64.59
    +
    + + + +
    Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
    +
    + + + +
    $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
    +342
    +
    + + + +
    Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
    +
    + + + +
    $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
    +1243
    +
    + + +

    diff --git a/docs/categories/index.html b/docs/categories/index.html index 8be24cf22..cadaf6a54 100644 --- a/docs/categories/index.html +++ b/docs/categories/index.html @@ -103,6 +103,65 @@
  • Send a note about my dspace-statistics-api to the dspace-tech mailing list
  • +

    2018-11-03

    + + + +
    # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
    +   1300 66.249.64.63
    +   1384 35.237.175.180
    +   1430 138.201.52.218
    +   1455 207.46.13.156
    +   1500 40.77.167.175
    +   1979 50.116.102.77
    +   2790 66.249.64.61
    +   3367 84.38.130.177
    +   4537 70.32.83.92
    +  22508 66.249.64.59
    +
    + + + +
    Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
    +
    + + + +
    $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
    +342
    +
    + + + +
    Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
    +
    + + + +
    $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
    +1243
    +
    + + +

    Read more → diff --git a/docs/index.html b/docs/index.html index b0b14850f..013095086 100644 --- a/docs/index.html +++ b/docs/index.html @@ -105,6 +105,65 @@
  • Send a note about my dspace-statistics-api to the dspace-tech mailing list
  • +

    2018-11-03

    + + + +
    # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
    +   1300 66.249.64.63
    +   1384 35.237.175.180
    +   1430 138.201.52.218
    +   1455 207.46.13.156
    +   1500 40.77.167.175
    +   1979 50.116.102.77
    +   2790 66.249.64.61
    +   3367 84.38.130.177
    +   4537 70.32.83.92
    +  22508 66.249.64.59
    +
    + + + +
    Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
    +
    + + + +
    $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
    +342
    +
    + + + +
    Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
    +
    + + + +
    $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
    +1243
    +
    + + +

    Read more → diff --git a/docs/index.xml b/docs/index.xml index ebd4c186a..90c7c10ee 100644 --- a/docs/index.xml +++ b/docs/index.xml @@ -24,6 +24,65 @@ <li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li> </ul> +<h2 id="2018-11-03">2018-11-03</h2> + +<ul> +<li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li> +<li>Today these are the top 10 IPs:</li> +</ul> + +<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Nov/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 + 1300 66.249.64.63 + 1384 35.237.175.180 + 1430 138.201.52.218 + 1455 207.46.13.156 + 1500 40.77.167.175 + 1979 50.116.102.77 + 2790 66.249.64.61 + 3367 84.38.130.177 + 4537 70.32.83.92 + 22508 66.249.64.59 +</code></pre> + +<ul> +<li>The <code>66.249.64.x</code> are definitely Google</li> +<li><code>70.32.83.92</code> is well known, probably CCAFS or something, as it&rsquo;s only a few thousand requests and always to REST API</li> +<li><code>84.38.130.177</code> is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:</li> +</ul> + +<pre><code>Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1 +</code></pre> + +<ul> +<li>They at least seem to be re-using their Tomcat sessions:</li> +</ul> + +<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq +342 +</code></pre> + +<ul> +<li><code>50.116.102.77</code> is also a regular REST API user</li> +<li><code>40.77.167.175</code> and <code>207.46.13.156</code> seem to be Bing</li> +<li><code>138.201.52.218</code> seems to be on Hetzner in Germany, but is using this user agent:</li> +</ul> + +<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0 +</code></pre> + +<ul> +<li>And it doesn&rsquo;t seem they are re-using their Tomcat sessions:</li> +</ul> + +<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq +1243 +</code></pre> + +<ul> +<li>Ah, we&rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&hellip;</li> +<li>I wonder if it&rsquo;s worth adding them to the list of bots in the nginx config?</li> +</ul> + <p></p> diff --git a/docs/posts/index.html b/docs/posts/index.html index e7fb079cd..2ace50760 100644 --- a/docs/posts/index.html +++ b/docs/posts/index.html @@ -105,6 +105,65 @@
  • Send a note about my dspace-statistics-api to the dspace-tech mailing list
  • +

    2018-11-03

    + + + +
    # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
    +   1300 66.249.64.63
    +   1384 35.237.175.180
    +   1430 138.201.52.218
    +   1455 207.46.13.156
    +   1500 40.77.167.175
    +   1979 50.116.102.77
    +   2790 66.249.64.61
    +   3367 84.38.130.177
    +   4537 70.32.83.92
    +  22508 66.249.64.59
    +
    + + + +
    Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
    +
    + + + +
    $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
    +342
    +
    + + + +
    Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
    +
    + + + +
    $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
    +1243
    +
    + + +

    Read more → diff --git a/docs/posts/index.xml b/docs/posts/index.xml index 2ce6545de..7900abe7d 100644 --- a/docs/posts/index.xml +++ b/docs/posts/index.xml @@ -24,6 +24,65 @@ <li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li> </ul> +<h2 id="2018-11-03">2018-11-03</h2> + +<ul> +<li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li> +<li>Today these are the top 10 IPs:</li> +</ul> + +<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Nov/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 + 1300 66.249.64.63 + 1384 35.237.175.180 + 1430 138.201.52.218 + 1455 207.46.13.156 + 1500 40.77.167.175 + 1979 50.116.102.77 + 2790 66.249.64.61 + 3367 84.38.130.177 + 4537 70.32.83.92 + 22508 66.249.64.59 +</code></pre> + +<ul> +<li>The <code>66.249.64.x</code> are definitely Google</li> +<li><code>70.32.83.92</code> is well known, probably CCAFS or something, as it&rsquo;s only a few thousand requests and always to REST API</li> +<li><code>84.38.130.177</code> is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:</li> +</ul> + +<pre><code>Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1 +</code></pre> + +<ul> +<li>They at least seem to be re-using their Tomcat sessions:</li> +</ul> + +<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq +342 +</code></pre> + +<ul> +<li><code>50.116.102.77</code> is also a regular REST API user</li> +<li><code>40.77.167.175</code> and <code>207.46.13.156</code> seem to be Bing</li> +<li><code>138.201.52.218</code> seems to be on Hetzner in Germany, but is using this user agent:</li> +</ul> + +<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0 +</code></pre> + +<ul> +<li>And it doesn&rsquo;t seem they are re-using their Tomcat sessions:</li> +</ul> + +<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq +1243 +</code></pre> + +<ul> +<li>Ah, we&rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&hellip;</li> +<li>I wonder if it&rsquo;s worth adding them to the list of bots in the nginx config?</li> +</ul> + <p></p> diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 673fe9dd4..cdf9f2d3a 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,7 +4,7 @@ https://alanorth.github.io/cgspace-notes/2018-11/ - 2018-11-01T16:41:30+02:00 + 2018-11-01T16:43:37+02:00 @@ -194,7 +194,7 @@ https://alanorth.github.io/cgspace-notes/ - 2018-11-01T16:41:30+02:00 + 2018-11-01T16:43:37+02:00 0 @@ -205,7 +205,7 @@ https://alanorth.github.io/cgspace-notes/tags/notes/ - 2018-11-01T16:41:30+02:00 + 2018-11-01T16:43:37+02:00 0 @@ -217,13 +217,13 @@ https://alanorth.github.io/cgspace-notes/posts/ - 2018-11-01T16:41:30+02:00 + 2018-11-01T16:43:37+02:00 0 https://alanorth.github.io/cgspace-notes/tags/ - 2018-11-01T16:41:30+02:00 + 2018-11-01T16:43:37+02:00 0 diff --git a/docs/tags/index.html b/docs/tags/index.html index 09454e08f..2fb2db9d4 100644 --- a/docs/tags/index.html +++ b/docs/tags/index.html @@ -105,6 +105,65 @@
  • Send a note about my dspace-statistics-api to the dspace-tech mailing list
  • +

    2018-11-03

    + + + +
    # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
    +   1300 66.249.64.63
    +   1384 35.237.175.180
    +   1430 138.201.52.218
    +   1455 207.46.13.156
    +   1500 40.77.167.175
    +   1979 50.116.102.77
    +   2790 66.249.64.61
    +   3367 84.38.130.177
    +   4537 70.32.83.92
    +  22508 66.249.64.59
    +
    + + + +
    Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
    +
    + + + +
    $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
    +342
    +
    + + + +
    Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
    +
    + + + +
    $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
    +1243
    +
    + + +

    Read more → diff --git a/docs/tags/notes/index.html b/docs/tags/notes/index.html index 17540d6bb..4082624e0 100644 --- a/docs/tags/notes/index.html +++ b/docs/tags/notes/index.html @@ -90,6 +90,65 @@
  • Send a note about my dspace-statistics-api to the dspace-tech mailing list
  • +

    2018-11-03

    + + + +
    # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
    +   1300 66.249.64.63
    +   1384 35.237.175.180
    +   1430 138.201.52.218
    +   1455 207.46.13.156
    +   1500 40.77.167.175
    +   1979 50.116.102.77
    +   2790 66.249.64.61
    +   3367 84.38.130.177
    +   4537 70.32.83.92
    +  22508 66.249.64.59
    +
    + + + +
    Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
    +
    + + + +
    $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
    +342
    +
    + + + +
    Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
    +
    + + + +
    $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
    +1243
    +
    + + +

    Read more → diff --git a/docs/tags/notes/index.xml b/docs/tags/notes/index.xml index b9bcd8263..ed12c0f79 100644 --- a/docs/tags/notes/index.xml +++ b/docs/tags/notes/index.xml @@ -24,6 +24,65 @@ <li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li> </ul> +<h2 id="2018-11-03">2018-11-03</h2> + +<ul> +<li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li> +<li>Today these are the top 10 IPs:</li> +</ul> + +<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Nov/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 + 1300 66.249.64.63 + 1384 35.237.175.180 + 1430 138.201.52.218 + 1455 207.46.13.156 + 1500 40.77.167.175 + 1979 50.116.102.77 + 2790 66.249.64.61 + 3367 84.38.130.177 + 4537 70.32.83.92 + 22508 66.249.64.59 +</code></pre> + +<ul> +<li>The <code>66.249.64.x</code> are definitely Google</li> +<li><code>70.32.83.92</code> is well known, probably CCAFS or something, as it&rsquo;s only a few thousand requests and always to REST API</li> +<li><code>84.38.130.177</code> is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:</li> +</ul> + +<pre><code>Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1 +</code></pre> + +<ul> +<li>They at least seem to be re-using their Tomcat sessions:</li> +</ul> + +<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq +342 +</code></pre> + +<ul> +<li><code>50.116.102.77</code> is also a regular REST API user</li> +<li><code>40.77.167.175</code> and <code>207.46.13.156</code> seem to be Bing</li> +<li><code>138.201.52.218</code> seems to be on Hetzner in Germany, but is using this user agent:</li> +</ul> + +<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0 +</code></pre> + +<ul> +<li>And it doesn&rsquo;t seem they are re-using their Tomcat sessions:</li> +</ul> + +<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq +1243 +</code></pre> + +<ul> +<li>Ah, we&rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&hellip;</li> +<li>I wonder if it&rsquo;s worth adding them to the list of bots in the nginx config?</li> +</ul> + <p></p>