Update notes for 2018-11-03

This commit is contained in:
Alan Orth 2018-11-04 01:02:29 +02:00
parent ee189c2ebf
commit 6bf2bdae09
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
11 changed files with 139 additions and 530 deletions

View File

@ -15,6 +15,9 @@ tags: ["Notes"]
- Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage - Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage
- Today these are the top 10 IPs: - Today these are the top 10 IPs:
<!--more-->
``` ```
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1300 66.249.64.63 1300 66.249.64.63
@ -61,7 +64,67 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11
- Ah, we've apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day... - Ah, we've apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day...
- I wonder if it's worth adding them to the list of bots in the nginx config? - I wonder if it's worth adding them to the list of bots in the nginx config?
- Linode sent a mail that CGSpace (linode18) is using high outgoing bandwidth
- Looking at the nginx logs again I see the following top ten IPs:
<!--more--> ```
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1979 50.116.102.77
1980 35.237.175.180
2186 207.46.13.156
2208 40.77.167.175
2843 66.249.64.63
4220 84.38.130.177
4537 70.32.83.92
5593 66.249.64.61
12557 78.46.89.18
32152 66.249.64.59
```
- `78.46.89.18` is new since I last checked a few hours ago, and it's from Hetzner with the following user agent:
```
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
```
- It's making lots of requests and using quite a number of Tomcat sessions:
```
$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.89.18' /home/cgspace.cgiar.org/log/dspace.log.2018-11-03 | sort | uniq
8449
```
- I could add this IP to the list of bot IPs in nginx, but it seems like a futile effort when some new IP could come along and do the same thing
- Perhaps I should think about adding rate limits to dynamic pages like `/discover` and `/browse`
- I think it's reasonable for a human to click one of those links five or ten times a minute...
- To contrast, `78.46.89.18` made about 300 requests per minute for a few hours today:
```
# grep 78.46.89.18 /var/log/nginx/access.log | grep -o -E '03/Nov/2018:[0-9][0-9]:[0-9][0-9]' | sort | uniq -c | sort -n | tail -n 20
286 03/Nov/2018:18:02
287 03/Nov/2018:18:21
289 03/Nov/2018:18:23
291 03/Nov/2018:18:27
293 03/Nov/2018:18:34
300 03/Nov/2018:17:58
300 03/Nov/2018:18:22
300 03/Nov/2018:18:32
304 03/Nov/2018:18:12
305 03/Nov/2018:18:13
305 03/Nov/2018:18:24
312 03/Nov/2018:18:39
322 03/Nov/2018:18:17
326 03/Nov/2018:18:38
327 03/Nov/2018:18:16
330 03/Nov/2018:17:57
332 03/Nov/2018:18:19
336 03/Nov/2018:17:56
340 03/Nov/2018:18:14
341 03/Nov/2018:18:18
```
- If they want to download all our metadata and PDFs they should use an API rather than scraping the XMLUI
- I will add them to the list of bot IPs in nginx for now and think about enforcing rate limits in XMLUI later
- Also, this is the third (?) time a mysterious IP in Hetzner has done this... who is this?
<!-- vim: set sw=2 ts=2: --> <!-- vim: set sw=2 ts=2: -->

View File

@ -20,62 +20,10 @@ Linode has been sending mails a few times a day recently that CGSpace (linode18)
Today these are the top 10 IPs: Today these are the top 10 IPs:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Nov/2018&quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
1300 66.249.64.63
1384 35.237.175.180
1430 138.201.52.218
1455 207.46.13.156
1500 40.77.167.175
1979 50.116.102.77
2790 66.249.64.61
3367 84.38.130.177
4537 70.32.83.92
22508 66.249.64.59
The 66.249.64.x are definitely Google
70.32.83.92 is well known, probably CCAFS or something, as it&rsquo;s only a few thousand requests and always to REST API
84.38.130.177 is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:
Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
They at least seem to be re-using their Tomcat sessions:
$ grep -c -E &#39;session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177&#39; dspace.log.2018-11-03 | sort | uniq
342
50.116.102.77 is also a regular REST API user
40.77.167.175 and 207.46.13.156 seem to be Bing
138.201.52.218 seems to be on Hetzner in Germany, but is using this user agent:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
And it doesn&rsquo;t seem they are re-using their Tomcat sessions:
$ grep -c -E &#39;session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218&#39; dspace.log.2018-11-03 | sort | uniq
1243
Ah, we&rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&hellip;
I wonder if it&rsquo;s worth adding them to the list of bots in the nginx config?
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-11/" /><meta property="article:published_time" content="2018-11-01T16:41:30&#43;02:00"/> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-11/" /><meta property="article:published_time" content="2018-11-01T16:41:30&#43;02:00"/>
<meta property="article:modified_time" content="2018-11-01T16:43:37&#43;02:00"/> <meta property="article:modified_time" content="2018-11-03T18:13:49&#43;02:00"/>
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="November, 2018"/> <meta name="twitter:title" content="November, 2018"/>
@ -93,58 +41,6 @@ Linode has been sending mails a few times a day recently that CGSpace (linode18)
Today these are the top 10 IPs: Today these are the top 10 IPs:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Nov/2018&quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
1300 66.249.64.63
1384 35.237.175.180
1430 138.201.52.218
1455 207.46.13.156
1500 40.77.167.175
1979 50.116.102.77
2790 66.249.64.61
3367 84.38.130.177
4537 70.32.83.92
22508 66.249.64.59
The 66.249.64.x are definitely Google
70.32.83.92 is well known, probably CCAFS or something, as it&rsquo;s only a few thousand requests and always to REST API
84.38.130.177 is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:
Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
They at least seem to be re-using their Tomcat sessions:
$ grep -c -E &#39;session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177&#39; dspace.log.2018-11-03 | sort | uniq
342
50.116.102.77 is also a regular REST API user
40.77.167.175 and 207.46.13.156 seem to be Bing
138.201.52.218 seems to be on Hetzner in Germany, but is using this user agent:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
And it doesn&rsquo;t seem they are re-using their Tomcat sessions:
$ grep -c -E &#39;session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218&#39; dspace.log.2018-11-03 | sort | uniq
1243
Ah, we&rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&hellip;
I wonder if it&rsquo;s worth adding them to the list of bots in the nginx config?
"/> "/>
<meta name="generator" content="Hugo 0.50" /> <meta name="generator" content="Hugo 0.50" />
@ -156,9 +52,9 @@ I wonder if it&rsquo;s worth adding them to the list of bots in the nginx config
"@type": "BlogPosting", "@type": "BlogPosting",
"headline": "November, 2018", "headline": "November, 2018",
"url": "https://alanorth.github.io/cgspace-notes/2018-11/", "url": "https://alanorth.github.io/cgspace-notes/2018-11/",
"wordCount": "260", "wordCount": "586",
"datePublished": "2018-11-01T16:41:30&#43;02:00", "datePublished": "2018-11-01T16:41:30&#43;02:00",
"dateModified": "2018-11-01T16:43:37&#43;02:00", "dateModified": "2018-11-03T18:13:49&#43;02:00",
"author": { "author": {
"@type": "Person", "@type": "Person",
"name": "Alan Orth" "name": "Alan Orth"
@ -238,6 +134,8 @@ I wonder if it&rsquo;s worth adding them to the list of bots in the nginx config
<li>Today these are the top 10 IPs:</li> <li>Today these are the top 10 IPs:</li>
</ul> </ul>
<p></p>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Nov/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 <pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Nov/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1300 66.249.64.63 1300 66.249.64.63
1384 35.237.175.180 1384 35.237.175.180
@ -288,9 +186,73 @@ I wonder if it&rsquo;s worth adding them to the list of bots in the nginx config
<ul> <ul>
<li>Ah, we&rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&hellip;</li> <li>Ah, we&rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&hellip;</li>
<li>I wonder if it&rsquo;s worth adding them to the list of bots in the nginx config?</li> <li>I wonder if it&rsquo;s worth adding them to the list of bots in the nginx config?</li>
<li>Linode sent a mail that CGSpace (linode18) is using high outgoing bandwidth</li>
<li>Looking at the nginx logs again I see the following top ten IPs:</li>
</ul> </ul>
<p></p> <pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Nov/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1979 50.116.102.77
1980 35.237.175.180
2186 207.46.13.156
2208 40.77.167.175
2843 66.249.64.63
4220 84.38.130.177
4537 70.32.83.92
5593 66.249.64.61
12557 78.46.89.18
32152 66.249.64.59
</code></pre>
<ul>
<li><code>78.46.89.18</code> is new since I last checked a few hours ago, and it&rsquo;s from Hetzner with the following user agent:</li>
</ul>
<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
</code></pre>
<ul>
<li>It&rsquo;s making lots of requests and using quite a number of Tomcat sessions:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.89.18' /home/cgspace.cgiar.org/log/dspace.log.2018-11-03 | sort | uniq
8449
</code></pre>
<ul>
<li>I could add this IP to the list of bot IPs in nginx, but it seems like a futile effort when some new IP could come along and do the same thing</li>
<li>Perhaps I should think about adding rate limits to dynamic pages like <code>/discover</code> and <code>/browse</code></li>
<li>I think it&rsquo;s reasonable for a human to click one of those links five or ten times a minute&hellip;</li>
<li>To contrast, <code>78.46.89.18</code> made about 300 requests per minute for a few hours today:</li>
</ul>
<pre><code># grep 78.46.89.18 /var/log/nginx/access.log | grep -o -E '03/Nov/2018:[0-9][0-9]:[0-9][0-9]' | sort | uniq -c | sort -n | tail -n 20
286 03/Nov/2018:18:02
287 03/Nov/2018:18:21
289 03/Nov/2018:18:23
291 03/Nov/2018:18:27
293 03/Nov/2018:18:34
300 03/Nov/2018:17:58
300 03/Nov/2018:18:22
300 03/Nov/2018:18:32
304 03/Nov/2018:18:12
305 03/Nov/2018:18:13
305 03/Nov/2018:18:24
312 03/Nov/2018:18:39
322 03/Nov/2018:18:17
326 03/Nov/2018:18:38
327 03/Nov/2018:18:16
330 03/Nov/2018:17:57
332 03/Nov/2018:18:19
336 03/Nov/2018:17:56
340 03/Nov/2018:18:14
341 03/Nov/2018:18:18
</code></pre>
<ul>
<li>If they want to download all our metadata and PDFs they should use an API rather than scraping the XMLUI</li>
<li>I will add them to the list of bot IPs in nginx for now and think about enforcing rate limits in XMLUI later</li>
<li>Also, this is the third (?) time a mysterious IP in Hetzner has done this&hellip; who is this?</li>
</ul>
<!-- vim: set sw=2 ts=2: --> <!-- vim: set sw=2 ts=2: -->

View File

@ -110,58 +110,6 @@
<li>Today these are the top 10 IPs:</li> <li>Today these are the top 10 IPs:</li>
</ul> </ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Nov/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1300 66.249.64.63
1384 35.237.175.180
1430 138.201.52.218
1455 207.46.13.156
1500 40.77.167.175
1979 50.116.102.77
2790 66.249.64.61
3367 84.38.130.177
4537 70.32.83.92
22508 66.249.64.59
</code></pre>
<ul>
<li>The <code>66.249.64.x</code> are definitely Google</li>
<li><code>70.32.83.92</code> is well known, probably CCAFS or something, as it&rsquo;s only a few thousand requests and always to REST API</li>
<li><code>84.38.130.177</code> is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:</li>
</ul>
<pre><code>Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
</code></pre>
<ul>
<li>They at least seem to be re-using their Tomcat sessions:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
342
</code></pre>
<ul>
<li><code>50.116.102.77</code> is also a regular REST API user</li>
<li><code>40.77.167.175</code> and <code>207.46.13.156</code> seem to be Bing</li>
<li><code>138.201.52.218</code> seems to be on Hetzner in Germany, but is using this user agent:</li>
</ul>
<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
</code></pre>
<ul>
<li>And it doesn&rsquo;t seem they are re-using their Tomcat sessions:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
1243
</code></pre>
<ul>
<li>Ah, we&rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&hellip;</li>
<li>I wonder if it&rsquo;s worth adding them to the list of bots in the nginx config?</li>
</ul>
<p></p> <p></p>
<a href='https://alanorth.github.io/cgspace-notes/2018-11/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-11/'>Read more →</a>
</article> </article>

View File

@ -112,58 +112,6 @@
<li>Today these are the top 10 IPs:</li> <li>Today these are the top 10 IPs:</li>
</ul> </ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Nov/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1300 66.249.64.63
1384 35.237.175.180
1430 138.201.52.218
1455 207.46.13.156
1500 40.77.167.175
1979 50.116.102.77
2790 66.249.64.61
3367 84.38.130.177
4537 70.32.83.92
22508 66.249.64.59
</code></pre>
<ul>
<li>The <code>66.249.64.x</code> are definitely Google</li>
<li><code>70.32.83.92</code> is well known, probably CCAFS or something, as it&rsquo;s only a few thousand requests and always to REST API</li>
<li><code>84.38.130.177</code> is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:</li>
</ul>
<pre><code>Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
</code></pre>
<ul>
<li>They at least seem to be re-using their Tomcat sessions:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
342
</code></pre>
<ul>
<li><code>50.116.102.77</code> is also a regular REST API user</li>
<li><code>40.77.167.175</code> and <code>207.46.13.156</code> seem to be Bing</li>
<li><code>138.201.52.218</code> seems to be on Hetzner in Germany, but is using this user agent:</li>
</ul>
<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
</code></pre>
<ul>
<li>And it doesn&rsquo;t seem they are re-using their Tomcat sessions:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
1243
</code></pre>
<ul>
<li>Ah, we&rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&hellip;</li>
<li>I wonder if it&rsquo;s worth adding them to the list of bots in the nginx config?</li>
</ul>
<p></p> <p></p>
<a href='https://alanorth.github.io/cgspace-notes/2018-11/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-11/'>Read more →</a>
</article> </article>

View File

@ -31,58 +31,6 @@
&lt;li&gt;Today these are the top 10 IPs:&lt;/li&gt; &lt;li&gt;Today these are the top 10 IPs:&lt;/li&gt;
&lt;/ul&gt; &lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &amp;quot;03/Nov/2018&amp;quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
1300 66.249.64.63
1384 35.237.175.180
1430 138.201.52.218
1455 207.46.13.156
1500 40.77.167.175
1979 50.116.102.77
2790 66.249.64.61
3367 84.38.130.177
4537 70.32.83.92
22508 66.249.64.59
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;66.249.64.x&lt;/code&gt; are definitely Google&lt;/li&gt;
&lt;li&gt;&lt;code&gt;70.32.83.92&lt;/code&gt; is well known, probably CCAFS or something, as it&amp;rsquo;s only a few thousand requests and always to REST API&lt;/li&gt;
&lt;li&gt;&lt;code&gt;84.38.130.177&lt;/code&gt; is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;They at least seem to be re-using their Tomcat sessions:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ grep -c -E &#39;session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177&#39; dspace.log.2018-11-03 | sort | uniq
342
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;50.116.102.77&lt;/code&gt; is also a regular REST API user&lt;/li&gt;
&lt;li&gt;&lt;code&gt;40.77.167.175&lt;/code&gt; and &lt;code&gt;207.46.13.156&lt;/code&gt; seem to be Bing&lt;/li&gt;
&lt;li&gt;&lt;code&gt;138.201.52.218&lt;/code&gt; seems to be on Hetzner in Germany, but is using this user agent:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;And it doesn&amp;rsquo;t seem they are re-using their Tomcat sessions:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ grep -c -E &#39;session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218&#39; dspace.log.2018-11-03 | sort | uniq
1243
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Ah, we&amp;rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&amp;hellip;&lt;/li&gt;
&lt;li&gt;I wonder if it&amp;rsquo;s worth adding them to the list of bots in the nginx config?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;</description> &lt;p&gt;&lt;/p&gt;</description>
</item> </item>

View File

@ -112,58 +112,6 @@
<li>Today these are the top 10 IPs:</li> <li>Today these are the top 10 IPs:</li>
</ul> </ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Nov/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1300 66.249.64.63
1384 35.237.175.180
1430 138.201.52.218
1455 207.46.13.156
1500 40.77.167.175
1979 50.116.102.77
2790 66.249.64.61
3367 84.38.130.177
4537 70.32.83.92
22508 66.249.64.59
</code></pre>
<ul>
<li>The <code>66.249.64.x</code> are definitely Google</li>
<li><code>70.32.83.92</code> is well known, probably CCAFS or something, as it&rsquo;s only a few thousand requests and always to REST API</li>
<li><code>84.38.130.177</code> is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:</li>
</ul>
<pre><code>Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
</code></pre>
<ul>
<li>They at least seem to be re-using their Tomcat sessions:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
342
</code></pre>
<ul>
<li><code>50.116.102.77</code> is also a regular REST API user</li>
<li><code>40.77.167.175</code> and <code>207.46.13.156</code> seem to be Bing</li>
<li><code>138.201.52.218</code> seems to be on Hetzner in Germany, but is using this user agent:</li>
</ul>
<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
</code></pre>
<ul>
<li>And it doesn&rsquo;t seem they are re-using their Tomcat sessions:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
1243
</code></pre>
<ul>
<li>Ah, we&rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&hellip;</li>
<li>I wonder if it&rsquo;s worth adding them to the list of bots in the nginx config?</li>
</ul>
<p></p> <p></p>
<a href='https://alanorth.github.io/cgspace-notes/2018-11/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-11/'>Read more →</a>
</article> </article>

View File

@ -31,58 +31,6 @@
&lt;li&gt;Today these are the top 10 IPs:&lt;/li&gt; &lt;li&gt;Today these are the top 10 IPs:&lt;/li&gt;
&lt;/ul&gt; &lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &amp;quot;03/Nov/2018&amp;quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
1300 66.249.64.63
1384 35.237.175.180
1430 138.201.52.218
1455 207.46.13.156
1500 40.77.167.175
1979 50.116.102.77
2790 66.249.64.61
3367 84.38.130.177
4537 70.32.83.92
22508 66.249.64.59
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;66.249.64.x&lt;/code&gt; are definitely Google&lt;/li&gt;
&lt;li&gt;&lt;code&gt;70.32.83.92&lt;/code&gt; is well known, probably CCAFS or something, as it&amp;rsquo;s only a few thousand requests and always to REST API&lt;/li&gt;
&lt;li&gt;&lt;code&gt;84.38.130.177&lt;/code&gt; is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;They at least seem to be re-using their Tomcat sessions:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ grep -c -E &#39;session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177&#39; dspace.log.2018-11-03 | sort | uniq
342
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;50.116.102.77&lt;/code&gt; is also a regular REST API user&lt;/li&gt;
&lt;li&gt;&lt;code&gt;40.77.167.175&lt;/code&gt; and &lt;code&gt;207.46.13.156&lt;/code&gt; seem to be Bing&lt;/li&gt;
&lt;li&gt;&lt;code&gt;138.201.52.218&lt;/code&gt; seems to be on Hetzner in Germany, but is using this user agent:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;And it doesn&amp;rsquo;t seem they are re-using their Tomcat sessions:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ grep -c -E &#39;session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218&#39; dspace.log.2018-11-03 | sort | uniq
1243
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Ah, we&amp;rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&amp;hellip;&lt;/li&gt;
&lt;li&gt;I wonder if it&amp;rsquo;s worth adding them to the list of bots in the nginx config?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;</description> &lt;p&gt;&lt;/p&gt;</description>
</item> </item>

View File

@ -4,7 +4,7 @@
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/2018-11/</loc> <loc>https://alanorth.github.io/cgspace-notes/2018-11/</loc>
<lastmod>2018-11-01T16:43:37+02:00</lastmod> <lastmod>2018-11-03T18:13:49+02:00</lastmod>
</url> </url>
<url> <url>
@ -194,7 +194,7 @@
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/</loc> <loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2018-11-01T16:43:37+02:00</lastmod> <lastmod>2018-11-03T18:13:49+02:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>
@ -205,7 +205,7 @@
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc> <loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
<lastmod>2018-11-01T16:43:37+02:00</lastmod> <lastmod>2018-11-03T18:13:49+02:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>
@ -217,13 +217,13 @@
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc> <loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2018-11-01T16:43:37+02:00</lastmod> <lastmod>2018-11-03T18:13:49+02:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc> <loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
<lastmod>2018-11-01T16:43:37+02:00</lastmod> <lastmod>2018-11-03T18:13:49+02:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>

View File

@ -112,58 +112,6 @@
<li>Today these are the top 10 IPs:</li> <li>Today these are the top 10 IPs:</li>
</ul> </ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Nov/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1300 66.249.64.63
1384 35.237.175.180
1430 138.201.52.218
1455 207.46.13.156
1500 40.77.167.175
1979 50.116.102.77
2790 66.249.64.61
3367 84.38.130.177
4537 70.32.83.92
22508 66.249.64.59
</code></pre>
<ul>
<li>The <code>66.249.64.x</code> are definitely Google</li>
<li><code>70.32.83.92</code> is well known, probably CCAFS or something, as it&rsquo;s only a few thousand requests and always to REST API</li>
<li><code>84.38.130.177</code> is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:</li>
</ul>
<pre><code>Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
</code></pre>
<ul>
<li>They at least seem to be re-using their Tomcat sessions:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
342
</code></pre>
<ul>
<li><code>50.116.102.77</code> is also a regular REST API user</li>
<li><code>40.77.167.175</code> and <code>207.46.13.156</code> seem to be Bing</li>
<li><code>138.201.52.218</code> seems to be on Hetzner in Germany, but is using this user agent:</li>
</ul>
<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
</code></pre>
<ul>
<li>And it doesn&rsquo;t seem they are re-using their Tomcat sessions:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
1243
</code></pre>
<ul>
<li>Ah, we&rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&hellip;</li>
<li>I wonder if it&rsquo;s worth adding them to the list of bots in the nginx config?</li>
</ul>
<p></p> <p></p>
<a href='https://alanorth.github.io/cgspace-notes/2018-11/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-11/'>Read more →</a>
</article> </article>

View File

@ -97,58 +97,6 @@
<li>Today these are the top 10 IPs:</li> <li>Today these are the top 10 IPs:</li>
</ul> </ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Nov/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1300 66.249.64.63
1384 35.237.175.180
1430 138.201.52.218
1455 207.46.13.156
1500 40.77.167.175
1979 50.116.102.77
2790 66.249.64.61
3367 84.38.130.177
4537 70.32.83.92
22508 66.249.64.59
</code></pre>
<ul>
<li>The <code>66.249.64.x</code> are definitely Google</li>
<li><code>70.32.83.92</code> is well known, probably CCAFS or something, as it&rsquo;s only a few thousand requests and always to REST API</li>
<li><code>84.38.130.177</code> is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:</li>
</ul>
<pre><code>Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
</code></pre>
<ul>
<li>They at least seem to be re-using their Tomcat sessions:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
342
</code></pre>
<ul>
<li><code>50.116.102.77</code> is also a regular REST API user</li>
<li><code>40.77.167.175</code> and <code>207.46.13.156</code> seem to be Bing</li>
<li><code>138.201.52.218</code> seems to be on Hetzner in Germany, but is using this user agent:</li>
</ul>
<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
</code></pre>
<ul>
<li>And it doesn&rsquo;t seem they are re-using their Tomcat sessions:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
1243
</code></pre>
<ul>
<li>Ah, we&rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&hellip;</li>
<li>I wonder if it&rsquo;s worth adding them to the list of bots in the nginx config?</li>
</ul>
<p></p> <p></p>
<a href='https://alanorth.github.io/cgspace-notes/2018-11/'>Read more →</a> <a href='https://alanorth.github.io/cgspace-notes/2018-11/'>Read more →</a>
</article> </article>

View File

@ -31,58 +31,6 @@
&lt;li&gt;Today these are the top 10 IPs:&lt;/li&gt; &lt;li&gt;Today these are the top 10 IPs:&lt;/li&gt;
&lt;/ul&gt; &lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &amp;quot;03/Nov/2018&amp;quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
1300 66.249.64.63
1384 35.237.175.180
1430 138.201.52.218
1455 207.46.13.156
1500 40.77.167.175
1979 50.116.102.77
2790 66.249.64.61
3367 84.38.130.177
4537 70.32.83.92
22508 66.249.64.59
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;66.249.64.x&lt;/code&gt; are definitely Google&lt;/li&gt;
&lt;li&gt;&lt;code&gt;70.32.83.92&lt;/code&gt; is well known, probably CCAFS or something, as it&amp;rsquo;s only a few thousand requests and always to REST API&lt;/li&gt;
&lt;li&gt;&lt;code&gt;84.38.130.177&lt;/code&gt; is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;They at least seem to be re-using their Tomcat sessions:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ grep -c -E &#39;session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177&#39; dspace.log.2018-11-03 | sort | uniq
342
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;50.116.102.77&lt;/code&gt; is also a regular REST API user&lt;/li&gt;
&lt;li&gt;&lt;code&gt;40.77.167.175&lt;/code&gt; and &lt;code&gt;207.46.13.156&lt;/code&gt; seem to be Bing&lt;/li&gt;
&lt;li&gt;&lt;code&gt;138.201.52.218&lt;/code&gt; seems to be on Hetzner in Germany, but is using this user agent:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;And it doesn&amp;rsquo;t seem they are re-using their Tomcat sessions:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ grep -c -E &#39;session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218&#39; dspace.log.2018-11-03 | sort | uniq
1243
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Ah, we&amp;rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&amp;hellip;&lt;/li&gt;
&lt;li&gt;I wonder if it&amp;rsquo;s worth adding them to the list of bots in the nginx config?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;</description> &lt;p&gt;&lt;/p&gt;</description>
</item> </item>