Add notes for 2018-11-03

This commit is contained in:
Alan Orth 2018-11-03 18:13:49 +02:00
parent ab0b4a986f
commit ee189c2ebf
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
11 changed files with 709 additions and 8 deletions

View File

@ -10,6 +10,58 @@ tags: ["Notes"]
- Finalize AReS Phase I and Phase II ToRs
- Send a note about my [dspace-statistics-api](https://github.com/ilri/dspace-statistics-api) to the dspace-tech mailing list
## 2018-11-03
- Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage
- Today these are the top 10 IPs:
```
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1300 66.249.64.63
1384 35.237.175.180
1430 138.201.52.218
1455 207.46.13.156
1500 40.77.167.175
1979 50.116.102.77
2790 66.249.64.61
3367 84.38.130.177
4537 70.32.83.92
22508 66.249.64.59
```
- The `66.249.64.x` are definitely Google
- `70.32.83.92` is well known, probably CCAFS or something, as it's only a few thousand requests and always to REST API
- `84.38.130.177` is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:
```
Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
```
- They at least seem to be re-using their Tomcat sessions:
```
$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
342
```
- `50.116.102.77` is also a regular REST API user
- `40.77.167.175` and `207.46.13.156` seem to be Bing
- `138.201.52.218` seems to be on Hetzner in Germany, but is using this user agent:
```
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
```
- And it doesn't seem they are re-using their Tomcat sessions:
```
$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
1243
```
- Ah, we've apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day...
- I wonder if it's worth adding them to the list of bots in the nginx config?
<!--more-->
<!-- vim: set sw=2 ts=2: -->

View File

@ -13,10 +13,69 @@ Finalize AReS Phase I and Phase II ToRs
Send a note about my dspace-statistics-api to the dspace-tech mailing list
2018-11-03
Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage
Today these are the top 10 IPs:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Nov/2018&quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
1300 66.249.64.63
1384 35.237.175.180
1430 138.201.52.218
1455 207.46.13.156
1500 40.77.167.175
1979 50.116.102.77
2790 66.249.64.61
3367 84.38.130.177
4537 70.32.83.92
22508 66.249.64.59
The 66.249.64.x are definitely Google
70.32.83.92 is well known, probably CCAFS or something, as it&rsquo;s only a few thousand requests and always to REST API
84.38.130.177 is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:
Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
They at least seem to be re-using their Tomcat sessions:
$ grep -c -E &#39;session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177&#39; dspace.log.2018-11-03 | sort | uniq
342
50.116.102.77 is also a regular REST API user
40.77.167.175 and 207.46.13.156 seem to be Bing
138.201.52.218 seems to be on Hetzner in Germany, but is using this user agent:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
And it doesn&rsquo;t seem they are re-using their Tomcat sessions:
$ grep -c -E &#39;session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218&#39; dspace.log.2018-11-03 | sort | uniq
1243
Ah, we&rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&hellip;
I wonder if it&rsquo;s worth adding them to the list of bots in the nginx config?
" />
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-11/" /><meta property="article:published_time" content="2018-11-01T16:41:30&#43;02:00"/>
<meta property="article:modified_time" content="2018-11-01T16:41:30&#43;02:00"/>
<meta property="article:modified_time" content="2018-11-01T16:43:37&#43;02:00"/>
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="November, 2018"/>
@ -27,6 +86,65 @@ Finalize AReS Phase I and Phase II ToRs
Send a note about my dspace-statistics-api to the dspace-tech mailing list
2018-11-03
Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage
Today these are the top 10 IPs:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Nov/2018&quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
1300 66.249.64.63
1384 35.237.175.180
1430 138.201.52.218
1455 207.46.13.156
1500 40.77.167.175
1979 50.116.102.77
2790 66.249.64.61
3367 84.38.130.177
4537 70.32.83.92
22508 66.249.64.59
The 66.249.64.x are definitely Google
70.32.83.92 is well known, probably CCAFS or something, as it&rsquo;s only a few thousand requests and always to REST API
84.38.130.177 is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:
Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
They at least seem to be re-using their Tomcat sessions:
$ grep -c -E &#39;session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177&#39; dspace.log.2018-11-03 | sort | uniq
342
50.116.102.77 is also a regular REST API user
40.77.167.175 and 207.46.13.156 seem to be Bing
138.201.52.218 seems to be on Hetzner in Germany, but is using this user agent:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
And it doesn&rsquo;t seem they are re-using their Tomcat sessions:
$ grep -c -E &#39;session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218&#39; dspace.log.2018-11-03 | sort | uniq
1243
Ah, we&rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&hellip;
I wonder if it&rsquo;s worth adding them to the list of bots in the nginx config?
"/>
<meta name="generator" content="Hugo 0.50" />
@ -38,9 +156,9 @@ Send a note about my dspace-statistics-api to the dspace-tech mailing list
"@type": "BlogPosting",
"headline": "November, 2018",
"url": "https://alanorth.github.io/cgspace-notes/2018-11/",
"wordCount": "20",
"wordCount": "260",
"datePublished": "2018-11-01T16:41:30&#43;02:00",
"dateModified": "2018-11-01T16:41:30&#43;02:00",
"dateModified": "2018-11-01T16:43:37&#43;02:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -113,6 +231,65 @@ Send a note about my dspace-statistics-api to the dspace-tech mailing list
<li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li>
</ul>
<h2 id="2018-11-03">2018-11-03</h2>
<ul>
<li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li>
<li>Today these are the top 10 IPs:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Nov/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1300 66.249.64.63
1384 35.237.175.180
1430 138.201.52.218
1455 207.46.13.156
1500 40.77.167.175
1979 50.116.102.77
2790 66.249.64.61
3367 84.38.130.177
4537 70.32.83.92
22508 66.249.64.59
</code></pre>
<ul>
<li>The <code>66.249.64.x</code> are definitely Google</li>
<li><code>70.32.83.92</code> is well known, probably CCAFS or something, as it&rsquo;s only a few thousand requests and always to REST API</li>
<li><code>84.38.130.177</code> is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:</li>
</ul>
<pre><code>Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
</code></pre>
<ul>
<li>They at least seem to be re-using their Tomcat sessions:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
342
</code></pre>
<ul>
<li><code>50.116.102.77</code> is also a regular REST API user</li>
<li><code>40.77.167.175</code> and <code>207.46.13.156</code> seem to be Bing</li>
<li><code>138.201.52.218</code> seems to be on Hetzner in Germany, but is using this user agent:</li>
</ul>
<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
</code></pre>
<ul>
<li>And it doesn&rsquo;t seem they are re-using their Tomcat sessions:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
1243
</code></pre>
<ul>
<li>Ah, we&rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&hellip;</li>
<li>I wonder if it&rsquo;s worth adding them to the list of bots in the nginx config?</li>
</ul>
<p></p>
<!-- vim: set sw=2 ts=2: -->

View File

@ -103,6 +103,65 @@
<li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li>
</ul>
<h2 id="2018-11-03">2018-11-03</h2>
<ul>
<li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li>
<li>Today these are the top 10 IPs:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Nov/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1300 66.249.64.63
1384 35.237.175.180
1430 138.201.52.218
1455 207.46.13.156
1500 40.77.167.175
1979 50.116.102.77
2790 66.249.64.61
3367 84.38.130.177
4537 70.32.83.92
22508 66.249.64.59
</code></pre>
<ul>
<li>The <code>66.249.64.x</code> are definitely Google</li>
<li><code>70.32.83.92</code> is well known, probably CCAFS or something, as it&rsquo;s only a few thousand requests and always to REST API</li>
<li><code>84.38.130.177</code> is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:</li>
</ul>
<pre><code>Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
</code></pre>
<ul>
<li>They at least seem to be re-using their Tomcat sessions:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
342
</code></pre>
<ul>
<li><code>50.116.102.77</code> is also a regular REST API user</li>
<li><code>40.77.167.175</code> and <code>207.46.13.156</code> seem to be Bing</li>
<li><code>138.201.52.218</code> seems to be on Hetzner in Germany, but is using this user agent:</li>
</ul>
<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
</code></pre>
<ul>
<li>And it doesn&rsquo;t seem they are re-using their Tomcat sessions:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
1243
</code></pre>
<ul>
<li>Ah, we&rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&hellip;</li>
<li>I wonder if it&rsquo;s worth adding them to the list of bots in the nginx config?</li>
</ul>
<p></p>
<a href='https://alanorth.github.io/cgspace-notes/2018-11/'>Read more →</a>
</article>

View File

@ -105,6 +105,65 @@
<li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li>
</ul>
<h2 id="2018-11-03">2018-11-03</h2>
<ul>
<li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li>
<li>Today these are the top 10 IPs:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Nov/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1300 66.249.64.63
1384 35.237.175.180
1430 138.201.52.218
1455 207.46.13.156
1500 40.77.167.175
1979 50.116.102.77
2790 66.249.64.61
3367 84.38.130.177
4537 70.32.83.92
22508 66.249.64.59
</code></pre>
<ul>
<li>The <code>66.249.64.x</code> are definitely Google</li>
<li><code>70.32.83.92</code> is well known, probably CCAFS or something, as it&rsquo;s only a few thousand requests and always to REST API</li>
<li><code>84.38.130.177</code> is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:</li>
</ul>
<pre><code>Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
</code></pre>
<ul>
<li>They at least seem to be re-using their Tomcat sessions:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
342
</code></pre>
<ul>
<li><code>50.116.102.77</code> is also a regular REST API user</li>
<li><code>40.77.167.175</code> and <code>207.46.13.156</code> seem to be Bing</li>
<li><code>138.201.52.218</code> seems to be on Hetzner in Germany, but is using this user agent:</li>
</ul>
<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
</code></pre>
<ul>
<li>And it doesn&rsquo;t seem they are re-using their Tomcat sessions:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
1243
</code></pre>
<ul>
<li>Ah, we&rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&hellip;</li>
<li>I wonder if it&rsquo;s worth adding them to the list of bots in the nginx config?</li>
</ul>
<p></p>
<a href='https://alanorth.github.io/cgspace-notes/2018-11/'>Read more →</a>
</article>

View File

@ -24,6 +24,65 @@
&lt;li&gt;Send a note about my &lt;a href=&#34;https://github.com/ilri/dspace-statistics-api&#34;&gt;dspace-statistics-api&lt;/a&gt; to the dspace-tech mailing list&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;2018-11-03&#34;&gt;2018-11-03&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage&lt;/li&gt;
&lt;li&gt;Today these are the top 10 IPs:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &amp;quot;03/Nov/2018&amp;quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
1300 66.249.64.63
1384 35.237.175.180
1430 138.201.52.218
1455 207.46.13.156
1500 40.77.167.175
1979 50.116.102.77
2790 66.249.64.61
3367 84.38.130.177
4537 70.32.83.92
22508 66.249.64.59
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;66.249.64.x&lt;/code&gt; are definitely Google&lt;/li&gt;
&lt;li&gt;&lt;code&gt;70.32.83.92&lt;/code&gt; is well known, probably CCAFS or something, as it&amp;rsquo;s only a few thousand requests and always to REST API&lt;/li&gt;
&lt;li&gt;&lt;code&gt;84.38.130.177&lt;/code&gt; is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;They at least seem to be re-using their Tomcat sessions:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ grep -c -E &#39;session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177&#39; dspace.log.2018-11-03 | sort | uniq
342
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;50.116.102.77&lt;/code&gt; is also a regular REST API user&lt;/li&gt;
&lt;li&gt;&lt;code&gt;40.77.167.175&lt;/code&gt; and &lt;code&gt;207.46.13.156&lt;/code&gt; seem to be Bing&lt;/li&gt;
&lt;li&gt;&lt;code&gt;138.201.52.218&lt;/code&gt; seems to be on Hetzner in Germany, but is using this user agent:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;And it doesn&amp;rsquo;t seem they are re-using their Tomcat sessions:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ grep -c -E &#39;session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218&#39; dspace.log.2018-11-03 | sort | uniq
1243
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Ah, we&amp;rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&amp;hellip;&lt;/li&gt;
&lt;li&gt;I wonder if it&amp;rsquo;s worth adding them to the list of bots in the nginx config?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;</description>
</item>

View File

@ -105,6 +105,65 @@
<li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li>
</ul>
<h2 id="2018-11-03">2018-11-03</h2>
<ul>
<li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li>
<li>Today these are the top 10 IPs:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Nov/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1300 66.249.64.63
1384 35.237.175.180
1430 138.201.52.218
1455 207.46.13.156
1500 40.77.167.175
1979 50.116.102.77
2790 66.249.64.61
3367 84.38.130.177
4537 70.32.83.92
22508 66.249.64.59
</code></pre>
<ul>
<li>The <code>66.249.64.x</code> are definitely Google</li>
<li><code>70.32.83.92</code> is well known, probably CCAFS or something, as it&rsquo;s only a few thousand requests and always to REST API</li>
<li><code>84.38.130.177</code> is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:</li>
</ul>
<pre><code>Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
</code></pre>
<ul>
<li>They at least seem to be re-using their Tomcat sessions:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
342
</code></pre>
<ul>
<li><code>50.116.102.77</code> is also a regular REST API user</li>
<li><code>40.77.167.175</code> and <code>207.46.13.156</code> seem to be Bing</li>
<li><code>138.201.52.218</code> seems to be on Hetzner in Germany, but is using this user agent:</li>
</ul>
<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
</code></pre>
<ul>
<li>And it doesn&rsquo;t seem they are re-using their Tomcat sessions:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
1243
</code></pre>
<ul>
<li>Ah, we&rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&hellip;</li>
<li>I wonder if it&rsquo;s worth adding them to the list of bots in the nginx config?</li>
</ul>
<p></p>
<a href='https://alanorth.github.io/cgspace-notes/2018-11/'>Read more →</a>
</article>

View File

@ -24,6 +24,65 @@
&lt;li&gt;Send a note about my &lt;a href=&#34;https://github.com/ilri/dspace-statistics-api&#34;&gt;dspace-statistics-api&lt;/a&gt; to the dspace-tech mailing list&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;2018-11-03&#34;&gt;2018-11-03&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage&lt;/li&gt;
&lt;li&gt;Today these are the top 10 IPs:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &amp;quot;03/Nov/2018&amp;quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
1300 66.249.64.63
1384 35.237.175.180
1430 138.201.52.218
1455 207.46.13.156
1500 40.77.167.175
1979 50.116.102.77
2790 66.249.64.61
3367 84.38.130.177
4537 70.32.83.92
22508 66.249.64.59
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;66.249.64.x&lt;/code&gt; are definitely Google&lt;/li&gt;
&lt;li&gt;&lt;code&gt;70.32.83.92&lt;/code&gt; is well known, probably CCAFS or something, as it&amp;rsquo;s only a few thousand requests and always to REST API&lt;/li&gt;
&lt;li&gt;&lt;code&gt;84.38.130.177&lt;/code&gt; is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;They at least seem to be re-using their Tomcat sessions:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ grep -c -E &#39;session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177&#39; dspace.log.2018-11-03 | sort | uniq
342
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;50.116.102.77&lt;/code&gt; is also a regular REST API user&lt;/li&gt;
&lt;li&gt;&lt;code&gt;40.77.167.175&lt;/code&gt; and &lt;code&gt;207.46.13.156&lt;/code&gt; seem to be Bing&lt;/li&gt;
&lt;li&gt;&lt;code&gt;138.201.52.218&lt;/code&gt; seems to be on Hetzner in Germany, but is using this user agent:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;And it doesn&amp;rsquo;t seem they are re-using their Tomcat sessions:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ grep -c -E &#39;session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218&#39; dspace.log.2018-11-03 | sort | uniq
1243
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Ah, we&amp;rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&amp;hellip;&lt;/li&gt;
&lt;li&gt;I wonder if it&amp;rsquo;s worth adding them to the list of bots in the nginx config?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;</description>
</item>

View File

@ -4,7 +4,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/2018-11/</loc>
<lastmod>2018-11-01T16:41:30+02:00</lastmod>
<lastmod>2018-11-01T16:43:37+02:00</lastmod>
</url>
<url>
@ -194,7 +194,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2018-11-01T16:41:30+02:00</lastmod>
<lastmod>2018-11-01T16:43:37+02:00</lastmod>
<priority>0</priority>
</url>
@ -205,7 +205,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
<lastmod>2018-11-01T16:41:30+02:00</lastmod>
<lastmod>2018-11-01T16:43:37+02:00</lastmod>
<priority>0</priority>
</url>
@ -217,13 +217,13 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2018-11-01T16:41:30+02:00</lastmod>
<lastmod>2018-11-01T16:43:37+02:00</lastmod>
<priority>0</priority>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
<lastmod>2018-11-01T16:41:30+02:00</lastmod>
<lastmod>2018-11-01T16:43:37+02:00</lastmod>
<priority>0</priority>
</url>

View File

@ -105,6 +105,65 @@
<li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li>
</ul>
<h2 id="2018-11-03">2018-11-03</h2>
<ul>
<li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li>
<li>Today these are the top 10 IPs:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Nov/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1300 66.249.64.63
1384 35.237.175.180
1430 138.201.52.218
1455 207.46.13.156
1500 40.77.167.175
1979 50.116.102.77
2790 66.249.64.61
3367 84.38.130.177
4537 70.32.83.92
22508 66.249.64.59
</code></pre>
<ul>
<li>The <code>66.249.64.x</code> are definitely Google</li>
<li><code>70.32.83.92</code> is well known, probably CCAFS or something, as it&rsquo;s only a few thousand requests and always to REST API</li>
<li><code>84.38.130.177</code> is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:</li>
</ul>
<pre><code>Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
</code></pre>
<ul>
<li>They at least seem to be re-using their Tomcat sessions:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
342
</code></pre>
<ul>
<li><code>50.116.102.77</code> is also a regular REST API user</li>
<li><code>40.77.167.175</code> and <code>207.46.13.156</code> seem to be Bing</li>
<li><code>138.201.52.218</code> seems to be on Hetzner in Germany, but is using this user agent:</li>
</ul>
<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
</code></pre>
<ul>
<li>And it doesn&rsquo;t seem they are re-using their Tomcat sessions:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
1243
</code></pre>
<ul>
<li>Ah, we&rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&hellip;</li>
<li>I wonder if it&rsquo;s worth adding them to the list of bots in the nginx config?</li>
</ul>
<p></p>
<a href='https://alanorth.github.io/cgspace-notes/2018-11/'>Read more →</a>
</article>

View File

@ -90,6 +90,65 @@
<li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li>
</ul>
<h2 id="2018-11-03">2018-11-03</h2>
<ul>
<li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li>
<li>Today these are the top 10 IPs:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Nov/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1300 66.249.64.63
1384 35.237.175.180
1430 138.201.52.218
1455 207.46.13.156
1500 40.77.167.175
1979 50.116.102.77
2790 66.249.64.61
3367 84.38.130.177
4537 70.32.83.92
22508 66.249.64.59
</code></pre>
<ul>
<li>The <code>66.249.64.x</code> are definitely Google</li>
<li><code>70.32.83.92</code> is well known, probably CCAFS or something, as it&rsquo;s only a few thousand requests and always to REST API</li>
<li><code>84.38.130.177</code> is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:</li>
</ul>
<pre><code>Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
</code></pre>
<ul>
<li>They at least seem to be re-using their Tomcat sessions:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
342
</code></pre>
<ul>
<li><code>50.116.102.77</code> is also a regular REST API user</li>
<li><code>40.77.167.175</code> and <code>207.46.13.156</code> seem to be Bing</li>
<li><code>138.201.52.218</code> seems to be on Hetzner in Germany, but is using this user agent:</li>
</ul>
<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
</code></pre>
<ul>
<li>And it doesn&rsquo;t seem they are re-using their Tomcat sessions:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
1243
</code></pre>
<ul>
<li>Ah, we&rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&hellip;</li>
<li>I wonder if it&rsquo;s worth adding them to the list of bots in the nginx config?</li>
</ul>
<p></p>
<a href='https://alanorth.github.io/cgspace-notes/2018-11/'>Read more →</a>
</article>

View File

@ -24,6 +24,65 @@
&lt;li&gt;Send a note about my &lt;a href=&#34;https://github.com/ilri/dspace-statistics-api&#34;&gt;dspace-statistics-api&lt;/a&gt; to the dspace-tech mailing list&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;2018-11-03&#34;&gt;2018-11-03&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage&lt;/li&gt;
&lt;li&gt;Today these are the top 10 IPs:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &amp;quot;03/Nov/2018&amp;quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
1300 66.249.64.63
1384 35.237.175.180
1430 138.201.52.218
1455 207.46.13.156
1500 40.77.167.175
1979 50.116.102.77
2790 66.249.64.61
3367 84.38.130.177
4537 70.32.83.92
22508 66.249.64.59
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;66.249.64.x&lt;/code&gt; are definitely Google&lt;/li&gt;
&lt;li&gt;&lt;code&gt;70.32.83.92&lt;/code&gt; is well known, probably CCAFS or something, as it&amp;rsquo;s only a few thousand requests and always to REST API&lt;/li&gt;
&lt;li&gt;&lt;code&gt;84.38.130.177&lt;/code&gt; is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;They at least seem to be re-using their Tomcat sessions:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ grep -c -E &#39;session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177&#39; dspace.log.2018-11-03 | sort | uniq
342
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;50.116.102.77&lt;/code&gt; is also a regular REST API user&lt;/li&gt;
&lt;li&gt;&lt;code&gt;40.77.167.175&lt;/code&gt; and &lt;code&gt;207.46.13.156&lt;/code&gt; seem to be Bing&lt;/li&gt;
&lt;li&gt;&lt;code&gt;138.201.52.218&lt;/code&gt; seems to be on Hetzner in Germany, but is using this user agent:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;And it doesn&amp;rsquo;t seem they are re-using their Tomcat sessions:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ grep -c -E &#39;session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218&#39; dspace.log.2018-11-03 | sort | uniq
1243
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Ah, we&amp;rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&amp;hellip;&lt;/li&gt;
&lt;li&gt;I wonder if it&amp;rsquo;s worth adding them to the list of bots in the nginx config?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;</description>
</item>