cgspace-notes/docs/2018-11/index.html

368 lines
9.8 KiB
HTML

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:title" content="November, 2018" />
<meta property="og:description" content="2018-11-01
Finalize AReS Phase I and Phase II ToRs
Send a note about my dspace-statistics-api to the dspace-tech mailing list
2018-11-03
Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage
Today these are the top 10 IPs:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Nov/2018&quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
1300 66.249.64.63
1384 35.237.175.180
1430 138.201.52.218
1455 207.46.13.156
1500 40.77.167.175
1979 50.116.102.77
2790 66.249.64.61
3367 84.38.130.177
4537 70.32.83.92
22508 66.249.64.59
The 66.249.64.x are definitely Google
70.32.83.92 is well known, probably CCAFS or something, as it&rsquo;s only a few thousand requests and always to REST API
84.38.130.177 is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:
Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
They at least seem to be re-using their Tomcat sessions:
$ grep -c -E &#39;session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177&#39; dspace.log.2018-11-03 | sort | uniq
342
50.116.102.77 is also a regular REST API user
40.77.167.175 and 207.46.13.156 seem to be Bing
138.201.52.218 seems to be on Hetzner in Germany, but is using this user agent:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
And it doesn&rsquo;t seem they are re-using their Tomcat sessions:
$ grep -c -E &#39;session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218&#39; dspace.log.2018-11-03 | sort | uniq
1243
Ah, we&rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&hellip;
I wonder if it&rsquo;s worth adding them to the list of bots in the nginx config?
" />
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-11/" /><meta property="article:published_time" content="2018-11-01T16:41:30&#43;02:00"/>
<meta property="article:modified_time" content="2018-11-01T16:43:37&#43;02:00"/>
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="November, 2018"/>
<meta name="twitter:description" content="2018-11-01
Finalize AReS Phase I and Phase II ToRs
Send a note about my dspace-statistics-api to the dspace-tech mailing list
2018-11-03
Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage
Today these are the top 10 IPs:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Nov/2018&quot; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
1300 66.249.64.63
1384 35.237.175.180
1430 138.201.52.218
1455 207.46.13.156
1500 40.77.167.175
1979 50.116.102.77
2790 66.249.64.61
3367 84.38.130.177
4537 70.32.83.92
22508 66.249.64.59
The 66.249.64.x are definitely Google
70.32.83.92 is well known, probably CCAFS or something, as it&rsquo;s only a few thousand requests and always to REST API
84.38.130.177 is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:
Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
They at least seem to be re-using their Tomcat sessions:
$ grep -c -E &#39;session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177&#39; dspace.log.2018-11-03 | sort | uniq
342
50.116.102.77 is also a regular REST API user
40.77.167.175 and 207.46.13.156 seem to be Bing
138.201.52.218 seems to be on Hetzner in Germany, but is using this user agent:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
And it doesn&rsquo;t seem they are re-using their Tomcat sessions:
$ grep -c -E &#39;session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218&#39; dspace.log.2018-11-03 | sort | uniq
1243
Ah, we&rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&hellip;
I wonder if it&rsquo;s worth adding them to the list of bots in the nginx config?
"/>
<meta name="generator" content="Hugo 0.50" />
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "November, 2018",
"url": "https://alanorth.github.io/cgspace-notes/2018-11/",
"wordCount": "260",
"datePublished": "2018-11-01T16:41:30&#43;02:00",
"dateModified": "2018-11-01T16:43:37&#43;02:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"keywords": "Notes"
}
</script>
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2018-11/">
<title>November, 2018 | CGSpace Notes</title>
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.css" rel="stylesheet" integrity="sha384-Upm5uY/SXdvbjuIGH6fBjF5vOYUr9DguqBskM&#43;EQpLBzO9U&#43;9fMVmWEt&#43;TTlGrWQ" crossorigin="anonymous">
</head>
<body>
<div class="blog-masthead">
<div class="container">
<nav class="nav blog-nav">
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
</nav>
</div>
</div>
<header class="blog-header">
<div class="container">
<h1 class="blog-title"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
<p class="lead blog-description">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
</div>
</header>
<div class="container">
<div class="row">
<div class="col-sm-8 blog-main">
<article class="blog-post">
<header>
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2018-11/">November, 2018</a></h2>
<p class="blog-post-meta"><time datetime="2018-11-01T16:41:30&#43;02:00">Thu Nov 01, 2018</time> by Alan Orth in
<i class="fa fa-tag" aria-hidden="true"></i>&nbsp;<a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
</p>
</header>
<h2 id="2018-11-01">2018-11-01</h2>
<ul>
<li>Finalize AReS Phase I and Phase II ToRs</li>
<li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li>
</ul>
<h2 id="2018-11-03">2018-11-03</h2>
<ul>
<li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li>
<li>Today these are the top 10 IPs:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Nov/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1300 66.249.64.63
1384 35.237.175.180
1430 138.201.52.218
1455 207.46.13.156
1500 40.77.167.175
1979 50.116.102.77
2790 66.249.64.61
3367 84.38.130.177
4537 70.32.83.92
22508 66.249.64.59
</code></pre>
<ul>
<li>The <code>66.249.64.x</code> are definitely Google</li>
<li><code>70.32.83.92</code> is well known, probably CCAFS or something, as it&rsquo;s only a few thousand requests and always to REST API</li>
<li><code>84.38.130.177</code> is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:</li>
</ul>
<pre><code>Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
</code></pre>
<ul>
<li>They at least seem to be re-using their Tomcat sessions:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq
342
</code></pre>
<ul>
<li><code>50.116.102.77</code> is also a regular REST API user</li>
<li><code>40.77.167.175</code> and <code>207.46.13.156</code> seem to be Bing</li>
<li><code>138.201.52.218</code> seems to be on Hetzner in Germany, but is using this user agent:</li>
</ul>
<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
</code></pre>
<ul>
<li>And it doesn&rsquo;t seem they are re-using their Tomcat sessions:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq
1243
</code></pre>
<ul>
<li>Ah, we&rsquo;ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day&hellip;</li>
<li>I wonder if it&rsquo;s worth adding them to the list of bots in the nginx config?</li>
</ul>
<p></p>
<!-- vim: set sw=2 ts=2: -->
</article>
</div> <!-- /.blog-main -->
<aside class="col-sm-3 ml-auto blog-sidebar">
<section class="sidebar-module">
<h4>Recent Posts</h4>
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2018-11/">November, 2018</a></li>
<li><a href="/cgspace-notes/2018-10/">October, 2018</a></li>
<li><a href="/cgspace-notes/2018-09/">September, 2018</a></li>
<li><a href="/cgspace-notes/2018-08/">August, 2018</a></li>
<li><a href="/cgspace-notes/2018-07/">July, 2018</a></li>
</ol>
</section>
<section class="sidebar-module">
<h4>Links</h4>
<ol class="list-unstyled">
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
</ol>
</section>
</aside>
</div> <!-- /.row -->
</div> <!-- /.container -->
<footer class="blog-footer">
<p>
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
</p>
<p>
<a href="#">Back to top</a>
</p>
</footer>
</body>
</html>