<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"> <meta property="og:title" content="November, 2018" /> <meta property="og:description" content="2018-11-01 Finalize AReS Phase I and Phase II ToRs Send a note about my dspace-statistics-api to the dspace-tech mailing list 2018-11-03 Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage Today these are the top 10 IPs: # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 1300 66.249.64.63 1384 35.237.175.180 1430 138.201.52.218 1455 207.46.13.156 1500 40.77.167.175 1979 50.116.102.77 2790 66.249.64.61 3367 84.38.130.177 4537 70.32.83.92 22508 66.249.64.59 The 66.249.64.x are definitely Google 70.32.83.92 is well known, probably CCAFS or something, as it’s only a few thousand requests and always to REST API 84.38.130.177 is some new IP in Latvia that is only hitting the XMLUI, using the following user agent: Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1 They at least seem to be re-using their Tomcat sessions: $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq 342 50.116.102.77 is also a regular REST API user 40.77.167.175 and 207.46.13.156 seem to be Bing 138.201.52.218 seems to be on Hetzner in Germany, but is using this user agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0 And it doesn’t seem they are re-using their Tomcat sessions: $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq 1243 Ah, we’ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day… I wonder if it’s worth adding them to the list of bots in the nginx config? " /> <meta property="og:type" content="article" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-11/" /><meta property="article:published_time" content="2018-11-01T16:41:30+02:00"/> <meta property="article:modified_time" content="2018-11-01T16:43:37+02:00"/> <meta name="twitter:card" content="summary"/> <meta name="twitter:title" content="November, 2018"/> <meta name="twitter:description" content="2018-11-01 Finalize AReS Phase I and Phase II ToRs Send a note about my dspace-statistics-api to the dspace-tech mailing list 2018-11-03 Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage Today these are the top 10 IPs: # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 1300 66.249.64.63 1384 35.237.175.180 1430 138.201.52.218 1455 207.46.13.156 1500 40.77.167.175 1979 50.116.102.77 2790 66.249.64.61 3367 84.38.130.177 4537 70.32.83.92 22508 66.249.64.59 The 66.249.64.x are definitely Google 70.32.83.92 is well known, probably CCAFS or something, as it’s only a few thousand requests and always to REST API 84.38.130.177 is some new IP in Latvia that is only hitting the XMLUI, using the following user agent: Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1 They at least seem to be re-using their Tomcat sessions: $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq 342 50.116.102.77 is also a regular REST API user 40.77.167.175 and 207.46.13.156 seem to be Bing 138.201.52.218 seems to be on Hetzner in Germany, but is using this user agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0 And it doesn’t seem they are re-using their Tomcat sessions: $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq 1243 Ah, we’ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day… I wonder if it’s worth adding them to the list of bots in the nginx config? "/> <meta name="generator" content="Hugo 0.50" /> <script type="application/ld+json"> { "@context": "http://schema.org", "@type": "BlogPosting", "headline": "November, 2018", "url": "https://alanorth.github.io/cgspace-notes/2018-11/", "wordCount": "260", "datePublished": "2018-11-01T16:41:30+02:00", "dateModified": "2018-11-01T16:43:37+02:00", "author": { "@type": "Person", "name": "Alan Orth" }, "keywords": "Notes" } </script> <link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2018-11/"> <title>November, 2018 | CGSpace Notes</title> <!-- combined, minified CSS --> <link href="https://alanorth.github.io/cgspace-notes/css/style.css" rel="stylesheet" integrity="sha384-Upm5uY/SXdvbjuIGH6fBjF5vOYUr9DguqBskM+EQpLBzO9U+9fMVmWEt+TTlGrWQ" crossorigin="anonymous"> </head> <body> <div class="blog-masthead"> <div class="container"> <nav class="nav blog-nav"> <a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a> </nav> </div> </div> <header class="blog-header"> <div class="container"> <h1 class="blog-title"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1> <p class="lead blog-description">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p> </div> </header> <div class="container"> <div class="row"> <div class="col-sm-8 blog-main"> <article class="blog-post"> <header> <h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2018-11/">November, 2018</a></h2> <p class="blog-post-meta"><time datetime="2018-11-01T16:41:30+02:00">Thu Nov 01, 2018</time> by Alan Orth in <i class="fa fa-tag" aria-hidden="true"></i> <a href="/cgspace-notes/tags/notes" rel="tag">Notes</a> </p> </header> <h2 id="2018-11-01">2018-11-01</h2> <ul> <li>Finalize AReS Phase I and Phase II ToRs</li> <li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li> </ul> <h2 id="2018-11-03">2018-11-03</h2> <ul> <li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li> <li>Today these are the top 10 IPs:</li> </ul> <pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 1300 66.249.64.63 1384 35.237.175.180 1430 138.201.52.218 1455 207.46.13.156 1500 40.77.167.175 1979 50.116.102.77 2790 66.249.64.61 3367 84.38.130.177 4537 70.32.83.92 22508 66.249.64.59 </code></pre> <ul> <li>The <code>66.249.64.x</code> are definitely Google</li> <li><code>70.32.83.92</code> is well known, probably CCAFS or something, as it’s only a few thousand requests and always to REST API</li> <li><code>84.38.130.177</code> is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:</li> </ul> <pre><code>Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1 </code></pre> <ul> <li>They at least seem to be re-using their Tomcat sessions:</li> </ul> <pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03 | sort | uniq 342 </code></pre> <ul> <li><code>50.116.102.77</code> is also a regular REST API user</li> <li><code>40.77.167.175</code> and <code>207.46.13.156</code> seem to be Bing</li> <li><code>138.201.52.218</code> seems to be on Hetzner in Germany, but is using this user agent:</li> </ul> <pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0 </code></pre> <ul> <li>And it doesn’t seem they are re-using their Tomcat sessions:</li> </ul> <pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03 | sort | uniq 1243 </code></pre> <ul> <li>Ah, we’ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day…</li> <li>I wonder if it’s worth adding them to the list of bots in the nginx config?</li> </ul> <p></p> <!-- vim: set sw=2 ts=2: --> </article> </div> <!-- /.blog-main --> <aside class="col-sm-3 ml-auto blog-sidebar"> <section class="sidebar-module"> <h4>Recent Posts</h4> <ol class="list-unstyled"> <li><a href="/cgspace-notes/2018-11/">November, 2018</a></li> <li><a href="/cgspace-notes/2018-10/">October, 2018</a></li> <li><a href="/cgspace-notes/2018-09/">September, 2018</a></li> <li><a href="/cgspace-notes/2018-08/">August, 2018</a></li> <li><a href="/cgspace-notes/2018-07/">July, 2018</a></li> </ol> </section> <section class="sidebar-module"> <h4>Links</h4> <ol class="list-unstyled"> <li><a href="https://cgspace.cgiar.org">CGSpace</a></li> <li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li> <li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li> </ol> </section> </aside> </div> <!-- /.row --> </div> <!-- /.container --> <footer class="blog-footer"> <p> Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>. </p> <p> <a href="#">Back to top</a> </p> </footer> </body> </html>