<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"> <meta property="og:title" content="December, 2017" /> <meta property="og:description" content="2017-12-01 Uptime Robot noticed that CGSpace went down The logs say “Timeout waiting for idle object” PostgreSQL activity says there are 115 connections currently The list of connections to XMLUI and REST API for today: " /> <meta property="og:type" content="article" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2017-12/" /> <meta property="article:published_time" content="2017-12-01T13:53:54+03:00"/> <meta property="article:modified_time" content="2017-12-04T15:37:58+03:00"/> <meta name="twitter:card" content="summary"/><meta name="twitter:title" content="December, 2017"/> <meta name="twitter:description" content="2017-12-01 Uptime Robot noticed that CGSpace went down The logs say “Timeout waiting for idle object” PostgreSQL activity says there are 115 connections currently The list of connections to XMLUI and REST API for today: "/> <meta name="generator" content="Hugo 0.31.1" /> <script type="application/ld+json"> { "@context": "http://schema.org", "@type": "BlogPosting", "headline": "December, 2017", "url": "https://alanorth.github.io/cgspace-notes/2017-12/", "wordCount": "418", "datePublished": "2017-12-01T13:53:54+03:00", "dateModified": "2017-12-04T15:37:58+03:00", "author": { "@type": "Person", "name": "Alan Orth" }, "keywords": "Notes" } </script> <link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2017-12/"> <title>December, 2017 | CGSpace Notes</title> <!-- combined, minified CSS --> <link href="https://alanorth.github.io/cgspace-notes/css/style.css" rel="stylesheet" integrity="sha384-O8wjsnz02XiyrPxnhfF6AVOv6YLBaEGRCnVF+DL3gCPBy9cieyHcpixIrVyD2JS5" crossorigin="anonymous"> </head> <body> <div class="blog-masthead"> <div class="container"> <nav class="nav blog-nav"> <a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a> </nav> </div> </div> <header class="blog-header"> <div class="container"> <h1 class="blog-title"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1> <p class="lead blog-description">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p> </div> </header> <div class="container"> <div class="row"> <div class="col-sm-8 blog-main"> <article class="blog-post"> <header> <h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2017-12/">December, 2017</a></h2> <p class="blog-post-meta"><time datetime="2017-12-01T13:53:54+03:00">Fri Dec 01, 2017</time> by Alan Orth in <i class="fa fa-tag" aria-hidden="true"></i> <a href="/cgspace-notes/tags/notes" rel="tag">Notes</a> </p> </header> <h2 id="2017-12-01">2017-12-01</h2> <ul> <li>Uptime Robot noticed that CGSpace went down</li> <li>The logs say “Timeout waiting for idle object”</li> <li>PostgreSQL activity says there are 115 connections currently</li> <li>The list of connections to XMLUI and REST API for today:</li> </ul> <p></p> <pre><code># cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E "1/Dec/2017" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail 763 2.86.122.76 907 207.46.13.94 1018 157.55.39.206 1021 157.55.39.235 1407 66.249.66.70 1411 104.196.152.243 1503 50.116.102.77 1805 66.249.66.90 4007 70.32.83.92 6061 45.5.184.196 </code></pre> <ul> <li>The number of DSpace sessions isn’t even that high:</li> </ul> <pre><code>$ cat /home/cgspace.cgiar.org/log/dspace.log.2017-12-01 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l 5815 </code></pre> <ul> <li>Connections in the last two hours:</li> </ul> <pre><code># cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E "1/Dec/2017:(09|10)" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail 78 93.160.60.22 101 40.77.167.122 113 66.249.66.70 129 157.55.39.206 130 157.55.39.235 135 40.77.167.58 164 68.180.229.254 177 87.100.118.220 188 66.249.66.90 314 2.86.122.76 </code></pre> <ul> <li>What the fuck is going on?</li> <li>I’ve never seen this 2.86.122.76 before, it has made quite a few unique Tomcat sessions today:</li> </ul> <pre><code>$ grep 2.86.122.76 /home/cgspace.cgiar.org/log/dspace.log.2017-12-01 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l 822 </code></pre> <ul> <li>Appears to be some new bot:</li> </ul> <pre><code>2.86.122.76 - - [01/Dec/2017:09:02:53 +0000] "GET /handle/10568/78444?show=full HTTP/1.1" 200 29307 "-" "Mozilla/3.0 (compatible; Indy Library)" </code></pre> <ul> <li>I restarted Tomcat and everything came back up</li> <li>I can add Indy Library to the Tomcat crawler session manager valve but it would be nice if I could simply remap the useragent in nginx</li> <li>I will also add ‘Drupal’ to the Tomcat crawler session manager valve because there are Drupals out there harvesting and they should be considered as bots</li> </ul> <pre><code># cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E "1/Dec/2017" | grep Drupal | awk '{print $1}' | sort -n | uniq -c | sort -h | tail 3 54.75.205.145 6 70.32.83.92 14 2a01:7e00::f03c:91ff:fe18:7396 46 2001:4b99:1:1:216:3eff:fe2c:dc6c 319 2001:4b99:1:1:216:3eff:fe76:205b </code></pre> <h2 id="2017-12-03">2017-12-03</h2> <ul> <li>Linode alerted that CGSpace’s load was 327.5% from 6 to 8 AM again</li> </ul> <h2 id="2017-12-04">2017-12-04</h2> <ul> <li>Linode alerted that CGSpace’s load was 255.5% from 8 to 10 AM again</li> <li>I looked at the Munin stats again to see how the PostgreSQL tweaks from a few weeks ago were holding up:</li> </ul> <p><img src="/cgspace-notes/2017/12/postgres-connections-month.png" alt="PostgreSQL connections month" /></p> <ul> <li>The results look fantastic! So the <code>random_page_cost</code> tweak is massively important for informing the PostgreSQL scheduler that there is no “cost” to accessing random pages, as we’re on an SSD!</li> <li>I guess we could probably even reduce the PostgreSQL connections in DSpace / PostgreSQL after using this</li> </ul> </article> </div> <!-- /.blog-main --> <aside class="col-sm-3 ml-auto blog-sidebar"> <section class="sidebar-module"> <h4>Recent Posts</h4> <ol class="list-unstyled"> <li><a href="/cgspace-notes/2017-12/">December, 2017</a></li> <li><a href="/cgspace-notes/2017-11/">November, 2017</a></li> <li><a href="/cgspace-notes/2017-10/">October, 2017</a></li> <li><a href="/cgspace-notes/cgiar-library-migration/">CGIAR Library Migration</a></li> <li><a href="/cgspace-notes/2017-09/">September, 2017</a></li> </ol> </section> <section class="sidebar-module"> <h4>Links</h4> <ol class="list-unstyled"> <li><a href="https://cgspace.cgiar.org">CGSpace</a></li> <li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li> <li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li> </ol> </section> </aside> </div> <!-- /.row --> </div> <!-- /.container --> <footer class="blog-footer"> <p> Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>. </p> <p> <a href="#">Back to top</a> </p> </footer> </body> </html>