<!DOCTYPE html> <html lang="en" > <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"> <meta property="og:title" content="December, 2015" /> <meta property="og:description" content="2015-12-02 Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less space: # cd /home/dspacetest.cgiar.org/log # ls -lh dspace.log.2015-11-18* -rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18 -rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo -rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz " /> <meta property="og:type" content="article" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2015-12/" /> <meta property="article:published_time" content="2015-12-02T13:18:00+03:00" /> <meta property="article:modified_time" content="2018-03-09T22:10:33+02:00" /> <meta name="twitter:card" content="summary"/> <meta name="twitter:title" content="December, 2015"/> <meta name="twitter:description" content="2015-12-02 Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less space: # cd /home/dspacetest.cgiar.org/log # ls -lh dspace.log.2015-11-18* -rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18 -rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo -rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz "/> <meta name="generator" content="Hugo 0.119.0"> <script type="application/ld+json"> { "@context": "http://schema.org", "@type": "BlogPosting", "headline": "December, 2015", "url": "https://alanorth.github.io/cgspace-notes/2015-12/", "wordCount": "753", "datePublished": "2015-12-02T13:18:00+03:00", "dateModified": "2018-03-09T22:10:33+02:00", "author": { "@type": "Person", "name": "Alan Orth" }, "keywords": "Notes" } </script> <link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2015-12/"> <title>December, 2015 | CGSpace Notes</title> <!-- combined, minified CSS --> <link href="https://alanorth.github.io/cgspace-notes/css/style.c6ba80bc50669557645abe05f86b73cc5af84408ed20f1551a267bc19ece8228.css" rel="stylesheet" integrity="sha256-xrqAvFBmlVdkWr4F+GtzzFr4RAjtIPFVGiZ7wZ7Ogig=" crossorigin="anonymous"> <!-- minified Font Awesome for SVG icons --> <script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f5072c55a0721857184db93a50561d7dc13975b4de2e19db7f81eb5f3fa57270.js" integrity="sha256-9QcsVaByGFcYTbk6UFYdfcE5dbTeLhnbf4HrXz+lcnA=" crossorigin="anonymous"></script> <!-- RSS 2.0 feed --> </head> <body> <div class="blog-masthead"> <div class="container"> <nav class="nav blog-nav"> <a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a> </nav> </div> </div> <header class="blog-header"> <div class="container"> <h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1> <p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p> </div> </header> <div class="container"> <div class="row"> <div class="col-sm-8 blog-main"> <article class="blog-post"> <header> <h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2015-12/">December, 2015</a></h2> <p class="blog-post-meta"> <time datetime="2015-12-02T13:18:00+03:00">Wed Dec 02, 2015</time> in <span class="fas fa-tag" aria-hidden="true"></span> <a href="/tags/notes/" rel="tag">Notes</a> </p> </header> <h2 id="2015-12-02">2015-12-02</h2> <ul> <li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li> </ul> <pre tabindex="0"><code># cd /home/dspacetest.cgiar.org/log # ls -lh dspace.log.2015-11-18* -rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18 -rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo -rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz </code></pre><ul> <li>I had used lrzip once, but it needs more memory and is harder to use as it requires the lrztar wrapper</li> <li>Need to remember to go check if everything is ok in a few days and then change CGSpace</li> <li>CGSpace went down again (due to PostgreSQL idle connections of course)</li> <li>Current database settings for DSpace are <code>db.maxconnections = 30</code> and <code>db.maxidle = 8</code>, yet idle connections are exceeding this:</li> </ul> <pre tabindex="0"><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle 39 </code></pre><ul> <li>I restarted PostgreSQL and Tomcat and it’s back</li> <li>On a related note of why CGSpace is so slow, I decided to finally try the <code>pgtune</code> script to tune the postgres settings:</li> </ul> <pre tabindex="0"><code># apt-get install pgtune # pgtune -i /etc/postgresql/9.3/main/postgresql.conf -o postgresql.conf-pgtune # mv /etc/postgresql/9.3/main/postgresql.conf /etc/postgresql/9.3/main/postgresql.conf.orig # mv postgresql.conf-pgtune /etc/postgresql/9.3/main/postgresql.conf </code></pre><ul> <li>It introduced the following new settings:</li> </ul> <pre tabindex="0"><code>default_statistics_target = 50 maintenance_work_mem = 480MB constraint_exclusion = on checkpoint_completion_target = 0.9 effective_cache_size = 5632MB work_mem = 48MB wal_buffers = 8MB checkpoint_segments = 16 shared_buffers = 1920MB max_connections = 80 </code></pre><ul> <li>Now I need to go read PostgreSQL docs about these options, and watch memory settings in munin etc</li> <li>For what it’s worth, now the REST API should be faster (because of these PostgreSQL tweaks):</li> </ul> <pre tabindex="0"><code>$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all 1.474 $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all 2.141 $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all 1.685 $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all 1.995 $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all 1.786 </code></pre><ul> <li>Last week it was an average of 8 seconds… now this is 1/4 of that</li> <li>CCAFS noticed that one of their items displays only the Atmire statlets: <a href="https://cgspace.cgiar.org/handle/10568/42445">https://cgspace.cgiar.org/handle/10568/42445</a></li> </ul> <p><img src="/cgspace-notes/2015/12/ccafs-item-no-metadata.png" alt="CCAFS item"></p> <ul> <li>The authorizations for the item are all public READ, and I don’t see any errors in dspace.log when browsing that item</li> <li>I filed a ticket on Atmire’s issue tracker</li> <li>I also filed a ticket on Atmire’s issue tracker for the PostgreSQL stuff</li> </ul> <h2 id="2015-12-03">2015-12-03</h2> <ul> <li>CGSpace very slow, and monitoring emailing me to say its down, even though I can load the page (very slowly)</li> <li>Idle postgres connections look like this (with no change in DSpace db settings lately):</li> </ul> <pre tabindex="0"><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle 29 </code></pre><ul> <li>I restarted Tomcat and postgres…</li> <li>Atmire commented that we should raise the JVM heap size by ~500M, so it is now <code>-Xms3584m -Xmx3584m</code></li> <li>We weren’t out of heap yet, but it’s probably fair enough that the DSpace 5 upgrade (and new Atmire modules) requires more memory so it’s ok</li> <li>A possible side effect is that I see that the REST API is twice as fast for the request above now:</li> </ul> <pre tabindex="0"><code>$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all 1.368 $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all 0.968 $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all 1.006 $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all 0.849 $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all 0.806 $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all 0.854 </code></pre><h2 id="2015-12-05">2015-12-05</h2> <ul> <li>CGSpace has been up and down all day and REST API is completely unresponsive</li> <li>PostgreSQL idle connections are currently:</li> </ul> <pre tabindex="0"><code>postgres@linode01:~$ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle 28 </code></pre><ul> <li>I have reverted all the pgtune tweaks from the other day, as they didn’t fix the stability issues, so I’d rather not have them introducing more variables into the equation</li> <li>The PostgreSQL stats from Munin all point to something database-related with the DSpace 5 upgrade around mid–late November</li> </ul> <p><img src="/cgspace-notes/2015/12/postgres_bgwriter-year.png" alt="PostgreSQL bgwriter (year)"> <img src="/cgspace-notes/2015/12/postgres_cache_cgspace-year.png" alt="PostgreSQL cache (year)"> <img src="/cgspace-notes/2015/12/postgres_locks_cgspace-year.png" alt="PostgreSQL locks (year)"> <img src="/cgspace-notes/2015/12/postgres_scans_cgspace-year.png" alt="PostgreSQL scans (year)"></p> <h2 id="2015-12-07">2015-12-07</h2> <ul> <li>Atmire sent <a href="https://github.com/ilri/DSpace/pull/161">some fixes</a> to DSpace’s REST API code that was leaving contexts open (causing the slow performance and database issues)</li> <li>After deploying the fix to CGSpace the REST API is consistently faster:</li> </ul> <pre tabindex="0"><code>$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all 0.675 $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all 0.599 $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all 0.588 $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all 0.566 $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all 0.497 </code></pre><h2 id="2015-12-08">2015-12-08</h2> <ul> <li>Switch CGSpace log compression cron jobs from using lzop to xz—the compression isn’t as good, but it’s much faster and causes less IO/CPU load</li> <li>Since we figured out (and fixed) the cause of the performance issue, I reverted Google Bot’s crawl rate to the “Let Google optimize” setting</li> </ul> </article> </div> <!-- /.blog-main --> <aside class="col-sm-3 ml-auto blog-sidebar"> <section class="sidebar-module"> <h4>Recent Posts</h4> <ol class="list-unstyled"> <li><a href="/cgspace-notes/2023-10/">October, 2023</a></li> <li><a href="/cgspace-notes/2023-09/">September, 2023</a></li> <li><a href="/cgspace-notes/2023-08/">August, 2023</a></li> <li><a href="/cgspace-notes/2023-07/">July, 2023</a></li> <li><a href="/cgspace-notes/2023-06/">June, 2023</a></li> </ol> </section> <section class="sidebar-module"> <h4>Links</h4> <ol class="list-unstyled"> <li><a href="https://cgspace.cgiar.org">CGSpace</a></li> <li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li> <li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li> </ol> </section> </aside> </div> <!-- /.row --> </div> <!-- /.container --> <footer class="blog-footer"> <p dir="auto"> Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>. </p> <p> <a href="#">Back to top</a> </p> </footer> </body> </html>