2018-02-11 17:28:23 +01:00
<!DOCTYPE html>
< html lang = "en" >
< head >
< meta charset = "utf-8" >
< meta name = "viewport" content = "width=device-width, initial-scale=1, shrink-to-fit=no" >
< meta property = "og:title" content = "December, 2015" / >
< meta property = "og:description" content = "2015-12-02
Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less space:
# cd /home/dspacetest.cgiar.org/log
# ls -lh dspace.log.2015-11-18*
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
" />
< meta property = "og:type" content = "article" / >
< meta property = "og:url" content = "https://alanorth.github.io/cgspace-notes/2015-12/" / >
< meta property = "article:published_time" content = "2015-12-02T13:18:00+03:00" / >
2018-03-09 21:16:20 +01:00
< meta property = "article:modified_time" content = "2018-03-09T22:10:33+02:00" / >
2018-02-11 17:28:23 +01:00
< meta name = "twitter:card" content = "summary" / >
< meta name = "twitter:title" content = "December, 2015" / >
< meta name = "twitter:description" content = "2015-12-02
Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less space:
# cd /home/dspacetest.cgiar.org/log
# ls -lh dspace.log.2015-11-18*
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
"/>
2018-03-08 14:05:29 +01:00
< meta name = "generator" content = "Hugo 0.37.1" / >
2018-02-11 17:28:23 +01:00
< script type = "application/ld+json" >
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "December, 2015",
"url": "https://alanorth.github.io/cgspace-notes/2015-12/",
"wordCount": "753",
"datePublished": "2015-12-02T13:18:00+ 03:00",
2018-03-09 21:16:20 +01:00
"dateModified": "2018-03-09T22:10:33+ 02:00",
2018-02-11 17:28:23 +01:00
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"keywords": "Notes"
}
< / script >
< link rel = "canonical" href = "https://alanorth.github.io/cgspace-notes/2015-12/" >
< title > December, 2015 | CGSpace Notes< / title >
<!-- combined, minified CSS -->
2018-02-27 17:50:30 +01:00
< link href = "https://alanorth.github.io/cgspace-notes/css/style.css" rel = "stylesheet" integrity = "sha384-CoMzlF7G4xk3ftqRr7leobnWP85AuISUJljMFjtTG/UHyP/+bBwWAvBlXkB4VQQk" crossorigin = "anonymous" >
2018-02-11 17:28:23 +01:00
< / head >
< body >
< div class = "blog-masthead" >
< div class = "container" >
< nav class = "nav blog-nav" >
< a class = "nav-link " href = "https://alanorth.github.io/cgspace-notes/" > Home< / a >
< / nav >
< / div >
< / div >
< header class = "blog-header" >
< div class = "container" >
< h1 class = "blog-title" > < a href = "https://alanorth.github.io/cgspace-notes/" rel = "home" > CGSpace Notes< / a > < / h1 >
< p class = "lead blog-description" > Documenting day-to-day work on the < a href = "https://cgspace.cgiar.org" > CGSpace< / a > repository.< / p >
< / div >
< / header >
< div class = "container" >
< div class = "row" >
< div class = "col-sm-8 blog-main" >
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" > < a href = "https://alanorth.github.io/cgspace-notes/2015-12/" > December, 2015< / a > < / h2 >
< p class = "blog-post-meta" > < time datetime = "2015-12-02T13:18:00+03:00" > Wed Dec 02, 2015< / time > by Alan Orth in
< i class = "fa fa-tag" aria-hidden = "true" > < / i > < a href = "/cgspace-notes/tags/notes" rel = "tag" > Notes< / a >
< / p >
< / header >
< h2 id = "2015-12-02" > 2015-12-02< / h2 >
< ul >
< li > Replace < code > lzop< / code > with < code > xz< / code > in log compression cron jobs on DSpace Test—it uses less space:< / li >
< / ul >
< pre > < code > # cd /home/dspacetest.cgiar.org/log
# ls -lh dspace.log.2015-11-18*
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
< / code > < / pre >
< p > < / p >
< ul >
< li > I had used lrzip once, but it needs more memory and is harder to use as it requires the lrztar wrapper< / li >
< li > Need to remember to go check if everything is ok in a few days and then change CGSpace< / li >
< li > CGSpace went down again (due to PostgreSQL idle connections of course)< / li >
< li > Current database settings for DSpace are < code > db.maxconnections = 30< / code > and < code > db.maxidle = 8< / code > , yet idle connections are exceeding this:< / li >
< / ul >
< pre > < code > $ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle
39
< / code > < / pre >
< ul >
< li > I restarted PostgreSQL and Tomcat and it’ s back< / li >
< li > On a related note of why CGSpace is so slow, I decided to finally try the < code > pgtune< / code > script to tune the postgres settings:< / li >
< / ul >
< pre > < code > # apt-get install pgtune
# pgtune -i /etc/postgresql/9.3/main/postgresql.conf -o postgresql.conf-pgtune
# mv /etc/postgresql/9.3/main/postgresql.conf /etc/postgresql/9.3/main/postgresql.conf.orig
# mv postgresql.conf-pgtune /etc/postgresql/9.3/main/postgresql.conf
< / code > < / pre >
< ul >
< li > It introduced the following new settings:< / li >
< / ul >
< pre > < code > default_statistics_target = 50
maintenance_work_mem = 480MB
constraint_exclusion = on
checkpoint_completion_target = 0.9
effective_cache_size = 5632MB
work_mem = 48MB
wal_buffers = 8MB
checkpoint_segments = 16
shared_buffers = 1920MB
max_connections = 80
< / code > < / pre >
< ul >
< li > Now I need to go read PostgreSQL docs about these options, and watch memory settings in munin etc< / li >
< li > For what it’ s worth, now the REST API should be faster (because of these PostgreSQL tweaks):< / li >
< / ul >
< pre > < code > $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
1.474
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
2.141
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
1.685
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
1.995
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
1.786
< / code > < / pre >
< ul >
< li > Last week it was an average of 8 seconds… now this is < sup > 1< / sup > ⁄ < sub > 4< / sub > of that< / li >
< li > CCAFS noticed that one of their items displays only the Atmire statlets: < a href = "https://cgspace.cgiar.org/handle/10568/42445" > https://cgspace.cgiar.org/handle/10568/42445< / a > < / li >
< / ul >
< p > < img src = "/cgspace-notes/2015/12/ccafs-item-no-metadata.png" alt = "CCAFS item" / > < / p >
< ul >
< li > The authorizations for the item are all public READ, and I don’ t see any errors in dspace.log when browsing that item< / li >
< li > I filed a ticket on Atmire’ s issue tracker< / li >
< li > I also filed a ticket on Atmire’ s issue tracker for the PostgreSQL stuff< / li >
< / ul >
< h2 id = "2015-12-03" > 2015-12-03< / h2 >
< ul >
< li > CGSpace very slow, and monitoring emailing me to say its down, even though I can load the page (very slowly)< / li >
< li > Idle postgres connections look like this (with no change in DSpace db settings lately):< / li >
< / ul >
< pre > < code > $ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle
29
< / code > < / pre >
< ul >
< li > I restarted Tomcat and postgres… < / li >
< li > Atmire commented that we should raise the JVM heap size by ~500M, so it is now < code > -Xms3584m -Xmx3584m< / code > < / li >
< li > We weren’ t out of heap yet, but it’ s probably fair enough that the DSpace 5 upgrade (and new Atmire modules) requires more memory so it’ s ok< / li >
< li > A possible side effect is that I see that the REST API is twice as fast for the request above now:< / li >
< / ul >
< pre > < code > $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
1.368
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.968
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
1.006
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.849
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.806
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.854
< / code > < / pre >
< h2 id = "2015-12-05" > 2015-12-05< / h2 >
< ul >
< li > CGSpace has been up and down all day and REST API is completely unresponsive< / li >
< li > PostgreSQL idle connections are currently:< / li >
< / ul >
< pre > < code > postgres@linode01:~$ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle
28
< / code > < / pre >
< ul >
< li > I have reverted all the pgtune tweaks from the other day, as they didn’ t fix the stability issues, so I’ d rather not have them introducing more variables into the equation< / li >
< li > The PostgreSQL stats from Munin all point to something database-related with the DSpace 5 upgrade around mid– late November< / li >
< / ul >
< p > < img src = "/cgspace-notes/2015/12/postgres_bgwriter-year.png" alt = "PostgreSQL bgwriter (year)" / >
< img src = "/cgspace-notes/2015/12/postgres_cache_cgspace-year.png" alt = "PostgreSQL cache (year)" / >
< img src = "/cgspace-notes/2015/12/postgres_locks_cgspace-year.png" alt = "PostgreSQL locks (year)" / >
< img src = "/cgspace-notes/2015/12/postgres_scans_cgspace-year.png" alt = "PostgreSQL scans (year)" / > < / p >
< h2 id = "2015-12-07" > 2015-12-07< / h2 >
< ul >
< li > Atmire sent < a href = "https://github.com/ilri/DSpace/pull/161" > some fixes< / a > to DSpace’ s REST API code that was leaving contexts open (causing the slow performance and database issues)< / li >
< li > After deploying the fix to CGSpace the REST API is consistently faster:< / li >
< / ul >
< pre > < code > $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.675
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.599
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.588
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.566
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.497
< / code > < / pre >
< h2 id = "2015-12-08" > 2015-12-08< / h2 >
< ul >
< li > Switch CGSpace log compression cron jobs from using lzop to xz—the compression isn’ t as good, but it’ s much faster and causes less IO/CPU load< / li >
< li > Since we figured out (and fixed) the cause of the performance issue, I reverted Google Bot’ s crawl rate to the “ Let Google optimize” setting< / li >
< / ul >
< / article >
< / div > <!-- /.blog - main -->
< aside class = "col-sm-3 ml-auto blog-sidebar" >
< section class = "sidebar-module" >
< h4 > Recent Posts< / h4 >
< ol class = "list-unstyled" >
2018-03-02 15:09:18 +01:00
< li > < a href = "/cgspace-notes/2018-03/" > March, 2018< / a > < / li >
2018-02-11 17:28:23 +01:00
< li > < a href = "/cgspace-notes/2018-02/" > February, 2018< / a > < / li >
< li > < a href = "/cgspace-notes/2018-01/" > January, 2018< / a > < / li >
< li > < a href = "/cgspace-notes/2017-12/" > December, 2017< / a > < / li >
< li > < a href = "/cgspace-notes/2017-11/" > November, 2017< / a > < / li >
< / ol >
< / section >
< section class = "sidebar-module" >
< h4 > Links< / h4 >
< ol class = "list-unstyled" >
< li > < a href = "https://cgspace.cgiar.org" > CGSpace< / a > < / li >
< li > < a href = "https://dspacetest.cgiar.org" > DSpace Test< / a > < / li >
< li > < a href = "https://github.com/ilri/DSpace" > CGSpace @ GitHub< / a > < / li >
< / ol >
< / section >
< / aside >
< / div > <!-- /.row -->
< / div > <!-- /.container -->
< footer class = "blog-footer" >
< p >
Blog template created by < a href = "https://twitter.com/mdo" > @mdo< / a > , ported to Hugo by < a href = 'https://twitter.com/mralanorth' > @mralanorth< / a > .
< / p >
< p >
< a href = "#" > Back to top< / a >
< / p >
< / footer >
< / body >
< / html >