2015-12-02 13:25:34 +02:00
<!DOCTYPE html>
2016-09-21 15:24:28 +03:00
< html lang = "en" >
2016-11-24 15:17:06 +02:00
< head >
< meta charset = "utf-8" >
< meta name = "viewport" content = "width=device-width, initial-scale=1, shrink-to-fit=no" >
2016-09-21 15:24:28 +03:00
2016-11-24 15:17:06 +02:00
< meta property = "og:title" content = "December, 2015" / >
2016-11-14 09:27:03 +02:00
< meta property = "og:description" content = "2015-12-02
Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less space:
# cd /home/dspacetest.cgiar.org/log
# ls -lh dspace.log.2015-11-18*
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
" />
< meta property = "og:type" content = "article" / >
< meta property = "og:url" content = "https://alanorth.github.io/cgspace-notes/2015-12/" / >
2017-01-27 13:03:13 +02:00
< meta property = "article:published_time" content = "2015-12-02T13:18:00+03:00" / >
2017-04-03 13:16:24 +03:00
< meta property = "article:modified_time" content = "2017-01-09T16:18:07+02:00" / >
2017-01-27 13:03:13 +02:00
2016-11-14 09:27:03 +02:00
2016-09-21 15:24:28 +03:00
2016-11-24 15:17:06 +02:00
2016-11-14 09:27:03 +02:00
< meta name = "twitter:card" content = "summary" / >
2017-02-27 21:30:13 +02:00
< meta name = "twitter:text:title" content = "December, 2015" / >
2016-11-14 09:27:03 +02:00
< meta name = "twitter:title" content = "December, 2015" / >
< meta name = "twitter:description" content = "2015-12-02
Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less space:
# cd /home/dspacetest.cgiar.org/log
# ls -lh dspace.log.2015-11-18*
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
"/>
2017-04-17 14:34:59 +03:00
< meta name = "generator" content = "Hugo 0.20.2" / >
2016-11-14 09:27:03 +02:00
2016-09-21 15:24:28 +03:00
2017-01-22 11:11:46 +02:00
< script type = "application/ld+json" >
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "December, 2015",
"url": "https://alanorth.github.io/cgspace-notes/2015-12/",
"wordCount": "753",
2017-02-27 21:30:13 +02:00
"datePublished": "2015-12-02T13:18:00+ 03:00",
2017-04-03 13:16:24 +03:00
"dateModified": "2017-01-09T16:18:07+ 02:00",
2017-01-22 11:11:46 +02:00
"author": {
"@type": "Person",
"name": "Alan Orth"
2017-03-30 14:51:57 +03:00
},
2017-01-22 11:11:46 +02:00
"keywords": "Notes"
}
< / script >
2016-10-14 17:13:52 -04:00
2016-09-21 15:24:28 +03:00
< link rel = "canonical" href = "https://alanorth.github.io/cgspace-notes/2015-12/" >
< title > December, 2015 | CGSpace Notes< / title >
<!-- combined, minified CSS -->
2017-04-11 20:46:03 +03:00
< link href = "https://alanorth.github.io/cgspace-notes/css/style.css" rel = "stylesheet" integrity = "sha384-CBHEXFKdMsTRFhEu0HSP9oETZoVpnz1mozAPqhfpxMQkda7lNJlqsQdYB30287Ka" crossorigin = "anonymous" >
2016-09-21 15:24:28 +03:00
2016-11-14 09:27:03 +02:00
2016-11-24 15:17:06 +02:00
2016-09-21 15:24:28 +03:00
< / head >
2016-11-24 15:17:06 +02:00
< body >
2016-09-21 15:24:28 +03:00
2016-11-24 15:17:06 +02:00
< div class = "blog-masthead" >
< div class = "container" >
< nav class = "nav blog-nav" >
< a class = "nav-link " href = "https://alanorth.github.io/cgspace-notes/" > Home< / a >
2016-11-17 15:59:59 +02:00
2016-11-24 15:17:06 +02:00
< / nav >
< / div >
2016-09-21 15:24:28 +03:00
< / div >
2016-11-24 15:17:06 +02:00
< header class = "blog-header" >
< div class = "container" >
< h1 class = "blog-title" > < a href = "https://alanorth.github.io/cgspace-notes/" rel = "home" > CGSpace Notes< / a > < / h1 >
2017-01-22 11:11:46 +02:00
< p class = "lead blog-description" > Documenting day-to-day work on the < a href = "https://cgspace.cgiar.org" > CGSpace< / a > repository.< / p >
2016-11-24 15:17:06 +02:00
< / div >
< / header >
2016-09-21 15:24:28 +03:00
2016-11-24 15:17:06 +02:00
< div class = "container" >
< div class = "row" >
< div class = "col-sm-8 blog-main" >
2016-09-21 15:24:28 +03:00
2016-11-24 15:17:06 +02:00
2016-11-14 09:27:03 +02:00
2016-11-24 15:17:06 +02:00
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" > < a href = "https://alanorth.github.io/cgspace-notes/2015-12/" > December, 2015< / a > < / h2 >
< p class = "blog-post-meta" > < time datetime = "2015-12-02T13:18:00+03:00" > Wed Dec 02, 2015< / time > by Alan Orth in
2016-09-27 23:54:30 +03:00
2016-11-24 15:17:06 +02:00
< i class = "fa fa-tag" aria-hidden = "true" > < / i > < a href = "/cgspace-notes/tags/notes" rel = "tag" > Notes< / a >
2016-09-27 23:54:30 +03:00
< / p >
2016-11-24 15:17:06 +02:00
< / header >
< h2 id = "2015-12-02" > 2015-12-02< / h2 >
2015-12-02 13:25:34 +02:00
< ul >
< li > Replace < code > lzop< / code > with < code > xz< / code > in log compression cron jobs on DSpace Test—it uses less space:< / li >
< / ul >
< pre > < code > # cd /home/dspacetest.cgiar.org/log
# ls -lh dspace.log.2015-11-18*
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
< / code > < / pre >
2016-10-03 18:28:33 +03:00
< p > < / p >
2015-12-02 13:25:34 +02:00
< ul >
< li > I had used lrzip once, but it needs more memory and is harder to use as it requires the lrztar wrapper< / li >
< li > Need to remember to go check if everything is ok in a few days and then change CGSpace< / li >
2015-12-02 19:16:44 +02:00
< li > CGSpace went down again (due to PostgreSQL idle connections of course)< / li >
< li > Current database settings for DSpace are < code > db.maxconnections = 30< / code > and < code > db.maxidle = 8< / code > , yet idle connections are exceeding this:< / li >
< / ul >
< pre > < code > $ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle
39
< / code > < / pre >
< ul >
< li > I restarted PostgreSQL and Tomcat and it’ s back< / li >
< li > On a related note of why CGSpace is so slow, I decided to finally try the < code > pgtune< / code > script to tune the postgres settings:< / li >
< / ul >
< pre > < code > # apt-get install pgtune
# pgtune -i /etc/postgresql/9.3/main/postgresql.conf -o postgresql.conf-pgtune
# mv /etc/postgresql/9.3/main/postgresql.conf /etc/postgresql/9.3/main/postgresql.conf.orig
# mv postgresql.conf-pgtune /etc/postgresql/9.3/main/postgresql.conf
< / code > < / pre >
< ul >
< li > It introduced the following new settings:< / li >
< / ul >
< pre > < code > default_statistics_target = 50
maintenance_work_mem = 480MB
constraint_exclusion = on
checkpoint_completion_target = 0.9
effective_cache_size = 5632MB
work_mem = 48MB
wal_buffers = 8MB
checkpoint_segments = 16
shared_buffers = 1920MB
max_connections = 80
< / code > < / pre >
< ul >
< li > Now I need to go read PostgreSQL docs about these options, and watch memory settings in munin etc< / li >
2015-12-02 21:11:28 +02:00
< li > For what it’ s worth, now the REST API should be faster (because of these PostgreSQL tweaks):< / li >
< / ul >
< pre > < code > $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
1.474
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
2.141
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
1.685
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
1.995
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
1.786
< / code > < / pre >
< ul >
< li > Last week it was an average of 8 seconds… now this is < sup > 1< / sup > ⁄ < sub > 4< / sub > of that< / li >
2015-12-02 22:19:56 +02:00
< li > CCAFS noticed that one of their items displays only the Atmire statlets: < a href = "https://cgspace.cgiar.org/handle/10568/42445" > https://cgspace.cgiar.org/handle/10568/42445< / a > < / li >
< / ul >
2017-01-09 16:20:52 +02:00
< p > < img src = "/cgspace-notes/2015/12/ccafs-item-no-metadata.png" alt = "CCAFS item" / > < / p >
2015-12-02 22:19:56 +02:00
< ul >
< li > The authorizations for the item are all public READ, and I don’ t see any errors in dspace.log when browsing that item< / li >
< li > I filed a ticket on Atmire’ s issue tracker< / li >
< li > I also filed a ticket on Atmire’ s issue tracker for the PostgreSQL stuff< / li >
2015-12-03 11:08:14 +02:00
< / ul >
2016-08-03 10:09:36 +03:00
< h2 id = "2015-12-03" > 2015-12-03< / h2 >
2015-12-03 11:08:14 +02:00
< ul >
< li > CGSpace very slow, and monitoring emailing me to say its down, even though I can load the page (very slowly)< / li >
< li > Idle postgres connections look like this (with no change in DSpace db settings lately):< / li >
< / ul >
< pre > < code > $ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle
29
< / code > < / pre >
< ul >
< li > I restarted Tomcat and postgres… < / li >
2015-12-04 00:08:49 +02:00
< li > Atmire commented that we should raise the JVM heap size by ~500M, so it is now < code > -Xms3584m -Xmx3584m< / code > < / li >
< li > We weren’ t out of heap yet, but it’ s probably fair enough that the DSpace 5 upgrade (and new Atmire modules) requires more memory so it’ s ok< / li >
2015-12-04 00:09:41 +02:00
< li > A possible side effect is that I see that the REST API is twice as fast for the request above now:< / li >
2015-12-02 13:25:34 +02:00
< / ul >
2015-12-04 00:09:41 +02:00
< pre > < code > $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
1.368
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.968
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
1.006
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.849
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.806
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.854
< / code > < / pre >
2016-08-03 10:09:36 +03:00
< h2 id = "2015-12-05" > 2015-12-05< / h2 >
2015-12-05 17:42:56 +02:00
< ul >
< li > CGSpace has been up and down all day and REST API is completely unresponsive< / li >
< li > PostgreSQL idle connections are currently:< / li >
< / ul >
< pre > < code > postgres@linode01:~$ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle
28
< / code > < / pre >
< ul >
< li > I have reverted all the pgtune tweaks from the other day, as they didn’ t fix the stability issues, so I’ d rather not have them introducing more variables into the equation< / li >
< li > The PostgreSQL stats from Munin all point to something database-related with the DSpace 5 upgrade around mid– late November< / li >
< / ul >
2017-01-09 16:20:52 +02:00
< p > < img src = "/cgspace-notes/2015/12/postgres_bgwriter-year.png" alt = "PostgreSQL bgwriter (year)" / >
< img src = "/cgspace-notes/2015/12/postgres_cache_cgspace-year.png" alt = "PostgreSQL cache (year)" / >
< img src = "/cgspace-notes/2015/12/postgres_locks_cgspace-year.png" alt = "PostgreSQL locks (year)" / >
< img src = "/cgspace-notes/2015/12/postgres_scans_cgspace-year.png" alt = "PostgreSQL scans (year)" / > < / p >
2015-12-05 17:42:56 +02:00
2016-08-03 10:09:36 +03:00
< h2 id = "2015-12-07" > 2015-12-07< / h2 >
2015-12-07 19:10:54 +02:00
< ul >
< li > Atmire sent < a href = "https://github.com/ilri/DSpace/pull/161" > some fixes< / a > to DSpace’ s REST API code that was leaving contexts open (causing the slow performance and database issues)< / li >
< li > After deploying the fix to CGSpace the REST API is consistently faster:< / li >
< / ul >
< pre > < code > $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.675
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.599
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.588
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.566
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.497
< / code > < / pre >
2016-08-03 10:09:36 +03:00
< h2 id = "2015-12-08" > 2015-12-08< / h2 >
2015-12-08 12:01:25 +02:00
< ul >
< li > Switch CGSpace log compression cron jobs from using lzop to xz—the compression isn’ t as good, but it’ s much faster and causes less IO/CPU load< / li >
2015-12-08 21:49:06 +02:00
< li > Since we figured out (and fixed) the cause of the performance issue, I reverted Google Bot’ s crawl rate to the “ Let Google optimize” setting< / li >
2015-12-08 12:01:25 +02:00
< / ul >
2016-11-14 09:27:03 +02:00
2016-11-24 15:17:06 +02:00
2017-01-08 17:08:08 +02:00
2016-11-24 15:17:06 +02:00
2017-01-08 17:08:08 +02:00
< / article >
2016-11-14 09:27:03 +02:00
2016-09-21 15:24:28 +03:00
2016-11-24 15:17:06 +02:00
< / div > <!-- /.blog - main -->
2016-09-21 15:24:28 +03:00
2016-11-24 15:17:06 +02:00
< aside class = "col-sm-3 offset-sm-1 blog-sidebar" >
2016-09-21 15:24:28 +03:00
2017-03-12 13:41:42 +02:00
< section class = "sidebar-module" >
2016-09-21 15:24:28 +03:00
< h4 > Recent Posts< / h4 >
< ol class = "list-unstyled" >
2017-03-12 13:41:42 +02:00
2017-04-02 17:57:26 +03:00
< li > < a href = "/cgspace-notes/2017-04/" > April, 2017< / a > < / li >
2017-03-12 13:41:42 +02:00
< li > < a href = "/cgspace-notes/2017-03/" > March, 2017< / a > < / li >
< li > < a href = "/cgspace-notes/2017-02/" > February, 2017< / a > < / li >
< li > < a href = "/cgspace-notes/2017-01/" > January, 2017< / a > < / li >
< li > < a href = "/cgspace-notes/2016-12/" > December, 2016< / a > < / li >
2016-09-21 15:24:28 +03:00
< / ol >
< / section >
2016-02-08 08:59:05 +02:00
2017-01-09 16:20:52 +02:00
2016-09-21 15:24:28 +03:00
< section class = "sidebar-module" >
< h4 > Links< / h4 >
< ol class = "list-unstyled" >
< li > < a href = "https://cgspace.cgiar.org" > CGSpace< / a > < / li >
< li > < a href = "https://dspacetest.cgiar.org" > DSpace Test< / a > < / li >
< li > < a href = "https://github.com/ilri/DSpace" > CGSpace @ GitHub< / a > < / li >
< / ol >
< / section >
2016-02-08 08:59:05 +02:00
2016-09-21 15:24:28 +03:00
< / aside >
2015-12-02 13:25:34 +02:00
2016-11-24 15:17:06 +02:00
< / div > <!-- /.row -->
< / div > <!-- /.container -->
2016-09-21 15:24:28 +03:00
2016-11-24 15:17:06 +02:00
< footer class = "blog-footer" >
< p >
2016-10-14 17:13:52 -04:00
Blog template created by < a href = "https://twitter.com/mdo" > @mdo< / a > , ported to Hugo by < a href = 'https://twitter.com/mralanorth' > @mralanorth< / a > .
2016-11-24 15:17:06 +02:00
< / p >
< p >
2017-01-05 15:44:45 +02:00
< a href = "#" > Back to top< / a >
2016-11-24 15:17:06 +02:00
< / p >
< / footer >
2015-12-02 13:25:34 +02:00
2016-11-24 15:17:06 +02:00
< / body >
2015-12-02 13:25:34 +02:00
2016-09-21 15:24:28 +03:00
< / html >