cgspace-notes/public/index.xml

<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>CGSpace Notes</title>
    <link>/cgspace-notes/</link>
    <description>Recent content on CGSpace Notes</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Wed, 02 Dec 2015 13:18:00 +0300</lastBuildDate>
    <atom:link href="/cgspace-notes/index.xml" rel="self" type="application/rss+xml" />
    
    <item>
      <title>December, 2015</title>
      <link>/cgspace-notes/2015-12/</link>
      <pubDate>Wed, 02 Dec 2015 13:18:00 +0300</pubDate>
      
      <guid>/cgspace-notes/2015-12/</guid>
      <description>

&lt;h2 id=&#34;2015-12-02:012a628feed6d64ae1151cbd6151ccd6&#34;&gt;2015-12-02&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Replace &lt;code&gt;lzop&lt;/code&gt; with &lt;code&gt;xz&lt;/code&gt; in log compression cron jobs on DSpace Test—it uses less space:&lt;/li&gt;
&lt;/ul&gt;

&lt;pre&gt;&lt;code&gt;# cd /home/dspacetest.cgiar.org/log
# ls -lh dspace.log.2015-11-18*
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
&lt;/code&gt;&lt;/pre&gt;

&lt;ul&gt;
&lt;li&gt;I had used lrzip once, but it needs more memory and is harder to use as it requires the lrztar wrapper&lt;/li&gt;
&lt;li&gt;Need to remember to go check if everything is ok in a few days and then change CGSpace&lt;/li&gt;
&lt;li&gt;CGSpace went down again (due to PostgreSQL idle connections of course)&lt;/li&gt;
&lt;li&gt;Current database settings for DSpace are &lt;code&gt;db.maxconnections = 30&lt;/code&gt; and &lt;code&gt;db.maxidle = 8&lt;/code&gt;, yet idle connections are exceeding this:&lt;/li&gt;
&lt;/ul&gt;

&lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep cgspace | grep -c idle
39
&lt;/code&gt;&lt;/pre&gt;

&lt;ul&gt;
&lt;li&gt;I restarted PostgreSQL and Tomcat and it&amp;rsquo;s back&lt;/li&gt;
&lt;li&gt;On a related note of why CGSpace is so slow, I decided to finally try the &lt;code&gt;pgtune&lt;/code&gt; script to tune the postgres settings:&lt;/li&gt;
&lt;/ul&gt;

&lt;pre&gt;&lt;code&gt;# apt-get install pgtune
# pgtune -i /etc/postgresql/9.3/main/postgresql.conf -o postgresql.conf-pgtune
# mv /etc/postgresql/9.3/main/postgresql.conf /etc/postgresql/9.3/main/postgresql.conf.orig 
# mv postgresql.conf-pgtune /etc/postgresql/9.3/main/postgresql.conf
&lt;/code&gt;&lt;/pre&gt;

&lt;ul&gt;
&lt;li&gt;It introduced the following new settings:&lt;/li&gt;
&lt;/ul&gt;

&lt;pre&gt;&lt;code&gt;default_statistics_target = 50
maintenance_work_mem = 480MB
constraint_exclusion = on
checkpoint_completion_target = 0.9
effective_cache_size = 5632MB
work_mem = 48MB
wal_buffers = 8MB
checkpoint_segments = 16
shared_buffers = 1920MB
max_connections = 80
&lt;/code&gt;&lt;/pre&gt;

&lt;ul&gt;
&lt;li&gt;Now I need to go read PostgreSQL docs about these options, and watch memory settings in munin etc&lt;/li&gt;
&lt;li&gt;For what it&amp;rsquo;s worth, now the REST API should be faster (because of these PostgreSQL tweaks):&lt;/li&gt;
&lt;/ul&gt;

&lt;pre&gt;&lt;code&gt;$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
1.474
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
2.141
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
1.685
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
1.995
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
1.786
&lt;/code&gt;&lt;/pre&gt;

&lt;ul&gt;
&lt;li&gt;Last week it was an average of 8 seconds&amp;hellip; now this is &lt;sup&gt;1&lt;/sup&gt;&amp;frasl;&lt;sub&gt;4&lt;/sub&gt; of that&lt;/li&gt;
&lt;li&gt;CCAFS noticed that one of their items displays only the Atmire statlets: &lt;a href=&#34;https://cgspace.cgiar.org/handle/10568/42445&#34;&gt;https://cgspace.cgiar.org/handle/10568/42445&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&#34;../images/ccafs-item-no-metadata.png&#34; alt=&#34;CCAFS item&#34; /&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The authorizations for the item are all public READ, and I don&amp;rsquo;t see any errors in dspace.log when browsing that item&lt;/li&gt;
&lt;li&gt;I filed a ticket on Atmire&amp;rsquo;s issue tracker&lt;/li&gt;
&lt;li&gt;I also filed a ticket on Atmire&amp;rsquo;s issue tracker for the PostgreSQL stuff&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;2015-12-03:012a628feed6d64ae1151cbd6151ccd6&#34;&gt;2015-12-03&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;CGSpace very slow, and monitoring emailing me to say its down, even though I can load the page (very slowly)&lt;/li&gt;
&lt;li&gt;Idle postgres connections look like this (with no change in DSpace db settings lately):&lt;/li&gt;
&lt;/ul&gt;

&lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep cgspace | grep -c idle
29
&lt;/code&gt;&lt;/pre&gt;

&lt;ul&gt;
&lt;li&gt;I restarted Tomcat and postgres&amp;hellip;&lt;/li&gt;
&lt;li&gt;Atmire commented that we should raise the JVM heap size by ~500M, so it is now &lt;code&gt;-Xms3584m -Xmx3584m&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;We weren&amp;rsquo;t out of heap yet, but it&amp;rsquo;s probably fair enough that the DSpace 5 upgrade (and new Atmire modules) requires more memory so it&amp;rsquo;s ok&lt;/li&gt;
&lt;li&gt;A possible side effect is that I see that the REST API is twice as fast for the request above now:&lt;/li&gt;
&lt;/ul&gt;

&lt;pre&gt;&lt;code&gt;$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
1.368
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.968
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
1.006
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.849
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.806
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.854
&lt;/code&gt;&lt;/pre&gt;

&lt;h2 id=&#34;2015-12-05:012a628feed6d64ae1151cbd6151ccd6&#34;&gt;2015-12-05&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;CGSpace has been up and down all day and REST API is completely unresponsive&lt;/li&gt;
&lt;li&gt;PostgreSQL idle connections are currently:&lt;/li&gt;
&lt;/ul&gt;

&lt;pre&gt;&lt;code&gt;postgres@linode01:~$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep cgspace | grep -c idle
28
&lt;/code&gt;&lt;/pre&gt;

&lt;ul&gt;
&lt;li&gt;I have reverted all the pgtune tweaks from the other day, as they didn&amp;rsquo;t fix the stability issues, so I&amp;rsquo;d rather not have them introducing more variables into the equation&lt;/li&gt;
&lt;li&gt;The PostgreSQL stats from Munin all point to something database-related with the DSpace 5 upgrade around mid–late November&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&#34;../images/2015/12/postgres_bgwriter-year.png&#34; alt=&#34;PostgreSQL bgwriter (year)&#34; /&gt;
&lt;img src=&#34;../images/2015/12/postgres_cache_cgspace-year.png&#34; alt=&#34;PostgreSQL cache (year)&#34; /&gt;
&lt;img src=&#34;../images/2015/12/postgres_locks_cgspace-year.png&#34; alt=&#34;PostgreSQL locks (year)&#34; /&gt;
&lt;img src=&#34;../images/2015/12/postgres_scans_cgspace-year.png&#34; alt=&#34;PostgreSQL scans *year)&#34; /&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>November, 2015</title>
      <link>/cgspace-notes/2015-11/</link>
      <pubDate>Mon, 23 Nov 2015 17:00:57 +0300</pubDate>
      
      <guid>/cgspace-notes/2015-11/</guid>
      <description>

&lt;h2 id=&#34;2015-11-22:3d03b850f8126f80d8144c2e17ea0ae7&#34;&gt;2015-11-22&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;CGSpace went down&lt;/li&gt;
&lt;li&gt;Looks like DSpace exhausted its PostgreSQL connection pool&lt;/li&gt;
&lt;li&gt;Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:&lt;/li&gt;
&lt;/ul&gt;

&lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep idle | grep -c cgspace
78
&lt;/code&gt;&lt;/pre&gt;

&lt;ul&gt;
&lt;li&gt;For now I have increased the limit from 60 to 90, run updates, and rebooted the server&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;2015-11-24:3d03b850f8126f80d8144c2e17ea0ae7&#34;&gt;2015-11-24&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;CGSpace went down again&lt;/li&gt;
&lt;li&gt;Getting emails from uptimeRobot and uptimeButler that it&amp;rsquo;s down, and Google Webmaster Tools is sending emails that there is an increase in crawl errors&lt;/li&gt;
&lt;li&gt;Looks like there are still a bunch of idle PostgreSQL connections:&lt;/li&gt;
&lt;/ul&gt;

&lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep idle | grep -c cgspace
96
&lt;/code&gt;&lt;/pre&gt;

&lt;ul&gt;
&lt;li&gt;For some reason the number of idle connections is very high since we upgraded to DSpace 5&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;2015-11-25:3d03b850f8126f80d8144c2e17ea0ae7&#34;&gt;2015-11-25&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Troubleshoot the DSpace 5 OAI breakage caused by nginx routing config&lt;/li&gt;
&lt;li&gt;The OAI application requests stylesheets and javascript files with the path &lt;code&gt;/oai/static/css&lt;/code&gt;, which gets matched here:&lt;/li&gt;
&lt;/ul&gt;

&lt;pre&gt;&lt;code&gt;# static assets we can load from the file system directly with nginx
location ~ /(themes|static|aspects/ReportingSuite) {
    try_files $uri @tomcat;
...
&lt;/code&gt;&lt;/pre&gt;

&lt;ul&gt;
&lt;li&gt;The document root is relative to the xmlui app, so this gets a 404—I&amp;rsquo;m not sure why it doesn&amp;rsquo;t pass to &lt;code&gt;@tomcat&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Anyways, I can&amp;rsquo;t find any URIs with path &lt;code&gt;/static&lt;/code&gt;, and the more important point is to handle all the static theme assets, so we can just remove &lt;code&gt;static&lt;/code&gt; from the regex for now (who cares if we can&amp;rsquo;t use nginx to send Etags for OAI CSS!)&lt;/li&gt;
&lt;li&gt;Also, I noticed we aren&amp;rsquo;t setting CSP headers on the static assets, because in nginx headers are inherited in child blocks, but if you use &lt;code&gt;add_header&lt;/code&gt; in a child block it doesn&amp;rsquo;t inherit the others&lt;/li&gt;
&lt;li&gt;We simply need to add &lt;code&gt;include extra-security.conf;&lt;/code&gt; to the above location block (but research and test first)&lt;/li&gt;
&lt;li&gt;We should add WOFF assets to the list of things to set expires for:&lt;/li&gt;
&lt;/ul&gt;

&lt;pre&gt;&lt;code&gt;location ~* \.(?:ico|css|js|gif|jpe?g|png|woff)$ {
&lt;/code&gt;&lt;/pre&gt;

&lt;ul&gt;
&lt;li&gt;We should also add &lt;code&gt;aspects/Statistics&lt;/code&gt; to the location block for static assets (minus &lt;code&gt;static&lt;/code&gt; from above):&lt;/li&gt;
&lt;/ul&gt;

&lt;pre&gt;&lt;code&gt;location ~ /(themes|aspects/ReportingSuite|aspects/Statistics) {
&lt;/code&gt;&lt;/pre&gt;

&lt;ul&gt;
&lt;li&gt;Need to check &lt;code&gt;/about&lt;/code&gt; on CGSpace, as it&amp;rsquo;s blank on my local test server and we might need to add something there&lt;/li&gt;
&lt;li&gt;CGSpace has been up and down all day due to PostgreSQL idle connections (current DSpace pool is 90):&lt;/li&gt;
&lt;/ul&gt;

&lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep idle | grep -c cgspace
93
&lt;/code&gt;&lt;/pre&gt;

&lt;ul&gt;
&lt;li&gt;I looked closer at the idle connections and saw that many have been idle for hours (current time on server is &lt;code&gt;2015-11-25T20:20:42+0000&lt;/code&gt;):&lt;/li&gt;
&lt;/ul&gt;

&lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | less -S
datid | datname  |  pid  | usesysid | usename  | application_name | client_addr | client_hostname | client_port |         backend_start         |          xact_start           |
-------+----------+-------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+---
20951 | cgspace  | 10966 |    18205 | cgspace  |                  | 127.0.0.1   |                 |       37731 | 2015-11-25 13:13:02.837624+00 |                               | 20
20951 | cgspace  | 10967 |    18205 | cgspace  |                  | 127.0.0.1   |                 |       37737 | 2015-11-25 13:13:03.069421+00 |                               | 20
...
&lt;/code&gt;&lt;/pre&gt;

&lt;ul&gt;
&lt;li&gt;There is a relevant Jira issue about this: &lt;a href=&#34;https://jira.duraspace.org/browse/DS-1458&#34;&gt;https://jira.duraspace.org/browse/DS-1458&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;It seems there is some sense changing DSpace&amp;rsquo;s default &lt;code&gt;db.maxidle&lt;/code&gt; from unlimited (-1) to something like 8 (Tomcat default) or 10 (Confluence default)&lt;/li&gt;
&lt;li&gt;Change &lt;code&gt;db.maxidle&lt;/code&gt; from -1 to 10, reduce &lt;code&gt;db.maxconnections&lt;/code&gt; from 90 to 50, and restart postgres and tomcat7&lt;/li&gt;
&lt;li&gt;Also redeploy DSpace Test with a clean sync of CGSpace and mirror these database settings there as well&lt;/li&gt;
&lt;li&gt;Also deploy the nginx fixes for the &lt;code&gt;try_files&lt;/code&gt; location block as well as the expires block&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;2015-11-26:3d03b850f8126f80d8144c2e17ea0ae7&#34;&gt;2015-11-26&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;CGSpace behaving much better since changing &lt;code&gt;db.maxidle&lt;/code&gt; yesterday, but still two up/down notices from monitoring this morning (better than 50!)&lt;/li&gt;
&lt;li&gt;CCAFS colleagues mentioned that the REST API is very slow, 24 seconds for one item&lt;/li&gt;
&lt;li&gt;Not as bad for me, but still unsustainable if you have to get many:&lt;/li&gt;
&lt;/ul&gt;

&lt;pre&gt;&lt;code&gt;$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
8.415
&lt;/code&gt;&lt;/pre&gt;

&lt;ul&gt;
&lt;li&gt;Monitoring e-mailed in the evening to say CGSpace was down&lt;/li&gt;
&lt;li&gt;Idle connections in PostgreSQL again:&lt;/li&gt;
&lt;/ul&gt;

&lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep cgspace | grep -c idle
66
&lt;/code&gt;&lt;/pre&gt;

&lt;ul&gt;
&lt;li&gt;At the time, the current DSpace pool size was 50&amp;hellip;&lt;/li&gt;
&lt;li&gt;I reduced the pool back to the default of 30, and reduced the &lt;code&gt;db.maxidle&lt;/code&gt; settings from 10 to 8&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;2015-11-29:3d03b850f8126f80d8144c2e17ea0ae7&#34;&gt;2015-11-29&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Still more alerts that CGSpace has been up and down all day&lt;/li&gt;
&lt;li&gt;Current database settings for DSpace:&lt;/li&gt;
&lt;/ul&gt;

&lt;pre&gt;&lt;code&gt;db.maxconnections = 30
db.maxwait = 5000
db.maxidle = 8
db.statementpool = true
&lt;/code&gt;&lt;/pre&gt;

&lt;ul&gt;
&lt;li&gt;And idle connections:&lt;/li&gt;
&lt;/ul&gt;

&lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep cgspace | grep -c idle
49
&lt;/code&gt;&lt;/pre&gt;

&lt;ul&gt;
&lt;li&gt;Perhaps I need to start drastically increasing the connection limits—like to 300—to see if DSpace&amp;rsquo;s thirst can ever be quenched&lt;/li&gt;
&lt;li&gt;On another note, SUNScholar&amp;rsquo;s notes suggest adjusting some other postgres variables: &lt;a href=&#34;http://wiki.lib.sun.ac.za/index.php/SUNScholar/Optimisations/Database&#34;&gt;http://wiki.lib.sun.ac.za/index.php/SUNScholar/Optimisations/Database&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;This might help with REST API speed (which I mentioned above and still need to do real tests)&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
  </channel>
</rss>
-												Update generated files in public

Signed-off-by: Alan Orth <alan.orth@gmail.com>

											
										
										
											2015-11-30 11:47:41 +01:00
+								<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
 								<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
 								  <channel>
 								    <title>CGSpace Notes</title>
-												Regenerate site

Signed-off-by: Alan Orth <alan.orth@gmail.com>

											
										
										
											2015-11-30 14:13:49 +01:00
+								    <link>/cgspace-notes/</link>
-												Update generated files in public

Signed-off-by: Alan Orth <alan.orth@gmail.com>

											
										
										
											2015-11-30 11:47:41 +01:00
+								    <description>Recent content on CGSpace Notes</description>
 								    <generator>Hugo -- gohugo.io</generator>
 								    <language>en-us</language>
-												Add December, 2015 page

Signed-off-by: Alan Orth <alan.orth@gmail.com>

											
										
										
											2015-12-02 12:25:34 +01:00
+								    <lastBuildDate>Wed, 02 Dec 2015 13:18:00 +0300</lastBuildDate>
-												Regenerate site

Signed-off-by: Alan Orth <alan.orth@gmail.com>

											
										
										
											2015-11-30 14:13:49 +01:00
+								    <atom:link href="/cgspace-notes/index.xml" rel="self" type="application/rss+xml" />
-												Update generated files in public

Signed-off-by: Alan Orth <alan.orth@gmail.com>

											
										
										
											2015-11-30 11:47:41 +01:00
-												Add December, 2015 page

Signed-off-by: Alan Orth <alan.orth@gmail.com>

											
										
										
											2015-12-02 12:25:34 +01:00
+								    <item>
 								      <title>December, 2015</title>
 								      <link>/cgspace-notes/2015-12/</link>
 								      <pubDate>Wed, 02 Dec 2015 13:18:00 +0300</pubDate>
 								      <guid>/cgspace-notes/2015-12/</guid>
 								      <description>
 								&lt;h2 id=&#34;2015-12-02:012a628feed6d64ae1151cbd6151ccd6&#34;&gt;2015-12-02&lt;/h2&gt;
 								&lt;ul&gt;
 								&lt;li&gt;Replace &lt;code&gt;lzop&lt;/code&gt; with &lt;code&gt;xz&lt;/code&gt; in log compression cron jobs on DSpace Test—it uses less space:&lt;/li&gt;
 								&lt;/ul&gt;
 								&lt;pre&gt;&lt;code&gt;# cd /home/dspacetest.cgiar.org/log
 								# ls -lh dspace.log.2015-11-18*
 								-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
 								-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
 								-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
 								&lt;/code&gt;&lt;/pre&gt;
 								&lt;ul&gt;
 								&lt;li&gt;I had used lrzip once, but it needs more memory and is harder to use as it requires the lrztar wrapper&lt;/li&gt;
 								&lt;li&gt;Need to remember to go check if everything is ok in a few days and then change CGSpace&lt;/li&gt;
-												Add more notes for 2015-12

Signed-off-by: Alan Orth <alan.orth@gmail.com>

											
										
										
											2015-12-02 18:16:44 +01:00
+								&lt;li&gt;CGSpace went down again (due to PostgreSQL idle connections of course)&lt;/li&gt;
 								&lt;li&gt;Current database settings for DSpace are &lt;code&gt;db.maxconnections = 30&lt;/code&gt; and &lt;code&gt;db.maxidle = 8&lt;/code&gt;, yet idle connections are exceeding this:&lt;/li&gt;
 								&lt;/ul&gt;
 								&lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep cgspace | grep -c idle
 
 								&lt;/code&gt;&lt;/pre&gt;
 								&lt;ul&gt;
 								&lt;li&gt;I restarted PostgreSQL and Tomcat and it&amp;rsquo;s back&lt;/li&gt;
 								&lt;li&gt;On a related note of why CGSpace is so slow, I decided to finally try the &lt;code&gt;pgtune&lt;/code&gt; script to tune the postgres settings:&lt;/li&gt;
 								&lt;/ul&gt;
 								&lt;pre&gt;&lt;code&gt;# apt-get install pgtune
 								# pgtune -i /etc/postgresql/9.3/main/postgresql.conf -o postgresql.conf-pgtune
 								# mv /etc/postgresql/9.3/main/postgresql.conf /etc/postgresql/9.3/main/postgresql.conf.orig
 								# mv postgresql.conf-pgtune /etc/postgresql/9.3/main/postgresql.conf
 								&lt;/code&gt;&lt;/pre&gt;
 								&lt;ul&gt;
 								&lt;li&gt;It introduced the following new settings:&lt;/li&gt;
 								&lt;/ul&gt;
 								&lt;pre&gt;&lt;code&gt;default_statistics_target = 50
 								maintenance_work_mem = 480MB
 								constraint_exclusion = on
 								checkpoint_completion_target = 0.9
 								effective_cache_size = 5632MB
 								work_mem = 48MB
 								wal_buffers = 8MB
 								checkpoint_segments = 16
 								shared_buffers = 1920MB
 								max_connections = 80
 								&lt;/code&gt;&lt;/pre&gt;
 								&lt;ul&gt;
 								&lt;li&gt;Now I need to go read PostgreSQL docs about these options, and watch memory settings in munin etc&lt;/li&gt;
-												Update notes for 2015-12-02

Signed-off-by: Alan Orth <alan.orth@gmail.com>

											
										
										
											2015-12-02 20:11:28 +01:00
+								&lt;li&gt;For what it&amp;rsquo;s worth, now the REST API should be faster (because of these PostgreSQL tweaks):&lt;/li&gt;
 								&lt;/ul&gt;
 								&lt;pre&gt;&lt;code&gt;$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
 .474
 								$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
 .141
 								$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
 .685
 								$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
 .995
 								$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
 .786
 								&lt;/code&gt;&lt;/pre&gt;
 								&lt;ul&gt;
 								&lt;li&gt;Last week it was an average of 8 seconds&amp;hellip; now this is &lt;sup&gt;1&lt;/sup&gt;&amp;frasl;&lt;sub&gt;4&lt;/sub&gt; of that&lt;/li&gt;
-												Update notes for 2015-12

Signed-off-by: Alan Orth <alan.orth@gmail.com>

											
										
										
											2015-12-02 21:19:56 +01:00
+								&lt;li&gt;CCAFS noticed that one of their items displays only the Atmire statlets: &lt;a href=&#34;https://cgspace.cgiar.org/handle/10568/42445&#34;&gt;https://cgspace.cgiar.org/handle/10568/42445&lt;/a&gt;&lt;/li&gt;
 								&lt;/ul&gt;
 								&lt;p&gt;&lt;img src=&#34;../images/ccafs-item-no-metadata.png&#34; alt=&#34;CCAFS item&#34; /&gt;&lt;/p&gt;
 								&lt;ul&gt;
 								&lt;li&gt;The authorizations for the item are all public READ, and I don&amp;rsquo;t see any errors in dspace.log when browsing that item&lt;/li&gt;
 								&lt;li&gt;I filed a ticket on Atmire&amp;rsquo;s issue tracker&lt;/li&gt;
 								&lt;li&gt;I also filed a ticket on Atmire&amp;rsquo;s issue tracker for the PostgreSQL stuff&lt;/li&gt;
-												Add December, 2015 page

Signed-off-by: Alan Orth <alan.orth@gmail.com>

											
										
										
											2015-12-02 12:25:34 +01:00
+								&lt;/ul&gt;
-												Add notes for 2015-12-03

Signed-off-by: Alan Orth <alan.orth@gmail.com>

											
										
										
											2015-12-03 10:08:14 +01:00
 								&lt;h2 id=&#34;2015-12-03:012a628feed6d64ae1151cbd6151ccd6&#34;&gt;2015-12-03&lt;/h2&gt;
 								&lt;ul&gt;
 								&lt;li&gt;CGSpace very slow, and monitoring emailing me to say its down, even though I can load the page (very slowly)&lt;/li&gt;
 								&lt;li&gt;Idle postgres connections look like this (with no change in DSpace db settings lately):&lt;/li&gt;
 								&lt;/ul&gt;
 								&lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep cgspace | grep -c idle
 
 								&lt;/code&gt;&lt;/pre&gt;
 								&lt;ul&gt;
 								&lt;li&gt;I restarted Tomcat and postgres&amp;hellip;&lt;/li&gt;
-												Update notes for 2015-12-03

Signed-off-by: Alan Orth <alan.orth@gmail.com>

											
										
										
											2015-12-03 23:08:49 +01:00
+								&lt;li&gt;Atmire commented that we should raise the JVM heap size by ~500M, so it is now &lt;code&gt;-Xms3584m -Xmx3584m&lt;/code&gt;&lt;/li&gt;
 								&lt;li&gt;We weren&amp;rsquo;t out of heap yet, but it&amp;rsquo;s probably fair enough that the DSpace 5 upgrade (and new Atmire modules) requires more memory so it&amp;rsquo;s ok&lt;/li&gt;
-												Update notes for 2015-12-03

I keep forgetting that `hugo serve` doesn't regenerate the files
on the disk. :)

Signed-off-by: Alan Orth <alan.orth@gmail.com>

											
										
										
											2015-12-03 23:09:41 +01:00
+								&lt;li&gt;A possible side effect is that I see that the REST API is twice as fast for the request above now:&lt;/li&gt;
-												Add notes for 2015-12-03

Signed-off-by: Alan Orth <alan.orth@gmail.com>

											
										
										
											2015-12-03 10:08:14 +01:00
+								&lt;/ul&gt;
-												Update notes for 2015-12-03

I keep forgetting that `hugo serve` doesn't regenerate the files
on the disk. :)

Signed-off-by: Alan Orth <alan.orth@gmail.com>

											
										
										
											2015-12-03 23:09:41 +01:00
 								&lt;pre&gt;&lt;code&gt;$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
 .368
 								$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
 .968
 								$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
 .006
 								$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
 .849
 								$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
 .806
 								$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
 .854
 								&lt;/code&gt;&lt;/pre&gt;
-												Add notes and pictures for 2015-12-05

Signed-off-by: Alan Orth <alan.orth@gmail.com>

											
										
										
											2015-12-05 16:42:56 +01:00
 								&lt;h2 id=&#34;2015-12-05:012a628feed6d64ae1151cbd6151ccd6&#34;&gt;2015-12-05&lt;/h2&gt;
 								&lt;ul&gt;
 								&lt;li&gt;CGSpace has been up and down all day and REST API is completely unresponsive&lt;/li&gt;
 								&lt;li&gt;PostgreSQL idle connections are currently:&lt;/li&gt;
 								&lt;/ul&gt;
 								&lt;pre&gt;&lt;code&gt;postgres@linode01:~$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep cgspace | grep -c idle
 
 								&lt;/code&gt;&lt;/pre&gt;
 								&lt;ul&gt;
 								&lt;li&gt;I have reverted all the pgtune tweaks from the other day, as they didn&amp;rsquo;t fix the stability issues, so I&amp;rsquo;d rather not have them introducing more variables into the equation&lt;/li&gt;
 								&lt;li&gt;The PostgreSQL stats from Munin all point to something database-related with the DSpace 5 upgrade around mid–late November&lt;/li&gt;
 								&lt;/ul&gt;
 								&lt;p&gt;&lt;img src=&#34;../images/2015/12/postgres_bgwriter-year.png&#34; alt=&#34;PostgreSQL bgwriter (year)&#34; /&gt;
 								&lt;img src=&#34;../images/2015/12/postgres_cache_cgspace-year.png&#34; alt=&#34;PostgreSQL cache (year)&#34; /&gt;
 								&lt;img src=&#34;../images/2015/12/postgres_locks_cgspace-year.png&#34; alt=&#34;PostgreSQL locks (year)&#34; /&gt;
 								&lt;img src=&#34;../images/2015/12/postgres_scans_cgspace-year.png&#34; alt=&#34;PostgreSQL scans *year)&#34; /&gt;&lt;/p&gt;
-												Add December, 2015 page

Signed-off-by: Alan Orth <alan.orth@gmail.com>

											
										
										
											2015-12-02 12:25:34 +01:00
+								</description>
 								    </item>
-												Update generated files in public

Signed-off-by: Alan Orth <alan.orth@gmail.com>

											
										
										
											2015-11-30 11:47:41 +01:00
+								    <item>
 								      <title>November, 2015</title>
-												Regenerate site

Signed-off-by: Alan Orth <alan.orth@gmail.com>

											
										
										
											2015-11-30 14:13:49 +01:00
+								      <link>/cgspace-notes/2015-11/</link>
-												Update generated files in public

Signed-off-by: Alan Orth <alan.orth@gmail.com>

											
										
										
											2015-11-30 11:47:41 +01:00
+								      <pubDate>Mon, 23 Nov 2015 17:00:57 +0300</pubDate>
-												Regenerate site

Signed-off-by: Alan Orth <alan.orth@gmail.com>

											
										
										
											2015-11-30 14:13:49 +01:00
+								      <guid>/cgspace-notes/2015-11/</guid>
-												Update generated files in public

Signed-off-by: Alan Orth <alan.orth@gmail.com>

											
										
										
											2015-11-30 11:47:41 +01:00
+								      <description>
 								&lt;h2 id=&#34;2015-11-22:3d03b850f8126f80d8144c2e17ea0ae7&#34;&gt;2015-11-22&lt;/h2&gt;
 								&lt;ul&gt;
 								&lt;li&gt;CGSpace went down&lt;/li&gt;
 								&lt;li&gt;Looks like DSpace exhausted its PostgreSQL connection pool&lt;/li&gt;
 								&lt;li&gt;Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:&lt;/li&gt;
 								&lt;/ul&gt;
 								&lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep idle | grep -c cgspace
 
 								&lt;/code&gt;&lt;/pre&gt;
 								&lt;ul&gt;
 								&lt;li&gt;For now I have increased the limit from 60 to 90, run updates, and rebooted the server&lt;/li&gt;
 								&lt;/ul&gt;
 								&lt;h2 id=&#34;2015-11-24:3d03b850f8126f80d8144c2e17ea0ae7&#34;&gt;2015-11-24&lt;/h2&gt;
 								&lt;ul&gt;
 								&lt;li&gt;CGSpace went down again&lt;/li&gt;
 								&lt;li&gt;Getting emails from uptimeRobot and uptimeButler that it&amp;rsquo;s down, and Google Webmaster Tools is sending emails that there is an increase in crawl errors&lt;/li&gt;
 								&lt;li&gt;Looks like there are still a bunch of idle PostgreSQL connections:&lt;/li&gt;
 								&lt;/ul&gt;
 								&lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep idle | grep -c cgspace
 
 								&lt;/code&gt;&lt;/pre&gt;
 								&lt;ul&gt;
 								&lt;li&gt;For some reason the number of idle connections is very high since we upgraded to DSpace 5&lt;/li&gt;
 								&lt;/ul&gt;
 								&lt;h2 id=&#34;2015-11-25:3d03b850f8126f80d8144c2e17ea0ae7&#34;&gt;2015-11-25&lt;/h2&gt;
 								&lt;ul&gt;
 								&lt;li&gt;Troubleshoot the DSpace 5 OAI breakage caused by nginx routing config&lt;/li&gt;
 								&lt;li&gt;The OAI application requests stylesheets and javascript files with the path &lt;code&gt;/oai/static/css&lt;/code&gt;, which gets matched here:&lt;/li&gt;
 								&lt;/ul&gt;
 								&lt;pre&gt;&lt;code&gt;# static assets we can load from the file system directly with nginx
 								location ~ /(themes|static|aspects/ReportingSuite) {
 								    try_files $uri @tomcat;
 								...
 								&lt;/code&gt;&lt;/pre&gt;
 								&lt;ul&gt;
 								&lt;li&gt;The document root is relative to the xmlui app, so this gets a 404—I&amp;rsquo;m not sure why it doesn&amp;rsquo;t pass to &lt;code&gt;@tomcat&lt;/code&gt;&lt;/li&gt;
 								&lt;li&gt;Anyways, I can&amp;rsquo;t find any URIs with path &lt;code&gt;/static&lt;/code&gt;, and the more important point is to handle all the static theme assets, so we can just remove &lt;code&gt;static&lt;/code&gt; from the regex for now (who cares if we can&amp;rsquo;t use nginx to send Etags for OAI CSS!)&lt;/li&gt;
 								&lt;li&gt;Also, I noticed we aren&amp;rsquo;t setting CSP headers on the static assets, because in nginx headers are inherited in child blocks, but if you use &lt;code&gt;add_header&lt;/code&gt; in a child block it doesn&amp;rsquo;t inherit the others&lt;/li&gt;
 								&lt;li&gt;We simply need to add &lt;code&gt;include extra-security.conf;&lt;/code&gt; to the above location block (but research and test first)&lt;/li&gt;
 								&lt;li&gt;We should add WOFF assets to the list of things to set expires for:&lt;/li&gt;
 								&lt;/ul&gt;
 								&lt;pre&gt;&lt;code&gt;location ~* \.(?:ico|css|js|gif|jpe?g|png|woff)$ {
 								&lt;/code&gt;&lt;/pre&gt;
 								&lt;ul&gt;
 								&lt;li&gt;We should also add &lt;code&gt;aspects/Statistics&lt;/code&gt; to the location block for static assets (minus &lt;code&gt;static&lt;/code&gt; from above):&lt;/li&gt;
 								&lt;/ul&gt;
 								&lt;pre&gt;&lt;code&gt;location ~ /(themes|aspects/ReportingSuite|aspects/Statistics) {
 								&lt;/code&gt;&lt;/pre&gt;
 								&lt;ul&gt;
 								&lt;li&gt;Need to check &lt;code&gt;/about&lt;/code&gt; on CGSpace, as it&amp;rsquo;s blank on my local test server and we might need to add something there&lt;/li&gt;
 								&lt;li&gt;CGSpace has been up and down all day due to PostgreSQL idle connections (current DSpace pool is 90):&lt;/li&gt;
 								&lt;/ul&gt;
 								&lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep idle | grep -c cgspace
 
 								&lt;/code&gt;&lt;/pre&gt;
 								&lt;ul&gt;
 								&lt;li&gt;I looked closer at the idle connections and saw that many have been idle for hours (current time on server is &lt;code&gt;2015-11-25T20:20:42+0000&lt;/code&gt;):&lt;/li&gt;
 								&lt;/ul&gt;
 								&lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | less -S
 								datid | datname  |  pid  | usesysid | usename  | application_name | client_addr | client_hostname | client_port |         backend_start         |          xact_start           |
 								-------+----------+-------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+---
 | cgspace  | 10966 |    18205 | cgspace  |                  | 127.0.0.1   |                 |       37731 | 2015-11-25 13:13:02.837624+00 |                               | 20
 | cgspace  | 10967 |    18205 | cgspace  |                  | 127.0.0.1   |                 |       37737 | 2015-11-25 13:13:03.069421+00 |                               | 20
 								...
 								&lt;/code&gt;&lt;/pre&gt;
 								&lt;ul&gt;
 								&lt;li&gt;There is a relevant Jira issue about this: &lt;a href=&#34;https://jira.duraspace.org/browse/DS-1458&#34;&gt;https://jira.duraspace.org/browse/DS-1458&lt;/a&gt;&lt;/li&gt;
 								&lt;li&gt;It seems there is some sense changing DSpace&amp;rsquo;s default &lt;code&gt;db.maxidle&lt;/code&gt; from unlimited (-1) to something like 8 (Tomcat default) or 10 (Confluence default)&lt;/li&gt;
 								&lt;li&gt;Change &lt;code&gt;db.maxidle&lt;/code&gt; from -1 to 10, reduce &lt;code&gt;db.maxconnections&lt;/code&gt; from 90 to 50, and restart postgres and tomcat7&lt;/li&gt;
 								&lt;li&gt;Also redeploy DSpace Test with a clean sync of CGSpace and mirror these database settings there as well&lt;/li&gt;
 								&lt;li&gt;Also deploy the nginx fixes for the &lt;code&gt;try_files&lt;/code&gt; location block as well as the expires block&lt;/li&gt;
 								&lt;/ul&gt;
 								&lt;h2 id=&#34;2015-11-26:3d03b850f8126f80d8144c2e17ea0ae7&#34;&gt;2015-11-26&lt;/h2&gt;
 								&lt;ul&gt;
 								&lt;li&gt;CGSpace behaving much better since changing &lt;code&gt;db.maxidle&lt;/code&gt; yesterday, but still two up/down notices from monitoring this morning (better than 50!)&lt;/li&gt;
 								&lt;li&gt;CCAFS colleagues mentioned that the REST API is very slow, 24 seconds for one item&lt;/li&gt;
 								&lt;li&gt;Not as bad for me, but still unsustainable if you have to get many:&lt;/li&gt;
 								&lt;/ul&gt;
 								&lt;pre&gt;&lt;code&gt;$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
 .415
 								&lt;/code&gt;&lt;/pre&gt;
 								&lt;ul&gt;
 								&lt;li&gt;Monitoring e-mailed in the evening to say CGSpace was down&lt;/li&gt;
 								&lt;li&gt;Idle connections in PostgreSQL again:&lt;/li&gt;
 								&lt;/ul&gt;
 								&lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep cgspace | grep -c idle
 
 								&lt;/code&gt;&lt;/pre&gt;
 								&lt;ul&gt;
 								&lt;li&gt;At the time, the current DSpace pool size was 50&amp;hellip;&lt;/li&gt;
 								&lt;li&gt;I reduced the pool back to the default of 30, and reduced the &lt;code&gt;db.maxidle&lt;/code&gt; settings from 10 to 8&lt;/li&gt;
 								&lt;/ul&gt;
-												Regenerate site

Signed-off-by: Alan Orth <alan.orth@gmail.com>

											
										
										
											2015-11-30 14:13:49 +01:00
+								&lt;h2 id=&#34;2015-11-29:3d03b850f8126f80d8144c2e17ea0ae7&#34;&gt;2015-11-29&lt;/h2&gt;
-												Update generated files in public

Signed-off-by: Alan Orth <alan.orth@gmail.com>

											
										
										
											2015-11-30 11:47:41 +01:00
 								&lt;ul&gt;
 								&lt;li&gt;Still more alerts that CGSpace has been up and down all day&lt;/li&gt;
 								&lt;li&gt;Current database settings for DSpace:&lt;/li&gt;
 								&lt;/ul&gt;
 								&lt;pre&gt;&lt;code&gt;db.maxconnections = 30
 								db.maxwait = 5000
 								db.maxidle = 8
 								db.statementpool = true
 								&lt;/code&gt;&lt;/pre&gt;
 								&lt;ul&gt;
 								&lt;li&gt;And idle connections:&lt;/li&gt;
 								&lt;/ul&gt;
 								&lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep cgspace | grep -c idle
 
 								&lt;/code&gt;&lt;/pre&gt;
 								&lt;ul&gt;
 								&lt;li&gt;Perhaps I need to start drastically increasing the connection limits—like to 300—to see if DSpace&amp;rsquo;s thirst can ever be quenched&lt;/li&gt;
 								&lt;li&gt;On another note, SUNScholar&amp;rsquo;s notes suggest adjusting some other postgres variables: &lt;a href=&#34;http://wiki.lib.sun.ac.za/index.php/SUNScholar/Optimisations/Database&#34;&gt;http://wiki.lib.sun.ac.za/index.php/SUNScholar/Optimisations/Database&lt;/a&gt;&lt;/li&gt;
 								&lt;li&gt;This might help with REST API speed (which I mentioned above and still need to do real tests)&lt;/li&gt;
 								&lt;/ul&gt;
 								</description>
 								    </item>
 								  </channel>
 								</rss>