cgspace-notes/public/tags/notes/index.xml

614 lines
32 KiB
XML
Raw Normal View History

<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Notes on CGSpace Notes</title>
<link>/cgspace-notes/tags/notes/</link>
<description>Recent content in Notes on CGSpace Notes</description>
<generator>Hugo -- gohugo.io</generator>
<language>en-us</language>
<lastBuildDate>Fri, 05 Feb 2016 13:18:00 +0300</lastBuildDate>
<atom:link href="/cgspace-notes/tags/notes/index.xml" rel="self" type="application/rss+xml" />
<item>
<title>February, 2016</title>
<link>/cgspace-notes/2016-02/</link>
<pubDate>Fri, 05 Feb 2016 13:18:00 +0300</pubDate>
<guid>/cgspace-notes/2016-02/</guid>
<description>
&lt;h2 id=&#34;2016-02-05:124a59adbaa8ef13e1518d003fc03981&#34;&gt;2016-02-05&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Looking at some DAGRIS data for Abenet Yabowork&lt;/li&gt;
&lt;li&gt;Lots of issues with spaces, newlines, etc causing the import to fail&lt;/li&gt;
&lt;li&gt;I noticed we have a very &lt;em&gt;interesting&lt;/em&gt; list of countries on CGSpace:&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&#34;../images/2016/02/cgspace-countries.png&#34; alt=&#34;CGSpace country list&#34; /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Not only are there 49,000 countries, we have some blanks (25)&amp;hellip;&lt;/li&gt;
&lt;li&gt;Also, lots of things like &amp;ldquo;COTE D`LVOIRE&amp;rdquo; and &amp;ldquo;COTE D IVOIRE&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;2016-02-06:124a59adbaa8ef13e1518d003fc03981&#34;&gt;2016-02-06&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Found a way to get items with null/empty metadata values from SQL&lt;/li&gt;
&lt;li&gt;First, find the &lt;code&gt;metadata_field_id&lt;/code&gt; for the field you want from the &lt;code&gt;metadatafieldregistry&lt;/code&gt; table:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;dspacetest=# select * from metadatafieldregistry;
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;In this case our country field is 78&lt;/li&gt;
&lt;li&gt;Now find all resources with type 2 (item) that have null/empty values for that field:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;dspacetest=# select resource_id from metadatavalue where resource_type_id=2 and metadata_field_id=78 and (text_value=&#39;&#39; OR text_value IS NULL);
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Then you can find the handle that owns it from its &lt;code&gt;resource_id&lt;/code&gt;:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;dspacetest=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id = &#39;22678&#39;;
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;It&amp;rsquo;s 25 items so editing in the web UI is annoying, let&amp;rsquo;s try SQL!&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;dspacetest=# delete from metadatavalue where metadata_field_id=78 and text_value=&#39;&#39;;
DELETE 25
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;After that perhaps a regular &lt;code&gt;dspace index-discovery&lt;/code&gt; (no -b) &lt;em&gt;should&lt;/em&gt; suffice&amp;hellip;&lt;/li&gt;
&lt;li&gt;Hmm, I indexed, cleared the Cocoon cache, and restarted Tomcat but the 25 &amp;ldquo;|||&amp;rdquo; countries are still there&lt;/li&gt;
&lt;li&gt;Maybe I need to do a full re-index&amp;hellip;&lt;/li&gt;
&lt;li&gt;Yep! The full re-index seems to work.&lt;/li&gt;
&lt;li&gt;Process the empty countries on CGSpace&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;2016-02-07:124a59adbaa8ef13e1518d003fc03981&#34;&gt;2016-02-07&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Working on cleaning up Abenet&amp;rsquo;s DAGRIS data with OpenRefine&lt;/li&gt;
&lt;li&gt;I discovered two really nice functions in OpenRefine: &lt;code&gt;value.trim()&lt;/code&gt; and &lt;code&gt;value.escape(&amp;quot;javascript&amp;quot;)&lt;/code&gt; which shows whitespace characters like &lt;code&gt;\r\n&lt;/code&gt;!&lt;/li&gt;
&lt;li&gt;For some reason when you import an Excel file into OpenRefine it exports dates like 1949 to 1949.0 in the CSV&lt;/li&gt;
&lt;li&gt;I re-import the resulting CSV and run a GREL on the date issued column: &lt;code&gt;value.replace(&amp;quot;\.0&amp;quot;, &amp;quot;&amp;quot;)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;I need to start running DSpace in Mac OS X instead of a Linux VM&lt;/li&gt;
&lt;li&gt;Install PostgreSQL from homebrew, then configure and import CGSpace database dump:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ postgres -D /opt/brew/var/postgres
$ createuser --superuser postgres
$ createuser --pwprompt dspacetest
$ createdb -O dspacetest --encoding=UNICODE dspacetest
$ psql postgres
postgres=# alter user dspacetest createuser;
postgres=# \q
$ pg_restore -O -U dspacetest -d dspacetest ~/Downloads/cgspace_2016-02-07.backup
$ psql postgres
postgres=# alter user dspacetest nocreateuser;
postgres=# \q
$ vacuumdb dspacetest
$ psql -U dspacetest -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest -h localhost
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;After building and running a &lt;code&gt;fresh_install&lt;/code&gt; I symlinked the webapps into Tomcat&amp;rsquo;s webapps folder:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ mv /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/ROOT /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/ROOT.orig
$ ln -sfv ~/dspace/webapps/xmlui /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/ROOT
$ ln -sfv ~/dspace/webapps/rest /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/rest
$ ln -sfv ~/dspace/webapps/jspui /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/jspui
$ ln -sfv ~/dspace/webapps/oai /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/oai
$ ln -sfv ~/dspace/webapps/solr /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/solr
$ /opt/brew/Cellar/tomcat/8.0.30/bin/catalina start
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Add CATALINA_OPTS in &lt;code&gt;/opt/brew/Cellar/tomcat/8.0.30/libexec/bin/setenv.sh&lt;/code&gt;, as this script is sourced by the &lt;code&gt;catalina&lt;/code&gt; startup script&lt;/li&gt;
&lt;li&gt;For example:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;CATALINA_OPTS=&amp;quot;-Djava.awt.headless=true -Xms2048m -Xmx2048m -XX:MaxPermSize=256m -XX:+UseConcMarkSweepGC -Dfile.encoding=UTF-8&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;After verifying that the site is working, start a full index:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ ~/dspace/bin/dspace index-discovery -b
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;2016-02-08:124a59adbaa8ef13e1518d003fc03981&#34;&gt;2016-02-08&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Finish cleaning up and importing ~400 DAGRIS items into CGSpace&lt;/li&gt;
&lt;li&gt;Whip up some quick CSS to make the button in the submission workflow use the XMLUI theme&amp;rsquo;s brand colors (&lt;a href=&#34;https://github.com/ilri/DSpace/issues/154&#34;&gt;#154&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&#34;../images/2016/02/submit-button-ilri.png&#34; alt=&#34;ILRI submission buttons&#34; /&gt;
&lt;img src=&#34;../images/2016/02/submit-button-drylands.png&#34; alt=&#34;Drylands submission buttons&#34; /&gt;&lt;/p&gt;
&lt;h2 id=&#34;2016-02-09:124a59adbaa8ef13e1518d003fc03981&#34;&gt;2016-02-09&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Re-sync DSpace Test with CGSpace&lt;/li&gt;
&lt;li&gt;Help Sisay with OpenRefine&lt;/li&gt;
&lt;li&gt;Enable HTTPS on DSpace Test using Let&amp;rsquo;s Encrypt:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ cd ~/src/git
$ git clone https://github.com/letsencrypt/letsencrypt
$ cd letsencrypt
$ sudo service nginx stop
# add port 443 to firewall rules
$ ./letsencrypt-auto certonly --standalone -d dspacetest.cgiar.org
$ sudo service nginx start
$ ansible-playbook dspace.yml -l linode02 -t nginx,firewall -u aorth --ask-become-pass
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;We should install it in /opt/letsencrypt and then script the renewal script, but first we have to wire up some variables and template stuff based on the script here: &lt;a href=&#34;https://letsencrypt.org/howitworks/&#34;&gt;https://letsencrypt.org/howitworks/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;I had to export some CIAT items that were being cleaned up on the test server and I noticed their &lt;code&gt;dc.contributor.author&lt;/code&gt; fields have DSpace 5 authority index UUIDs&amp;hellip;&lt;/li&gt;
&lt;li&gt;To clean those up in OpenRefine I used this GREL expression: &lt;code&gt;value.replace(/::\w{8}-\w{4}-\w{4}-\w{4}-\w{12}::600/,&amp;quot;&amp;quot;)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Getting more and more hangs on DSpace Test, seemingly random but also during CSV import&lt;/li&gt;
&lt;li&gt;Logs don&amp;rsquo;t always show anything right when it fails, but eventually one of these appears:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;org.dspace.discovery.SearchServiceException: Error while processing facet fields: java.lang.OutOfMemoryError: Java heap space
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;or&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;Caused by: java.util.NoSuchElementException: Timeout waiting for idle object
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Right now DSpace Test&amp;rsquo;s Tomcat heap is set to 1536m and we have quite a bit of free RAM:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;# free -m
total used free shared buffers cached
Mem: 3950 3902 48 9 37 1311
-/+ buffers/cache: 2552 1397
Swap: 255 57 198
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;So I&amp;rsquo;ll bump up the Tomcat heap to 2048 (CGSpace production server is using 3GB)&lt;/li&gt;
&lt;/ul&gt;
</description>
</item>
<item>
<title>January, 2016</title>
<link>/cgspace-notes/2016-01/</link>
<pubDate>Wed, 13 Jan 2016 13:18:00 +0300</pubDate>
<guid>/cgspace-notes/2016-01/</guid>
<description>
&lt;h2 id=&#34;2016-01-13:3846b7fcbca60cdedafd373cb39cd76d&#34;&gt;2016-01-13&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Move ILRI collection &lt;code&gt;10568/12503&lt;/code&gt; from &lt;code&gt;10568/27869&lt;/code&gt; to &lt;code&gt;10568/27629&lt;/code&gt; using the &lt;a href=&#34;https://gist.github.com/alanorth/392c4660e8b022d99dfa&#34;&gt;move_collections.sh&lt;/a&gt; script I wrote last year.&lt;/li&gt;
&lt;li&gt;I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.&lt;/li&gt;
&lt;li&gt;Update GitHub wiki for documentation of &lt;a href=&#34;https://github.com/ilri/DSpace/wiki/Maintenance-Tasks&#34;&gt;maintenance tasks&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;2016-01-14:3846b7fcbca60cdedafd373cb39cd76d&#34;&gt;2016-01-14&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Update CCAFS project identifiers in input-forms.xml&lt;/li&gt;
&lt;li&gt;Run system updates and restart the server&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;2016-01-18:3846b7fcbca60cdedafd373cb39cd76d&#34;&gt;2016-01-18&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Change &amp;ldquo;Extension material&amp;rdquo; to &amp;ldquo;Extension Material&amp;rdquo; in input-forms.xml (a mistake that fell through the cracks when we fixed the others in DSpace 4 era)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;2016-01-19:3846b7fcbca60cdedafd373cb39cd76d&#34;&gt;2016-01-19&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Work on tweaks and updates for the social sharing icons on item pages: add Delicious and Mendeley (from Academicons), make links open in new windows, and set the icon color to the theme&amp;rsquo;s primary color (&lt;a href=&#34;https://github.com/ilri/DSpace/issues/157&#34;&gt;#157&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Tweak date-based facets to show more values in drill-down ranges (&lt;a href=&#34;https://github.com/ilri/DSpace/issues/162&#34;&gt;#162&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Need to remember to clear the Cocoon cache after deployment or else you don&amp;rsquo;t see the new ranges immediately&lt;/li&gt;
&lt;li&gt;Set up recipe on IFTTT to tweet new items from the CGSpace Atom feed to my twitter account&lt;/li&gt;
&lt;li&gt;Altmetrics&amp;rsquo; support for Handles is kinda weak, so they can&amp;rsquo;t associate our items with DOIs until they are tweeted or blogged, etc first.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;2016-01-21:3846b7fcbca60cdedafd373cb39cd76d&#34;&gt;2016-01-21&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Still waiting for my IFTTT recipe to fire, two days later&lt;/li&gt;
&lt;li&gt;It looks like the Atom feed on CGSpace hasn&amp;rsquo;t changed in two days, but there have definitely been new items&lt;/li&gt;
&lt;li&gt;The RSS feed is nearly as old, but has different old items there&lt;/li&gt;
&lt;li&gt;On a hunch I cleared the Cocoon cache and now the feeds are fresh&lt;/li&gt;
&lt;li&gt;Looks like there is configuration option related to this, &lt;code&gt;webui.feed.cache.age&lt;/code&gt;, which defaults to 48 hours, though I&amp;rsquo;m not sure what relation it has to the Cocoon cache&lt;/li&gt;
&lt;li&gt;In any case, we should change this cache to be something more like 6 hours, as we publish new items several times per day.&lt;/li&gt;
&lt;li&gt;Work around a CSS issue with long URLs in the item view (&lt;a href=&#34;https://github.com/ilri/DSpace/issues/172&#34;&gt;#172&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;2016-01-25:3846b7fcbca60cdedafd373cb39cd76d&#34;&gt;2016-01-25&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Re-deploy CGSpace and DSpace Test with latest &lt;code&gt;5_x-prod&lt;/code&gt; branch&lt;/li&gt;
&lt;li&gt;This included the social icon fixes/updates, date-based facet tweaks, reducing the feed cache age, and fixing a layout issue in XMLUI item view when an item had long URLs&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;2016-01-26:3846b7fcbca60cdedafd373cb39cd76d&#34;&gt;2016-01-26&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Run nginx updates on CGSpace and DSpace Test (&lt;a href=&#34;http://mailman.nginx.org/pipermail/nginx/2016-January/049700.html&#34;&gt;1.8.1 and 1.9.10, respectively&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Run updates on DSpace Test and reboot for new Linode kernel &lt;code&gt;Linux 4.4.0-x86_64-linode63&lt;/code&gt; (first update in months)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;2016-01-28:3846b7fcbca60cdedafd373cb39cd76d&#34;&gt;2016-01-28&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Start looking at importing some Bioversity data that had been prepared earlier this week&lt;/li&gt;
&lt;li&gt;&lt;p&gt;While checking the data I noticed something strange, there are 79 items but only 8 unique PDFs:&lt;/p&gt;
&lt;p&gt;$ ls SimpleArchiveForBio/ | wc -l
79
$ find SimpleArchiveForBio/ -iname &amp;ldquo;*.pdf&amp;rdquo; -exec basename {} \; | sort -u | wc -l
8&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;2016-01-29:3846b7fcbca60cdedafd373cb39cd76d&#34;&gt;2016-01-29&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Add five missing center-specific subjects to XMLUI item view (&lt;a href=&#34;https://github.com/ilri/DSpace/issues/174&#34;&gt;#174&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;This &lt;a href=&#34;https://cgspace.cgiar.org/handle/10568/67062&#34;&gt;CCAFS item&lt;/a&gt; Before:&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&#34;../images/2016/01/xmlui-subjects-before.png&#34; alt=&#34;XMLUI subjects before&#34; /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;After:&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&#34;../images/2016/01/xmlui-subjects-after.png&#34; alt=&#34;XMLUI subjects after&#34; /&gt;&lt;/p&gt;
</description>
</item>
<item>
<title>December, 2015</title>
<link>/cgspace-notes/2015-12/</link>
<pubDate>Wed, 02 Dec 2015 13:18:00 +0300</pubDate>
<guid>/cgspace-notes/2015-12/</guid>
<description>
&lt;h2 id=&#34;2015-12-02:012a628feed6d64ae1151cbd6151ccd6&#34;&gt;2015-12-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Replace &lt;code&gt;lzop&lt;/code&gt; with &lt;code&gt;xz&lt;/code&gt; in log compression cron jobs on DSpace Test—it uses less space:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;# cd /home/dspacetest.cgiar.org/log
# ls -lh dspace.log.2015-11-18*
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;I had used lrzip once, but it needs more memory and is harder to use as it requires the lrztar wrapper&lt;/li&gt;
&lt;li&gt;Need to remember to go check if everything is ok in a few days and then change CGSpace&lt;/li&gt;
&lt;li&gt;CGSpace went down again (due to PostgreSQL idle connections of course)&lt;/li&gt;
&lt;li&gt;Current database settings for DSpace are &lt;code&gt;db.maxconnections = 30&lt;/code&gt; and &lt;code&gt;db.maxidle = 8&lt;/code&gt;, yet idle connections are exceeding this:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep cgspace | grep -c idle
39
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;I restarted PostgreSQL and Tomcat and it&amp;rsquo;s back&lt;/li&gt;
&lt;li&gt;On a related note of why CGSpace is so slow, I decided to finally try the &lt;code&gt;pgtune&lt;/code&gt; script to tune the postgres settings:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;# apt-get install pgtune
# pgtune -i /etc/postgresql/9.3/main/postgresql.conf -o postgresql.conf-pgtune
# mv /etc/postgresql/9.3/main/postgresql.conf /etc/postgresql/9.3/main/postgresql.conf.orig
# mv postgresql.conf-pgtune /etc/postgresql/9.3/main/postgresql.conf
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;It introduced the following new settings:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;default_statistics_target = 50
maintenance_work_mem = 480MB
constraint_exclusion = on
checkpoint_completion_target = 0.9
effective_cache_size = 5632MB
work_mem = 48MB
wal_buffers = 8MB
checkpoint_segments = 16
shared_buffers = 1920MB
max_connections = 80
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Now I need to go read PostgreSQL docs about these options, and watch memory settings in munin etc&lt;/li&gt;
&lt;li&gt;For what it&amp;rsquo;s worth, now the REST API should be faster (because of these PostgreSQL tweaks):&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
1.474
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
2.141
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
1.685
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
1.995
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
1.786
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Last week it was an average of 8 seconds&amp;hellip; now this is &lt;sup&gt;1&lt;/sup&gt;&amp;frasl;&lt;sub&gt;4&lt;/sub&gt; of that&lt;/li&gt;
&lt;li&gt;CCAFS noticed that one of their items displays only the Atmire statlets: &lt;a href=&#34;https://cgspace.cgiar.org/handle/10568/42445&#34;&gt;https://cgspace.cgiar.org/handle/10568/42445&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&#34;../images/2015/12/ccafs-item-no-metadata.png&#34; alt=&#34;CCAFS item&#34; /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The authorizations for the item are all public READ, and I don&amp;rsquo;t see any errors in dspace.log when browsing that item&lt;/li&gt;
&lt;li&gt;I filed a ticket on Atmire&amp;rsquo;s issue tracker&lt;/li&gt;
&lt;li&gt;I also filed a ticket on Atmire&amp;rsquo;s issue tracker for the PostgreSQL stuff&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;2015-12-03:012a628feed6d64ae1151cbd6151ccd6&#34;&gt;2015-12-03&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;CGSpace very slow, and monitoring emailing me to say its down, even though I can load the page (very slowly)&lt;/li&gt;
&lt;li&gt;Idle postgres connections look like this (with no change in DSpace db settings lately):&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep cgspace | grep -c idle
29
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;I restarted Tomcat and postgres&amp;hellip;&lt;/li&gt;
&lt;li&gt;Atmire commented that we should raise the JVM heap size by ~500M, so it is now &lt;code&gt;-Xms3584m -Xmx3584m&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;We weren&amp;rsquo;t out of heap yet, but it&amp;rsquo;s probably fair enough that the DSpace 5 upgrade (and new Atmire modules) requires more memory so it&amp;rsquo;s ok&lt;/li&gt;
&lt;li&gt;A possible side effect is that I see that the REST API is twice as fast for the request above now:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
1.368
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.968
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
1.006
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.849
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.806
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.854
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;2015-12-05:012a628feed6d64ae1151cbd6151ccd6&#34;&gt;2015-12-05&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;CGSpace has been up and down all day and REST API is completely unresponsive&lt;/li&gt;
&lt;li&gt;PostgreSQL idle connections are currently:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;postgres@linode01:~$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep cgspace | grep -c idle
28
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;I have reverted all the pgtune tweaks from the other day, as they didn&amp;rsquo;t fix the stability issues, so I&amp;rsquo;d rather not have them introducing more variables into the equation&lt;/li&gt;
&lt;li&gt;The PostgreSQL stats from Munin all point to something database-related with the DSpace 5 upgrade around midlate November&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&#34;../images/2015/12/postgres_bgwriter-year.png&#34; alt=&#34;PostgreSQL bgwriter (year)&#34; /&gt;
&lt;img src=&#34;../images/2015/12/postgres_cache_cgspace-year.png&#34; alt=&#34;PostgreSQL cache (year)&#34; /&gt;
&lt;img src=&#34;../images/2015/12/postgres_locks_cgspace-year.png&#34; alt=&#34;PostgreSQL locks (year)&#34; /&gt;
&lt;img src=&#34;../images/2015/12/postgres_scans_cgspace-year.png&#34; alt=&#34;PostgreSQL scans (year)&#34; /&gt;&lt;/p&gt;
&lt;h2 id=&#34;2015-12-07:012a628feed6d64ae1151cbd6151ccd6&#34;&gt;2015-12-07&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Atmire sent &lt;a href=&#34;https://github.com/ilri/DSpace/pull/161&#34;&gt;some fixes&lt;/a&gt; to DSpace&amp;rsquo;s REST API code that was leaving contexts open (causing the slow performance and database issues)&lt;/li&gt;
&lt;li&gt;After deploying the fix to CGSpace the REST API is consistently faster:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.675
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.599
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.588
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.566
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.497
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;2015-12-08:012a628feed6d64ae1151cbd6151ccd6&#34;&gt;2015-12-08&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Switch CGSpace log compression cron jobs from using lzop to xz—the compression isn&amp;rsquo;t as good, but it&amp;rsquo;s much faster and causes less IO/CPU load&lt;/li&gt;
&lt;li&gt;Since we figured out (and fixed) the cause of the performance issue, I reverted Google Bot&amp;rsquo;s crawl rate to the &amp;ldquo;Let Google optimize&amp;rdquo; setting&lt;/li&gt;
&lt;/ul&gt;
</description>
</item>
<item>
<title>November, 2015</title>
<link>/cgspace-notes/2015-11/</link>
<pubDate>Mon, 23 Nov 2015 17:00:57 +0300</pubDate>
<guid>/cgspace-notes/2015-11/</guid>
<description>
&lt;h2 id=&#34;2015-11-22:3d03b850f8126f80d8144c2e17ea0ae7&#34;&gt;2015-11-22&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;CGSpace went down&lt;/li&gt;
&lt;li&gt;Looks like DSpace exhausted its PostgreSQL connection pool&lt;/li&gt;
&lt;li&gt;Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep idle | grep -c cgspace
78
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;For now I have increased the limit from 60 to 90, run updates, and rebooted the server&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;2015-11-24:3d03b850f8126f80d8144c2e17ea0ae7&#34;&gt;2015-11-24&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;CGSpace went down again&lt;/li&gt;
&lt;li&gt;Getting emails from uptimeRobot and uptimeButler that it&amp;rsquo;s down, and Google Webmaster Tools is sending emails that there is an increase in crawl errors&lt;/li&gt;
&lt;li&gt;Looks like there are still a bunch of idle PostgreSQL connections:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep idle | grep -c cgspace
96
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;For some reason the number of idle connections is very high since we upgraded to DSpace 5&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;2015-11-25:3d03b850f8126f80d8144c2e17ea0ae7&#34;&gt;2015-11-25&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Troubleshoot the DSpace 5 OAI breakage caused by nginx routing config&lt;/li&gt;
&lt;li&gt;The OAI application requests stylesheets and javascript files with the path &lt;code&gt;/oai/static/css&lt;/code&gt;, which gets matched here:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;# static assets we can load from the file system directly with nginx
location ~ /(themes|static|aspects/ReportingSuite) {
try_files $uri @tomcat;
...
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;The document root is relative to the xmlui app, so this gets a 404—I&amp;rsquo;m not sure why it doesn&amp;rsquo;t pass to &lt;code&gt;@tomcat&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Anyways, I can&amp;rsquo;t find any URIs with path &lt;code&gt;/static&lt;/code&gt;, and the more important point is to handle all the static theme assets, so we can just remove &lt;code&gt;static&lt;/code&gt; from the regex for now (who cares if we can&amp;rsquo;t use nginx to send Etags for OAI CSS!)&lt;/li&gt;
&lt;li&gt;Also, I noticed we aren&amp;rsquo;t setting CSP headers on the static assets, because in nginx headers are inherited in child blocks, but if you use &lt;code&gt;add_header&lt;/code&gt; in a child block it doesn&amp;rsquo;t inherit the others&lt;/li&gt;
&lt;li&gt;We simply need to add &lt;code&gt;include extra-security.conf;&lt;/code&gt; to the above location block (but research and test first)&lt;/li&gt;
&lt;li&gt;We should add WOFF assets to the list of things to set expires for:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;location ~* \.(?:ico|css|js|gif|jpe?g|png|woff)$ {
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;We should also add &lt;code&gt;aspects/Statistics&lt;/code&gt; to the location block for static assets (minus &lt;code&gt;static&lt;/code&gt; from above):&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;location ~ /(themes|aspects/ReportingSuite|aspects/Statistics) {
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Need to check &lt;code&gt;/about&lt;/code&gt; on CGSpace, as it&amp;rsquo;s blank on my local test server and we might need to add something there&lt;/li&gt;
&lt;li&gt;CGSpace has been up and down all day due to PostgreSQL idle connections (current DSpace pool is 90):&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep idle | grep -c cgspace
93
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;I looked closer at the idle connections and saw that many have been idle for hours (current time on server is &lt;code&gt;2015-11-25T20:20:42+0000&lt;/code&gt;):&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | less -S
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start |
-------+----------+-------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+---
20951 | cgspace | 10966 | 18205 | cgspace | | 127.0.0.1 | | 37731 | 2015-11-25 13:13:02.837624+00 | | 20
20951 | cgspace | 10967 | 18205 | cgspace | | 127.0.0.1 | | 37737 | 2015-11-25 13:13:03.069421+00 | | 20
...
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;There is a relevant Jira issue about this: &lt;a href=&#34;https://jira.duraspace.org/browse/DS-1458&#34;&gt;https://jira.duraspace.org/browse/DS-1458&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;It seems there is some sense changing DSpace&amp;rsquo;s default &lt;code&gt;db.maxidle&lt;/code&gt; from unlimited (-1) to something like 8 (Tomcat default) or 10 (Confluence default)&lt;/li&gt;
&lt;li&gt;Change &lt;code&gt;db.maxidle&lt;/code&gt; from -1 to 10, reduce &lt;code&gt;db.maxconnections&lt;/code&gt; from 90 to 50, and restart postgres and tomcat7&lt;/li&gt;
&lt;li&gt;Also redeploy DSpace Test with a clean sync of CGSpace and mirror these database settings there as well&lt;/li&gt;
&lt;li&gt;Also deploy the nginx fixes for the &lt;code&gt;try_files&lt;/code&gt; location block as well as the expires block&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;2015-11-26:3d03b850f8126f80d8144c2e17ea0ae7&#34;&gt;2015-11-26&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;CGSpace behaving much better since changing &lt;code&gt;db.maxidle&lt;/code&gt; yesterday, but still two up/down notices from monitoring this morning (better than 50!)&lt;/li&gt;
&lt;li&gt;CCAFS colleagues mentioned that the REST API is very slow, 24 seconds for one item&lt;/li&gt;
&lt;li&gt;Not as bad for me, but still unsustainable if you have to get many:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
8.415
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Monitoring e-mailed in the evening to say CGSpace was down&lt;/li&gt;
&lt;li&gt;Idle connections in PostgreSQL again:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep cgspace | grep -c idle
66
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;At the time, the current DSpace pool size was 50&amp;hellip;&lt;/li&gt;
&lt;li&gt;I reduced the pool back to the default of 30, and reduced the &lt;code&gt;db.maxidle&lt;/code&gt; settings from 10 to 8&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;2015-11-29:3d03b850f8126f80d8144c2e17ea0ae7&#34;&gt;2015-11-29&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Still more alerts that CGSpace has been up and down all day&lt;/li&gt;
&lt;li&gt;Current database settings for DSpace:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;db.maxconnections = 30
db.maxwait = 5000
db.maxidle = 8
db.statementpool = true
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;And idle connections:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep cgspace | grep -c idle
49
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Perhaps I need to start drastically increasing the connection limits—like to 300—to see if DSpace&amp;rsquo;s thirst can ever be quenched&lt;/li&gt;
&lt;li&gt;On another note, SUNScholar&amp;rsquo;s notes suggest adjusting some other postgres variables: &lt;a href=&#34;http://wiki.lib.sun.ac.za/index.php/SUNScholar/Optimisations/Database&#34;&gt;http://wiki.lib.sun.ac.za/index.php/SUNScholar/Optimisations/Database&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;This might help with REST API speed (which I mentioned above and still need to do real tests)&lt;/li&gt;
&lt;/ul&gt;
</description>
</item>
</channel>
</rss>