Add notes for 2017-02-07

This commit is contained in:
2017-02-07 07:07:44 -08:00
parent 8ee893ad95
commit b6c03e1ab6
28 changed files with 591 additions and 599 deletions

View File

@ -13,7 +13,7 @@
<meta property="og:updated_time" content="2017-01-02T10:43:00&#43;03:00"/>
<meta property="og:updated_time" content="2017-02-07T07:04:52-08:00"/>
@ -43,7 +43,7 @@
},
"dateModified": "2017-01-02T10:43:00+03:00",
"dateModified": "2017-02-07T07:04:52-08:00",
@ -105,6 +105,39 @@
<article class="blog-post">
<header>
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2017-02/">February, 2017</a></h2>
<p class="blog-post-meta"><time datetime="2017-02-07T07:04:52-08:00">Tue Feb 07, 2017</time> by Alan Orth in
<i class="fa fa-tag" aria-hidden="true"></i>&nbsp;<a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
</p>
</header>
<h2 id="2017-02-07">2017-02-07</h2>
<ul>
<li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li>
</ul>
<pre><code>dspace=# select * from collection2item where item_id = '80278';
id | collection_id | item_id
-------+---------------+---------
92551 | 313 | 80278
92550 | 313 | 80278
90774 | 1051 | 80278
(3 rows)
dspace=# delete from collection2item where id = 92551 and item_id = 80278;
DELETE 1
</code></pre>
<p></p>
<a href='https://alanorth.github.io/cgspace-notes/2017-02/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2017-01/">January, 2017</a></h2>
@ -371,32 +404,6 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<article class="blog-post">
<header>
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2016-04/">April, 2016</a></h2>
<p class="blog-post-meta"><time datetime="2016-04-04T11:06:00&#43;03:00">Mon Apr 04, 2016</time> by Alan Orth in
<i class="fa fa-tag" aria-hidden="true"></i>&nbsp;<a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
</p>
</header>
<h2 id="2016-04-04">2016-04-04</h2>
<ul>
<li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li>
<li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li>
<li>After running DSpace for over five years I&rsquo;ve never needed to look in any other log file than dspace.log, leave alone one from last year!</li>
<li>This will save us a few gigs of backup space we&rsquo;re paying for on S3</li>
<li>Also, I noticed the <code>checker</code> log has some errors we should pay attention to:</li>
</ul>
<p></p>
<a href='https://alanorth.github.io/cgspace-notes/2016-04/'>Read more →</a>
</article>
<nav class="blog-pagination">
@ -421,6 +428,8 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<h4>Recent Posts</h4>
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2017-02/">February, 2017</a></li>
<li><a href="/cgspace-notes/2017-01/">January, 2017</a></li>
<li><a href="/cgspace-notes/2016-12/">December, 2016</a></li>
@ -429,8 +438,6 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<li><a href="/cgspace-notes/2016-10/">October, 2016</a></li>
<li><a href="/cgspace-notes/2016-09/">September, 2016</a></li>
</ol>
</section>

View File

@ -6,9 +6,35 @@
<description>Recent content in Posts on CGSpace Notes</description>
<generator>Hugo -- gohugo.io</generator>
<language>en-us</language>
<lastBuildDate>Mon, 02 Jan 2017 10:43:00 +0300</lastBuildDate>
<lastBuildDate>Tue, 07 Feb 2017 07:04:52 -0800</lastBuildDate>
<atom:link href="https://alanorth.github.io/cgspace-notes/post/index.xml" rel="self" type="application/rss+xml" />
<item>
<title>February, 2017</title>
<link>https://alanorth.github.io/cgspace-notes/2017-02/</link>
<pubDate>Tue, 07 Feb 2017 07:04:52 -0800</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-02/</guid>
<description>&lt;h2 id=&#34;2017-02-07&#34;&gt;2017-02-07&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;An item was mapped twice erroneously again, so I had to remove one of the mappings manually:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;dspace=# select * from collection2item where item_id = &#39;80278&#39;;
id | collection_id | item_id
-------+---------------+---------
92551 | 313 | 80278
92550 | 313 | 80278
90774 | 1051 | 80278
(3 rows)
dspace=# delete from collection2item where id = 92551 and item_id = 80278;
DELETE 1
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;/p&gt;</description>
</item>
<item>
<title>January, 2017</title>
<link>https://alanorth.github.io/cgspace-notes/2017-01/</link>
@ -5168,160 +5194,6 @@ $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle
&lt;ul&gt;
&lt;li&gt;Switch CGSpace log compression cron jobs from using lzop to xz—the compression isn&amp;rsquo;t as good, but it&amp;rsquo;s much faster and causes less IO/CPU load&lt;/li&gt;
&lt;li&gt;Since we figured out (and fixed) the cause of the performance issue, I reverted Google Bot&amp;rsquo;s crawl rate to the &amp;ldquo;Let Google optimize&amp;rdquo; setting&lt;/li&gt;
&lt;/ul&gt;</description>
</item>
<item>
<title>November, 2015</title>
<link>https://alanorth.github.io/cgspace-notes/2015-11/</link>
<pubDate>Mon, 23 Nov 2015 17:00:57 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2015-11/</guid>
<description>&lt;h2 id=&#34;2015-11-22&#34;&gt;2015-11-22&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;CGSpace went down&lt;/li&gt;
&lt;li&gt;Looks like DSpace exhausted its PostgreSQL connection pool&lt;/li&gt;
&lt;li&gt;Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep idle | grep -c cgspace
78
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;For now I have increased the limit from 60 to 90, run updates, and rebooted the server&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;2015-11-24&#34;&gt;2015-11-24&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;CGSpace went down again&lt;/li&gt;
&lt;li&gt;Getting emails from uptimeRobot and uptimeButler that it&amp;rsquo;s down, and Google Webmaster Tools is sending emails that there is an increase in crawl errors&lt;/li&gt;
&lt;li&gt;Looks like there are still a bunch of idle PostgreSQL connections:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep idle | grep -c cgspace
96
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;For some reason the number of idle connections is very high since we upgraded to DSpace 5&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;2015-11-25&#34;&gt;2015-11-25&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Troubleshoot the DSpace 5 OAI breakage caused by nginx routing config&lt;/li&gt;
&lt;li&gt;The OAI application requests stylesheets and javascript files with the path &lt;code&gt;/oai/static/css&lt;/code&gt;, which gets matched here:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;# static assets we can load from the file system directly with nginx
location ~ /(themes|static|aspects/ReportingSuite) {
try_files $uri @tomcat;
...
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;The document root is relative to the xmlui app, so this gets a 404—I&amp;rsquo;m not sure why it doesn&amp;rsquo;t pass to &lt;code&gt;@tomcat&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Anyways, I can&amp;rsquo;t find any URIs with path &lt;code&gt;/static&lt;/code&gt;, and the more important point is to handle all the static theme assets, so we can just remove &lt;code&gt;static&lt;/code&gt; from the regex for now (who cares if we can&amp;rsquo;t use nginx to send Etags for OAI CSS!)&lt;/li&gt;
&lt;li&gt;Also, I noticed we aren&amp;rsquo;t setting CSP headers on the static assets, because in nginx headers are inherited in child blocks, but if you use &lt;code&gt;add_header&lt;/code&gt; in a child block it doesn&amp;rsquo;t inherit the others&lt;/li&gt;
&lt;li&gt;We simply need to add &lt;code&gt;include extra-security.conf;&lt;/code&gt; to the above location block (but research and test first)&lt;/li&gt;
&lt;li&gt;We should add WOFF assets to the list of things to set expires for:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;location ~* \.(?:ico|css|js|gif|jpe?g|png|woff)$ {
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;We should also add &lt;code&gt;aspects/Statistics&lt;/code&gt; to the location block for static assets (minus &lt;code&gt;static&lt;/code&gt; from above):&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;location ~ /(themes|aspects/ReportingSuite|aspects/Statistics) {
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Need to check &lt;code&gt;/about&lt;/code&gt; on CGSpace, as it&amp;rsquo;s blank on my local test server and we might need to add something there&lt;/li&gt;
&lt;li&gt;CGSpace has been up and down all day due to PostgreSQL idle connections (current DSpace pool is 90):&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep idle | grep -c cgspace
93
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;I looked closer at the idle connections and saw that many have been idle for hours (current time on server is &lt;code&gt;2015-11-25T20:20:42+0000&lt;/code&gt;):&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | less -S
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start |
-------+----------+-------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+---
20951 | cgspace | 10966 | 18205 | cgspace | | 127.0.0.1 | | 37731 | 2015-11-25 13:13:02.837624+00 | | 20
20951 | cgspace | 10967 | 18205 | cgspace | | 127.0.0.1 | | 37737 | 2015-11-25 13:13:03.069421+00 | | 20
...
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;There is a relevant Jira issue about this: &lt;a href=&#34;https://jira.duraspace.org/browse/DS-1458&#34;&gt;https://jira.duraspace.org/browse/DS-1458&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;It seems there is some sense changing DSpace&amp;rsquo;s default &lt;code&gt;db.maxidle&lt;/code&gt; from unlimited (-1) to something like 8 (Tomcat default) or 10 (Confluence default)&lt;/li&gt;
&lt;li&gt;Change &lt;code&gt;db.maxidle&lt;/code&gt; from -1 to 10, reduce &lt;code&gt;db.maxconnections&lt;/code&gt; from 90 to 50, and restart postgres and tomcat7&lt;/li&gt;
&lt;li&gt;Also redeploy DSpace Test with a clean sync of CGSpace and mirror these database settings there as well&lt;/li&gt;
&lt;li&gt;Also deploy the nginx fixes for the &lt;code&gt;try_files&lt;/code&gt; location block as well as the expires block&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;2015-11-26&#34;&gt;2015-11-26&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;CGSpace behaving much better since changing &lt;code&gt;db.maxidle&lt;/code&gt; yesterday, but still two up/down notices from monitoring this morning (better than 50!)&lt;/li&gt;
&lt;li&gt;CCAFS colleagues mentioned that the REST API is very slow, 24 seconds for one item&lt;/li&gt;
&lt;li&gt;Not as bad for me, but still unsustainable if you have to get many:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
8.415
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Monitoring e-mailed in the evening to say CGSpace was down&lt;/li&gt;
&lt;li&gt;Idle connections in PostgreSQL again:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep cgspace | grep -c idle
66
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;At the time, the current DSpace pool size was 50&amp;hellip;&lt;/li&gt;
&lt;li&gt;I reduced the pool back to the default of 30, and reduced the &lt;code&gt;db.maxidle&lt;/code&gt; settings from 10 to 8&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;2015-11-29&#34;&gt;2015-11-29&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Still more alerts that CGSpace has been up and down all day&lt;/li&gt;
&lt;li&gt;Current database settings for DSpace:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;db.maxconnections = 30
db.maxwait = 5000
db.maxidle = 8
db.statementpool = true
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;And idle connections:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep cgspace | grep -c idle
49
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Perhaps I need to start drastically increasing the connection limits—like to 300—to see if DSpace&amp;rsquo;s thirst can ever be quenched&lt;/li&gt;
&lt;li&gt;On another note, SUNScholar&amp;rsquo;s notes suggest adjusting some other postgres variables: &lt;a href=&#34;http://wiki.lib.sun.ac.za/index.php/SUNScholar/Optimisations/Database&#34;&gt;http://wiki.lib.sun.ac.za/index.php/SUNScholar/Optimisations/Database&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;This might help with REST API speed (which I mentioned above and still need to do real tests)&lt;/li&gt;
&lt;/ul&gt;</description>
</item>

View File

@ -13,7 +13,7 @@
<meta property="og:updated_time" content="2016-03-02T16:50:00&#43;03:00"/>
<meta property="og:updated_time" content="2016-04-04T11:06:00&#43;03:00"/>
@ -43,7 +43,7 @@
},
"dateModified": "2016-03-02T16:50:00+03:00",
"dateModified": "2016-04-04T11:06:00+03:00",
@ -105,6 +105,32 @@
<article class="blog-post">
<header>
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2016-04/">April, 2016</a></h2>
<p class="blog-post-meta"><time datetime="2016-04-04T11:06:00&#43;03:00">Mon Apr 04, 2016</time> by Alan Orth in
<i class="fa fa-tag" aria-hidden="true"></i>&nbsp;<a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
</p>
</header>
<h2 id="2016-04-04">2016-04-04</h2>
<ul>
<li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li>
<li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li>
<li>After running DSpace for over five years I&rsquo;ve never needed to look in any other log file than dspace.log, leave alone one from last year!</li>
<li>This will save us a few gigs of backup space we&rsquo;re paying for on S3</li>
<li>Also, I noticed the <code>checker</code> log has some errors we should pay attention to:</li>
</ul>
<p></p>
<a href='https://alanorth.github.io/cgspace-notes/2016-04/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2016-03/">March, 2016</a></h2>
@ -265,6 +291,8 @@
<h4>Recent Posts</h4>
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2017-02/">February, 2017</a></li>
<li><a href="/cgspace-notes/2017-01/">January, 2017</a></li>
<li><a href="/cgspace-notes/2016-12/">December, 2016</a></li>
@ -273,8 +301,6 @@
<li><a href="/cgspace-notes/2016-10/">October, 2016</a></li>
<li><a href="/cgspace-notes/2016-09/">September, 2016</a></li>
</ol>
</section>