Add notes for 2019-05-05

This commit is contained in:
2019-05-05 16:45:12 +03:00
parent cfa5f3ddfb
commit 96d6602775
76 changed files with 10839 additions and 11300 deletions

View File

@ -59,7 +59,7 @@ This was due to newline characters in the dc.description.abstract column, which
I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using g/^$/d
Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet
"/>
<meta name="generator" content="Hugo 0.55.3" />
<meta name="generator" content="Hugo 0.55.5" />
@ -220,14 +220,13 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
<li>Had to do some quality checks and column renames before importing, as either Sisay or Abenet renamed a few columns and the metadata importer wanted to remove/add new metadata for title, abstract, etc.</li>
<li>Also I applied the HTML entities unescape transform on the abstract column in Open Refine</li>
<li>I need to get an author list from the database for only the CGIAR Library community to send to Peter</li>
<li>It turns out that I had already used this SQL query in <a href="/cgspace-notes/2017-05">May, 2017</a> to get the authors from CGIAR Library:</li>
</ul>
<li><p>It turns out that I had already used this SQL query in <a href="/cgspace-notes/2017-05">May, 2017</a> to get the authors from CGIAR Library:</p>
<pre><code>dspace#= \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/93761', '10947/1', '10947/10', '10947/11', '10947/12', '10947/13', '10947/14', '10947/15', '10947/16', '10947/17', '10947/18', '10947/19', '10947/2', '10947/20', '10947/21', '10947/22', '10947/23', '10947/24', '10947/25', '10947/2512', '10947/2515', '10947/2516', '10947/2517', '10947/2518', '10947/2519', '10947/2520', '10947/2521', '10947/2522', '10947/2523', '10947/2524', '10947/2525', '10947/2526', '10947/2527', '10947/2528', '10947/2529', '10947/2530', '10947/2531', '10947/2532', '10947/2533', '10947/2534', '10947/2535', '10947/2536', '10947/2537', '10947/2538', '10947/2539', '10947/2540', '10947/2541', '10947/2589', '10947/26', '10947/2631', '10947/27', '10947/2708', '10947/2776', '10947/2782', '10947/2784', '10947/2786', '10947/2790', '10947/28', '10947/2805', '10947/2836', '10947/2871', '10947/2878', '10947/29', '10947/2900', '10947/2919', '10947/3', '10947/30', '10947/31', '10947/32', '10947/33', '10947/34', '10947/3457', '10947/35', '10947/36', '10947/37', '10947/38', '10947/39', '10947/4', '10947/40', '10947/4052', '10947/4054', '10947/4056', '10947/4068', '10947/41', '10947/42', '10947/43', '10947/4368', '10947/44', '10947/4467', '10947/45', '10947/4508', '10947/4509', '10947/4510', '10947/4573', '10947/46', '10947/4635', '10947/4636', '10947/4637', '10947/4638', '10947/4639', '10947/4651', '10947/4657', '10947/47', '10947/48', '10947/49', '10947/5', '10947/50', '10947/51', '10947/5308', '10947/5322', '10947/5324', '10947/5326', '10947/6', '10947/7', '10947/8', '10947/9'))) group by text_value order by count desc) to /tmp/cgiar-library-authors.csv with csv;
</code></pre>
</code></pre></li>
<ul>
<li>Meeting with Peter and CGSpace team
<li><p>Meeting with Peter and CGSpace team</p>
<ul>
<li>Alan to follow up with ICARDA about depositing in CGSpace, we want ICARD and Drylands legacy content but not duplicates</li>
@ -235,8 +234,10 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
<li>Alan to follow up with Atmire about a dedicated field for ORCIDs, based on the discussion in the <a href="https://wiki.duraspace.org/display/cmtygp/DCAT+Meeting+June+2017">June, 2017 DCAT meeting</a></li>
<li>Alan to ask about how to query external services like AGROVOC in the DSpace submission form</li>
</ul></li>
<li>Follow up with Atmire on the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=510">ticket about ORCID metadata in DSpace</a></li>
<li>Follow up with Lili and Andrea about the pending CCAFS metadata and flagship updates</li>
<li><p>Follow up with Atmire on the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=510">ticket about ORCID metadata in DSpace</a></p></li>
<li><p>Follow up with Lili and Andrea about the pending CCAFS metadata and flagship updates</p></li>
</ul>
<h2 id="2017-08-11">2017-08-11</h2>
@ -254,29 +255,29 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
<ul>
<li>I sent a message to the mailing list about the duplicate content issue with <code>/rest</code> and <code>/bitstream</code> URLs</li>
<li>Looking at the logs for the REST API on <code>/rest</code>, it looks like there is someone hammering doing testing or something on it&hellip;</li>
</ul>
<li><p>Looking at the logs for the REST API on <code>/rest</code>, it looks like there is someone hammering doing testing or something on it&hellip;</p>
<pre><code># awk '{print $1}' /var/log/nginx/rest.log.1 | sort -n | uniq -c | sort -h | tail -n 5
140 66.249.66.91
404 66.249.66.90
1479 50.116.102.77
9794 45.5.184.196
85736 70.32.83.92
</code></pre>
140 66.249.66.91
404 66.249.66.90
1479 50.116.102.77
9794 45.5.184.196
85736 70.32.83.92
</code></pre></li>
<ul>
<li>The top offender is 70.32.83.92 which is actually the same IP as ccafs.cgiar.org, so I will email the Macaroni Bros to see if they can test on DSpace Test instead</li>
<li>I&rsquo;ve enabled logging of <code>/oai</code> requests on nginx as well so we can potentially determine bad actors here (also to see if anyone is actually using OAI!)</li>
<li><p>The top offender is 70.32.83.92 which is actually the same IP as ccafs.cgiar.org, so I will email the Macaroni Bros to see if they can test on DSpace Test instead</p></li>
<li><p>I&rsquo;ve enabled logging of <code>/oai</code> requests on nginx as well so we can potentially determine bad actors here (also to see if anyone is actually using OAI!)</p>
<pre><code># log oai requests
location /oai {
access_log /var/log/nginx/oai.log;
proxy_pass http://tomcat_http;
}
</code></pre></li>
</ul>
<pre><code> # log oai requests
location /oai {
access_log /var/log/nginx/oai.log;
proxy_pass http://tomcat_http;
}
</code></pre>
<h2 id="2017-08-13">2017-08-13</h2>
<ul>
@ -287,27 +288,25 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
<h2 id="2017-08-14">2017-08-14</h2>
<ul>
<li>Run author corrections on CGIAR Library community from Peter</li>
</ul>
<li><p>Run author corrections on CGIAR Library community from Peter</p>
<pre><code>$ ./fix-metadata-values.py -i /tmp/authors-fix-523.csv -f dc.contributor.author -t correct -m 3 -d dspace -u dspace -p fuuuu
</code></pre>
</code></pre></li>
<ul>
<li>There were only three deletions so I just did them manually:</li>
</ul>
<li><p>There were only three deletions so I just did them manually:</p>
<pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='C';
DELETE 1
dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='WSSD';
</code></pre>
</code></pre></li>
<ul>
<li>Generate a new list of authors from the CGIAR Library community for Peter to look through now that the initial corrections have been done</li>
<li>Thinking about resource limits for PostgreSQL again after last week&rsquo;s CGSpace crash and related to a recently discussion I had in the comments of the <a href="https://wiki.duraspace.org/display/cmtygp/DCAT+Meeting+April+2017">April, 2017 DCAT meeting notes</a></li>
<li>In that thread Chris Wilper suggests a new default of 35 max connections for <code>db.maxconnections</code> (from the current default of 30), knowing that <em>each DSpace web application</em> gets to use up to this many on its own</li>
<li>It would be good to approximate what the theoretical maximum number of connections on a busy server would be, perhaps by looking to see which apps use SQL:</li>
</ul>
<li><p>Generate a new list of authors from the CGIAR Library community for Peter to look through now that the initial corrections have been done</p></li>
<li><p>Thinking about resource limits for PostgreSQL again after last week&rsquo;s CGSpace crash and related to a recently discussion I had in the comments of the <a href="https://wiki.duraspace.org/display/cmtygp/DCAT+Meeting+April+2017">April, 2017 DCAT meeting notes</a></p></li>
<li><p>In that thread Chris Wilper suggests a new default of 35 max connections for <code>db.maxconnections</code> (from the current default of 30), knowing that <em>each DSpace web application</em> gets to use up to this many on its own</p></li>
<li><p>It would be good to approximate what the theoretical maximum number of connections on a busy server would be, perhaps by looking to see which apps use SQL:</p>
<pre><code>$ grep -rsI SQLException dspace-jspui | wc -l
473
@ -319,18 +318,25 @@ $ grep -rsI SQLException dspace-solr | wc -l
0
$ grep -rsI SQLException dspace-xmlui | wc -l
866
</code></pre>
</code></pre></li>
<ul>
<li>Of those five applications we&rsquo;re running, only <code>solr</code> appears not to use the database directly</li>
<li>And JSPUI is only used internally (so it doesn&rsquo;t really count), leaving us with OAI, REST, and XMLUI</li>
<li>Assuming each takes a theoretical maximum of 35 connections during a heavy load (35 * 3 = 105), that would put the connections well above PostgreSQL&rsquo;s default max of 100 connections (remember a handful of connections are reserved for the PostgreSQL super user, see <code>superuser_reserved_connections</code>)</li>
<li>So we should adjust PostgreSQL&rsquo;s max connections to be DSpace&rsquo;s <code>db.maxconnections</code> * 3 + 3</li>
<li>This would allow each application to use up to <code>db.maxconnections</code> and not to go over the system&rsquo;s PostgreSQL limit</li>
<li>Perhaps since CGSpace is a busy site with lots of resources we could actually use something like 40 for <code>db.maxconnections</code></li>
<li>Also worth looking into is to set up a database pool using JNDI, as apparently DSpace&rsquo;s <code>db.poolname</code> hasn&rsquo;t been used since around DSpace 1.7 (according to Chris Wilper&rsquo;s comments in the thread)</li>
<li>Need to go check the PostgreSQL connection stats in Munin on CGSpace from the past week to get an idea if 40 is appropriate</li>
<li>Looks like connections hover around 50:</li>
<li><p>Of those five applications we&rsquo;re running, only <code>solr</code> appears not to use the database directly</p></li>
<li><p>And JSPUI is only used internally (so it doesn&rsquo;t really count), leaving us with OAI, REST, and XMLUI</p></li>
<li><p>Assuming each takes a theoretical maximum of 35 connections during a heavy load (35 * 3 = 105), that would put the connections well above PostgreSQL&rsquo;s default max of 100 connections (remember a handful of connections are reserved for the PostgreSQL super user, see <code>superuser_reserved_connections</code>)</p></li>
<li><p>So we should adjust PostgreSQL&rsquo;s max connections to be DSpace&rsquo;s <code>db.maxconnections</code> * 3 + 3</p></li>
<li><p>This would allow each application to use up to <code>db.maxconnections</code> and not to go over the system&rsquo;s PostgreSQL limit</p></li>
<li><p>Perhaps since CGSpace is a busy site with lots of resources we could actually use something like 40 for <code>db.maxconnections</code></p></li>
<li><p>Also worth looking into is to set up a database pool using JNDI, as apparently DSpace&rsquo;s <code>db.poolname</code> hasn&rsquo;t been used since around DSpace 1.7 (according to Chris Wilper&rsquo;s comments in the thread)</p></li>
<li><p>Need to go check the PostgreSQL connection stats in Munin on CGSpace from the past week to get an idea if 40 is appropriate</p></li>
<li><p>Looks like connections hover around 50:</p></li>
</ul>
<p><img src="/cgspace-notes/2017/08/postgresql-connections-cgspace.png" alt="PostgreSQL connections 2017-08" /></p>
@ -356,67 +362,61 @@ $ grep -rsI SQLException dspace-xmlui | wc -l
<h2 id="2017-08-16">2017-08-16</h2>
<ul>
<li>I wanted to merge the various field variations like <code>cg.subject.system</code> and <code>cg.subject.system[en_US]</code> in OpenRefine but I realized it would be easier in PostgreSQL:</li>
</ul>
<li><p>I wanted to merge the various field variations like <code>cg.subject.system</code> and <code>cg.subject.system[en_US]</code> in OpenRefine but I realized it would be easier in PostgreSQL:</p>
<pre><code>dspace=# select distinct text_value, text_lang from metadatavalue where resource_type_id=2 and metadata_field_id=254;
</code></pre>
</code></pre></li>
<ul>
<li>And actually, we can do it for other generic fields for items in those collections, for example <code>dc.description.abstract</code>:</li>
</ul>
<li><p>And actually, we can do it for other generic fields for items in those collections, for example <code>dc.description.abstract</code>:</p>
<pre><code>dspace=# update metadatavalue set text_lang='en_US' where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'description' and qualifier = 'abstract') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/93761', '10947/1', '10947/10', '10947/11', '10947/12', '10947/13', '10947/14', '10947/15', '10947/16', '10947/17', '10947/18', '10947/19', '10947/2', '10947/20', '10947/21', '10947/22', '10947/23', '10947/24', '10947/25', '10947/2512', '10947/2515', '10947/2516', '10947/2517', '10947/2518', '10947/2519', '10947/2520', '10947/2521', '10947/2522', '10947/2523', '10947/2524', '10947/2525', '10947/2526', '10947/2527', '10947/2528', '10947/2529', '10947/2530', '10947/2531', '10947/2532', '10947/2533', '10947/2534', '10947/2535', '10947/2536', '10947/2537', '10947/2538', '10947/2539', '10947/2540', '10947/2541', '10947/2589', '10947/26', '10947/2631', '10947/27', '10947/2708', '10947/2776', '10947/2782', '10947/2784', '10947/2786', '10947/2790', '10947/28', '10947/2805', '10947/2836', '10947/2871', '10947/2878', '10947/29', '10947/2900', '10947/2919', '10947/3', '10947/30', '10947/31', '10947/32', '10947/33', '10947/34', '10947/3457', '10947/35', '10947/36', '10947/37', '10947/38', '10947/39', '10947/4', '10947/40', '10947/4052', '10947/4054', '10947/4056', '10947/4068', '10947/41', '10947/42', '10947/43', '10947/4368', '10947/44', '10947/4467', '10947/45', '10947/4508', '10947/4509', '10947/4510', '10947/4573', '10947/46', '10947/4635', '10947/4636', '10947/4637', '10947/4638', '10947/4639', '10947/4651', '10947/4657', '10947/47', '10947/48', '10947/49', '10947/5', '10947/50', '10947/51', '10947/5308', '10947/5322', '10947/5324', '10947/5326', '10947/6', '10947/7', '10947/8', '10947/9')))
</code></pre>
</code></pre></li>
<ul>
<li>And on others like <code>dc.language.iso</code>, <code>dc.relation.ispartofseries</code>, <code>dc.type</code>, <code>dc.title</code>, etc&hellip;</li>
<li>Also, to move fields from <code>dc.identifier.url</code> to <code>cg.identifier.url[en_US]</code> (because we don&rsquo;t use the Dublin Core one for some reason):</li>
</ul>
<li><p>And on others like <code>dc.language.iso</code>, <code>dc.relation.ispartofseries</code>, <code>dc.type</code>, <code>dc.title</code>, etc&hellip;</p></li>
<li><p>Also, to move fields from <code>dc.identifier.url</code> to <code>cg.identifier.url[en_US]</code> (because we don&rsquo;t use the Dublin Core one for some reason):</p>
<pre><code>dspace=# update metadatavalue set metadata_field_id = 219, text_lang = 'en_US' where resource_type_id = 2 AND metadata_field_id = 237;
UPDATE 15
</code></pre>
</code></pre></li>
<ul>
<li>Set the text_lang of all <code>dc.identifier.uri</code> (Handle) fields to be NULL, just like default DSpace does:</li>
</ul>
<li><p>Set the text_lang of all <code>dc.identifier.uri</code> (Handle) fields to be NULL, just like default DSpace does:</p>
<pre><code>dspace=# update metadatavalue set text_lang=NULL where resource_type_id = 2 and metadata_field_id = 25 and text_value like 'http://hdl.handle.net/10947/%';
UPDATE 4248
</code></pre>
</code></pre></li>
<ul>
<li>Also update the text_lang of <code>dc.contributor.author</code> fields for metadata in these collections:</li>
</ul>
<li><p>Also update the text_lang of <code>dc.contributor.author</code> fields for metadata in these collections:</p>
<pre><code>dspace=# update metadatavalue set text_lang=NULL where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/93761', '10947/1', '10947/10', '10947/11', '10947/12', '10947/13', '10947/14', '10947/15', '10947/16', '10947/17', '10947/18', '10947/19', '10947/2', '10947/20', '10947/21', '10947/22', '10947/23', '10947/24', '10947/25', '10947/2512', '10947/2515', '10947/2516', '10947/2517', '10947/2518', '10947/2519', '10947/2520', '10947/2521', '10947/2522', '10947/2523', '10947/2524', '10947/2525', '10947/2526', '10947/2527', '10947/2528', '10947/2529', '10947/2530', '10947/2531', '10947/2532', '10947/2533', '10947/2534', '10947/2535', '10947/2536', '10947/2537', '10947/2538', '10947/2539', '10947/2540', '10947/2541', '10947/2589', '10947/26', '10947/2631', '10947/27', '10947/2708', '10947/2776', '10947/2782', '10947/2784', '10947/2786', '10947/2790', '10947/28', '10947/2805', '10947/2836', '10947/2871', '10947/2878', '10947/29', '10947/2900', '10947/2919', '10947/3', '10947/30', '10947/31', '10947/32', '10947/33', '10947/34', '10947/3457', '10947/35', '10947/36', '10947/37', '10947/38', '10947/39', '10947/4', '10947/40', '10947/4052', '10947/4054', '10947/4056', '10947/4068', '10947/41', '10947/42', '10947/43', '10947/4368', '10947/44', '10947/4467', '10947/45', '10947/4508', '10947/4509', '10947/4510', '10947/4573', '10947/46', '10947/4635', '10947/4636', '10947/4637', '10947/4638', '10947/4639', '10947/4651', '10947/4657', '10947/47', '10947/48', '10947/49', '10947/5', '10947/50', '10947/51', '10947/5308', '10947/5322', '10947/5324', '10947/5326', '10947/6', '10947/7', '10947/8', '10947/9')));
UPDATE 4899
</code></pre>
</code></pre></li>
<ul>
<li>Wow, I just wrote this baller regex facet to find duplicate authors:</li>
</ul>
<li><p>Wow, I just wrote this baller regex facet to find duplicate authors:</p>
<pre><code>isNotNull(value.match(/(CGIAR .+?)\|\|\1/))
</code></pre>
</code></pre></li>
<ul>
<li>This would be true if the authors were like <code>CGIAR System Management Office||CGIAR System Management Office</code>, which some of the CGIAR Library&rsquo;s were</li>
<li>Unfortunately when you fix these in OpenRefine and then submit the metadata to DSpace it doesn&rsquo;t detect any changes, so you have to edit them all manually via DSpace&rsquo;s &ldquo;Edit Item&rdquo;</li>
<li>Ooh! And an even more interesting regex would match <em>any</em> duplicated author:</li>
</ul>
<li><p>This would be true if the authors were like <code>CGIAR System Management Office||CGIAR System Management Office</code>, which some of the CGIAR Library&rsquo;s were</p></li>
<li><p>Unfortunately when you fix these in OpenRefine and then submit the metadata to DSpace it doesn&rsquo;t detect any changes, so you have to edit them all manually via DSpace&rsquo;s &ldquo;Edit Item&rdquo;</p></li>
<li><p>Ooh! And an even more interesting regex would match <em>any</em> duplicated author:</p>
<pre><code>isNotNull(value.match(/(.+?)\|\|\1/))
</code></pre>
</code></pre></li>
<ul>
<li>Which means it can also be used to find items with duplicate <code>dc.subject</code> fields&hellip;</li>
<li>Finally sent Peter the final dump of the CGIAR System Organization community so he can have a last look at it</li>
<li>Post a message to the dspace-tech mailing list to ask about querying the AGROVOC API from the submission form</li>
<li>Abenet was asking if there was some way to hide certain internal items from the &ldquo;ILRI Research Outputs&rdquo; RSS feed (which is the top-level ILRI community feed), because Shirley was complaining</li>
<li>I think we could use <code>harvest.includerestricted.rss = false</code> but the items might need to be 100% restricted, not just the metadata</li>
<li>Adjust Ansible postgres role to use <code>max_connections</code> from a template variable and deploy a new limit of 123 on CGSpace</li>
<li><p>Which means it can also be used to find items with duplicate <code>dc.subject</code> fields&hellip;</p></li>
<li><p>Finally sent Peter the final dump of the CGIAR System Organization community so he can have a last look at it</p></li>
<li><p>Post a message to the dspace-tech mailing list to ask about querying the AGROVOC API from the submission form</p></li>
<li><p>Abenet was asking if there was some way to hide certain internal items from the &ldquo;ILRI Research Outputs&rdquo; RSS feed (which is the top-level ILRI community feed), because Shirley was complaining</p></li>
<li><p>I think we could use <code>harvest.includerestricted.rss = false</code> but the items might need to be 100% restricted, not just the metadata</p></li>
<li><p>Adjust Ansible postgres role to use <code>max_connections</code> from a template variable and deploy a new limit of 123 on CGSpace</p></li>
</ul>
<h2 id="2017-08-17">2017-08-17</h2>
@ -424,16 +424,14 @@ UPDATE 4899
<ul>
<li>Run Peter&rsquo;s edits to the CGIAR System Organization community on DSpace Test</li>
<li>Uptime Robot said CGSpace went down for 1 minute, not sure why</li>
<li>Looking in <code>dspace.log.2017-08-17</code> I see some weird errors that might be related?</li>
</ul>
<li><p>Looking in <code>dspace.log.2017-08-17</code> I see some weird errors that might be related?</p>
<pre><code>2017-08-17 07:55:31,396 ERROR net.sf.ehcache.store.DiskStore @ cocoon-ehcacheCache: Could not read disk store element for key PK_G-aspect-cocoon://DRI/12/handle/10568/65885?pipelinehash=823411183535858997_T-Navigation-3368194896954203241. Error was invalid stream header: 00000000
java.io.StreamCorruptedException: invalid stream header: 00000000
</code></pre>
</code></pre></li>
<ul>
<li>Weird that these errors seem to have started on August 11th, the same day we had capacity issues with PostgreSQL:</li>
</ul>
<li><p>Weird that these errors seem to have started on August 11th, the same day we had capacity issues with PostgreSQL:</p>
<pre><code># grep -c &quot;ERROR net.sf.ehcache.store.DiskStore&quot; dspace.log.2017-08-*
dspace.log.2017-08-01:0
@ -453,14 +451,17 @@ dspace.log.2017-08-14:2135
dspace.log.2017-08-15:1506
dspace.log.2017-08-16:1935
dspace.log.2017-08-17:584
</code></pre>
</code></pre></li>
<ul>
<li>There are none in 2017-07 either&hellip;</li>
<li>A few posts on the dspace-tech mailing list say this is related to the Cocoon cache somehow</li>
<li>I will clear the XMLUI cache for now and see if the errors continue (though perpaps shutting down Tomcat and removing the cache is more effective somehow?)</li>
<li>We tested the option for limiting restricted items from the RSS feeds on DSpace Test</li>
<li>I created four items, and only the two with public metadata showed up in the community&rsquo;s RSS feed:
<li><p>There are none in 2017-07 either&hellip;</p></li>
<li><p>A few posts on the dspace-tech mailing list say this is related to the Cocoon cache somehow</p></li>
<li><p>I will clear the XMLUI cache for now and see if the errors continue (though perpaps shutting down Tomcat and removing the cache is more effective somehow?)</p></li>
<li><p>We tested the option for limiting restricted items from the RSS feeds on DSpace Test</p></li>
<li><p>I created four items, and only the two with public metadata showed up in the community&rsquo;s RSS feed:</p>
<ul>
<li>Public metadata, public bitstream ✓</li>
@ -468,7 +469,8 @@ dspace.log.2017-08-17:584
<li>Restricted metadata, restricted bitstream ✗</li>
<li>Private item ✗</li>
</ul></li>
<li>Peter responded and said that he doesn&rsquo;t want to limit items to be restricted just so we can change the RSS feeds</li>
<li><p>Peter responded and said that he doesn&rsquo;t want to limit items to be restricted just so we can change the RSS feeds</p></li>
</ul>
<h2 id="2017-08-18">2017-08-18</h2>
@ -479,16 +481,16 @@ dspace.log.2017-08-17:584
<li>I wired it up to the <code>dc.subject</code> field of the submission interface using the &ldquo;lookup&rdquo; type and it works!</li>
<li>I think we can use this example to get a working AGROVOC query</li>
<li>More information about authority framework: <a href="https://wiki.duraspace.org/display/DSPACE/Authority+Control+of+Metadata+Values">https://wiki.duraspace.org/display/DSPACE/Authority+Control+of+Metadata+Values</a></li>
<li>Wow, I&rsquo;m playing with the AGROVOC SPARQL endpoint using the <a href="https://github.com/tialaramex/sparql-query">sparql-query tool</a>:</li>
</ul>
<li><p>Wow, I&rsquo;m playing with the AGROVOC SPARQL endpoint using the <a href="https://github.com/tialaramex/sparql-query">sparql-query tool</a>:</p>
<pre><code>$ ./sparql-query http://202.45.139.84:10035/catalogs/fao/repositories/agrovoc
sparql$ PREFIX skos: &lt;http://www.w3.org/2004/02/skos/core#&gt;
SELECT
?label
?label
WHERE {
{ ?concept skos:altLabel ?label . } UNION { ?concept skos:prefLabel ?label . }
FILTER regex(str(?label), &quot;^fish&quot;, &quot;i&quot;) .
{ ?concept skos:altLabel ?label . } UNION { ?concept skos:prefLabel ?label . }
FILTER regex(str(?label), &quot;^fish&quot;, &quot;i&quot;) .
} LIMIT 10;
┌───────────────────────┐
@ -505,12 +507,13 @@ WHERE {
│ fishing times │
│ fish passes │
└───────────────────────┘
</code></pre>
</code></pre></li>
<ul>
<li>More examples about SPARQL syntax: <a href="https://github.com/rsinger/openlcsh/wiki/Sparql-Examples">https://github.com/rsinger/openlcsh/wiki/Sparql-Examples</a></li>
<li>I found this blog post about speeding up the Tomcat startup time: <a href="http://skybert.net/java/improve-tomcat-startup-time/">http://skybert.net/java/improve-tomcat-startup-time/</a></li>
<li>The startup time went from ~80s to 40s!</li>
<li><p>More examples about SPARQL syntax: <a href="https://github.com/rsinger/openlcsh/wiki/Sparql-Examples">https://github.com/rsinger/openlcsh/wiki/Sparql-Examples</a></p></li>
<li><p>I found this blog post about speeding up the Tomcat startup time: <a href="http://skybert.net/java/improve-tomcat-startup-time/">http://skybert.net/java/improve-tomcat-startup-time/</a></p></li>
<li><p>The startup time went from ~80s to 40s!</p></li>
</ul>
<h2 id="2017-08-19">2017-08-19</h2>
@ -526,35 +529,35 @@ WHERE {
<ul>
<li>Since I cleared the XMLUI cache on 2017-08-17 there haven&rsquo;t been any more <code>ERROR net.sf.ehcache.store.DiskStore</code> errors</li>
<li>Look at the CGIAR Library to see if I can find the items that have been submitted since May:</li>
</ul>
<li><p>Look at the CGIAR Library to see if I can find the items that have been submitted since May:</p>
<pre><code>dspace=# select * from metadatavalue where metadata_field_id=11 and date(text_value) &gt; '2017-05-01T00:00:00Z';
metadata_value_id | item_id | metadata_field_id | text_value | text_lang | place | authority | confidence
metadata_value_id | item_id | metadata_field_id | text_value | text_lang | place | authority | confidence
-------------------+---------+-------------------+----------------------+-----------+-------+-----------+------------
123117 | 5872 | 11 | 2017-06-28T13:05:18Z | | 1 | | -1
123042 | 5869 | 11 | 2017-05-15T03:29:23Z | | 1 | | -1
123056 | 5870 | 11 | 2017-05-22T11:27:15Z | | 1 | | -1
123072 | 5871 | 11 | 2017-06-06T07:46:01Z | | 1 | | -1
123171 | 5874 | 11 | 2017-08-04T07:51:20Z | | 1 | | -1
123117 | 5872 | 11 | 2017-06-28T13:05:18Z | | 1 | | -1
123042 | 5869 | 11 | 2017-05-15T03:29:23Z | | 1 | | -1
123056 | 5870 | 11 | 2017-05-22T11:27:15Z | | 1 | | -1
123072 | 5871 | 11 | 2017-06-06T07:46:01Z | | 1 | | -1
123171 | 5874 | 11 | 2017-08-04T07:51:20Z | | 1 | | -1
(5 rows)
</code></pre>
</code></pre></li>
<ul>
<li>According to <code>dc.date.accessioned</code> (metadata field id 11) there have only been five items submitted since May</li>
<li>These are their handles:</li>
</ul>
<li><p>According to <code>dc.date.accessioned</code> (metadata field id 11) there have only been five items submitted since May</p></li>
<li><p>These are their handles:</p>
<pre><code>dspace=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id in (select item_id from metadatavalue where metadata_field_id=11 and date(text_value) &gt; '2017-05-01T00:00:00Z');
handle
handle
------------
10947/4658
10947/4659
10947/4660
10947/4661
10947/4664
10947/4658
10947/4659
10947/4660
10947/4661
10947/4664
(5 rows)
</code></pre>
</code></pre></li>
</ul>
<h2 id="2017-08-23">2017-08-23</h2>
@ -575,17 +578,18 @@ WHERE {
<li>I notice that in many WLE collections Marianne Gadeberg is in the edit or approval steps, but she is also in the groups for those steps.</li>
<li>I think we need to have a process to go back and check / fix some of these scenarios—to remove her user from the step and instead add her to the group—because we have way too many authorizations and in late 2016 we had <a href="https://github.com/ilri/rmg-ansible-public/commit/358b5ea43f9e5820986f897c9d560937c702ac6e">performance issues with Solr</a> because of this</li>
<li>I asked Sisay about this and hinted that he should go back and fix these things, but let&rsquo;s see what he says</li>
<li>Saw CGSpace go down briefly today and noticed SQL connection pool errors in the dspace log file:</li>
</ul>
<li><p>Saw CGSpace go down briefly today and noticed SQL connection pool errors in the dspace log file:</p>
<pre><code>ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error
org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
</code></pre>
</code></pre></li>
<ul>
<li>Looking at the logs I see we have been having hundreds or thousands of these errors a few times per week in 2017-07 and almost every day in 2017-08</li>
<li>It seems that I changed the <code>db.maxconnections</code> setting from 70 to 40 around 2017-08-14, but Macaroni Bros also reduced their hourly hammering of the REST API then</li>
<li>Nevertheless, it seems like a connection limit is not enough and that I should increase it (as well as the system&rsquo;s PostgreSQL <code>max_connections</code>)</li>
<li><p>Looking at the logs I see we have been having hundreds or thousands of these errors a few times per week in 2017-07 and almost every day in 2017-08</p></li>
<li><p>It seems that I changed the <code>db.maxconnections</code> setting from 70 to 40 around 2017-08-14, but Macaroni Bros also reduced their hourly hammering of the REST API then</p></li>
<li><p>Nevertheless, it seems like a connection limit is not enough and that I should increase it (as well as the system&rsquo;s PostgreSQL <code>max_connections</code>)</p></li>
</ul>