mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2019-05-05
This commit is contained in:
@ -59,7 +59,7 @@ This was due to newline characters in the dc.description.abstract column, which
|
||||
I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using g/^$/d
|
||||
Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -220,14 +220,13 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
|
||||
<li>Had to do some quality checks and column renames before importing, as either Sisay or Abenet renamed a few columns and the metadata importer wanted to remove/add new metadata for title, abstract, etc.</li>
|
||||
<li>Also I applied the HTML entities unescape transform on the abstract column in Open Refine</li>
|
||||
<li>I need to get an author list from the database for only the CGIAR Library community to send to Peter</li>
|
||||
<li>It turns out that I had already used this SQL query in <a href="/cgspace-notes/2017-05">May, 2017</a> to get the authors from CGIAR Library:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>It turns out that I had already used this SQL query in <a href="/cgspace-notes/2017-05">May, 2017</a> to get the authors from CGIAR Library:</p>
|
||||
|
||||
<pre><code>dspace#= \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/93761', '10947/1', '10947/10', '10947/11', '10947/12', '10947/13', '10947/14', '10947/15', '10947/16', '10947/17', '10947/18', '10947/19', '10947/2', '10947/20', '10947/21', '10947/22', '10947/23', '10947/24', '10947/25', '10947/2512', '10947/2515', '10947/2516', '10947/2517', '10947/2518', '10947/2519', '10947/2520', '10947/2521', '10947/2522', '10947/2523', '10947/2524', '10947/2525', '10947/2526', '10947/2527', '10947/2528', '10947/2529', '10947/2530', '10947/2531', '10947/2532', '10947/2533', '10947/2534', '10947/2535', '10947/2536', '10947/2537', '10947/2538', '10947/2539', '10947/2540', '10947/2541', '10947/2589', '10947/26', '10947/2631', '10947/27', '10947/2708', '10947/2776', '10947/2782', '10947/2784', '10947/2786', '10947/2790', '10947/28', '10947/2805', '10947/2836', '10947/2871', '10947/2878', '10947/29', '10947/2900', '10947/2919', '10947/3', '10947/30', '10947/31', '10947/32', '10947/33', '10947/34', '10947/3457', '10947/35', '10947/36', '10947/37', '10947/38', '10947/39', '10947/4', '10947/40', '10947/4052', '10947/4054', '10947/4056', '10947/4068', '10947/41', '10947/42', '10947/43', '10947/4368', '10947/44', '10947/4467', '10947/45', '10947/4508', '10947/4509', '10947/4510', '10947/4573', '10947/46', '10947/4635', '10947/4636', '10947/4637', '10947/4638', '10947/4639', '10947/4651', '10947/4657', '10947/47', '10947/48', '10947/49', '10947/5', '10947/50', '10947/51', '10947/5308', '10947/5322', '10947/5324', '10947/5326', '10947/6', '10947/7', '10947/8', '10947/9'))) group by text_value order by count desc) to /tmp/cgiar-library-authors.csv with csv;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Meeting with Peter and CGSpace team
|
||||
<li><p>Meeting with Peter and CGSpace team</p>
|
||||
|
||||
<ul>
|
||||
<li>Alan to follow up with ICARDA about depositing in CGSpace, we want ICARD and Drylands legacy content but not duplicates</li>
|
||||
@ -235,8 +234,10 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
|
||||
<li>Alan to follow up with Atmire about a dedicated field for ORCIDs, based on the discussion in the <a href="https://wiki.duraspace.org/display/cmtygp/DCAT+Meeting+June+2017">June, 2017 DCAT meeting</a></li>
|
||||
<li>Alan to ask about how to query external services like AGROVOC in the DSpace submission form</li>
|
||||
</ul></li>
|
||||
<li>Follow up with Atmire on the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=510">ticket about ORCID metadata in DSpace</a></li>
|
||||
<li>Follow up with Lili and Andrea about the pending CCAFS metadata and flagship updates</li>
|
||||
|
||||
<li><p>Follow up with Atmire on the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=510">ticket about ORCID metadata in DSpace</a></p></li>
|
||||
|
||||
<li><p>Follow up with Lili and Andrea about the pending CCAFS metadata and flagship updates</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-08-11">2017-08-11</h2>
|
||||
@ -254,29 +255,29 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
|
||||
|
||||
<ul>
|
||||
<li>I sent a message to the mailing list about the duplicate content issue with <code>/rest</code> and <code>/bitstream</code> URLs</li>
|
||||
<li>Looking at the logs for the REST API on <code>/rest</code>, it looks like there is someone hammering doing testing or something on it…</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Looking at the logs for the REST API on <code>/rest</code>, it looks like there is someone hammering doing testing or something on it…</p>
|
||||
|
||||
<pre><code># awk '{print $1}' /var/log/nginx/rest.log.1 | sort -n | uniq -c | sort -h | tail -n 5
|
||||
140 66.249.66.91
|
||||
404 66.249.66.90
|
||||
1479 50.116.102.77
|
||||
9794 45.5.184.196
|
||||
85736 70.32.83.92
|
||||
</code></pre>
|
||||
140 66.249.66.91
|
||||
404 66.249.66.90
|
||||
1479 50.116.102.77
|
||||
9794 45.5.184.196
|
||||
85736 70.32.83.92
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>The top offender is 70.32.83.92 which is actually the same IP as ccafs.cgiar.org, so I will email the Macaroni Bros to see if they can test on DSpace Test instead</li>
|
||||
<li>I’ve enabled logging of <code>/oai</code> requests on nginx as well so we can potentially determine bad actors here (also to see if anyone is actually using OAI!)</li>
|
||||
<li><p>The top offender is 70.32.83.92 which is actually the same IP as ccafs.cgiar.org, so I will email the Macaroni Bros to see if they can test on DSpace Test instead</p></li>
|
||||
|
||||
<li><p>I’ve enabled logging of <code>/oai</code> requests on nginx as well so we can potentially determine bad actors here (also to see if anyone is actually using OAI!)</p>
|
||||
|
||||
<pre><code># log oai requests
|
||||
location /oai {
|
||||
access_log /var/log/nginx/oai.log;
|
||||
proxy_pass http://tomcat_http;
|
||||
}
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<pre><code> # log oai requests
|
||||
location /oai {
|
||||
access_log /var/log/nginx/oai.log;
|
||||
proxy_pass http://tomcat_http;
|
||||
}
|
||||
</code></pre>
|
||||
|
||||
<h2 id="2017-08-13">2017-08-13</h2>
|
||||
|
||||
<ul>
|
||||
@ -287,27 +288,25 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
|
||||
<h2 id="2017-08-14">2017-08-14</h2>
|
||||
|
||||
<ul>
|
||||
<li>Run author corrections on CGIAR Library community from Peter</li>
|
||||
</ul>
|
||||
<li><p>Run author corrections on CGIAR Library community from Peter</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/authors-fix-523.csv -f dc.contributor.author -t correct -m 3 -d dspace -u dspace -p fuuuu
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>There were only three deletions so I just did them manually:</li>
|
||||
</ul>
|
||||
<li><p>There were only three deletions so I just did them manually:</p>
|
||||
|
||||
<pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='C';
|
||||
DELETE 1
|
||||
dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='WSSD';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Generate a new list of authors from the CGIAR Library community for Peter to look through now that the initial corrections have been done</li>
|
||||
<li>Thinking about resource limits for PostgreSQL again after last week’s CGSpace crash and related to a recently discussion I had in the comments of the <a href="https://wiki.duraspace.org/display/cmtygp/DCAT+Meeting+April+2017">April, 2017 DCAT meeting notes</a></li>
|
||||
<li>In that thread Chris Wilper suggests a new default of 35 max connections for <code>db.maxconnections</code> (from the current default of 30), knowing that <em>each DSpace web application</em> gets to use up to this many on its own</li>
|
||||
<li>It would be good to approximate what the theoretical maximum number of connections on a busy server would be, perhaps by looking to see which apps use SQL:</li>
|
||||
</ul>
|
||||
<li><p>Generate a new list of authors from the CGIAR Library community for Peter to look through now that the initial corrections have been done</p></li>
|
||||
|
||||
<li><p>Thinking about resource limits for PostgreSQL again after last week’s CGSpace crash and related to a recently discussion I had in the comments of the <a href="https://wiki.duraspace.org/display/cmtygp/DCAT+Meeting+April+2017">April, 2017 DCAT meeting notes</a></p></li>
|
||||
|
||||
<li><p>In that thread Chris Wilper suggests a new default of 35 max connections for <code>db.maxconnections</code> (from the current default of 30), knowing that <em>each DSpace web application</em> gets to use up to this many on its own</p></li>
|
||||
|
||||
<li><p>It would be good to approximate what the theoretical maximum number of connections on a busy server would be, perhaps by looking to see which apps use SQL:</p>
|
||||
|
||||
<pre><code>$ grep -rsI SQLException dspace-jspui | wc -l
|
||||
473
|
||||
@ -319,18 +318,25 @@ $ grep -rsI SQLException dspace-solr | wc -l
|
||||
0
|
||||
$ grep -rsI SQLException dspace-xmlui | wc -l
|
||||
866
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Of those five applications we’re running, only <code>solr</code> appears not to use the database directly</li>
|
||||
<li>And JSPUI is only used internally (so it doesn’t really count), leaving us with OAI, REST, and XMLUI</li>
|
||||
<li>Assuming each takes a theoretical maximum of 35 connections during a heavy load (35 * 3 = 105), that would put the connections well above PostgreSQL’s default max of 100 connections (remember a handful of connections are reserved for the PostgreSQL super user, see <code>superuser_reserved_connections</code>)</li>
|
||||
<li>So we should adjust PostgreSQL’s max connections to be DSpace’s <code>db.maxconnections</code> * 3 + 3</li>
|
||||
<li>This would allow each application to use up to <code>db.maxconnections</code> and not to go over the system’s PostgreSQL limit</li>
|
||||
<li>Perhaps since CGSpace is a busy site with lots of resources we could actually use something like 40 for <code>db.maxconnections</code></li>
|
||||
<li>Also worth looking into is to set up a database pool using JNDI, as apparently DSpace’s <code>db.poolname</code> hasn’t been used since around DSpace 1.7 (according to Chris Wilper’s comments in the thread)</li>
|
||||
<li>Need to go check the PostgreSQL connection stats in Munin on CGSpace from the past week to get an idea if 40 is appropriate</li>
|
||||
<li>Looks like connections hover around 50:</li>
|
||||
<li><p>Of those five applications we’re running, only <code>solr</code> appears not to use the database directly</p></li>
|
||||
|
||||
<li><p>And JSPUI is only used internally (so it doesn’t really count), leaving us with OAI, REST, and XMLUI</p></li>
|
||||
|
||||
<li><p>Assuming each takes a theoretical maximum of 35 connections during a heavy load (35 * 3 = 105), that would put the connections well above PostgreSQL’s default max of 100 connections (remember a handful of connections are reserved for the PostgreSQL super user, see <code>superuser_reserved_connections</code>)</p></li>
|
||||
|
||||
<li><p>So we should adjust PostgreSQL’s max connections to be DSpace’s <code>db.maxconnections</code> * 3 + 3</p></li>
|
||||
|
||||
<li><p>This would allow each application to use up to <code>db.maxconnections</code> and not to go over the system’s PostgreSQL limit</p></li>
|
||||
|
||||
<li><p>Perhaps since CGSpace is a busy site with lots of resources we could actually use something like 40 for <code>db.maxconnections</code></p></li>
|
||||
|
||||
<li><p>Also worth looking into is to set up a database pool using JNDI, as apparently DSpace’s <code>db.poolname</code> hasn’t been used since around DSpace 1.7 (according to Chris Wilper’s comments in the thread)</p></li>
|
||||
|
||||
<li><p>Need to go check the PostgreSQL connection stats in Munin on CGSpace from the past week to get an idea if 40 is appropriate</p></li>
|
||||
|
||||
<li><p>Looks like connections hover around 50:</p></li>
|
||||
</ul>
|
||||
|
||||
<p><img src="/cgspace-notes/2017/08/postgresql-connections-cgspace.png" alt="PostgreSQL connections 2017-08" /></p>
|
||||
@ -356,67 +362,61 @@ $ grep -rsI SQLException dspace-xmlui | wc -l
|
||||
<h2 id="2017-08-16">2017-08-16</h2>
|
||||
|
||||
<ul>
|
||||
<li>I wanted to merge the various field variations like <code>cg.subject.system</code> and <code>cg.subject.system[en_US]</code> in OpenRefine but I realized it would be easier in PostgreSQL:</li>
|
||||
</ul>
|
||||
<li><p>I wanted to merge the various field variations like <code>cg.subject.system</code> and <code>cg.subject.system[en_US]</code> in OpenRefine but I realized it would be easier in PostgreSQL:</p>
|
||||
|
||||
<pre><code>dspace=# select distinct text_value, text_lang from metadatavalue where resource_type_id=2 and metadata_field_id=254;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And actually, we can do it for other generic fields for items in those collections, for example <code>dc.description.abstract</code>:</li>
|
||||
</ul>
|
||||
<li><p>And actually, we can do it for other generic fields for items in those collections, for example <code>dc.description.abstract</code>:</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set text_lang='en_US' where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'description' and qualifier = 'abstract') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/93761', '10947/1', '10947/10', '10947/11', '10947/12', '10947/13', '10947/14', '10947/15', '10947/16', '10947/17', '10947/18', '10947/19', '10947/2', '10947/20', '10947/21', '10947/22', '10947/23', '10947/24', '10947/25', '10947/2512', '10947/2515', '10947/2516', '10947/2517', '10947/2518', '10947/2519', '10947/2520', '10947/2521', '10947/2522', '10947/2523', '10947/2524', '10947/2525', '10947/2526', '10947/2527', '10947/2528', '10947/2529', '10947/2530', '10947/2531', '10947/2532', '10947/2533', '10947/2534', '10947/2535', '10947/2536', '10947/2537', '10947/2538', '10947/2539', '10947/2540', '10947/2541', '10947/2589', '10947/26', '10947/2631', '10947/27', '10947/2708', '10947/2776', '10947/2782', '10947/2784', '10947/2786', '10947/2790', '10947/28', '10947/2805', '10947/2836', '10947/2871', '10947/2878', '10947/29', '10947/2900', '10947/2919', '10947/3', '10947/30', '10947/31', '10947/32', '10947/33', '10947/34', '10947/3457', '10947/35', '10947/36', '10947/37', '10947/38', '10947/39', '10947/4', '10947/40', '10947/4052', '10947/4054', '10947/4056', '10947/4068', '10947/41', '10947/42', '10947/43', '10947/4368', '10947/44', '10947/4467', '10947/45', '10947/4508', '10947/4509', '10947/4510', '10947/4573', '10947/46', '10947/4635', '10947/4636', '10947/4637', '10947/4638', '10947/4639', '10947/4651', '10947/4657', '10947/47', '10947/48', '10947/49', '10947/5', '10947/50', '10947/51', '10947/5308', '10947/5322', '10947/5324', '10947/5326', '10947/6', '10947/7', '10947/8', '10947/9')))
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And on others like <code>dc.language.iso</code>, <code>dc.relation.ispartofseries</code>, <code>dc.type</code>, <code>dc.title</code>, etc…</li>
|
||||
<li>Also, to move fields from <code>dc.identifier.url</code> to <code>cg.identifier.url[en_US]</code> (because we don’t use the Dublin Core one for some reason):</li>
|
||||
</ul>
|
||||
<li><p>And on others like <code>dc.language.iso</code>, <code>dc.relation.ispartofseries</code>, <code>dc.type</code>, <code>dc.title</code>, etc…</p></li>
|
||||
|
||||
<li><p>Also, to move fields from <code>dc.identifier.url</code> to <code>cg.identifier.url[en_US]</code> (because we don’t use the Dublin Core one for some reason):</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set metadata_field_id = 219, text_lang = 'en_US' where resource_type_id = 2 AND metadata_field_id = 237;
|
||||
UPDATE 15
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Set the text_lang of all <code>dc.identifier.uri</code> (Handle) fields to be NULL, just like default DSpace does:</li>
|
||||
</ul>
|
||||
<li><p>Set the text_lang of all <code>dc.identifier.uri</code> (Handle) fields to be NULL, just like default DSpace does:</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set text_lang=NULL where resource_type_id = 2 and metadata_field_id = 25 and text_value like 'http://hdl.handle.net/10947/%';
|
||||
UPDATE 4248
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Also update the text_lang of <code>dc.contributor.author</code> fields for metadata in these collections:</li>
|
||||
</ul>
|
||||
<li><p>Also update the text_lang of <code>dc.contributor.author</code> fields for metadata in these collections:</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set text_lang=NULL where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/93761', '10947/1', '10947/10', '10947/11', '10947/12', '10947/13', '10947/14', '10947/15', '10947/16', '10947/17', '10947/18', '10947/19', '10947/2', '10947/20', '10947/21', '10947/22', '10947/23', '10947/24', '10947/25', '10947/2512', '10947/2515', '10947/2516', '10947/2517', '10947/2518', '10947/2519', '10947/2520', '10947/2521', '10947/2522', '10947/2523', '10947/2524', '10947/2525', '10947/2526', '10947/2527', '10947/2528', '10947/2529', '10947/2530', '10947/2531', '10947/2532', '10947/2533', '10947/2534', '10947/2535', '10947/2536', '10947/2537', '10947/2538', '10947/2539', '10947/2540', '10947/2541', '10947/2589', '10947/26', '10947/2631', '10947/27', '10947/2708', '10947/2776', '10947/2782', '10947/2784', '10947/2786', '10947/2790', '10947/28', '10947/2805', '10947/2836', '10947/2871', '10947/2878', '10947/29', '10947/2900', '10947/2919', '10947/3', '10947/30', '10947/31', '10947/32', '10947/33', '10947/34', '10947/3457', '10947/35', '10947/36', '10947/37', '10947/38', '10947/39', '10947/4', '10947/40', '10947/4052', '10947/4054', '10947/4056', '10947/4068', '10947/41', '10947/42', '10947/43', '10947/4368', '10947/44', '10947/4467', '10947/45', '10947/4508', '10947/4509', '10947/4510', '10947/4573', '10947/46', '10947/4635', '10947/4636', '10947/4637', '10947/4638', '10947/4639', '10947/4651', '10947/4657', '10947/47', '10947/48', '10947/49', '10947/5', '10947/50', '10947/51', '10947/5308', '10947/5322', '10947/5324', '10947/5326', '10947/6', '10947/7', '10947/8', '10947/9')));
|
||||
UPDATE 4899
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Wow, I just wrote this baller regex facet to find duplicate authors:</li>
|
||||
</ul>
|
||||
<li><p>Wow, I just wrote this baller regex facet to find duplicate authors:</p>
|
||||
|
||||
<pre><code>isNotNull(value.match(/(CGIAR .+?)\|\|\1/))
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>This would be true if the authors were like <code>CGIAR System Management Office||CGIAR System Management Office</code>, which some of the CGIAR Library’s were</li>
|
||||
<li>Unfortunately when you fix these in OpenRefine and then submit the metadata to DSpace it doesn’t detect any changes, so you have to edit them all manually via DSpace’s “Edit Item”</li>
|
||||
<li>Ooh! And an even more interesting regex would match <em>any</em> duplicated author:</li>
|
||||
</ul>
|
||||
<li><p>This would be true if the authors were like <code>CGIAR System Management Office||CGIAR System Management Office</code>, which some of the CGIAR Library’s were</p></li>
|
||||
|
||||
<li><p>Unfortunately when you fix these in OpenRefine and then submit the metadata to DSpace it doesn’t detect any changes, so you have to edit them all manually via DSpace’s “Edit Item”</p></li>
|
||||
|
||||
<li><p>Ooh! And an even more interesting regex would match <em>any</em> duplicated author:</p>
|
||||
|
||||
<pre><code>isNotNull(value.match(/(.+?)\|\|\1/))
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Which means it can also be used to find items with duplicate <code>dc.subject</code> fields…</li>
|
||||
<li>Finally sent Peter the final dump of the CGIAR System Organization community so he can have a last look at it</li>
|
||||
<li>Post a message to the dspace-tech mailing list to ask about querying the AGROVOC API from the submission form</li>
|
||||
<li>Abenet was asking if there was some way to hide certain internal items from the “ILRI Research Outputs” RSS feed (which is the top-level ILRI community feed), because Shirley was complaining</li>
|
||||
<li>I think we could use <code>harvest.includerestricted.rss = false</code> but the items might need to be 100% restricted, not just the metadata</li>
|
||||
<li>Adjust Ansible postgres role to use <code>max_connections</code> from a template variable and deploy a new limit of 123 on CGSpace</li>
|
||||
<li><p>Which means it can also be used to find items with duplicate <code>dc.subject</code> fields…</p></li>
|
||||
|
||||
<li><p>Finally sent Peter the final dump of the CGIAR System Organization community so he can have a last look at it</p></li>
|
||||
|
||||
<li><p>Post a message to the dspace-tech mailing list to ask about querying the AGROVOC API from the submission form</p></li>
|
||||
|
||||
<li><p>Abenet was asking if there was some way to hide certain internal items from the “ILRI Research Outputs” RSS feed (which is the top-level ILRI community feed), because Shirley was complaining</p></li>
|
||||
|
||||
<li><p>I think we could use <code>harvest.includerestricted.rss = false</code> but the items might need to be 100% restricted, not just the metadata</p></li>
|
||||
|
||||
<li><p>Adjust Ansible postgres role to use <code>max_connections</code> from a template variable and deploy a new limit of 123 on CGSpace</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-08-17">2017-08-17</h2>
|
||||
@ -424,16 +424,14 @@ UPDATE 4899
|
||||
<ul>
|
||||
<li>Run Peter’s edits to the CGIAR System Organization community on DSpace Test</li>
|
||||
<li>Uptime Robot said CGSpace went down for 1 minute, not sure why</li>
|
||||
<li>Looking in <code>dspace.log.2017-08-17</code> I see some weird errors that might be related?</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Looking in <code>dspace.log.2017-08-17</code> I see some weird errors that might be related?</p>
|
||||
|
||||
<pre><code>2017-08-17 07:55:31,396 ERROR net.sf.ehcache.store.DiskStore @ cocoon-ehcacheCache: Could not read disk store element for key PK_G-aspect-cocoon://DRI/12/handle/10568/65885?pipelinehash=823411183535858997_T-Navigation-3368194896954203241. Error was invalid stream header: 00000000
|
||||
java.io.StreamCorruptedException: invalid stream header: 00000000
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Weird that these errors seem to have started on August 11th, the same day we had capacity issues with PostgreSQL:</li>
|
||||
</ul>
|
||||
<li><p>Weird that these errors seem to have started on August 11th, the same day we had capacity issues with PostgreSQL:</p>
|
||||
|
||||
<pre><code># grep -c "ERROR net.sf.ehcache.store.DiskStore" dspace.log.2017-08-*
|
||||
dspace.log.2017-08-01:0
|
||||
@ -453,14 +451,17 @@ dspace.log.2017-08-14:2135
|
||||
dspace.log.2017-08-15:1506
|
||||
dspace.log.2017-08-16:1935
|
||||
dspace.log.2017-08-17:584
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>There are none in 2017-07 either…</li>
|
||||
<li>A few posts on the dspace-tech mailing list say this is related to the Cocoon cache somehow</li>
|
||||
<li>I will clear the XMLUI cache for now and see if the errors continue (though perpaps shutting down Tomcat and removing the cache is more effective somehow?)</li>
|
||||
<li>We tested the option for limiting restricted items from the RSS feeds on DSpace Test</li>
|
||||
<li>I created four items, and only the two with public metadata showed up in the community’s RSS feed:
|
||||
<li><p>There are none in 2017-07 either…</p></li>
|
||||
|
||||
<li><p>A few posts on the dspace-tech mailing list say this is related to the Cocoon cache somehow</p></li>
|
||||
|
||||
<li><p>I will clear the XMLUI cache for now and see if the errors continue (though perpaps shutting down Tomcat and removing the cache is more effective somehow?)</p></li>
|
||||
|
||||
<li><p>We tested the option for limiting restricted items from the RSS feeds on DSpace Test</p></li>
|
||||
|
||||
<li><p>I created four items, and only the two with public metadata showed up in the community’s RSS feed:</p>
|
||||
|
||||
<ul>
|
||||
<li>Public metadata, public bitstream ✓</li>
|
||||
@ -468,7 +469,8 @@ dspace.log.2017-08-17:584
|
||||
<li>Restricted metadata, restricted bitstream ✗</li>
|
||||
<li>Private item ✗</li>
|
||||
</ul></li>
|
||||
<li>Peter responded and said that he doesn’t want to limit items to be restricted just so we can change the RSS feeds</li>
|
||||
|
||||
<li><p>Peter responded and said that he doesn’t want to limit items to be restricted just so we can change the RSS feeds</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-08-18">2017-08-18</h2>
|
||||
@ -479,16 +481,16 @@ dspace.log.2017-08-17:584
|
||||
<li>I wired it up to the <code>dc.subject</code> field of the submission interface using the “lookup” type and it works!</li>
|
||||
<li>I think we can use this example to get a working AGROVOC query</li>
|
||||
<li>More information about authority framework: <a href="https://wiki.duraspace.org/display/DSPACE/Authority+Control+of+Metadata+Values">https://wiki.duraspace.org/display/DSPACE/Authority+Control+of+Metadata+Values</a></li>
|
||||
<li>Wow, I’m playing with the AGROVOC SPARQL endpoint using the <a href="https://github.com/tialaramex/sparql-query">sparql-query tool</a>:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Wow, I’m playing with the AGROVOC SPARQL endpoint using the <a href="https://github.com/tialaramex/sparql-query">sparql-query tool</a>:</p>
|
||||
|
||||
<pre><code>$ ./sparql-query http://202.45.139.84:10035/catalogs/fao/repositories/agrovoc
|
||||
sparql$ PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
|
||||
SELECT
|
||||
?label
|
||||
?label
|
||||
WHERE {
|
||||
{ ?concept skos:altLabel ?label . } UNION { ?concept skos:prefLabel ?label . }
|
||||
FILTER regex(str(?label), "^fish", "i") .
|
||||
{ ?concept skos:altLabel ?label . } UNION { ?concept skos:prefLabel ?label . }
|
||||
FILTER regex(str(?label), "^fish", "i") .
|
||||
} LIMIT 10;
|
||||
|
||||
┌───────────────────────┐
|
||||
@ -505,12 +507,13 @@ WHERE {
|
||||
│ fishing times │
|
||||
│ fish passes │
|
||||
└───────────────────────┘
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>More examples about SPARQL syntax: <a href="https://github.com/rsinger/openlcsh/wiki/Sparql-Examples">https://github.com/rsinger/openlcsh/wiki/Sparql-Examples</a></li>
|
||||
<li>I found this blog post about speeding up the Tomcat startup time: <a href="http://skybert.net/java/improve-tomcat-startup-time/">http://skybert.net/java/improve-tomcat-startup-time/</a></li>
|
||||
<li>The startup time went from ~80s to 40s!</li>
|
||||
<li><p>More examples about SPARQL syntax: <a href="https://github.com/rsinger/openlcsh/wiki/Sparql-Examples">https://github.com/rsinger/openlcsh/wiki/Sparql-Examples</a></p></li>
|
||||
|
||||
<li><p>I found this blog post about speeding up the Tomcat startup time: <a href="http://skybert.net/java/improve-tomcat-startup-time/">http://skybert.net/java/improve-tomcat-startup-time/</a></p></li>
|
||||
|
||||
<li><p>The startup time went from ~80s to 40s!</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-08-19">2017-08-19</h2>
|
||||
@ -526,35 +529,35 @@ WHERE {
|
||||
|
||||
<ul>
|
||||
<li>Since I cleared the XMLUI cache on 2017-08-17 there haven’t been any more <code>ERROR net.sf.ehcache.store.DiskStore</code> errors</li>
|
||||
<li>Look at the CGIAR Library to see if I can find the items that have been submitted since May:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Look at the CGIAR Library to see if I can find the items that have been submitted since May:</p>
|
||||
|
||||
<pre><code>dspace=# select * from metadatavalue where metadata_field_id=11 and date(text_value) > '2017-05-01T00:00:00Z';
|
||||
metadata_value_id | item_id | metadata_field_id | text_value | text_lang | place | authority | confidence
|
||||
metadata_value_id | item_id | metadata_field_id | text_value | text_lang | place | authority | confidence
|
||||
-------------------+---------+-------------------+----------------------+-----------+-------+-----------+------------
|
||||
123117 | 5872 | 11 | 2017-06-28T13:05:18Z | | 1 | | -1
|
||||
123042 | 5869 | 11 | 2017-05-15T03:29:23Z | | 1 | | -1
|
||||
123056 | 5870 | 11 | 2017-05-22T11:27:15Z | | 1 | | -1
|
||||
123072 | 5871 | 11 | 2017-06-06T07:46:01Z | | 1 | | -1
|
||||
123171 | 5874 | 11 | 2017-08-04T07:51:20Z | | 1 | | -1
|
||||
123117 | 5872 | 11 | 2017-06-28T13:05:18Z | | 1 | | -1
|
||||
123042 | 5869 | 11 | 2017-05-15T03:29:23Z | | 1 | | -1
|
||||
123056 | 5870 | 11 | 2017-05-22T11:27:15Z | | 1 | | -1
|
||||
123072 | 5871 | 11 | 2017-06-06T07:46:01Z | | 1 | | -1
|
||||
123171 | 5874 | 11 | 2017-08-04T07:51:20Z | | 1 | | -1
|
||||
(5 rows)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>According to <code>dc.date.accessioned</code> (metadata field id 11) there have only been five items submitted since May</li>
|
||||
<li>These are their handles:</li>
|
||||
</ul>
|
||||
<li><p>According to <code>dc.date.accessioned</code> (metadata field id 11) there have only been five items submitted since May</p></li>
|
||||
|
||||
<li><p>These are their handles:</p>
|
||||
|
||||
<pre><code>dspace=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id in (select item_id from metadatavalue where metadata_field_id=11 and date(text_value) > '2017-05-01T00:00:00Z');
|
||||
handle
|
||||
handle
|
||||
------------
|
||||
10947/4658
|
||||
10947/4659
|
||||
10947/4660
|
||||
10947/4661
|
||||
10947/4664
|
||||
10947/4658
|
||||
10947/4659
|
||||
10947/4660
|
||||
10947/4661
|
||||
10947/4664
|
||||
(5 rows)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-08-23">2017-08-23</h2>
|
||||
|
||||
@ -575,17 +578,18 @@ WHERE {
|
||||
<li>I notice that in many WLE collections Marianne Gadeberg is in the edit or approval steps, but she is also in the groups for those steps.</li>
|
||||
<li>I think we need to have a process to go back and check / fix some of these scenarios—to remove her user from the step and instead add her to the group—because we have way too many authorizations and in late 2016 we had <a href="https://github.com/ilri/rmg-ansible-public/commit/358b5ea43f9e5820986f897c9d560937c702ac6e">performance issues with Solr</a> because of this</li>
|
||||
<li>I asked Sisay about this and hinted that he should go back and fix these things, but let’s see what he says</li>
|
||||
<li>Saw CGSpace go down briefly today and noticed SQL connection pool errors in the dspace log file:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Saw CGSpace go down briefly today and noticed SQL connection pool errors in the dspace log file:</p>
|
||||
|
||||
<pre><code>ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error
|
||||
org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Looking at the logs I see we have been having hundreds or thousands of these errors a few times per week in 2017-07 and almost every day in 2017-08</li>
|
||||
<li>It seems that I changed the <code>db.maxconnections</code> setting from 70 to 40 around 2017-08-14, but Macaroni Bros also reduced their hourly hammering of the REST API then</li>
|
||||
<li>Nevertheless, it seems like a connection limit is not enough and that I should increase it (as well as the system’s PostgreSQL <code>max_connections</code>)</li>
|
||||
<li><p>Looking at the logs I see we have been having hundreds or thousands of these errors a few times per week in 2017-07 and almost every day in 2017-08</p></li>
|
||||
|
||||
<li><p>It seems that I changed the <code>db.maxconnections</code> setting from 70 to 40 around 2017-08-14, but Macaroni Bros also reduced their hourly hammering of the REST API then</p></li>
|
||||
|
||||
<li><p>Nevertheless, it seems like a connection limit is not enough and that I should increase it (as well as the system’s PostgreSQL <code>max_connections</code>)</p></li>
|
||||
</ul>
|
||||
|
||||
|
||||
|
Reference in New Issue
Block a user