Add notes for 2020-01-27

This commit is contained in:
2020-01-27 16:20:44 +02:00
parent 207ace0883
commit 8feb93be39
112 changed files with 11466 additions and 5158 deletions

View File

@ -10,8 +10,8 @@
Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit
We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc
After running DSpace for over five years I've never needed to look in any other log file than dspace.log, leave alone one from last year!
This will save us a few gigs of backup space we're paying for on S3
After running DSpace for over five years I’ve never needed to look in any other log file than dspace.log, leave alone one from last year!
This will save us a few gigs of backup space we’re paying for on S3
Also, I noticed the checker log has some errors we should pay attention to:
" />
<meta property="og:type" content="article" />
@ -25,11 +25,11 @@ Also, I noticed the checker log has some errors we should pay attention to:
Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit
We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc
After running DSpace for over five years I&#39;ve never needed to look in any other log file than dspace.log, leave alone one from last year!
This will save us a few gigs of backup space we&#39;re paying for on S3
After running DSpace for over five years I&rsquo;ve never needed to look in any other log file than dspace.log, leave alone one from last year!
This will save us a few gigs of backup space we&rsquo;re paying for on S3
Also, I noticed the checker log has some errors we should pay attention to:
"/>
<meta name="generator" content="Hugo 0.62.2" />
<meta name="generator" content="Hugo 0.63.1" />
@ -59,7 +59,7 @@ Also, I noticed the checker log has some errors we should pay attention to:
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.a20c1a4367639632cdb341d23c27ca44fedcc75b0f8b3cbea6203010da153d3c.css" rel="stylesheet" integrity="sha256-ogwaQ2djljLNs0HSPCfKRP7cx1sPizy&#43;piAwENoVPTw=" crossorigin="anonymous">
<link href="https://alanorth.github.io/cgspace-notes/css/style.23e2c3298bcc8c1136c19aba330c211ec94c36f7c4454ea15cf4d3548370042a.css" rel="stylesheet" integrity="sha256-I&#43;LDKYvMjBE2wZq6MwwhHslMNvfERU6hXPTTVINwBCo=" crossorigin="anonymous">
<!-- RSS 2.0 feed -->
@ -107,7 +107,7 @@ Also, I noticed the checker log has some errors we should pay attention to:
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2016-04/">April, 2016</a></h2>
<p class="blog-post-meta"><time datetime="2016-04-04T11:06:00&#43;03:00">Mon Apr 04, 2016</time> by Alan Orth in
<i class="fa fa-tag" aria-hidden="true"></i>&nbsp;<a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
</p>
</header>
@ -115,8 +115,8 @@ Also, I noticed the checker log has some errors we should pay attention to:
<ul>
<li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li>
<li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li>
<li>After running DSpace for over five years I've never needed to look in any other log file than dspace.log, leave alone one from last year!</li>
<li>This will save us a few gigs of backup space we're paying for on S3</li>
<li>After running DSpace for over five years I&rsquo;ve never needed to look in any other log file than dspace.log, leave alone one from last year!</li>
<li>This will save us a few gigs of backup space we&rsquo;re paying for on S3</li>
<li>Also, I noticed the <code>checker</code> log has some errors we should pay attention to:</li>
</ul>
<pre><code>Run start time: 03/06/2016 04:00:22
@ -143,13 +143,13 @@ java.io.FileNotFoundException: /home/cgspace.cgiar.org/assetstore/64/29/06/64290
******************************************************
</code></pre><ul>
<li>So this would be the <code>tomcat7</code> Unix user, who seems to have a default limit of 1024 files in its shell</li>
<li>For what it's worth, we have been setting the actual Tomcat 7 process&rsquo; limit to 16384 for a few years (in <code>/etc/default/tomcat7</code>)</li>
<li>For what it&rsquo;s worth, we have been setting the actual Tomcat 7 process&rsquo; limit to 16384 for a few years (in <code>/etc/default/tomcat7</code>)</li>
<li>Looks like cron will read limits from <code>/etc/security/limits.*</code> so we can do something for the tomcat7 user there</li>
<li>Submit pull request for Tomcat 7 limits in Ansible dspace role (<a href="https://github.com/ilri/rmg-ansible-public/pull/30">#30</a>)</li>
</ul>
<h2 id="2016-04-05">2016-04-05</h2>
<ul>
<li>Reduce Amazon S3 storage used for logs from 46 GB to 6GB by deleting a bunch of logs we don't need!</li>
<li>Reduce Amazon S3 storage used for logs from 46 GB to 6GB by deleting a bunch of logs we don&rsquo;t need!</li>
</ul>
<pre><code># s3cmd ls s3://cgspace.cgiar.org/log/ &gt; /tmp/s3-logs.txt
# grep checker.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
@ -184,8 +184,8 @@ UPDATE metadatavalue SET metadata_field_id=203 WHERE metadata_field_id=76
UPDATE 51258
</code></pre><h2 id="2016-04-08">2016-04-08</h2>
<ul>
<li>Discuss metadata renaming with Abenet, we decided it's better to start with the center-specific subjects like ILRI, CIFOR, CCAFS, IWMI, and CPWF</li>
<li>I've e-mailed CCAFS and CPWF people to ask them how much time it will take for them to update their systems to cope with this change</li>
<li>Discuss metadata renaming with Abenet, we decided it&rsquo;s better to start with the center-specific subjects like ILRI, CIFOR, CCAFS, IWMI, and CPWF</li>
<li>I&rsquo;ve e-mailed CCAFS and CPWF people to ask them how much time it will take for them to update their systems to cope with this change</li>
</ul>
<h2 id="2016-04-10">2016-04-10</h2>
<ul>
@ -208,7 +208,7 @@ dspacetest=# select count(*) from metadatavalue where metadata_field_id=74 and t
<h2 id="2016-04-11">2016-04-11</h2>
<ul>
<li>The donut is already updated and shows the correct number now</li>
<li>CCAFS people say it will only take them an hour to update their code for the metadata renames, so I proposed we'd do it tentatively on Monday the 18th.</li>
<li>CCAFS people say it will only take them an hour to update their code for the metadata renames, so I proposed we&rsquo;d do it tentatively on Monday the 18th.</li>
</ul>
<h2 id="2016-04-12">2016-04-12</h2>
<ul>
@ -217,7 +217,7 @@ dspacetest=# select count(*) from metadatavalue where metadata_field_id=74 and t
<pre><code>dspacetest=# select text_value, count(*) from metadatavalue where metadata_field_id=217 group by text_value order by count(*) desc;
</code></pre><ul>
<li>Listings and Reports is still not returning reliable data for <code>dc.type</code></li>
<li>I think we need to ask Atmire, as their documentation isn't too clear on the format of the filter configs</li>
<li>I think we need to ask Atmire, as their documentation isn&rsquo;t too clear on the format of the filter configs</li>
<li>Alternatively, I want to see if I move all the data from <code>dc.type.output</code> to <code>dc.type</code> and then re-index, if it behaves better</li>
<li>Looking at our <code>input-forms.xml</code> I see we have two sets of ILRI subjects, but one has a few extra subjects</li>
<li>Remove one set of ILRI subjects and remove duplicate <code>VALUE CHAINS</code> from existing list (<a href="https://github.com/ilri/DSpace/pull/216">#216</a>)</li>
@ -231,9 +231,9 @@ dspacetest=# select count(*) from metadatavalue where metadata_field_id=74 and t
<pre><code>dspacetest=# delete from metadatavalue where resource_type_id=2 and text_value='';
DELETE 226
</code></pre><ul>
<li>I deleted them on CGSpace but I'll wait to do the re-index as we're going to be doing one in a few days for the metadata changes anyways</li>
<li>I deleted them on CGSpace but I&rsquo;ll wait to do the re-index as we&rsquo;re going to be doing one in a few days for the metadata changes anyways</li>
<li>In other news, moving the <code>dc.type.output</code> to <code>dc.type</code> and re-indexing seems to have fixed the Listings and Reports issue from above</li>
<li>Unfortunately this isn't a very good solution, because Listings and Reports config should allow us to filter on <code>dc.type.*</code> but the documentation isn't very clear and I couldn't reach Atmire today</li>
<li>Unfortunately this isn&rsquo;t a very good solution, because Listings and Reports config should allow us to filter on <code>dc.type.*</code> but the documentation isn&rsquo;t very clear and I couldn&rsquo;t reach Atmire today</li>
<li>We want to do the <code>dc.type.output</code> move on CGSpace anyways, but we should wait as it might affect other external people!</li>
</ul>
<h2 id="2016-04-14">2016-04-14</h2>
@ -289,7 +289,7 @@ UPDATE metadatavalue SET metadata_field_id=217 WHERE metadata_field_id=108
UPDATE 46075
$ JAVA_OPTS=&quot;-Xms512m -Xmx512m -Dfile.encoding=UTF-8&quot; ~/dspace/bin/dspace index-discovery -bf
</code></pre><ul>
<li>CGSpace was down but I'm not sure why, this was in <code>catalina.out</code>:</li>
<li>CGSpace was down but I&rsquo;m not sure why, this was in <code>catalina.out</code>:</li>
</ul>
<pre><code>Apr 18, 2016 7:32:26 PM com.sun.jersey.spi.container.ContainerResponse logException
SEVERE: Mapped exception to response: 500 (Internal Server Error)
@ -334,14 +334,14 @@ javax.ws.rs.WebApplicationException
<pre><code># delete from metadatavalue where resource_type_id=2 and metadata_field_id=96;
# delete from metadatavalue where resource_type_id=2 and metadata_field_id=83;
</code></pre><ul>
<li>They are old ICRAF fields and we haven't used them since 2011 or so</li>
<li>They are old ICRAF fields and we haven&rsquo;t used them since 2011 or so</li>
<li>Also delete them from the metadata registry</li>
<li>CGSpace went down again, <code>dspace.log</code> had this:</li>
</ul>
<pre><code>2016-04-19 15:02:17,025 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
</code></pre><ul>
<li>I restarted Tomcat and PostgreSQL and now it's back up</li>
<li>I restarted Tomcat and PostgreSQL and now it&rsquo;s back up</li>
<li>I bet this is the same crash as yesterday, but I only saw the errors in <code>catalina.out</code></li>
<li>Looks to be related to this, from <code>dspace.log</code>:</li>
</ul>
@ -383,24 +383,24 @@ UPDATE 46075
<pre><code>$ grep -c &quot;Aborting context in finally statement&quot; dspace.log.2016-04-20
21252
</code></pre><ul>
<li>I found a recent discussion on the DSpace mailing list and I've asked for advice there</li>
<li>Looks like this issue was noted and fixed in DSpace 5.5 (we're on 5.1): <a href="https://jira.duraspace.org/browse/DS-2936">https://jira.duraspace.org/browse/DS-2936</a></li>
<li>I've sent a message to Atmire asking about compatibility with DSpace 5.5</li>
<li>I found a recent discussion on the DSpace mailing list and I&rsquo;ve asked for advice there</li>
<li>Looks like this issue was noted and fixed in DSpace 5.5 (we&rsquo;re on 5.1): <a href="https://jira.duraspace.org/browse/DS-2936">https://jira.duraspace.org/browse/DS-2936</a></li>
<li>I&rsquo;ve sent a message to Atmire asking about compatibility with DSpace 5.5</li>
</ul>
<h2 id="2016-04-21">2016-04-21</h2>
<ul>
<li>Fix a bunch of metadata consistency issues with IITA Journal Articles (Peer review, Formally published, messed up DOIs, etc)</li>
<li>Atmire responded with DSpace 5.5 compatible versions for their modules, so I'll start testing those in a few weeks</li>
<li>Atmire responded with DSpace 5.5 compatible versions for their modules, so I&rsquo;ll start testing those in a few weeks</li>
</ul>
<h2 id="2016-04-22">2016-04-22</h2>
<ul>
<li>Import 95 records into <a href="https://cgspace.cgiar.org/handle/10568/42219">CTA's Agrodok collection</a></li>
<li>Import 95 records into <a href="https://cgspace.cgiar.org/handle/10568/42219">CTA&rsquo;s Agrodok collection</a></li>
</ul>
<h2 id="2016-04-26">2016-04-26</h2>
<ul>
<li>Test embargo during item upload</li>
<li>Seems to be working but the help text is misleading as to the date format</li>
<li>It turns out the <code>robots.txt</code> issue we thought we solved last month isn't solved because you can't use wildcards in URL patterns: <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
<li>It turns out the <code>robots.txt</code> issue we thought we solved last month isn&rsquo;t solved because you can&rsquo;t use wildcards in URL patterns: <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
<li>Write some nginx rules to add <code>X-Robots-Tag</code> HTTP headers to the dynamic requests from <code>robots.txt</code> instead</li>
<li>A few URLs to test with:
<ul>
@ -449,17 +449,17 @@ dspace.log.2016-04-27:7271
<li>Add Spanish XMLUI strings so those users see &ldquo;CGSpace&rdquo; instead of &ldquo;DSpace&rdquo; in the user interface (<a href="https://github.com/ilri/DSpace/pull/222">#222</a>)</li>
<li>Submit patch to upstream DSpace for the misleading help text in the embargo step of the item submission: <a href="https://jira.duraspace.org/browse/DS-3172">https://jira.duraspace.org/browse/DS-3172</a></li>
<li>Update infrastructure playbooks for nginx 1.10.x (stable) release: <a href="https://github.com/ilri/rmg-ansible-public/issues/32">https://github.com/ilri/rmg-ansible-public/issues/32</a></li>
<li>Currently running on DSpace Test, we'll give it a few days before we adjust CGSpace</li>
<li>CGSpace down, restarted tomcat and it's back up</li>
<li>Currently running on DSpace Test, we&rsquo;ll give it a few days before we adjust CGSpace</li>
<li>CGSpace down, restarted tomcat and it&rsquo;s back up</li>
</ul>
<h2 id="2016-04-28">2016-04-28</h2>
<ul>
<li>Problems with stability again. I've blocked access to <code>/rest</code> for now to see if the number of errors in the log files drop</li>
<li>Problems with stability again. I&rsquo;ve blocked access to <code>/rest</code> for now to see if the number of errors in the log files drop</li>
<li>Later we could maybe start logging access to <code>/rest</code> and perhaps whitelist some IPs&hellip;</li>
</ul>
<h2 id="2016-04-30">2016-04-30</h2>
<ul>
<li>Logs for today and yesterday have zero references to this REST error, so I'm going to open back up the REST API but log all requests</li>
<li>Logs for today and yesterday have zero references to this REST error, so I&rsquo;m going to open back up the REST API but log all requests</li>
</ul>
<pre><code>location /rest {
access_log /var/log/nginx/rest.log;