Add notes for 2022-03-04

This commit is contained in:
2022-03-04 15:30:06 +03:00
parent 7453499827
commit 27acbac859
115 changed files with 6550 additions and 6444 deletions

View File

@ -32,7 +32,7 @@ After running DSpace for over five years I’ve never needed to look in any
This will save us a few gigs of backup space we’re paying for on S3
Also, I noticed the checker log has some errors we should pay attention to:
"/>
<meta name="generator" content="Hugo 0.92.2" />
<meta name="generator" content="Hugo 0.93.1" />
@ -150,7 +150,7 @@ java.io.FileNotFoundException: /home/cgspace.cgiar.org/assetstore/64/29/06/64290
******************************************************
</code></pre><ul>
<li>So this would be the <code>tomcat7</code> Unix user, who seems to have a default limit of 1024 files in its shell</li>
<li>For what it&rsquo;s worth, we have been setting the actual Tomcat 7 process' limit to 16384 for a few years (in <code>/etc/default/tomcat7</code>)</li>
<li>For what it&rsquo;s worth, we have been setting the actual Tomcat 7 process&rsquo; limit to 16384 for a few years (in <code>/etc/default/tomcat7</code>)</li>
<li>Looks like cron will read limits from <code>/etc/security/limits.*</code> so we can do something for the tomcat7 user there</li>
<li>Submit pull request for Tomcat 7 limits in Ansible dspace role (<a href="https://github.com/ilri/rmg-ansible-public/pull/30">#30</a>)</li>
</ul>
@ -159,10 +159,10 @@ java.io.FileNotFoundException: /home/cgspace.cgiar.org/assetstore/64/29/06/64290
<li>Reduce Amazon S3 storage used for logs from 46 GB to 6GB by deleting a bunch of logs we don&rsquo;t need!</li>
</ul>
<pre tabindex="0"><code># s3cmd ls s3://cgspace.cgiar.org/log/ &gt; /tmp/s3-logs.txt
# grep checker.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
# grep cocoon.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
# grep handle-plugin.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
# grep solr.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
# grep checker.log /tmp/s3-logs.txt | awk &#39;{print $4}&#39; | xargs s3cmd del
# grep cocoon.log /tmp/s3-logs.txt | awk &#39;{print $4}&#39; | xargs s3cmd del
# grep handle-plugin.log /tmp/s3-logs.txt | awk &#39;{print $4}&#39; | xargs s3cmd del
# grep solr.log /tmp/s3-logs.txt | awk &#39;{print $4}&#39; | xargs s3cmd del
</code></pre><ul>
<li>Also, adjust the cron jobs for backups so they only backup <code>dspace.log</code> and some stats files (.dat)</li>
<li>Try to do some metadata field migrations using the Atmire batch UI (<code>dc.Species</code> → <code>cg.species</code>) but it took several hours and even missed a few records</li>
@ -199,13 +199,13 @@ UPDATE 51258
<li>Looking at the DOI issue <a href="https://www.yammer.com/dspacedevelopers/#/Threads/show?threadId=678507860">reported by Leroy from CIAT a few weeks ago</a></li>
<li>It seems the <code>dx.doi.org</code> URLs are much more proper in our repository!</li>
</ul>
<pre tabindex="0"><code>dspacetest=# select count(*) from metadatavalue where metadata_field_id=74 and text_value like 'http://dx.doi.org%';
<pre tabindex="0"><code>dspacetest=# select count(*) from metadatavalue where metadata_field_id=74 and text_value like &#39;http://dx.doi.org%&#39;;
count
-------
5638
(1 row)
dspacetest=# select count(*) from metadatavalue where metadata_field_id=74 and text_value like 'http://doi.org%';
dspacetest=# select count(*) from metadatavalue where metadata_field_id=74 and text_value like &#39;http://doi.org%&#39;;
count
-------
3
@ -231,11 +231,11 @@ dspacetest=# select count(*) from metadatavalue where metadata_field_id=74 and t
<li>I decided to keep the set of subjects that had <code>FMD</code> and <code>RANGELANDS</code> added, as it appears to have been requested to have been added, and might be the newer list</li>
<li>I found 226 blank metadatavalues:</li>
</ul>
<pre tabindex="0"><code>dspacetest# select * from metadatavalue where resource_type_id=2 and text_value='';
<pre tabindex="0"><code>dspacetest# select * from metadatavalue where resource_type_id=2 and text_value=&#39;&#39;;
</code></pre><ul>
<li>I think we should delete them and do a full re-index:</li>
</ul>
<pre tabindex="0"><code>dspacetest=# delete from metadatavalue where resource_type_id=2 and text_value='';
<pre tabindex="0"><code>dspacetest=# delete from metadatavalue where resource_type_id=2 and text_value=&#39;&#39;;
DELETE 226
</code></pre><ul>
<li>I deleted them on CGSpace but I&rsquo;ll wait to do the re-index as we&rsquo;re going to be doing one in a few days for the metadata changes anyways</li>
@ -294,7 +294,7 @@ UPDATE metadatavalue SET metadata_field_id=215 WHERE metadata_field_id=106
UPDATE 3872
UPDATE metadatavalue SET metadata_field_id=217 WHERE metadata_field_id=108
UPDATE 46075
$ JAVA_OPTS=&quot;-Xms512m -Xmx512m -Dfile.encoding=UTF-8&quot; ~/dspace/bin/dspace index-discovery -bf
$ JAVA_OPTS=&#34;-Xms512m -Xmx512m -Dfile.encoding=UTF-8&#34; ~/dspace/bin/dspace index-discovery -bf
</code></pre><ul>
<li>CGSpace was down but I&rsquo;m not sure why, this was in <code>catalina.out</code>:</li>
</ul>
@ -387,7 +387,7 @@ UPDATE 46075
<li>Basically, this gives us the ability to use the latest upstream stable 9.3.x release (currently 9.3.12)</li>
<li>Looking into the REST API errors again, it looks like these started appearing a few days ago in the tens of thousands:</li>
</ul>
<pre tabindex="0"><code>$ grep -c &quot;Aborting context in finally statement&quot; dspace.log.2016-04-20
<pre tabindex="0"><code>$ grep -c &#34;Aborting context in finally statement&#34; dspace.log.2016-04-20
21252
</code></pre><ul>
<li>I found a recent discussion on the DSpace mailing list and I&rsquo;ve asked for advice there</li>
@ -423,7 +423,7 @@ UPDATE 46075
<li>Looks like the last one was &ldquo;down&rdquo; from about four hours ago</li>
<li>I think there must be something with this REST stuff:</li>
</ul>
<pre tabindex="0"><code># grep -c &quot;Aborting context in finally statement&quot; dspace.log.2016-04-*
<pre tabindex="0"><code># grep -c &#34;Aborting context in finally statement&#34; dspace.log.2016-04-*
dspace.log.2016-04-01:0
dspace.log.2016-04-02:0
dspace.log.2016-04-03:0