Add notes for 2019-05-05

This commit is contained in:
2019-05-05 16:45:12 +03:00
parent cfa5f3ddfb
commit 96d6602775
76 changed files with 10839 additions and 11300 deletions

View File

@ -10,8 +10,8 @@
CGSpace was down for five hours in the morning while I was sleeping
While looking in the logs for errors, I see tons of warnings about Atmire MQM:
While looking in the logs for errors, I see tons of warnings about Atmire MQM:
2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail="dc.title", transactionID="TX157907838689377964651674089851855413607")
@ -20,9 +20,10 @@ While looking in the logs for errors, I see tons of warnings about Atmire MQM:
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade
I’ve raised a ticket with Atmire to ask
Another worrying error from dspace.log is:
" />
<meta property="og:type" content="article" />
@ -36,8 +37,8 @@ Another worrying error from dspace.log is:
CGSpace was down for five hours in the morning while I was sleeping
While looking in the logs for errors, I see tons of warnings about Atmire MQM:
While looking in the logs for errors, I see tons of warnings about Atmire MQM:
2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail=&quot;dc.title&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
@ -46,12 +47,13 @@ While looking in the logs for errors, I see tons of warnings about Atmire MQM:
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
I see thousands of them in the logs for the last few months, so it&rsquo;s not related to the DSpace 5.5 upgrade
I&rsquo;ve raised a ticket with Atmire to ask
Another worrying error from dspace.log is:
"/>
<meta name="generator" content="Hugo 0.55.3" />
<meta name="generator" content="Hugo 0.55.5" />
@ -134,20 +136,21 @@ Another worrying error from dspace.log is:
<ul>
<li>CGSpace was down for five hours in the morning while I was sleeping</li>
<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li>
</ul>
<li><p>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</p>
<pre><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail=&quot;dc.title&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail=&quot;THUMBNAIL&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail=&quot;-1&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
</code></pre>
</code></pre></li>
<ul>
<li>I see thousands of them in the logs for the last few months, so it&rsquo;s not related to the DSpace 5.5 upgrade</li>
<li>I&rsquo;ve raised a ticket with Atmire to ask</li>
<li>Another worrying error from dspace.log is:</li>
<li><p>I see thousands of them in the logs for the last few months, so it&rsquo;s not related to the DSpace 5.5 upgrade</p></li>
<li><p>I&rsquo;ve raised a ticket with Atmire to ask</p></li>
<li><p>Another worrying error from dspace.log is:</p></li>
</ul>
<pre><code>org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceObjectDatasetGenerator.toDatasetQuery(Lorg/dspace/core/Context;)Lcom/atmire/statistics/content/DatasetQuery;
@ -239,35 +242,35 @@ Caused by: java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceOb
</code></pre>
<ul>
<li>The first error I see in dspace.log this morning is:</li>
</ul>
<li><p>The first error I see in dspace.log this morning is:</p>
<pre><code>2016-12-02 03:00:46,656 ERROR org.dspace.authority.AuthorityValueFinder @ anonymous::Error while retrieving AuthorityValue from solr:query\colon; id\colon;&quot;b0b541c1-ec15-48bf-9209-6dbe8e338cdc&quot;
org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://localhost:8081/solr/authority
</code></pre>
</code></pre></li>
<ul>
<li>Looking through DSpace&rsquo;s solr log I see that about 20 seconds before this, there were a few 30+ KiB solr queries</li>
<li>The last logs here right before Solr became unresponsive (and right after I restarted it five hours later) were:</li>
</ul>
<li><p>Looking through DSpace&rsquo;s solr log I see that about 20 seconds before this, there were a few 30+ KiB solr queries</p></li>
<li><p>The last logs here right before Solr became unresponsive (and right after I restarted it five hours later) were:</p>
<pre><code>2016-12-02 03:00:42,606 INFO org.apache.solr.core.SolrCore @ [statistics] webapp=/solr path=/select params={q=containerItem:72828+AND+type:0&amp;shards=localhost:8081/solr/statistics-2010,localhost:8081/solr/statistics&amp;fq=-isInternal:true&amp;fq=-(author_mtdt:&quot;CGIAR\+Institutional\+Learning\+and\+Change\+Initiative&quot;++AND+subject_mtdt:&quot;PARTNERSHIPS&quot;+AND+subject_mtdt:&quot;RESEARCH&quot;+AND+subject_mtdt:&quot;AGRICULTURE&quot;+AND+subject_mtdt:&quot;DEVELOPMENT&quot;++AND+iso_mtdt:&quot;en&quot;+)&amp;rows=0&amp;wt=javabin&amp;version=2} hits=0 status=0 QTime=19
2016-12-02 08:28:23,908 INFO org.apache.solr.servlet.SolrDispatchFilter @ SolrDispatchFilter.init()
</code></pre>
</code></pre></li>
<ul>
<li>DSpace&rsquo;s own Solr logs don&rsquo;t give IP addresses, so I will have to enable Nginx&rsquo;s logging of <code>/solr</code> so I can see where this request came from</li>
<li>I enabled logging of <code>/rest/</code> and I think I&rsquo;ll leave it on for good</li>
<li>Also, the disk is nearly full because of log file issues, so I&rsquo;m running some compression on DSpace logs</li>
<li>Normally these stay uncompressed for a month just in case we need to look at them, so now I&rsquo;ve just compressed anything older than 2 weeks so we can get some disk space back</li>
<li><p>DSpace&rsquo;s own Solr logs don&rsquo;t give IP addresses, so I will have to enable Nginx&rsquo;s logging of <code>/solr</code> so I can see where this request came from</p></li>
<li><p>I enabled logging of <code>/rest/</code> and I think I&rsquo;ll leave it on for good</p></li>
<li><p>Also, the disk is nearly full because of log file issues, so I&rsquo;m running some compression on DSpace logs</p></li>
<li><p>Normally these stay uncompressed for a month just in case we need to look at them, so now I&rsquo;ve just compressed anything older than 2 weeks so we can get some disk space back</p></li>
</ul>
<h2 id="2016-12-04">2016-12-04</h2>
<ul>
<li>I got a weird report from the CGSpace checksum checker this morning</li>
<li>It says 732 bitstreams have potential issues, for example:</li>
</ul>
<li><p>It says 732 bitstreams have potential issues, for example:</p>
<pre><code>------------------------------------------------
Bitstream Id = 6
@ -286,14 +289,15 @@ Checksum Expected = 9959301aa4ca808d00957dff88214e38
Checksum Calculated =
Result = The bitstream could not be found
-----------------------------------------------
</code></pre>
</code></pre></li>
<ul>
<li>The first one seems ok, but I don&rsquo;t know what to make of the second one&hellip;</li>
<li>I had a look and there is indeed no file with the second checksum in the assetstore (ie, looking in <code>[dspace-dir]/assetstore/99/59/30/...</code>)</li>
<li>For what it&rsquo;s worth, there is no item on DSpace Test or S3 backups with that checksum either&hellip;</li>
<li>In other news, I&rsquo;m looking at JVM settings from the Solr 4.10.2 release, from <code>bin/solr.in.sh</code>:</li>
</ul>
<li><p>The first one seems ok, but I don&rsquo;t know what to make of the second one&hellip;</p></li>
<li><p>I had a look and there is indeed no file with the second checksum in the assetstore (ie, looking in <code>[dspace-dir]/assetstore/99/59/30/...</code>)</p></li>
<li><p>For what it&rsquo;s worth, there is no item on DSpace Test or S3 backups with that checksum either&hellip;</p></li>
<li><p>In other news, I&rsquo;m looking at JVM settings from the Solr 4.10.2 release, from <code>bin/solr.in.sh</code>:</p>
<pre><code># These GC settings have shown to work well for a number of common Solr workloads
GC_TUNE=&quot;-XX:-UseSuperWord \
@ -314,11 +318,11 @@ GC_TUNE=&quot;-XX:-UseSuperWord \
-XX:+CMSParallelRemarkEnabled \
-XX:+ParallelRefProcEnabled \
-XX:+AggressiveOpts&quot;
</code></pre>
</code></pre></li>
<ul>
<li>I need to try these because they are recommended by the Solr project itself</li>
<li>Also, as always, I need to read <a href="https://wiki.apache.org/solr/ShawnHeisey">Shawn Heisey&rsquo;s wiki page on Solr</a></li>
<li><p>I need to try these because they are recommended by the Solr project itself</p></li>
<li><p>Also, as always, I need to read <a href="https://wiki.apache.org/solr/ShawnHeisey">Shawn Heisey&rsquo;s wiki page on Solr</a></p></li>
</ul>
<h2 id="2016-12-05">2016-12-05</h2>
@ -330,21 +334,19 @@ GC_TUNE=&quot;-XX:-UseSuperWord \
<li>I did a few traceroutes from Jordan and Kenya and it seems that Linode&rsquo;s Frankfurt datacenter is a few less hops and perhaps less packet loss than the London one, so I put the new server in Frankfurt</li>
<li>Do initial provisioning</li>
<li>Atmire responded about the MQM warnings in the DSpace logs</li>
<li>Apparently we need to change the batch edit consumers in <code>dspace/config/dspace.cfg</code>:</li>
</ul>
<li><p>Apparently we need to change the batch edit consumers in <code>dspace/config/dspace.cfg</code>:</p>
<pre><code>event.consumer.batchedit.filters = Community|Collection+Create
</code></pre>
</code></pre></li>
<ul>
<li>I haven&rsquo;t tested it yet, but I created a pull request: <a href="https://github.com/ilri/DSpace/pull/289">#289</a></li>
<li><p>I haven&rsquo;t tested it yet, but I created a pull request: <a href="https://github.com/ilri/DSpace/pull/289">#289</a></p></li>
</ul>
<h2 id="2016-12-06">2016-12-06</h2>
<ul>
<li>Some author authority corrections and name standardizations for Peter:</li>
</ul>
<li><p>Some author authority corrections and name standardizations for Peter:</p>
<pre><code>dspace=# update metadatavalue set authority='b041f2f4-19e7-4113-b774-0439baabd197', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Mora Benard%';
UPDATE 11
@ -358,47 +360,55 @@ dspace=# update metadatavalue set authority='0d8369bb-57f7-4b2f-92aa-af820b183ac
UPDATE 360
dspace=# update metadatavalue set text_value='Grace, Delia', authority='0b4fcbc1-d930-4319-9b4d-ea1553cca70b', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
UPDATE 561
</code></pre>
</code></pre></li>
<ul>
<li>Pay attention to the regex to prevent false positives in tricky cases with Dutch names!</li>
<li>I will run these updates on DSpace Test and then force a Discovery reindex, and then run them on CGSpace next week</li>
<li>More work on the KM4Dev Journal article</li>
<li>In other news, it seems the batch edit patch is working, there are no more WARN errors in the logs and the batch edit seems to work</li>
<li>I need to check the CGSpace logs to see if there are still errors there, and then deploy/monitor it there</li>
<li>Paola from CCAFS mentioned she also has the &ldquo;take task&rdquo; bug on CGSpace</li>
<li>Reading about <a href="https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html"><code>shared_buffers</code> in PostgreSQL configuration</a> (default is 128MB)</li>
<li>Looks like we have ~5GB of memory used by caches on the test server (after OS and JVM heap!), so we might as well bump up the buffers for Postgres</li>
<li>The docs say a good starting point for a dedicated server is 25% of the system RAM, and our server isn&rsquo;t dedicated (also runs Solr, which can benefit from OS cache) so let&rsquo;s try 1024MB</li>
<li>In other news, the authority reindexing keeps crashing (I was manually running it after the author updates above):</li>
</ul>
<li><p>Pay attention to the regex to prevent false positives in tricky cases with Dutch names!</p></li>
<li><p>I will run these updates on DSpace Test and then force a Discovery reindex, and then run them on CGSpace next week</p></li>
<li><p>More work on the KM4Dev Journal article</p></li>
<li><p>In other news, it seems the batch edit patch is working, there are no more WARN errors in the logs and the batch edit seems to work</p></li>
<li><p>I need to check the CGSpace logs to see if there are still errors there, and then deploy/monitor it there</p></li>
<li><p>Paola from CCAFS mentioned she also has the &ldquo;take task&rdquo; bug on CGSpace</p></li>
<li><p>Reading about <a href="https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html"><code>shared_buffers</code> in PostgreSQL configuration</a> (default is 128MB)</p></li>
<li><p>Looks like we have ~5GB of memory used by caches on the test server (after OS and JVM heap!), so we might as well bump up the buffers for Postgres</p></li>
<li><p>The docs say a good starting point for a dedicated server is 25% of the system RAM, and our server isn&rsquo;t dedicated (also runs Solr, which can benefit from OS cache) so let&rsquo;s try 1024MB</p></li>
<li><p>In other news, the authority reindexing keeps crashing (I was manually running it after the author updates above):</p>
<pre><code>$ time JAVA_OPTS=&quot;-Xms768m -Xmx768m -Dfile.encoding=UTF-8&quot; /home/dspacetest.cgiar.org/bin/dspace index-authority
Retrieving all data
Initialize org.dspace.authority.indexer.DSpaceAuthorityIndexer
Exception: null
java.lang.NullPointerException
at org.dspace.authority.AuthorityValueGenerator.generateRaw(AuthorityValueGenerator.java:82)
at org.dspace.authority.AuthorityValueGenerator.generate(AuthorityValueGenerator.java:39)
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.prepareNextValue(DSpaceAuthorityIndexer.java:201)
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:132)
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:159)
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
at org.dspace.authority.indexer.AuthorityIndexClient.main(AuthorityIndexClient.java:61)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
at org.dspace.authority.AuthorityValueGenerator.generateRaw(AuthorityValueGenerator.java:82)
at org.dspace.authority.AuthorityValueGenerator.generate(AuthorityValueGenerator.java:39)
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.prepareNextValue(DSpaceAuthorityIndexer.java:201)
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:132)
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:159)
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
at org.dspace.authority.indexer.AuthorityIndexClient.main(AuthorityIndexClient.java:61)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
real 8m39.913s
user 1m54.190s
sys 0m22.647s
</code></pre>
</code></pre></li>
</ul>
<h2 id="2016-12-07">2016-12-07</h2>
@ -407,108 +417,107 @@ sys 0m22.647s
<li>I will have to test more</li>
<li>Anyways, I noticed that some of the authority values I set actually have versions of author names we don&rsquo;t want, ie &ldquo;Grace, D.&rdquo;</li>
<li>For example, do a Solr query for &ldquo;first_name:Grace&rdquo; and look at the results</li>
<li>Querying that ID shows the fields that need to be changed:</li>
</ul>
<li><p>Querying that ID shows the fields that need to be changed:</p>
<pre><code>{
&quot;responseHeader&quot;: {
&quot;status&quot;: 0,
&quot;QTime&quot;: 1,
&quot;params&quot;: {
&quot;q&quot;: &quot;id:0b4fcbc1-d930-4319-9b4d-ea1553cca70b&quot;,
&quot;indent&quot;: &quot;true&quot;,
&quot;wt&quot;: &quot;json&quot;,
&quot;_&quot;: &quot;1481102189244&quot;
}
},
&quot;response&quot;: {
&quot;numFound&quot;: 1,
&quot;start&quot;: 0,
&quot;docs&quot;: [
{
&quot;id&quot;: &quot;0b4fcbc1-d930-4319-9b4d-ea1553cca70b&quot;,
&quot;field&quot;: &quot;dc_contributor_author&quot;,
&quot;value&quot;: &quot;Grace, D.&quot;,
&quot;deleted&quot;: false,
&quot;creation_date&quot;: &quot;2016-11-10T15:13:40.318Z&quot;,
&quot;last_modified_date&quot;: &quot;2016-11-10T15:13:40.318Z&quot;,
&quot;authority_type&quot;: &quot;person&quot;,
&quot;first_name&quot;: &quot;D.&quot;,
&quot;last_name&quot;: &quot;Grace&quot;
}
]
}
&quot;responseHeader&quot;: {
&quot;status&quot;: 0,
&quot;QTime&quot;: 1,
&quot;params&quot;: {
&quot;q&quot;: &quot;id:0b4fcbc1-d930-4319-9b4d-ea1553cca70b&quot;,
&quot;indent&quot;: &quot;true&quot;,
&quot;wt&quot;: &quot;json&quot;,
&quot;_&quot;: &quot;1481102189244&quot;
}
</code></pre>
},
&quot;response&quot;: {
&quot;numFound&quot;: 1,
&quot;start&quot;: 0,
&quot;docs&quot;: [
{
&quot;id&quot;: &quot;0b4fcbc1-d930-4319-9b4d-ea1553cca70b&quot;,
&quot;field&quot;: &quot;dc_contributor_author&quot;,
&quot;value&quot;: &quot;Grace, D.&quot;,
&quot;deleted&quot;: false,
&quot;creation_date&quot;: &quot;2016-11-10T15:13:40.318Z&quot;,
&quot;last_modified_date&quot;: &quot;2016-11-10T15:13:40.318Z&quot;,
&quot;authority_type&quot;: &quot;person&quot;,
&quot;first_name&quot;: &quot;D.&quot;,
&quot;last_name&quot;: &quot;Grace&quot;
}
]
}
}
</code></pre></li>
<ul>
<li>I think I can just update the <code>value</code>, <code>first_name</code>, and <code>last_name</code> fields&hellip;</li>
<li>The update syntax should be something like this, but I&rsquo;m getting errors from Solr:</li>
</ul>
<li><p>I think I can just update the <code>value</code>, <code>first_name</code>, and <code>last_name</code> fields&hellip;</p></li>
<li><p>The update syntax should be something like this, but I&rsquo;m getting errors from Solr:</p>
<pre><code>$ curl 'localhost:8081/solr/authority/update?commit=true&amp;wt=json&amp;indent=true' -H 'Content-type:application/json' -d '[{&quot;id&quot;:&quot;1&quot;,&quot;price&quot;:{&quot;set&quot;:100}}]'
{
&quot;responseHeader&quot;:{
&quot;status&quot;:400,
&quot;QTime&quot;:0},
&quot;error&quot;:{
&quot;msg&quot;:&quot;Unexpected character '[' (code 91) in prolog; expected '&lt;'\n at [row,col {unknown-source}]: [1,1]&quot;,
&quot;code&quot;:400}}
</code></pre>
&quot;responseHeader&quot;:{
&quot;status&quot;:400,
&quot;QTime&quot;:0},
&quot;error&quot;:{
&quot;msg&quot;:&quot;Unexpected character '[' (code 91) in prolog; expected '&lt;'\n at [row,col {unknown-source}]: [1,1]&quot;,
&quot;code&quot;:400}}
</code></pre></li>
<ul>
<li>When I try using the XML format I get an error that the <code>updateLog</code> needs to be configured for that core</li>
<li>Maybe I can just remove the authority UUID from the records, run the indexing again so it creates a new one for each name variant, then match them correctly?</li>
</ul>
<li><p>When I try using the XML format I get an error that the <code>updateLog</code> needs to be configured for that core</p></li>
<li><p>Maybe I can just remove the authority UUID from the records, run the indexing again so it creates a new one for each name variant, then match them correctly?</p>
<pre><code>dspace=# update metadatavalue set authority=null, confidence=-1 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
UPDATE 561
</code></pre>
</code></pre></li>
<ul>
<li>Then I&rsquo;ll reindex discovery and authority and see how the authority Solr core looks</li>
<li>After this, now there are authorities for some of the &ldquo;Grace, D.&rdquo; and &ldquo;Grace, Delia&rdquo; text_values in the database (the first version is actually the same authority that already exists in the core, so it was just added back to some text_values, but the second one is new):</li>
</ul>
<li><p>Then I&rsquo;ll reindex discovery and authority and see how the authority Solr core looks</p></li>
<li><p>After this, now there are authorities for some of the &ldquo;Grace, D.&rdquo; and &ldquo;Grace, Delia&rdquo; text_values in the database (the first version is actually the same authority that already exists in the core, so it was just added back to some text_values, but the second one is new):</p>
<pre><code>$ curl 'localhost:8081/solr/authority/select?q=id%3A18ea1525-2513-430a-8817-a834cd733fbc&amp;wt=json&amp;indent=true'
{
&quot;responseHeader&quot;:{
&quot;status&quot;:0,
&quot;QTime&quot;:0,
&quot;params&quot;:{
&quot;q&quot;:&quot;id:18ea1525-2513-430a-8817-a834cd733fbc&quot;,
&quot;indent&quot;:&quot;true&quot;,
&quot;wt&quot;:&quot;json&quot;}},
&quot;response&quot;:{&quot;numFound&quot;:1,&quot;start&quot;:0,&quot;docs&quot;:[
{
&quot;id&quot;:&quot;18ea1525-2513-430a-8817-a834cd733fbc&quot;,
&quot;field&quot;:&quot;dc_contributor_author&quot;,
&quot;value&quot;:&quot;Grace, Delia&quot;,
&quot;deleted&quot;:false,
&quot;creation_date&quot;:&quot;2016-12-07T10:54:34.356Z&quot;,
&quot;last_modified_date&quot;:&quot;2016-12-07T10:54:34.356Z&quot;,
&quot;authority_type&quot;:&quot;person&quot;,
&quot;first_name&quot;:&quot;Delia&quot;,
&quot;last_name&quot;:&quot;Grace&quot;}]
}}
</code></pre>
&quot;responseHeader&quot;:{
&quot;status&quot;:0,
&quot;QTime&quot;:0,
&quot;params&quot;:{
&quot;q&quot;:&quot;id:18ea1525-2513-430a-8817-a834cd733fbc&quot;,
&quot;indent&quot;:&quot;true&quot;,
&quot;wt&quot;:&quot;json&quot;}},
&quot;response&quot;:{&quot;numFound&quot;:1,&quot;start&quot;:0,&quot;docs&quot;:[
{
&quot;id&quot;:&quot;18ea1525-2513-430a-8817-a834cd733fbc&quot;,
&quot;field&quot;:&quot;dc_contributor_author&quot;,
&quot;value&quot;:&quot;Grace, Delia&quot;,
&quot;deleted&quot;:false,
&quot;creation_date&quot;:&quot;2016-12-07T10:54:34.356Z&quot;,
&quot;last_modified_date&quot;:&quot;2016-12-07T10:54:34.356Z&quot;,
&quot;authority_type&quot;:&quot;person&quot;,
&quot;first_name&quot;:&quot;Delia&quot;,
&quot;last_name&quot;:&quot;Grace&quot;}]
}}
</code></pre></li>
<ul>
<li>So now I could set them all to this ID and the name would be ok, but there has to be a better way!</li>
<li>In this case it seems that since there were also two different IDs in the original database, I just picked the wrong one!</li>
<li>Better to use:</li>
</ul>
<li><p>So now I could set them all to this ID and the name would be ok, but there has to be a better way!</p></li>
<li><p>In this case it seems that since there were also two different IDs in the original database, I just picked the wrong one!</p></li>
<li><p>Better to use:</p>
<pre><code>dspace#= update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
</code></pre>
</code></pre></li>
<ul>
<li>This proves that unifying author name varieties in authorities is easy, but fixing the name in the authority is tricky!</li>
<li>Perhaps another way is to just add our own UUID to the authority field for the text_value we like, then re-index authority so they get synced from PostgreSQL to Solr, then set the other text_values to use that authority ID</li>
<li>Deploy MQM WARN fix on CGSpace (<a href="https://github.com/ilri/DSpace/pull/289">#289</a>)</li>
<li>Deploy &ldquo;take task&rdquo; hack/fix on CGSpace (<a href="https://github.com/ilri/DSpace/pull/290">#290</a>)</li>
<li>I ran the following author corrections and then reindexed discovery:</li>
</ul>
<li><p>This proves that unifying author name varieties in authorities is easy, but fixing the name in the authority is tricky!</p></li>
<li><p>Perhaps another way is to just add our own UUID to the authority field for the text_value we like, then re-index authority so they get synced from PostgreSQL to Solr, then set the other text_values to use that authority ID</p></li>
<li><p>Deploy MQM WARN fix on CGSpace (<a href="https://github.com/ilri/DSpace/pull/289">#289</a>)</p></li>
<li><p>Deploy &ldquo;take task&rdquo; hack/fix on CGSpace (<a href="https://github.com/ilri/DSpace/pull/290">#290</a>)</p></li>
<li><p>I ran the following author corrections and then reindexed discovery:</p>
<pre><code>update metadatavalue set authority='b041f2f4-19e7-4113-b774-0439baabd197', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Mora Benard%';
update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-6fd5-4b43-9363-58d18e7952c9', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Hoek, R%';
@ -516,68 +525,63 @@ update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-
update metadatavalue set authority='18349f29-61b1-44d7-ac60-89e55546e812', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thorne, P%';
update metadatavalue set authority='0d8369bb-57f7-4b2f-92aa-af820b183aca', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thornton, P%';
update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
</code></pre>
</code></pre></li>
</ul>
<h2 id="2016-12-08">2016-12-08</h2>
<ul>
<li>Something weird happened and Peter Thorne&rsquo;s names all ended up as &ldquo;Thorne&rdquo;, I guess because the original authority had that as its name value:</li>
</ul>
<li><p>Something weird happened and Peter Thorne&rsquo;s names all ended up as &ldquo;Thorne&rdquo;, I guess because the original authority had that as its name value:</p>
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thorne%';
text_value | authority | confidence
text_value | authority | confidence
------------------+--------------------------------------+------------
Thorne, P.J. | 18349f29-61b1-44d7-ac60-89e55546e812 | 600
Thorne | 18349f29-61b1-44d7-ac60-89e55546e812 | 600
Thorne-Lyman, A. | 0781e13a-1dc8-4e3f-82e8-5c422b44a344 | -1
Thorne, M. D. | 54c52649-cefd-438d-893f-3bcef3702f07 | -1
Thorne, P.J | 18349f29-61b1-44d7-ac60-89e55546e812 | 600
Thorne, P. | 18349f29-61b1-44d7-ac60-89e55546e812 | 600
Thorne, P.J. | 18349f29-61b1-44d7-ac60-89e55546e812 | 600
Thorne | 18349f29-61b1-44d7-ac60-89e55546e812 | 600
Thorne-Lyman, A. | 0781e13a-1dc8-4e3f-82e8-5c422b44a344 | -1
Thorne, M. D. | 54c52649-cefd-438d-893f-3bcef3702f07 | -1
Thorne, P.J | 18349f29-61b1-44d7-ac60-89e55546e812 | 600
Thorne, P. | 18349f29-61b1-44d7-ac60-89e55546e812 | 600
(6 rows)
</code></pre>
</code></pre></li>
<ul>
<li>I generated a new UUID using <code>uuidgen | tr [A-Z] [a-z]</code> and set it along with correct name variation for all records:</li>
</ul>
<li><p>I generated a new UUID using <code>uuidgen | tr [A-Z] [a-z]</code> and set it along with correct name variation for all records:</p>
<pre><code>dspace=# update metadatavalue set authority='b2f7603d-2fb5-4018-923a-c4ec8d85b3bb', text_value='Thorne, P.J.' where resource_type_id=2 and metadata_field_id=3 and authority='18349f29-61b1-44d7-ac60-89e55546e812';
UPDATE 43
</code></pre>
</code></pre></li>
<ul>
<li>Apparently we also need to normalize Phil Thornton&rsquo;s names to <code>Thornton, Philip K.</code>:
<br /></li>
</ul>
<li><p>Apparently we also need to normalize Phil Thornton&rsquo;s names to <code>Thornton, Philip K.</code>:</p>
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^Thornton[,\.]? P.*';
text_value | authority | confidence
text_value | authority | confidence
---------------------+--------------------------------------+------------
Thornton, P | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
Thornton, P K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
Thornton, P K | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
Thornton. P.K. | 3e1e6639-d4fb-449e-9fce-ce06b5b0f702 | -1
Thornton, P K . | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
Thornton, P.K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
Thornton, P.K | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
Thornton, Philip K | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
Thornton, Philip K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
Thornton, P. K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
Thornton, P | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
Thornton, P K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
Thornton, P K | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
Thornton. P.K. | 3e1e6639-d4fb-449e-9fce-ce06b5b0f702 | -1
Thornton, P K . | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
Thornton, P.K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
Thornton, P.K | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
Thornton, Philip K | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
Thornton, Philip K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
Thornton, P. K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
(10 rows)
</code></pre>
</code></pre></li>
<ul>
<li>Seems his original authorities are using an incorrect version of the name so I need to generate another UUID and tie it to the correct name, then reindex:</li>
</ul>
<li><p>Seems his original authorities are using an incorrect version of the name so I need to generate another UUID and tie it to the correct name, then reindex:</p>
<pre><code>dspace=# update metadatavalue set authority='2df8136e-d8f4-4142-b58c-562337cab764', text_value='Thornton, Philip K.', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^Thornton[,\.]? P.*';
UPDATE 362
</code></pre>
</code></pre></li>
<ul>
<li>It seems that, when you are messing with authority and author text values in the database, it is better to run authority reindex first (postgres→solr authority core) and then Discovery reindex (postgres→solr Discovery core)</li>
<li>Everything looks ok after authority and discovery reindex</li>
<li>In other news, I think we should really be using more RAM for PostgreSQL&rsquo;s <code>shared_buffers</code></li>
<li>The <a href="https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html">PostgreSQL documentation</a> recommends using 25% of the system&rsquo;s RAM on dedicated systems, but we should use a bit less since we also have a massive JVM heap and also benefit from some RAM being used by the OS cache</li>
<li><p>It seems that, when you are messing with authority and author text values in the database, it is better to run authority reindex first (postgres→solr authority core) and then Discovery reindex (postgres→solr Discovery core)</p></li>
<li><p>Everything looks ok after authority and discovery reindex</p></li>
<li><p>In other news, I think we should really be using more RAM for PostgreSQL&rsquo;s <code>shared_buffers</code></p></li>
<li><p>The <a href="https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html">PostgreSQL documentation</a> recommends using 25% of the system&rsquo;s RAM on dedicated systems, but we should use a bit less since we also have a massive JVM heap and also benefit from some RAM being used by the OS cache</p></li>
</ul>
<h2 id="2016-12-09">2016-12-09</h2>
@ -585,15 +589,14 @@ UPDATE 362
<ul>
<li>More work on finishing rough draft of KM4Dev article</li>
<li>Set PostgreSQL&rsquo;s <code>shared_buffers</code> on CGSpace to 10% of system RAM (1200MB)</li>
<li>Run the following author corrections on CGSpace:</li>
</ul>
<li><p>Run the following author corrections on CGSpace:</p>
<pre><code>dspace=# update metadatavalue set authority='34df639a-42d8-4867-a3f2-1892075fcb3f', text_value='Thorne, P.J.' where resource_type_id=2 and metadata_field_id=3 and authority='18349f29-61b1-44d7-ac60-89e55546e812' or authority='021cd183-946b-42bb-964e-522ebff02993';
dspace=# update metadatavalue set authority='2df8136e-d8f4-4142-b58c-562337cab764', text_value='Thornton, Philip K.', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^Thornton[,\.]? P.*';
</code></pre>
</code></pre></li>
<ul>
<li>The authority IDs were different now than when I was looking a few days ago so I had to adjust them here</li>
<li><p>The authority IDs were different now than when I was looking a few days ago so I had to adjust them here</p></li>
</ul>
<h2 id="2016-12-11">2016-12-11</h2>
@ -606,40 +609,38 @@ dspace=# update metadatavalue set authority='2df8136e-d8f4-4142-b58c-562337cab76
<img src="/cgspace-notes/2016/12/postgres_connections_ALL-week.png" alt="postgres_connections_ALL-week" /></p>
<ul>
<li>Looking at CIAT records from last week again, they have a lot of double authors like:</li>
</ul>
<li><p>Looking at CIAT records from last week again, they have a lot of double authors like:</p>
<pre><code>International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::600
International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::500
International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::0
</code></pre>
</code></pre></li>
<ul>
<li>Some in the same <code>dc.contributor.author</code> field, and some in others like <code>dc.contributor.author[en_US]</code> etc</li>
<li>Removing the duplicates in OpenRefine and uploading a CSV to DSpace says &ldquo;no changes detected&rdquo;</li>
<li>Seems like the only way to sortof clean these up would be to start in SQL:</li>
</ul>
<li><p>Some in the same <code>dc.contributor.author</code> field, and some in others like <code>dc.contributor.author[en_US]</code> etc</p></li>
<li><p>Removing the duplicates in OpenRefine and uploading a CSV to DSpace says &ldquo;no changes detected&rdquo;</p></li>
<li><p>Seems like the only way to sortof clean these up would be to start in SQL:</p>
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like 'International Center for Tropical Agriculture';
text_value | authority | confidence
text_value | authority | confidence
-----------------------------------------------+--------------------------------------+------------
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | -1
International Center for Tropical Agriculture | | 600
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 500
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | 600
International Center for Tropical Agriculture | | -1
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | 500
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 600
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | -1
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 0
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | -1
International Center for Tropical Agriculture | | 600
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 500
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | 600
International Center for Tropical Agriculture | | -1
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | 500
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 600
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | -1
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 0
dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value = 'International Center for Tropical Agriculture';
UPDATE 1693
dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', text_value='International Center for Tropical Agriculture', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like '%CIAT%';
UPDATE 35
</code></pre>
</code></pre></li>
<ul>
<li>Work on article for KM4Dev journal</li>
<li><p>Work on article for KM4Dev journal</p></li>
</ul>
<h2 id="2016-12-13">2016-12-13</h2>
@ -659,31 +660,34 @@ UPDATE 35
<li>Would probably be better to make custom logrotate files for them in the future</li>
<li>Clean up some unneeded log files from 2014 (they weren&rsquo;t large, just don&rsquo;t need them)</li>
<li>So basically, new cron jobs for logs should look something like this:</li>
<li>Find any file named <code>*.log*</code> that isn&rsquo;t <code>dspace.log*</code>, isn&rsquo;t already zipped, and is older than one day, and zip it:</li>
</ul>
<li><p>Find any file named <code>*.log*</code> that isn&rsquo;t <code>dspace.log*</code>, isn&rsquo;t already zipped, and is older than one day, and zip it:</p>
<pre><code># find /home/dspacetest.cgiar.org/log -regextype posix-extended -iregex &quot;.*\.log.*&quot; ! -iregex &quot;.*dspace\.log.*&quot; ! -iregex &quot;.*\.(gz|lrz|lzo|xz)&quot; ! -newermt &quot;Yesterday&quot; -exec schedtool -B -e ionice -c2 -n7 xz {} \;
</code></pre>
</code></pre></li>
<ul>
<li>Since there is <code>xzgrep</code> and <code>xzless</code> we can actually just zip them after one day, why not?!</li>
<li>We can keep the zipped ones for two weeks just in case we need to look for errors, etc, and delete them after that</li>
<li>I use <code>schedtool -B</code> and <code>ionice -c2 -n7</code> to set the CPU scheduling to <code>SCHED_BATCH</code> and the IO to best effort which should, in theory, impact important system processes like Tomcat and PostgreSQL less</li>
<li>When the tasks are running you can see that the policies do apply:</li>
</ul>
<li><p>Since there is <code>xzgrep</code> and <code>xzless</code> we can actually just zip them after one day, why not?!</p></li>
<li><p>We can keep the zipped ones for two weeks just in case we need to look for errors, etc, and delete them after that</p></li>
<li><p>I use <code>schedtool -B</code> and <code>ionice -c2 -n7</code> to set the CPU scheduling to <code>SCHED_BATCH</code> and the IO to best effort which should, in theory, impact important system processes like Tomcat and PostgreSQL less</p></li>
<li><p>When the tasks are running you can see that the policies do apply:</p>
<pre><code>$ schedtool $(ps aux | grep &quot;xz /home&quot; | grep -v grep | awk '{print $2}') &amp;&amp; ionice -p $(ps aux | grep &quot;xz /home&quot; | grep -v grep | awk '{print $2}')
PID 17049: PRIO 0, POLICY B: SCHED_BATCH , NICE 0, AFFINITY 0xf
best-effort: prio 7
</code></pre>
</code></pre></li>
<ul>
<li>All in all this should free up a few gigs (we were at 9.3GB free when I started)</li>
<li>Next thing to look at is whether we need Tomcat&rsquo;s access logs</li>
<li>I just looked and it seems that we saved 10GB by zipping these logs</li>
<li>Some users pointed out issues with the &ldquo;most popular&rdquo; stats on a community or collection</li>
<li>This error appears in the logs when you try to view them:</li>
</ul>
<li><p>All in all this should free up a few gigs (we were at 9.3GB free when I started)</p></li>
<li><p>Next thing to look at is whether we need Tomcat&rsquo;s access logs</p></li>
<li><p>I just looked and it seems that we saved 10GB by zipping these logs</p></li>
<li><p>Some users pointed out issues with the &ldquo;most popular&rdquo; stats on a community or collection</p></li>
<li><p>This error appears in the logs when you try to view them:</p>
<pre><code>2016-12-13 21:17:37,486 ERROR org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter @ Serious Error Occurred Processing Request!
org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceObjectDatasetGenerator.toDatasetQuery(Lorg/dspace/core/Context;)Lcom/atmire/statistics/content/DatasetQuery;
@ -735,11 +739,11 @@ Caused by: java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceOb
at com.atmire.statistics.mostpopular.JSONStatsMostPopularGenerator.generate(SourceFile:246)
at com.atmire.app.xmlui.aspect.statistics.JSONStatsMostPopular.generate(JSONStatsMostPopular.java:145)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
</code></pre>
</code></pre></li>
<ul>
<li>It happens on development and production, so I will have to ask Atmire</li>
<li>Most likely an issue with installation/configuration</li>
<li><p>It happens on development and production, so I will have to ask Atmire</p></li>
<li><p>Most likely an issue with installation/configuration</p></li>
</ul>
<h2 id="2016-12-14">2016-12-14</h2>
@ -777,8 +781,8 @@ Caused by: java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceOb
<li>Last week, when we asked CGNET to update the DNS records this weekend, they misunderstood and did it immediately</li>
<li>We quickly told them to undo it, but I just realized they didn&rsquo;t undo the IPv6 AAAA record!</li>
<li>None of our users in African institutes will have IPv6, but some Europeans might, so I need to check if any submissions have been added since then</li>
<li>Update some names and authorities in the database:</li>
</ul>
<li><p>Update some names and authorities in the database:</p>
<pre><code>dspace=# update metadatavalue set authority='5ff35043-942e-4d0a-b377-4daed6e3c1a3', confidence=600, text_value='Duncan, Alan' where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^.*Duncan,? A.*';
UPDATE 204
@ -786,15 +790,17 @@ dspace=# update metadatavalue set authority='46804b53-ea30-4a85-9ccf-b79a35816fa
UPDATE 89
dspace=# update metadatavalue set authority='f840da02-26e7-4a74-b7ba-3e2b723f3684', confidence=600, text_value='Lukuyu, Ben A.' where resource_type_id=2 and metadata_field_id=3 and text_value like '%Lukuyu, B%';
UPDATE 140
</code></pre>
</code></pre></li>
<ul>
<li>Generated a new UUID for Ben using <code>uuidgen | tr [A-Z] [a-z]</code> as the one in Solr had his ORCID but the name format was incorrect</li>
<li>In theory DSpace should be able to check names from ORCID and update the records in the database, but I find that this doesn&rsquo;t work (see Jira bug <a href="https://jira.duraspace.org/browse/DS-3302">DS-3302</a>)</li>
<li>I need to run these updates along with the other one for CIAT that I found last week</li>
<li>Enable OCSP stapling for hosts &gt;= Ubuntu 16.04 in our Ansible playbooks (<a href="https://github.com/ilri/rmg-ansible-public/pull/76">#76</a>)</li>
<li>Working for DSpace Test on the second response:</li>
</ul>
<li><p>Generated a new UUID for Ben using <code>uuidgen | tr [A-Z] [a-z]</code> as the one in Solr had his ORCID but the name format was incorrect</p></li>
<li><p>In theory DSpace should be able to check names from ORCID and update the records in the database, but I find that this doesn&rsquo;t work (see Jira bug <a href="https://jira.duraspace.org/browse/DS-3302">DS-3302</a>)</p></li>
<li><p>I need to run these updates along with the other one for CIAT that I found last week</p></li>
<li><p>Enable OCSP stapling for hosts &gt;= Ubuntu 16.04 in our Ansible playbooks (<a href="https://github.com/ilri/rmg-ansible-public/pull/76">#76</a>)</p></li>
<li><p>Working for DSpace Test on the second response:</p>
<pre><code>$ openssl s_client -connect dspacetest.cgiar.org:443 -servername dspacetest.cgiar.org -tls1_2 -tlsextdebug -status
...
@ -803,21 +809,18 @@ $ openssl s_client -connect dspacetest.cgiar.org:443 -servername dspacetest.cgia
...
OCSP Response Data:
...
Cert Status: good
</code></pre>
Cert Status: good
</code></pre></li>
<ul>
<li>Migrate CGSpace to new server, roughly following these steps:</li>
<li>On old server:</li>
</ul>
<li><p>Migrate CGSpace to new server, roughly following these steps:</p></li>
<li><p>On old server:</p>
<pre><code># service tomcat7 stop
# /home/backup/scripts/postgres_backup.sh
</code></pre>
</code></pre></li>
<ul>
<li>On new server:</li>
</ul>
<li><p>On new server:</p>
<pre><code># systemctl stop tomcat7
# rsync -4 -av --delete 178.79.187.182:/home/cgspace.cgiar.org/assetstore/ /home/cgspace.cgiar.org/assetstore/
@ -843,10 +846,9 @@ $ cd src/git/DSpace/dspace/target/dspace-installer
$ ant update clean_backups
$ exit
# systemctl start tomcat7
</code></pre>
</code></pre></li>
<ul>
<li>It took about twenty minutes and afterwards I had to check a few things, like:
<li><p>It took about twenty minutes and afterwards I had to check a few things, like:</p>
<ul>
<li>check and enable systemd timer for let&rsquo;s encrypt</li>
@ -862,11 +864,12 @@ $ exit
<ul>
<li>Abenet wanted a CSV of the IITA community, but the web export doesn&rsquo;t include the <code>dc.date.accessioned</code> field</li>
<li>I had to export it from the command line using the <code>-a</code> flag:</li>
</ul>
<li><p>I had to export it from the command line using the <code>-a</code> flag:</p>
<pre><code>$ [dspace]/bin/dspace metadata-export -a -f /tmp/iita.csv -i 10568/68616
</code></pre>
</code></pre></li>
</ul>
<h2 id="2016-12-28">2016-12-28</h2>