2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail="dc.title", transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail="THUMBNAIL", transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail="-1", transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade
2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail="dc.title", transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail="THUMBNAIL", transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail="-1", transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade
<pre><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail="dc.title", transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail="THUMBNAIL", transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail="-1", transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
<li><p>DSpace’s own Solr logs don’t give IP addresses, so I will have to enable Nginx’s logging of <code>/solr</code> so I can see where this request came from</p></li>
<li><p>I enabled logging of <code>/rest/</code> and I think I’ll leave it on for good</p></li>
<li><p>Also, the disk is nearly full because of log file issues, so I’m running some compression on DSpace logs</p></li>
<li><p>Normally these stay uncompressed for a month just in case we need to look at them, so now I’ve just compressed anything older than 2 weeks so we can get some disk space back</p></li>
<li><p>The first one seems ok, but I don’t know what to make of the second one…</p></li>
<li><p>I had a look and there is indeed no file with the second checksum in the assetstore (ie, looking in <code>[dspace-dir]/assetstore/99/59/30/...</code>)</p></li>
<li><p>For what it’s worth, there is no item on DSpace Test or S3 backups with that checksum either…</p></li>
<li><p>In other news, I’m looking at JVM settings from the Solr 4.10.2 release, from <code>bin/solr.in.sh</code>:</p>
<li>I did some basic benchmarking on a local DSpace before and after the JVM settings above, but there wasn’t anything amazingly obvious</li>
<li>I want to make the changes on DSpace Test and monitor the JVM heap graphs for a few days to see if they change the JVM GC patterns or anything (munin graphs)</li>
<li>Spin up new CGSpace server on Linode</li>
<li>I did a few traceroutes from Jordan and Kenya and it seems that Linode’s Frankfurt datacenter is a few less hops and perhaps less packet loss than the London one, so I put the new server in Frankfurt</li>
<li>Do initial provisioning</li>
<li>Atmire responded about the MQM warnings in the DSpace logs</li>
<pre><code>dspace=# update metadatavalue set authority='b041f2f4-19e7-4113-b774-0439baabd197', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Mora Benard%';
UPDATE 11
dspace=# update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-6fd5-4b43-9363-58d18e7952c9', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Hoek, R%';
UPDATE 36
dspace=# update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-6fd5-4b43-9363-58d18e7952c9', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like '%an der Hoek%' and text_value !~ '^.*W\.?$';
UPDATE 14
dspace=# update metadatavalue set authority='18349f29-61b1-44d7-ac60-89e55546e812', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thorne, P%';
UPDATE 42
dspace=# update metadatavalue set authority='0d8369bb-57f7-4b2f-92aa-af820b183aca', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thornton, P%';
UPDATE 360
dspace=# update metadatavalue set text_value='Grace, Delia', authority='0b4fcbc1-d930-4319-9b4d-ea1553cca70b', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
<li><p>Pay attention to the regex to prevent false positives in tricky cases with Dutch names!</p></li>
<li><p>I will run these updates on DSpace Test and then force a Discovery reindex, and then run them on CGSpace next week</p></li>
<li><p>More work on the KM4Dev Journal article</p></li>
<li><p>In other news, it seems the batch edit patch is working, there are no more WARN errors in the logs and the batch edit seems to work</p></li>
<li><p>I need to check the CGSpace logs to see if there are still errors there, and then deploy/monitor it there</p></li>
<li><p>Paola from CCAFS mentioned she also has the “take task” bug on CGSpace</p></li>
<li><p>Reading about <ahref="https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html"><code>shared_buffers</code> in PostgreSQL configuration</a> (default is 128MB)</p></li>
<li><p>Looks like we have ~5GB of memory used by caches on the test server (after OS and JVM heap!), so we might as well bump up the buffers for Postgres</p></li>
<li><p>The docs say a good starting point for a dedicated server is 25% of the system RAM, and our server isn’t dedicated (also runs Solr, which can benefit from OS cache) so let’s try 1024MB</p></li>
<li><p>In other news, the authority reindexing keeps crashing (I was manually running it after the author updates above):</p>
<li>For what it’s worth, after running the same SQL updates on my local test server, <code>index-authority</code> runs and completes just fine</li>
<li>I will have to test more</li>
<li>Anyways, I noticed that some of the authority values I set actually have versions of author names we don’t want, ie “Grace, D.”</li>
<li>For example, do a Solr query for “first_name:Grace” and look at the results</li>
<li><p>When I try using the XML format I get an error that the <code>updateLog</code> needs to be configured for that core</p></li>
<li><p>Maybe I can just remove the authority UUID from the records, run the indexing again so it creates a new one for each name variant, then match them correctly?</p>
<pre><code>dspace=# update metadatavalue set authority=null, confidence=-1 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
<li><p>Then I’ll reindex discovery and authority and see how the authority Solr core looks</p></li>
<li><p>After this, now there are authorities for some of the “Grace, D.” and “Grace, Delia” text_values in the database (the first version is actually the same authority that already exists in the core, so it was just added back to some text_values, but the second one is new):</p>
<pre><code>dspace#= update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
<li><p>This proves that unifying author name varieties in authorities is easy, but fixing the name in the authority is tricky!</p></li>
<li><p>Perhaps another way is to just add our own UUID to the authority field for the text_value we like, then re-index authority so they get synced from PostgreSQL to Solr, then set the other text_values to use that authority ID</p></li>
<li><p>Deploy MQM WARN fix on CGSpace (<ahref="https://github.com/ilri/DSpace/pull/289">#289</a>)</p></li>
<li><p>Deploy “take task” hack/fix on CGSpace (<ahref="https://github.com/ilri/DSpace/pull/290">#290</a>)</p></li>
<li><p>I ran the following author corrections and then reindexed discovery:</p>
<pre><code>update metadatavalue set authority='b041f2f4-19e7-4113-b774-0439baabd197', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Mora Benard%';
update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-6fd5-4b43-9363-58d18e7952c9', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Hoek, R%';
update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-6fd5-4b43-9363-58d18e7952c9', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like '%an der Hoek%' and text_value !~ '^.*W\.?$';
update metadatavalue set authority='18349f29-61b1-44d7-ac60-89e55546e812', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thorne, P%';
update metadatavalue set authority='0d8369bb-57f7-4b2f-92aa-af820b183aca', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thornton, P%';
update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
<li><p>Something weird happened and Peter Thorne’s names all ended up as “Thorne”, I guess because the original authority had that as its name value:</p>
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thorne%';
<pre><code>dspace=# update metadatavalue set authority='b2f7603d-2fb5-4018-923a-c4ec8d85b3bb', text_value='Thorne, P.J.' where resource_type_id=2 and metadata_field_id=3 and authority='18349f29-61b1-44d7-ac60-89e55546e812';
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^Thornton[,\.]? P.*';
<li><p>Seems his original authorities are using an incorrect version of the name so I need to generate another UUID and tie it to the correct name, then reindex:</p>
<pre><code>dspace=# update metadatavalue set authority='2df8136e-d8f4-4142-b58c-562337cab764', text_value='Thornton, Philip K.', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^Thornton[,\.]? P.*';
<li><p>It seems that, when you are messing with authority and author text values in the database, it is better to run authority reindex first (postgres→solr authority core) and then Discovery reindex (postgres→solr Discovery core)</p></li>
<li><p>Everything looks ok after authority and discovery reindex</p></li>
<li><p>In other news, I think we should really be using more RAM for PostgreSQL’s <code>shared_buffers</code></p></li>
<li><p>The <ahref="https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html">PostgreSQL documentation</a> recommends using 25% of the system’s RAM on dedicated systems, but we should use a bit less since we also have a massive JVM heap and also benefit from some RAM being used by the OS cache</p></li>
<pre><code>dspace=# update metadatavalue set authority='34df639a-42d8-4867-a3f2-1892075fcb3f', text_value='Thorne, P.J.' where resource_type_id=2 and metadata_field_id=3 and authority='18349f29-61b1-44d7-ac60-89e55546e812' or authority='021cd183-946b-42bb-964e-522ebff02993';
dspace=# update metadatavalue set authority='2df8136e-d8f4-4142-b58c-562337cab764', text_value='Thornton, Philip K.', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^Thornton[,\.]? P.*';
<li>After enabling a sizable <code>shared_buffers</code> for CGSpace’s PostgreSQL configuration the number of connections to the database dropped significantly</li>
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like 'International Center for Tropical Agriculture';
dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value = 'International Center for Tropical Agriculture';
UPDATE 1693
dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', text_value='International Center for Tropical Agriculture', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like '%CIAT%';
<li>Looking at logs, it seems we need to evaluate which logs we keep and for how long</li>
<li>Basically the only ones we <em>need</em> are <code>dspace.log</code> because those are used for legacy statistics (need to keep for 1 month)</li>
<li>Other logs will be an issue because they don’t have date stamps</li>
<li>I will add date stamps to the logs we’re storing from the tomcat7 user’s cron jobs at least, using: <code>$(date --iso-8601)</code></li>
<li>Would probably be better to make custom logrotate files for them in the future</li>
<li>Clean up some unneeded log files from 2014 (they weren’t large, just don’t need them)</li>
<li>So basically, new cron jobs for logs should look something like this:</li>
<li><p>Find any file named <code>*.log*</code> that isn’t <code>dspace.log*</code>, isn’t already zipped, and is older than one day, and zip it:</p>
<li><p>Since there is <code>xzgrep</code> and <code>xzless</code> we can actually just zip them after one day, why not?!</p></li>
<li><p>We can keep the zipped ones for two weeks just in case we need to look for errors, etc, and delete them after that</p></li>
<li><p>I use <code>schedtool -B</code> and <code>ionice -c2 -n7</code> to set the CPU scheduling to <code>SCHED_BATCH</code> and the IO to best effort which should, in theory, impact important system processes like Tomcat and PostgreSQL less</p></li>
<li><p>When the tasks are running you can see that the policies do apply:</p>
<li>Atmire sent a quick fix for the <code>last-update.txt</code> file not found error</li>
<li>After applying pull request <ahref="https://github.com/ilri/DSpace/pull/291">#291</a> on DSpace Test I no longer see the error in the logs after the <code>UpdateSolrStorageReports</code> task runs</li>
<li>Also, I’m toying with the idea of moving the <code>tomcat7</code> user’s cron jobs to <code>/etc/cron.d</code> so we can manage them in Ansible</li>
<li>Made a pull request with a template for the cron jobs (<ahref="https://github.com/ilri/rmg-ansible-public/pull/75">#75</a>)</li>
<li>Testing SMTP from the new CGSpace server and it’s not working, I’ll have to tell James</li>
</ul>
<h2id="2016-12-15">2016-12-15</h2>
<ul>
<li>Start planning for server migration this weekend, letting users know</li>
<li>I am trying to figure out what the process is to <ahref="http://handle.net/hnr_support.html">update the server’s IP in the Handle system</a>, and emailing the hdladmin account bounces(!)</li>
<li>I will contact the Jane Euler directly as I know I’ve corresponded with her in the past</li>
<li>She said that I should indeed just re-run the <code>[dspace]/bin/dspace make-handle-config</code> command and submit the new <code>sitebndl.zip</code> file to the CNRI website</li>
<li>Also I was troubleshooting some workflow issues from Bizuwork</li>
<li>I re-created the same scenario by adding a non-admin account and submitting an item, but I was able to successfully approve and commit it</li>
<li>So it turns out it’s not a bug, it’s just that Peter was added as a reviewer/admin AFTER the items were submitted</li>
<li>This is how DSpace works, and I need to ask if there is a way to override someone’s submission, as the other reviewer seems to not be paying attention, or has perhaps taken the item from the task pool?</li>
<li>Run a batch edit to add “RANGELANDS” ILRI subject to all items containing the word “RANGELANDS” in their metadata for Peter Ballantyne</li>
</ul>
<p><imgsrc="/cgspace-notes/2016/12/batch-edit1.png"alt="Select all items with "rangelands" in metadata"/>
<li>Add four new CRP subjects for 2017 and sort the input forms alphabetically (<ahref="https://github.com/ilri/DSpace/pull/294">#294</a>)</li>
<li>Test the SMTP on the new server and it’s working</li>
<li>Last week, when we asked CGNET to update the DNS records this weekend, they misunderstood and did it immediately</li>
<li>We quickly told them to undo it, but I just realized they didn’t undo the IPv6 AAAA record!</li>
<li>None of our users in African institutes will have IPv6, but some Europeans might, so I need to check if any submissions have been added since then</li>
<pre><code>dspace=# update metadatavalue set authority='5ff35043-942e-4d0a-b377-4daed6e3c1a3', confidence=600, text_value='Duncan, Alan' where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^.*Duncan,? A.*';
UPDATE 204
dspace=# update metadatavalue set authority='46804b53-ea30-4a85-9ccf-b79a35816fa9', confidence=600, text_value='Mekonnen, Kindu' where resource_type_id=2 and metadata_field_id=3 and text_value like '%Mekonnen, K%';
UPDATE 89
dspace=# update metadatavalue set authority='f840da02-26e7-4a74-b7ba-3e2b723f3684', confidence=600, text_value='Lukuyu, Ben A.' where resource_type_id=2 and metadata_field_id=3 and text_value like '%Lukuyu, B%';
<li><p>Generated a new UUID for Ben using <code>uuidgen | tr [A-Z] [a-z]</code> as the one in Solr had his ORCID but the name format was incorrect</p></li>
<li><p>In theory DSpace should be able to check names from ORCID and update the records in the database, but I find that this doesn’t work (see Jira bug <ahref="https://jira.duraspace.org/browse/DS-3302">DS-3302</a>)</p></li>
<li><p>I need to run these updates along with the other one for CIAT that I found last week</p></li>
<li><p>Enable OCSP stapling for hosts >= Ubuntu 16.04 in our Ansible playbooks (<ahref="https://github.com/ilri/rmg-ansible-public/pull/76">#76</a>)</p></li>
<li><p>Working for DSpace Test on the second response:</p>