<pre><code>dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Poole, %';
<pre><code>dspacetest=# update metadatavalue set authority='c3a22456-8d6a-41f9-bba0-de51ef564d45', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Poole, %';
<pre><code>dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Ballantyne, %';
<pre><code>dspacetest=# update metadatavalue set authority='4f04ca06-9a76-4206-bd9c-917ca75d278e', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Ballantyne, %';
<pre><code>dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Orth, A%';
dspacetest=# update metadatavalue set authority='1a1943a0-3f87-402f-9afe-e52fb46a513e', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Orth, %';
<pre><code>dspacetest=# update metadatavalue set authority='0e414b4c-4671-4a23-b570-6077aca647d8', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Campbell, B%';
UPDATE 166
dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Campbell, B%';
<li>If I unzip the original zip from CIAT on Windows, re-zip it with 7zip on Windows, and then unzip it on Linux directly, the file names seem to be proper UTF-8</li>
<li><p>We should definitely clean filenames so they don’t use characters that are tricky to process in CSV and shell scripts, like: <code>,</code>, <code>'</code>, and <code>"</code></p>
<li><p>I need to write a Python script to match that for renaming files in the file system</p></li>
<li><p>When importing SAF bundles it seems you can specify the target collection on the command line using <code>-c 10568/4003</code> or in the <code>collections</code> file inside each item in the bundle</p></li>
<li><p>Seems that the latter method causes a null pointer exception, so I will just have to use the former method</p></li>
<li><p>In the end I was able to import the files after unzipping them ONLY on Linux</p>
<li>The CSV file was giving file names in UTF-8, and unzipping the zip on Mac OS X and transferring it was converting the file names to Unicode equivalence like I saw above</li>
<li><p>Import CIAT Gender Network records to CGSpace, first creating the SAF bundles as my user, then importing as the <code>tomcat7</code> user, and deleting the bundle, for each collection’s items:</p>
<li>Erase and rebuild DSpace Test based on latest Ubuntu 16.04, PostgreSQL 9.5, and Java 8 stuff</li>
<li>Reading about PostgreSQL maintenance and it seems manual vacuuming is only for certain workloads, such as heavy update/write loads</li>
<li>I suggest we disable our nightly manual vacuum task, as we’re a mostly read workload, and I’d rather stick as close to the documentation as possible since we haven’t done any testing/observation of PostgreSQL</li>
<pre><code>dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc) to /tmp/affiliations.csv with csv;
<li><p>Perhaps these particular issues <em>are</em> memory issues, the munin graphs definitely show some weird purging/allocating behavior starting this week</p></li>
<li>And really, we did reduce the memory of CGSpace in late 2015, so maybe we should just increase it again, now that our usage is higher and we are having memory errors in the logs</li>
<li>Oh great, the configuration on the actual server is different than in configuration management!</li>
<li>Looking at Google Webmaster Tools again, it seems the work I did on URL query parameters and blocking via the <code>X-Robots-Tag</code> HTTP header in March, 2016 seem to have had a positive effect on Google’s index for CGSpace</li>
</ul>
<p><imgsrc="/cgspace-notes/2016/09/google-webmaster-tools-index.png"alt="Google Webmaster Tools for CGSpace"/></p>
<li><p>CGSpace crashed again, and there are TONS of heap space errors but the datestamps aren’t on those lines so I’m not sure if they were yesterday:</p>
Exception in thread "http-bio-127.0.0.1-8081-exec-247" java.lang.OutOfMemoryError: Java heap space
Exception in thread "http-bio-127.0.0.1-8081-exec-241" java.lang.OutOfMemoryError: Java heap space
Exception in thread "http-bio-127.0.0.1-8081-exec-243" java.lang.OutOfMemoryError: Java heap space
Exception in thread "http-bio-127.0.0.1-8081-exec-258" java.lang.OutOfMemoryError: Java heap space
Exception in thread "http-bio-127.0.0.1-8081-exec-268" java.lang.OutOfMemoryError: Java heap space
Exception in thread "http-bio-127.0.0.1-8081-exec-263" java.lang.OutOfMemoryError: Java heap space
Exception in thread "http-bio-127.0.0.1-8081-exec-280" java.lang.OutOfMemoryError: Java heap space
Exception in thread "Thread-54216" org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Exception writing document id 7feaa95d-8e1f-4f45-80bb
-e14ef82ee224 to the index; possible analysis error.
<li><p>After that we need to take the top ~300 and make a controlled vocabulary for it</p></li>
<li><p>I dumped a list of the top 300 affiliations from the database, sorted it alphabetically in OpenRefine, and created a controlled vocabulary for it (<ahref="https://github.com/ilri/DSpace/pull/267">#267</a>)</p></li>
<li>Run all system updates on DSpace Test and reboot the server</li>
<li>Merge changes for sponsorship and affiliation controlled vocabularies (<ahref="https://github.com/ilri/DSpace/pull/267">#267</a>, <ahref="https://github.com/ilri/DSpace/pull/268">#268</a>)</li>
<li>Merge minor changes to <code>messages.xml</code> to reconcile it with the stock DSpace 5.1 one (<ahref="https://github.com/ilri/DSpace/pull/269">#269</a>)</li>
<li>Peter asked about adding title search to Discovery</li>
<li>The index was already defined, so I just added it to the search filters</li>
<li>It works but CGSpace apparently uses <code>OR</code> for search terms, which makes the search results basically useless</li>
<li>I need to read the docs and ask on the mailing list to see if we can tweak that</li>
<li>Generate a new list of sponsors from the database for Peter Ballantyne so we can clean them up and update the controlled vocabulary</li>
</ul>
<h2id="2016-09-21">2016-09-21</h2>
<ul>
<li>Turns out the Solr search logic switched from OR to AND in DSpace 6.0 and the change is easy to backport: <ahref="https://jira.duraspace.org/browse/DS-2809">https://jira.duraspace.org/browse/DS-2809</a></li>
<li>Update controlled vocabulary for sponsorship based on the latest corrected values from the database</li>
</ul>
<h2id="2016-09-25">2016-09-25</h2>
<ul>
<li>Merge accession date improvements for CUA module (<ahref="https://github.com/ilri/DSpace/pull/275">#275</a>)</li>
<li>Merge addition of accession date to Discovery search filters (<ahref="https://github.com/ilri/DSpace/pull/276">#276</a>)</li>
<li>Merge updates to sponsorship controlled vocabulary (<ahref="https://github.com/ilri/DSpace/pull/277">#277</a>)</li>
<li>I’ve been trying to add a search filter for <code>dc.description</code> so the IITA people can search for some tags they use there, but for some reason the filter never shows up in Atmire’s CUA</li>
<li>Not sure if it’s something like we already have too many filters there (30), or the filter name is reserved, etc…</li>
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where resource_type_id=2 and metadata_field_id=203 group by text_value order by count desc) to /tmp/ilrisubjects.csv with csv;
<li><p>I’ve been monitoring this for almost two years in this GitHub issue: <ahref="https://github.com/ilri/DSpace/issues/38">https://github.com/ilri/DSpace/issues/38</a></p></li>
<pre><code>dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Vermeu
<pre><code>dspacetest=# update metadatavalue set authority='fe4b719f-6cc4-4d65-8504-7a83130b9f83w', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Vermeulen, S%';
<li><p>Hmm, now her name is missing from the authors facet and only shows the authority ID</p></li>
<li><p>On the production server there is an item with her ORCID but it is using a different authority: f01f7b7b-be3f-4df7-a61d-b73c067de88d</p></li>
<li><p>Maybe I used the wrong one… I need to look again at the production database</p></li>
<li><p>On a clean snapshot of the database I see the correct authority should be <code>f01f7b7b-be3f-4df7-a61d-b73c067de88d</code>, not <code>fe4b719f-6cc4-4d65-8504-7a83130b9f83</code></p></li>
<li><p>Updating her authorities again and reindexing:</p>
<pre><code>dspacetest=# update metadatavalue set authority='f01f7b7b-be3f-4df7-a61d-b73c067de88d', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Vermeulen, S%';
<li>Make a placeholder pull request for <code>discovery.xml</code> changes (<ahref="https://github.com/ilri/DSpace/pull/278">#278</a>), as I still need to test their effect on Atmire content analysis module</li>
<li>Make a placeholder pull request for Font Awesome changes (<ahref="https://github.com/ilri/DSpace/pull/279">#279</a>), which replaces the GitHub image in the footer with an icon, and add style for RSS and @ icons that I will start replacing in community/collection HTML intros</li>
<li>Had some issues with local test server after messing with Solr too much, had to blow everything away and re-install from CGSpace</li>
<li>Going to try to update Sonja Vermeulen’s authority to 2b4166b7-6e4d-4f66-9d8b-ddfbec9a6ae0, as that seems to be one of her authorities that has an ORCID</li>
<li>Merge Font Awesome changes (<ahref="https://github.com/ilri/DSpace/pull/279">#279</a>)</li>
<li>Minor fix to a string in Atmire’s CUA module (<ahref="https://github.com/ilri/DSpace/pull/280">#280</a>)</li>
<li><p>This seems to be what I’ll need to do for Sonja Vermeulen (but with <code>2b4166b7-6e4d-4f66-9d8b-ddfbec9a6ae0</code> instead on the live site):</p>
<pre><code>dspacetest=# update metadatavalue set authority='09e4da69-33a3-45ca-b110-7d3f82d2d6d2', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Vermeulen, S%';
dspacetest=# update metadatavalue set authority='09e4da69-33a3-45ca-b110-7d3f82d2d6d2', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Vermeulen SJ%';
<li><p>And then update Discovery and Authority indexes</p></li>
<li><p>Minor fix for “Subject” string in Discovery search and Atmire modules (<ahref="https://github.com/ilri/DSpace/pull/281">#281</a>)</p></li>
<li><p>Start testing batch fixes for ILRI subject from Peter:</p>
<pre><code>dspacetest=# select distinct text_value from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/5472', '10568/5473')));