<li>Some names that I thought I fixed in July seem not to be:</li>
</ul>
<pre><code>dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Poole, %';
Poole, Elizabeth Jane | b6efa27f-8829-4b92-80fe-bc63e03e3ccb | 600
Poole, Elizabeth Jane | 41628f42-fc38-4b38-b473-93aec9196326 | 600
Poole, Elizabeth Jane | 83b82da0-f652-4ebc-babc-591af1697919 | 600
Poole, Elizabeth Jane | c3a22456-8d6a-41f9-bba0-de51ef564d45 | 600
Poole, E.J. | c3a22456-8d6a-41f9-bba0-de51ef564d45 | 600
Poole, E.J. | 0fbd91b9-1b71-4504-8828-e26885bf8b84 | 600
(6 rows)
</code></pre>
<ul>
<li>At least a few of these actually have the correct ORCID, but I will unify the authority to be c3a22456-8d6a-41f9-bba0-de51ef564d45</li>
</ul>
<pre><code>dspacetest=# update metadatavalue set authority='c3a22456-8d6a-41f9-bba0-de51ef564d45', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Poole, %';
UPDATE 69
</code></pre>
<ul>
<li>And for Peter Ballantyne:</li>
</ul>
<pre><code>dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Ballantyne, %';
Ballantyne, Peter | ba5f205b-b78b-43e5-8e80-0c9a1e1ad2ca | 600
Ballantyne, Peter | 20f21160-414c-4ecf-89ca-5f2cb64e75c1 | 600
(5 rows)
</code></pre>
<ul>
<li>Again, a few have the correct ORCID, but there should only be one authority…</li>
</ul>
<pre><code>dspacetest=# update metadatavalue set authority='4f04ca06-9a76-4206-bd9c-917ca75d278e', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Ballantyne, %';
UPDATE 58
</code></pre>
<ul>
<li>And for me:</li>
</ul>
<pre><code>dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Orth, A%';
Orth, Alan | 4884def0-4d7e-4256-9dd4-018cd60a5871 | 600
Orth, A. | 4884def0-4d7e-4256-9dd4-018cd60a5871 | 600
Orth, A. | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
(3 rows)
dspacetest=# update metadatavalue set authority='1a1943a0-3f87-402f-9afe-e52fb46a513e', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Orth, %';
UPDATE 11
</code></pre>
<ul>
<li>And for CCAFS author Bruce Campbell that I had discussed with CCAFS earlier this week:</li>
</ul>
<pre><code>dspacetest=# update metadatavalue set authority='0e414b4c-4671-4a23-b570-6077aca647d8', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Campbell, B%';
UPDATE 166
dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Campbell, B%';
<li>If I unzip the original zip from CIAT on Windows, re-zip it with 7zip on Windows, and then unzip it on Linux directly, the file names seem to be proper UTF-8</li>
<li>We should definitely clean filenames so they don’t use characters that are tricky to process in CSV and shell scripts, like: <code>,</code>, <code>'</code>, and <code>"</code></li>
<li>I need to write a Python script to match that for renaming files in the file system</li>
<li>When importing SAF bundles it seems you can specify the target collection on the command line using <code>-c 10568/4003</code> or in the <code>collections</code> file inside each item in the bundle</li>
<li>Seems that the latter method causes a null pointer exception, so I will just have to use the former method</li>
<li>In the end I was able to import the files after unzipping them ONLY on Linux
<ul>
<li>The CSV file was giving file names in UTF-8, and unzipping the zip on Mac OS X and transferring it was converting the file names to Unicode equivalence like I saw above</li>
<li>Import CIAT Gender Network records to CGSpace, first creating the SAF bundles as my user, then importing as the <code>tomcat7</code> user, and deleting the bundle, for each collection’s items:</li>
<li>Erase and rebuild DSpace Test based on latest Ubuntu 16.04, PostgreSQL 9.5, and Java 8 stuff</li>
<li>Reading about PostgreSQL maintenance and it seems manual vacuuming is only for certain workloads, such as heavy update/write loads</li>
<li>I suggest we disable our nightly manual vacuum task, as we’re a mostly read workload, and I’d rather stick as close to the documentation as possible since we haven’t done any testing/observation of PostgreSQL</li>
<pre><code>dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc) to /tmp/affiliations.csv with csv;
Sep 14, 2016 11:32:22 AM com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator buildModelAndSchemas
SEVERE: Failed to generate the schema for the JAX-B elements
com.sun.xml.bind.v2.runtime.IllegalAnnotationsException: 2 counts of IllegalAnnotationExceptions
java.util.Map is an interface, and JAXB can't handle interfaces.
this problem is related to the following location:
at java.util.Map
at public java.util.Map com.atmire.dspace.rest.common.Statlet.getRender()
at com.atmire.dspace.rest.common.Statlet
java.util.Map does not have a no-arg default constructor.
this problem is related to the following location:
at java.util.Map
at public java.util.Map com.atmire.dspace.rest.common.Statlet.getRender()
at com.atmire.dspace.rest.common.Statlet
</code></pre>
<ul>
<li>Then 20 minutes later another outOfMemoryError:</li>
</ul>
<pre><code>Exception in thread "http-bio-127.0.0.1-8081-exec-25" java.lang.OutOfMemoryError: Java heap space
at java.lang.StringCoding.decode(StringCoding.java:215)
</code></pre>
<ul>
<li>Perhaps these particular issues <em>are</em> memory issues, the munin graphs definitely show some weird purging/allocating behavior starting this week</li>
<li>And really, we did reduce the memory of CGSpace in late 2015, so maybe we should just increase it again, now that our usage is higher and we are having memory errors in the logs</li>
<li>Oh great, the configuration on the actual server is different than in configuration management!</li>
<li>Seems we added a bunch of settings to the <code>/etc/default/tomcat7</code> in December, 2015 and never updated our ansible repository:</li>
<li>Looking at Google Webmaster Tools again, it seems the work I did on URL query parameters and blocking via the <code>X-Robots-Tag</code> HTTP header in March, 2016 seem to have had a positive effect on Google’s index for CGSpace</li>
</ul>
<p><imgsrc="../images/2016/09/google-webmaster-tools-index.png"alt="Google Webmaster Tools for CGSpace"/></p>
<li>CGSpace crashed again, and there are TONS of heap space errors but the datestamps aren’t on those lines so I’m not sure if they were yesterday:</li>
Exception in thread "http-bio-127.0.0.1-8081-exec-247" java.lang.OutOfMemoryError: Java heap space
Exception in thread "http-bio-127.0.0.1-8081-exec-241" java.lang.OutOfMemoryError: Java heap space
Exception in thread "http-bio-127.0.0.1-8081-exec-243" java.lang.OutOfMemoryError: Java heap space
Exception in thread "http-bio-127.0.0.1-8081-exec-258" java.lang.OutOfMemoryError: Java heap space
Exception in thread "http-bio-127.0.0.1-8081-exec-268" java.lang.OutOfMemoryError: Java heap space
Exception in thread "http-bio-127.0.0.1-8081-exec-263" java.lang.OutOfMemoryError: Java heap space
Exception in thread "http-bio-127.0.0.1-8081-exec-280" java.lang.OutOfMemoryError: Java heap space
Exception in thread "Thread-54216" org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Exception writing document id 7feaa95d-8e1f-4f45-80bb
-e14ef82ee224 to the index; possible analysis error.
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)
at com.atmire.statistics.SolrLogThread.run(SourceFile:25)
</code></pre>
<ul>
<li>I bumped the heap space from 4096m to 5120m to see if this is <em>really</em> about heap speace or not.</li>
<li>Looking into some of these errors that I’ve seen this week but haven’t noticed before:</li>
</ul>
<pre><code># zcat -f -- /var/log/tomcat7/catalina.* | grep -c 'Failed to generate the schema for the JAX-B elements'
113
</code></pre>
<ul>
<li>I’ve sent a message to Atmire about the Solr error to see if it’s related to their batch update module</li>
<li>After that we need to take the top ~300 and make a controlled vocabulary for it</li>
<li>I dumped a list of the top 300 affiliations from the database, sorted it alphabetically in OpenRefine, and created a controlled vocabulary for it (<ahref="https://github.com/ilri/DSpace/pull/267">#267</a>)</li>