<li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li>
<li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li>
<li>After running DSpace for over five years I’ve never needed to look in any other log file than dspace.log, leave alone one from last year!</li>
<li>This will save us a few gigs of backup space we’re paying for on S3</li>
<li>Also, I noticed the <code>checker</code> log has some errors we should pay attention to:</li>
</ul>
<pre><code>Run start time: 03/06/2016 04:00:22
Error retrieving bitstream ID 71274 from asset store.
java.io.FileNotFoundException: /home/cgspace.cgiar.org/assetstore/64/29/06/64290601546459645925328536011917633626 (Too many open files)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at edu.sdsc.grid.io.local.LocalFileInputStream.open(LocalFileInputStream.java:171)
at edu.sdsc.grid.io.GeneralFileInputStream.<init>(GeneralFileInputStream.java:145)
at edu.sdsc.grid.io.local.LocalFileInputStream.<init>(LocalFileInputStream.java:139)
at edu.sdsc.grid.io.FileFactory.newFileInputStream(FileFactory.java:630)
at org.dspace.storage.bitstore.BitstreamStorageManager.retrieve(BitstreamStorageManager.java:525)
at org.dspace.checker.BitstreamDAO.getBitstream(BitstreamDAO.java:60)
at org.dspace.checker.CheckerCommand.processBitstream(CheckerCommand.java:303)
at org.dspace.checker.CheckerCommand.checkBitstream(CheckerCommand.java:171)
at org.dspace.checker.CheckerCommand.process(CheckerCommand.java:120)
at org.dspace.app.checker.ChecksumChecker.main(ChecksumChecker.java:236)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:225)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:77)
<li>So this would be the <code>tomcat7</code> Unix user, who seems to have a default limit of 1024 files in its shell</li>
<li>For what it’s worth, we have been setting the actual Tomcat 7 process’ limit to 16384 for a few years (in <code>/etc/default/tomcat7</code>)</li>
<li>Looks like cron will read limits from <code>/etc/security/limits.*</code> so we can do something for the tomcat7 user there</li>
<li>Submit pull request for Tomcat 7 limits in Ansible dspace role (<ahref="https://github.com/ilri/rmg-ansible-public/pull/30">#30</a>)</li>
<li><p>Also, adjust the cron jobs for backups so they only backup <code>dspace.log</code> and some stats files (.dat)</p></li>
<li><p>Try to do some metadata field migrations using the Atmire batch UI (<code>dc.Species</code> → <code>cg.species</code>) but it took several hours and even missed a few records</p></li>
<li><p>A better way to move metadata on this scale is via SQL, for example <code>dc.type.output</code> → <code>dc.type</code> (their IDs in the metadatafieldregistry are 66 and 109, respectively):</p>
<li>Write shell script to do the migration of fields: <ahref="https://gist.github.com/alanorth/72a70aca856d76f24c127a6e67b3342b">https://gist.github.com/alanorth/72a70aca856d76f24c127a6e67b3342b</a></li>
<li>Discuss metadata renaming with Abenet, we decided it’s better to start with the center-specific subjects like ILRI, CIFOR, CCAFS, IWMI, and CPWF</li>
<li>I’ve e-mailed CCAFS and CPWF people to ask them how much time it will take for them to update their systems to cope with this change</li>
</ul>
<h2id="2016-04-10">2016-04-10</h2>
<ul>
<li>Looking at the DOI issue <ahref="https://www.yammer.com/dspacedevelopers/#/Threads/show?threadId=678507860">reported by Leroy from CIAT a few weeks ago</a></li>
<li><p>I will manually edit the <code>dc.identifier.doi</code> in <ahref="https://cgspace.cgiar.org/handle/10568/72509?show=full"><sup>10568</sup>⁄<sub>72509</sub></a> and tweet the link, then check back in a week to see if the donut gets updated</p></li>
<li>The donut is already updated and shows the correct number now</li>
<li>CCAFS people say it will only take them an hour to update their code for the metadata renames, so I proposed we’d do it tentatively on Monday the 18th.</li>
<li><p>Listings and Reports is still not returning reliable data for <code>dc.type</code></p></li>
<li><p>I think we need to ask Atmire, as their documentation isn’t too clear on the format of the filter configs</p></li>
<li><p>Alternatively, I want to see if I move all the data from <code>dc.type.output</code> to <code>dc.type</code> and then re-index, if it behaves better</p></li>
<li><p>Looking at our <code>input-forms.xml</code> I see we have two sets of ILRI subjects, but one has a few extra subjects</p></li>
<li><p>Remove one set of ILRI subjects and remove duplicate <code>VALUE CHAINS</code> from existing list (<ahref="https://github.com/ilri/DSpace/pull/216">#216</a>)</p></li>
<li><p>I decided to keep the set of subjects that had <code>FMD</code> and <code>RANGELANDS</code> added, as it appears to have been requested to have been added, and might be the newer list</p></li>
<li><p>I deleted them on CGSpace but I’ll wait to do the re-index as we’re going to be doing one in a few days for the metadata changes anyways</p></li>
<li><p>In other news, moving the <code>dc.type.output</code> to <code>dc.type</code> and re-indexing seems to have fixed the Listings and Reports issue from above</p></li>
<li><p>Unfortunately this isn’t a very good solution, because Listings and Reports config should allow us to filter on <code>dc.type.*</code> but the documentation isn’t very clear and I couldn’t reach Atmire today</p></li>
<li><p>We want to do the <code>dc.type.output</code> move on CGSpace anyways, but we should wait as it might affect other external people!</p></li>
at org.dspace.rest.Resource.processFinally(Resource.java:163)
at org.dspace.rest.HandleResource.getObject(HandleResource.java:81)
at sun.reflect.GeneratedMethodAccessor198.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1511)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1442)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1391)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1381)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
<pre><code># select handle from item, handle where handle.resource_id = item.item_id AND item.item_id in (select resource_id from metadatavalue where resource_type_id=2 and metadata_field_id=105);
<li><p>We have 18,000 of these errors right now…</p></li>
<li><p>Delete a few more old metadata values: <code>dc.Species.animal</code>, <code>dc.type.journal</code>, and <code>dc.publicationcategory</code>:</p>
<li><p>Also, I migrated CGSpace to using the PGDG PostgreSQL repo as the infrastructure playbooks had been using it for a while and it seemed to be working well</p></li>
<li><p>Basically, this gives us the ability to use the latest upstream stable 9.3.x release (currently 9.3.12)</p></li>
<li><p>Looking into the REST API errors again, it looks like these started appearing a few days ago in the tens of thousands:</p>
<li><p>I found a recent discussion on the DSpace mailing list and I’ve asked for advice there</p></li>
<li><p>Looks like this issue was noted and fixed in DSpace 5.5 (we’re on 5.1): <ahref="https://jira.duraspace.org/browse/DS-2936">https://jira.duraspace.org/browse/DS-2936</a></p></li>
<li><p>I’ve sent a message to Atmire asking about compatibility with DSpace 5.5</p></li>
<li>Fix a bunch of metadata consistency issues with IITA Journal Articles (Peer review, Formally published, messed up DOIs, etc)</li>
<li>Atmire responded with DSpace 5.5 compatible versions for their modules, so I’ll start testing those in a few weeks</li>
</ul>
<h2id="2016-04-22">2016-04-22</h2>
<ul>
<li>Import 95 records into <ahref="https://cgspace.cgiar.org/handle/10568/42219">CTA’s Agrodok collection</a></li>
</ul>
<h2id="2016-04-26">2016-04-26</h2>
<ul>
<li>Test embargo during item upload</li>
<li>Seems to be working but the help text is misleading as to the date format</li>
<li>It turns out the <code>robots.txt</code> issue we thought we solved last month isn’t solved because you can’t use wildcards in URL patterns: <ahref="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
<li>Write some nginx rules to add <code>X-Robots-Tag</code> HTTP headers to the dynamic requests from <code>robots.txt</code> instead</li>
<li><p>I restarted tomcat and it is back up</p></li>
<li><p>Add Spanish XMLUI strings so those users see “CGSpace” instead of “DSpace” in the user interface (<ahref="https://github.com/ilri/DSpace/pull/222">#222</a>)</p></li>
<li><p>Submit patch to upstream DSpace for the misleading help text in the embargo step of the item submission: <ahref="https://jira.duraspace.org/browse/DS-3172">https://jira.duraspace.org/browse/DS-3172</a></p></li>
<li><p>Update infrastructure playbooks for nginx 1.10.x (stable) release: <ahref="https://github.com/ilri/rmg-ansible-public/issues/32">https://github.com/ilri/rmg-ansible-public/issues/32</a></p></li>
<li><p>Currently running on DSpace Test, we’ll give it a few days before we adjust CGSpace</p></li>
<li><p>CGSpace down, restarted tomcat and it’s back up</p></li>