I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error
I tested on DSpace Test as well and it doesn’t work there either
I asked on the dspace-tech mailing list because it seems to be broken, and actually now I’m not sure if we’ve ever had the sharding task run successfully over all these years
I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error
I tested on DSpace Test as well and it doesn’t work there either
I asked on the dspace-tech mailing list because it seems to be broken, and actually now I’m not sure if we’ve ever had the sharding task run successfully over all these years
<li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li>
<li>I tested on DSpace Test as well and it doesn’t work there either</li>
<li>I asked on the dspace-tech mailing list because it seems to be broken, and actually now I’m not sure if we’ve ever had the sharding task run successfully over all these years</li>
</ul>
<h2id="2017-01-04">2017-01-04</h2>
<ul>
<li>I tried to shard my local dev instance and it fails the same way:</li>
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:867)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:448)
... 10 more
Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry request with a non-repeatable request entity. The cause lists the reason the original request failed.
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:659)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
<li>Very interesting… it creates the core and then fails somehow</li>
</ul>
<h2id="2017-01-08">2017-01-08</h2>
<ul>
<li>Put Sisay’s <code>item-view.xsl</code> code to show mapped collections on CGSpace (<ahref="https://github.com/ilri/DSpace/pull/295">#295</a>)</li>
</ul>
<h2id="2017-01-09">2017-01-09</h2>
<ul>
<li>A user wrote to tell me that the new display of an item’s mappings had a crazy bug for at least one item: <ahref="https://cgspace.cgiar.org/handle/10568/78596">https://cgspace.cgiar.org/handle/10568/78596</a></li>
<li>She said she only mapped it once, but it appears to be mapped 184 times</li>
<li>I tried to clean up the duplicate mappings by exporting the item’s metadata to CSV, editing, and re-importing, but DSpace said “no changes were detected”</li>
<li>I’ve asked on the dspace-tech mailing list to see if anyone can help</li>
<li>I found an old post on the mailing list discussing a similar issue, and listing some SQL commands that might help</li>
<li>For example, this shows 186 mappings for the item, the first three of which are real:</li>
</ul>
<pre><code>dspace=# select * from collection2item where item_id = '80596';
</code></pre>
<ul>
<li>Then I deleted the others:</li>
</ul>
<pre><code>dspace=# delete from collection2item where item_id = '80596' and id not in (90792, 90806, 90807);
</code></pre>
<ul>
<li>And in the item view it now shows the correct mappings</li>
<li>I will have to ask the DSpace people if this is a valid approach</li>
<li>Finish looking at the Journal Title corrections of the top 500 Journal Titles so we can make a controlled vocabulary from it</li>
</ul>
<h2id="2017-01-11">2017-01-11</h2>
<ul>
<li>Maria found another item with duplicate mappings: <ahref="https://cgspace.cgiar.org/handle/10568/78658">https://cgspace.cgiar.org/handle/10568/78658</a></li>
<li>Error in <code>fix-metadata-values.py</code> when it tries to print the value for Entwicklung & Ländlicher Raum:</li>
</ul>
<pre><code>Traceback (most recent call last):
File "./fix-metadata-values.py", line 80, in <module>
<li>I’m actually not sure if we need to encode() the strings to UTF-8 before writing them to the database… I’ve never had this issue before</li>
<li>Now back to cleaning up some journal titles so we can make the controlled vocabulary:</li>
<pre><code>dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by count desc limit 500) to /tmp/journal-titles.csv with csv;
</code></pre>
<ul>
<li>The values are a bit dirty and outdated, since the file I had given to Abenet and Peter was from November</li>
<li>I will have to go through these and fix some more before making the controlled vocabulary</li>
<li>Added 30 more corrections or so, now there are 49 total and I’ll have to get the top 500 after applying them</li>
</ul>
<h2id="2017-01-13">2017-01-13</h2>
<ul>
<li>Add <code>FOOD SYSTEMS</code> to CIAT subjects, waiting to merge: <ahref="https://github.com/ilri/DSpace/pull/296">https://github.com/ilri/DSpace/pull/296</a></li>
</ul>
<h2id="2017-01-16">2017-01-16</h2>
<ul>
<li>Fix the two items Maria found with duplicate mappings with this script:</li>
</ul>
<pre><code>/* 184 in correct mappings: https://cgspace.cgiar.org/handle/10568/78596 */
delete from collection2item where item_id = '80596' and id not in (90792, 90806, 90807);
<li>Helping clean up some file names in the 232 CIAT records that Sisay worked on last week</li>
<li>There are about 30 files with <code>%20</code> (space) and Spanish accents in the file name</li>
<li>At first I thought we should fix these, but actually it is <ahref="https://www.w3.org/TR/html4/appendix/notes.html#h-B.2.1">prescribed by the W3 working group to convert these to UTF8 and URL encode them</a>!</li>
<li>And the file names don’t really matter either, as long as the SAF Builder tool can read them—after that DSpace renames them with a hash in the assetstore</li>
<li>Seems like the only ones I should replace are the <code>'</code> apostrophe characters, as <code>%27</code>:</li>
</ul>
<pre><code>value.replace("'",'%27')
</code></pre>
<ul>
<li>Add the item’s Type to the filename column as a hint to SAF Builder so it can set a more useful description field:</li>
<li>Many of the PDFs are 20, 30, 40, 50+ MB, which makes a total of 4GB</li>
<li>These are scanned from paper and likely have no compression, so we should try to test if these compression techniques help without comprimising the quality too much:</li>
</ul>
<pre><code>$ convert -compress Zip -density 150x150 input.pdf output.pdf
<li>Somewhere on the Internet suggested using a DPI of 144</li>
</ul>
<h2id="2017-01-19">2017-01-19</h2>
<ul>
<li>In testing a random sample of CIAT’s PDFs for compressability, it looks like all of these methods generally increase the file size so we will just import them as they are</li>
<li>Looking at some records that Sisay is having problems importing into DSpace Test (seems to be because of copious whitespace return characters from Excel’s CSV exporter)</li>
<li>There were also some issues with an invalid dc.date.issued field, and I trimmed leading / trailing whitespace and cleaned up some URLs with unneeded parameters like ?show=full</li>
</ul>
<h2id="2017-01-23">2017-01-23</h2>
<ul>
<li>I merged Atmire’s pull request into the development branch so they can deploy it on DSpace Test</li>
<li>Move some old ILRI Program communities to a new subcommunity for former programs (<sup>10568</sup>⁄<sub>79164</sub>):</li>
</ul>
<pre><code>$ for community in 10568/171 10568/27868 10568/231 10568/27869 10568/150 10568/230 10568/32724 10568/172; do /home/cgspace.cgiar.org/bin/dspace community-filiator --remove --parent=10568/27866 --child="$community"&& /home/cgspace.cgiar.org/bin/dspace community-filiator --set --parent=10568/79164 --child="$community"; done
</code></pre>
<ul>
<li>Move some collections with <ahref="https://gist.github.com/alanorth/e60b530ed4989df0c731afbb0c640515"><code>move-collections.sh</code></a> using the following config:</li>
</ul>
<pre><code>10568/42161 10568/171 10568/79341
10568/41914 10568/171 10568/79340
</code></pre>
<h2id="2017-01-24">2017-01-24</h2>
<ul>
<li>Run all updates on DSpace Test and reboot the server</li>
<li>Create a new list of the top 500 journal titles from the database:</li>
</ul>
<pre><code>dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by count desc limit 500) to /tmp/journal-titles.csv with csv;
</code></pre>
<ul>
<li>Then sort them in OpenRefine and create a controlled vocabulary by manually adding the XML markup, pull request (<ahref="https://github.com/ilri/DSpace/pull/298">#298</a>)</li>
<li>This would be the last issue remaining to close the meta issue about switching to controlled vocabularies (<ahref="https://github.com/ilri/DSpace/pull/69">#69</a>)</li>
</ul>
<h2id="2017-01-25">2017-01-25</h2>
<ul>
<li>Atmire says the <code>com.atmire.statistics.util.UpdateSolrStorageReports</code> and <code>com.atmire.utils.ReportSender</code> are no longer necessary because they are using a Spring scheduler for these tasks now</li>
<li>Pull request to remove them from the Ansible templates: <ahref="https://github.com/ilri/rmg-ansible-public/pull/80">https://github.com/ilri/rmg-ansible-public/pull/80</a></li>
<li>Still testing the Atmire modules on DSpace Test, and it looks like a few issues we had reported are now fixed:
<ul>
<li>XLS Export from Content statistics</li>
<li>Most popular items</li>
<li>Show statistics on collection pages</li>
</ul></li>
<li>But now we have a new issue with the “Types” in Content statistics not being respected—we only get the defaults, despite having custom settings in <code>dspace/config/modules/atmire-cua.cfg</code></li>
</ul>
<h2id="2017-01-27">2017-01-27</h2>
<ul>
<li>Magdalena pointed out that somehow the Anonymous group had been added to the Administrators group on CGSpace (!)</li>
<li>Discuss plans to update CCAFS metadata and communities for their new flagships and phase II project identifiers</li>
<li>The flagships are in <code>cg.subject.ccafs</code>, and we need to probably make a new field for the phase II project identifiers</li>
</ul>
<h2id="2017-01-28">2017-01-28</h2>
<ul>
<li>Merge controlled vocabulary for journal titles (<code>dc.source</code>) into CGSpace (<ahref="https://github.com/ilri/DSpace/pull/298">#298</a>)</li>
<li>Merge new CIAT subject into CGSpace (<ahref="https://github.com/ilri/DSpace/pull/296">#296</a>)</li>
</ul>
<h2id="2017-01-29">2017-01-29</h2>
<ul>
<li>Run all system updates on DSpace Test, redeploy DSpace code, and reboot the server</li>
<li>Run all system updates on CGSpace, redeploy DSpace code, and reboot the server</li>