- I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error
- I tested on DSpace Test as well and it doesn't work there either
- I asked on the dspace-tech mailing list because it seems to be broken, and actually now I'm not sure if we've ever had the sharding task run successfully over all these years
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:867)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:448)
... 10 more
Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry request with a non-repeatable request entity. The cause lists the reason the original request failed.
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:659)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
- A user wrote to tell me that the new display of an item's mappings had a crazy bug for at least one item: https://cgspace.cgiar.org/handle/10568/78596
- She said she only mapped it once, but it appears to be mapped 184 times
- I tried to clean up the duplicate mappings by exporting the item's metadata to CSV, editing, and re-importing, but DSpace said "no changes were detected"
- I've asked on the dspace-tech mailing list to see if anyone can help
dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by count desc limit 500) to /tmp/journal-titles.csv with csv;
- Helping clean up some file names in the 232 CIAT records that Sisay worked on last week
- There are about 30 files with `%20` (space) and Spanish accents in the file name
- At first I thought we should fix these, but actually it is [prescribed by the W3 working group to convert these to UTF8 and URL encode them](https://www.w3.org/TR/html4/appendix/notes.html#h-B.2.1)!
- And the file names don't really matter either, as long as the SAF Builder tool can read them—after that DSpace renames them with a hash in the assetstore
- Seems like the only ones I should replace are the `'` apostrophe characters, as `%27`:
```
value.replace("'",'%27')
```
- Add the item's Type to the filename column as a hint to SAF Builder so it can set a more useful description field:
```
value + "__description:" + cells["dc.type"].value
```
- Test importing of the new CIAT records (actually there are 232, not 234):
- Many of the PDFs are 20, 30, 40, 50+ MB, which makes a total of 4GB
- These are scanned from paper and likely have no compression, so we should try to test if these compression techniques help without comprimising the quality too much:
```
$ convert -compress Zip -density 150x150 input.pdf output.pdf
- In testing a random sample of CIAT's PDFs for compressability, it looks like all of these methods generally increase the file size so we will just import them as they are
- Looking at some records that Sisay is having problems importing into DSpace Test (seems to be because of copious whitespace return characters from Excel's CSV exporter)
- There were also some issues with an invalid dc.date.issued field, and I trimmed leading / trailing whitespace and cleaned up some URLs with unneeded parameters like ?show=full
- Create a new list of the top 500 journal titles from the database:
```
dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by count desc limit 500) to /tmp/journal-titles.csv with csv;
```
- Then sort them in OpenRefine and create a controlled vocabulary by manually adding the XML markup, pull request ([#298](https://github.com/ilri/DSpace/pull/298))
- This would be the last issue remaining to close the meta issue about switching to controlled vocabularies ([#69](https://github.com/ilri/DSpace/pull/69))
- Atmire says the `com.atmire.statistics.util.UpdateSolrStorageReports` and `com.atmire.utils.ReportSender` are no longer necessary because they are using a Spring scheduler for these tasks now
- Pull request to remove them from the Ansible templates: https://github.com/ilri/rmg-ansible-public/pull/80
- Still testing the Atmire modules on DSpace Test, and it looks like a few issues we had reported are now fixed:
- XLS Export from Content statistics
- Most popular items
- Show statistics on collection pages
- But now we have a new issue with the "Types" in Content statistics not being respected—we only get the defaults, despite having custom settings in `dspace/config/modules/atmire-cua.cfg`