- I checked IITA's 259 Feb 14 records from last month for duplicates using Atmire's Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good
- I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc...
- Looking at the other half of Udana's WLE records from 2018-11
- I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)
- I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items
- Most worryingly, there are encoding errors in the abstracts for eleven items, for example:
- 68.15% <20> 9.45 instead of 68.15% ± 9.45
- 2003<30>2013 instead of 2003–2013
- I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs
- DSpace's export function doesn't include the collections for some reason, so you need to import them somewhere first, then export the collection metadata and re-map the items to proper owning collections based on their types using OpenRefine or something
- After re-importing to CGSpace to apply the mappings, I deleted the collection on DSpace Test and ran the `dspace cleanup` script
- Generate a controlled vocabulary of 1187 AGROVOC subjects from the top 1500 that I checked last month, dumping the terms themselves using `csvcut` and then applying XML controlled vocabulary format in vim and then checking with tidy for good measure:
```
$ csvcut -c name 2019-02-22-subjects.csv > dspace/config/controlled-vocabularies/dc-contributor-author.xml
- Atmire noticed my message about the "solr_update_time_stamp" error on the dspace-tech mailing list and created an issue on their tracker to discuss it with me
- They say the error is harmless, but has nevertheless been fixed in their newer module versions