I checked IITA’s 259 Feb 14 records from last month for duplicates using Atmire’s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good
I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc…
Looking at the other half of Udana’s WLE records from 2018-11
I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)
I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items
Most worryingly, there are encoding errors in the abstracts for eleven items, for example:
68.15% <20> 9.45 instead of 68.15% ± 9.45
2003<EFBFBD>2013 instead of 2003–2013
I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs
I checked IITA’s 259 Feb 14 records from last month for duplicates using Atmire’s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good
I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc…
Looking at the other half of Udana’s WLE records from 2018-11
I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)
I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items
Most worryingly, there are encoding errors in the abstracts for eleven items, for example:
68.15% <20> 9.45 instead of 68.15% ± 9.45
2003<EFBFBD>2013 instead of 2003–2013
I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs
<li>I checked IITA’s 259 Feb 14 records from last month for duplicates using Atmire’s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li>
<li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc…</li>
<li>Looking at the other half of Udana’s WLE records from 2018-11
<ul>
<li>I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)</li>
<li>I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items</li>
<li>Most worryingly, there are encoding errors in the abstracts for eleven items, for example:</li>
<li>68.15% <20> 9.45 instead of 68.15% ± 9.45</li>
<li>2003<EFBFBD>2013 instead of 2003–2013</li>
</ul></li>
<li>I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs</li>
<li>As I was inspecting the archive I noticed that there were some problems with the bitsreams:
<ul>
<li>First, Sisay didn’t include the bitstream descriptions</li>
<li>Second, only five items had bitstreams and I remember in the discussion with IITA that there should have been nine!</li>
<li>I had to refer to the original CSV from January to find the file names, then download and add them to the export contents manually!</li>
</ul></li>
<li>After adding the missing bitstreams and descriptions manually I tested them again locally, then imported them to a temporary collection on CGSpace:</li>
<li>DSpace’s export function doesn’t include the collections for some reason, so you need to import them somewhere first, then export the collection metadata and re-map the items to proper owning collections based on their types using OpenRefine or something</li>
<li>After re-importing to CGSpace to apply the mappings, I deleted the collection on DSpace Test and ran the <code>dspace cleanup</code> script</li>
<li>Merge the IITA research theme changes from last month to the <code>5_x-prod</code> branch (<ahref="https://github.com/ilri/DSpace/pull/413">#413</a>)
<ul>
<li>I will deploy to CGSpace soon and then think about how to batch tag all IITA’s existing items with this metadata</li>
<li>Generate a controlled vocabulary of 1187 AGROVOC subjects from the top 1500 that I checked last month, dumping the terms themselves using <code>csvcut</code> and then applying XML controlled vocabulary format in vim and then checking with tidy for good measure:</li>
</ul>
<pre><code>$ csvcut -c name 2019-02-22-subjects.csv > dspace/config/controlled-vocabularies/dc-contributor-author.xml
<li>Atmire noticed my message about the “solr_update_time_stamp” error on the dspace-tech mailing list and created an issue on their tracker to discuss it with me
<ul>
<li>They say the error is harmless, but has nevertheless been fixed in their newer module versions</li>