<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"> <meta property="og:title" content="March, 2019" /> <meta property="og:description" content="2019-03-01 I checked IITA’s 259 Feb 14 records from last month for duplicates using Atmire’s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc… Looking at the other half of Udana’s WLE records from 2018-11 I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC) I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items Most worryingly, there are encoding errors in the abstracts for eleven items, for example: 68.15% � 9.45 instead of 68.15% ± 9.45 2003�2013 instead of 2003–2013 I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs " /> <meta property="og:type" content="article" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-03/" /> <meta property="article:published_time" content="2019-03-01T12:16:30+01:00"/> <meta property="article:modified_time" content="2019-03-07T11:37:53+02:00"/> <meta name="twitter:card" content="summary"/> <meta name="twitter:title" content="March, 2019"/> <meta name="twitter:description" content="2019-03-01 I checked IITA’s 259 Feb 14 records from last month for duplicates using Atmire’s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc… Looking at the other half of Udana’s WLE records from 2018-11 I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC) I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items Most worryingly, there are encoding errors in the abstracts for eleven items, for example: 68.15% � 9.45 instead of 68.15% ± 9.45 2003�2013 instead of 2003–2013 I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs "/> <meta name="generator" content="Hugo 0.54.0" /> <script type="application/ld+json"> { "@context": "http://schema.org", "@type": "BlogPosting", "headline": "March, 2019", "url": "https://alanorth.github.io/cgspace-notes/2019-03/", "wordCount": "617", "datePublished": "2019-03-01T12:16:30+01:00", "dateModified": "2019-03-07T11:37:53+02:00", "author": { "@type": "Person", "name": "Alan Orth" }, "keywords": "Notes" } </script> <link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2019-03/"> <title>March, 2019 | CGSpace Notes</title> <!-- combined, minified CSS --> <link href="https://alanorth.github.io/cgspace-notes/css/style.css" rel="stylesheet" integrity="sha384-G5B34w7DFTumWTswxYzTX7NWfbvQEg1HbFFEg6ItN03uTAAoS2qkPS/fu3LhuuSA" crossorigin="anonymous"> </head> <body> <div class="blog-masthead"> <div class="container"> <nav class="nav blog-nav"> <a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a> </nav> </div> </div> <header class="blog-header"> <div class="container"> <h1 class="blog-title"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1> <p class="lead blog-description">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p> </div> </header> <div class="container"> <div class="row"> <div class="col-sm-8 blog-main"> <article class="blog-post"> <header> <h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2019-03/">March, 2019</a></h2> <p class="blog-post-meta"><time datetime="2019-03-01T12:16:30+01:00">Fri Mar 01, 2019</time> by Alan Orth in <i class="fa fa-tag" aria-hidden="true"></i> <a href="/cgspace-notes/tags/notes" rel="tag">Notes</a> </p> </header> <h2 id="2019-03-01">2019-03-01</h2> <ul> <li>I checked IITA’s 259 Feb 14 records from last month for duplicates using Atmire’s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li> <li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc…</li> <li>Looking at the other half of Udana’s WLE records from 2018-11 <ul> <li>I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)</li> <li>I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items</li> <li>Most worryingly, there are encoding errors in the abstracts for eleven items, for example:</li> <li>68.15% � 9.45 instead of 68.15% ± 9.45</li> <li>2003�2013 instead of 2003–2013</li> </ul></li> <li>I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs</li> </ul> <h2 id="2019-03-03">2019-03-03</h2> <ul> <li>Trying to finally upload IITA’s 259 Feb 14 items to CGSpace so I exported them from DSpace Test:</li> </ul> <pre><code>$ mkdir 2019-03-03-IITA-Feb14 $ dspace export -i 10568/108684 -t COLLECTION -m -n 0 -d 2019-03-03-IITA-Feb14 </code></pre> <ul> <li>As I was inspecting the archive I noticed that there were some problems with the bitsreams: <ul> <li>First, Sisay didn’t include the bitstream descriptions</li> <li>Second, only five items had bitstreams and I remember in the discussion with IITA that there should have been nine!</li> <li>I had to refer to the original CSV from January to find the file names, then download and add them to the export contents manually!</li> </ul></li> <li>After adding the missing bitstreams and descriptions manually I tested them again locally, then imported them to a temporary collection on CGSpace:</li> </ul> <pre><code>$ dspace import -a -c 10568/99832 -e aorth@stfu.com -m 2019-03-03-IITA-Feb14.map -s /tmp/2019-03-03-IITA-Feb14 </code></pre> <ul> <li>DSpace’s export function doesn’t include the collections for some reason, so you need to import them somewhere first, then export the collection metadata and re-map the items to proper owning collections based on their types using OpenRefine or something</li> <li>After re-importing to CGSpace to apply the mappings, I deleted the collection on DSpace Test and ran the <code>dspace cleanup</code> script</li> <li>Merge the IITA research theme changes from last month to the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/413">#413</a>) <ul> <li>I will deploy to CGSpace soon and then think about how to batch tag all IITA’s existing items with this metadata</li> </ul></li> <li>Deploy Tomcat 7.0.93 on CGSpace (linode18) after having tested it on DSpace Test (linode19) for a week</li> </ul> <h2 id="2019-03-06">2019-03-06</h2> <ul> <li>Abenet was having problems with a CIP user account, I think that the user could not register</li> <li>I suspect it’s related to the email issue that ICT hasn’t responded about since last week</li> <li>As I thought, I still cannot send emails from CGSpace:</li> </ul> <pre><code>$ dspace test-email About to send test email: - To: blah@stfu.com - Subject: DSpace test email - Server: smtp.office365.com Error sending email: - Error: javax.mail.AuthenticationFailedException </code></pre> <ul> <li>I will send a follow-up to ICT to ask them to reset the password</li> </ul> <h2 id="2019-03-07">2019-03-07</h2> <ul> <li>ICT reset the email password and I confirmed that it is working now</li> <li>Generate a controlled vocabulary of 1187 AGROVOC subjects from the top 1500 that I checked last month, dumping the terms themselves using <code>csvcut</code> and then applying XML controlled vocabulary format in vim and then checking with tidy for good measure:</li> </ul> <pre><code>$ csvcut -c name 2019-02-22-subjects.csv > dspace/config/controlled-vocabularies/dc-contributor-author.xml $ # apply formatting in XML file $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/dc-subject.xml </code></pre> <ul> <li>Atmire noticed my message about the “solr_update_time_stamp” error on the dspace-tech mailing list and created an issue on their tracker to discuss it with me <ul> <li>They say the error is harmless, but has nevertheless been fixed in their newer module versions</li> </ul></li> </ul> <!-- vim: set sw=2 ts=2: --> </article> </div> <!-- /.blog-main --> <aside class="col-sm-3 ml-auto blog-sidebar"> <section class="sidebar-module"> <h4>Recent Posts</h4> <ol class="list-unstyled"> <li><a href="/cgspace-notes/2019-03/">March, 2019</a></li> <li><a href="/cgspace-notes/2019-02/">February, 2019</a></li> <li><a href="/cgspace-notes/2019-01/">January, 2019</a></li> <li><a href="/cgspace-notes/2018-12/">December, 2018</a></li> <li><a href="/cgspace-notes/2018-11/">November, 2018</a></li> </ol> </section> <section class="sidebar-module"> <h4>Links</h4> <ol class="list-unstyled"> <li><a href="https://cgspace.cgiar.org">CGSpace</a></li> <li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li> <li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li> </ol> </section> </aside> </div> <!-- /.row --> </div> <!-- /.container --> <footer class="blog-footer"> <p> Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>. </p> <p> <a href="#">Back to top</a> </p> </footer> </body> </html>