cgspace-notes/public/2017-06/index.html

288 lines
12 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:title" content="June, 2017" />
<meta property="og:description" content="2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we&rsquo;ll create a new sub-community for Phase II and create collections for the research themes there The current &ldquo;Research Themes&rdquo; community will be renamed to &ldquo;WLE Phase I Research Themes&rdquo; Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg." />
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2017-06/" />
<meta property="article:published_time" content="2017-06-01T10:14:52&#43;03:00"/>
<meta property="article:modified_time" content="2017-06-07T12:15:21&#43;03:00"/>
<meta name="twitter:card" content="summary"/>
<meta name="twitter:text:title" content="June, 2017"/>
<meta name="twitter:title" content="June, 2017"/>
<meta name="twitter:description" content="2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we&rsquo;ll create a new sub-community for Phase II and create collections for the research themes there The current &ldquo;Research Themes&rdquo; community will be renamed to &ldquo;WLE Phase I Research Themes&rdquo; Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg."/>
<meta name="generator" content="Hugo 0.22-DEV" />
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "June, 2017",
"url": "https://alanorth.github.io/cgspace-notes/2017-06/",
"wordCount": "868",
"datePublished": "2017-06-01T10:14:52&#43;03:00",
"dateModified": "2017-06-07T12:15:21&#43;03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"keywords": "Notes"
}
</script>
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2017-06/">
<title>June, 2017 | CGSpace Notes</title>
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.css" rel="stylesheet" integrity="sha384-j3n8sYdzztDYtVc80KiiuOXoCg5Bjz0zYyLGzDMW8RbfA0u5djbF0GO3bVOPoLyN" crossorigin="anonymous">
</head>
<body>
<div class="blog-masthead">
<div class="container">
<nav class="nav blog-nav">
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
</nav>
</div>
</div>
<header class="blog-header">
<div class="container">
<h1 class="blog-title"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
<p class="lead blog-description">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
</div>
</header>
<div class="container">
<div class="row">
<div class="col-sm-8 blog-main">
<article class="blog-post">
<header>
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2017-06/">June, 2017</a></h2>
<p class="blog-post-meta"><time datetime="2017-06-01T10:14:52&#43;03:00">Thu Jun 01, 2017</time> by Alan Orth in
<i class="fa fa-tag" aria-hidden="true"></i>&nbsp;<a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
</p>
</header>
<h2 id="2017-06-01">2017-06-01</h2>
<ul>
<li>After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes</li>
<li>The <code>cg.identifier.wletheme</code> field will be used for both Phase I and Phase II Research Themes</li>
<li>Then we&rsquo;ll create a new sub-community for Phase II and create collections for the research themes there</li>
<li>The current &ldquo;Research Themes&rdquo; community will be renamed to &ldquo;WLE Phase I Research Themes&rdquo;</li>
<li>Tagged all items in the current Phase I collections with their appropriate themes</li>
<li>Create pull request to add Phase II research themes to the submission form: <a href="https://github.com/ilri/DSpace/pull/328">#328</a></li>
<li>Add <code>cg.subject.system</code> to CGSpace metadata registry, for subject from the upcoming CGIAR Library migration</li>
</ul>
<h2 id="2017-06-04">2017-06-04</h2>
<ul>
<li>After adding <code>cg.identifier.wletheme</code> to 1106 WLE items I can see the field on XMLUI but not in REST!</li>
<li>Strangely it happens on DSpace Test AND on CGSpace!</li>
<li>I tried to re-index Discovery but it didn&rsquo;t fix it</li>
<li>Run all system updates on DSpace Test and reboot the server</li>
<li>After rebooting the server (and therefore restarting Tomcat) the new metadata field is available</li>
<li>I&rsquo;ve sent a message to the dspace-tech mailing list to ask if this is a bug and whether I should file a Jira ticket</li>
</ul>
<h2 id="2016-06-05">2016-06-05</h2>
<ul>
<li>Rename WLE&rsquo;s &ldquo;Research Themes&rdquo; sub-community to &ldquo;WLE Phase I Research Themes&rdquo; on DSpace Test so Macaroni Bros can continue their testing</li>
<li>Macaroni Bros tested it and said it&rsquo;s fine, so I renamed it on CGSpace as well</li>
<li>Working on how to automate the extraction of the CIAT Book chapters, doing some magic in OpenRefine to extract page fromto from cg.identifier.url and dc.format.extent, respectively:
<ul>
<li>cg.identifier.url: <code>value.split(&quot;page=&quot;, &quot;&quot;)[1]</code></li>
<li>dc.format.extent: <code>value.replace(&quot;p. &quot;, &quot;&quot;).split(&quot;-&quot;)[1].toNumber() - value.replace(&quot;p. &quot;, &quot;&quot;).split(&quot;-&quot;)[0].toNumber()</code></li>
</ul></li>
<li>Finally, after some filtering to see which small outliers there were (based on dc.format.extent using &ldquo;p. 1-14&rdquo; vs &ldquo;29 p.&rdquo;), create a new column with last page number:
<ul>
<li><code>cells[&quot;dc.page.from&quot;].value.toNumber() + cells[&quot;dc.format.pages&quot;].value.toNumber()</code></li>
</ul></li>
<li>Then create a new, unique file name to be used in the output, based on a SHA1 of the dc.title and with a description:
<ul>
<li>dc.page.to: <code>value.split(&quot; &quot;)[0].replace(&quot;,&quot;,&quot;&quot;).toLowercase() + &quot;-&quot; + sha1(value).get(1,9) + &quot;.pdf__description:&quot; + cells[&quot;dc.type&quot;].value</code></li>
</ul></li>
<li>Start processing 769 records after filtering the following (there are another 159 records that have some other format, or for example they have their own PDF which I will process later), using a modified <code>generate-thumbnails.py</code> script to read certain fields and then pass to GhostScript:
<ul>
<li>cg.identifier.url: <code>value.contains(&quot;page=&quot;)</code></li>
<li>dc.format.extent: <code>or(value.contains(&quot;p. &quot;),value.contains(&quot; p.&quot;))</code></li>
<li>Command like: <code>$ gs -dNOPAUSE -dBATCH -dFirstPage=14 -dLastPage=27 -sDEVICE=pdfwrite -sOutputFile=beans.pdf -f 12605-1.pdf</code></li>
</ul></li>
<li>17 of the items have issues with incorrect page number ranges, and upon closer inspection they do not appear in the referenced PDF</li>
<li>I&rsquo;ve flagged them and proceeded without them (752 total) on DSpace Test:</li>
</ul>
<pre><code>$ JAVA_OPTS=&quot;-Xmx1024m -Dfile.encoding=UTF-8&quot; [dspace]/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/93843 --source /home/aorth/src/CIAT-Books/SimpleArchiveFormat/ --mapfile=/tmp/ciat-books.map &amp;&gt; /tmp/ciat-books.log
</code></pre>
<ul>
<li>I went and did some basic sanity checks on the remaining items in the CIAT Book Chapters and decided they are mostly fine (except one duplicate and the flagged ones), so I imported them to DSpace Test too (162 items)</li>
<li>Total items in CIAT Book Chapters is 914, with the others being flagged for some reason, and we should send that back to CIAT</li>
<li>Restart Tomcat on CGSpace so that the <code>cg.identifier.wletheme</code> field is available on REST API for Macaroni Bros</li>
</ul>
<h2 id="2017-06-07">2017-06-07</h2>
<ul>
<li>Testing <a href="https://github.com/ilri/DSpace/pull/319">Atmire&rsquo;s patch for the CUA Workflow Statistics again</a></li>
<li>Still doesn&rsquo;t seem to give results I&rsquo;d expect, like there are no results for Maria Garruccio, or for the ILRI community!</li>
<li>Then I&rsquo;ll file an update to the issue on Atmire&rsquo;s tracker</li>
<li>Created a new branch with just the relevant changes, so I can send it to them</li>
<li>One thing I noticed is that there is a failed database migration related to CUA:</li>
</ul>
<pre><code>+----------------+----------------------------+---------------------+---------+
| Version | Description | Installed on | State |
+----------------+----------------------------+---------------------+---------+
| 1.1 | Initial DSpace 1.1 databas | | PreInit |
| 1.2 | Upgrade to DSpace 1.2 sche | | PreInit |
| 1.3 | Upgrade to DSpace 1.3 sche | | PreInit |
| 1.3.9 | Drop constraint for DSpace | | PreInit |
| 1.4 | Upgrade to DSpace 1.4 sche | | PreInit |
| 1.5 | Upgrade to DSpace 1.5 sche | | PreInit |
| 1.5.9 | Drop constraint for DSpace | | PreInit |
| 1.6 | Upgrade to DSpace 1.6 sche | | PreInit |
| 1.7 | Upgrade to DSpace 1.7 sche | | PreInit |
| 1.8 | Upgrade to DSpace 1.8 sche | | PreInit |
| 3.0 | Upgrade to DSpace 3.x sche | | PreInit |
| 4.0 | Initializing from DSpace 4 | 2015-11-20 12:42:52 | Success |
| 5.0.2014.08.08 | DS-1945 Helpdesk Request a | 2015-11-20 12:42:53 | Success |
| 5.0.2014.09.25 | DS 1582 Metadata For All O | 2015-11-20 12:42:55 | Success |
| 5.0.2014.09.26 | DS-1582 Metadata For All O | 2015-11-20 12:42:55 | Success |
| 5.0.2015.01.27 | MigrateAtmireExtraMetadata | 2015-11-20 12:43:29 | Success |
| 5.0.2017.04.28 | CUA eperson metadata migra | 2017-06-07 11:07:28 | OutOrde |
| 5.5.2015.12.03 | Atmire CUA 4 migration | 2016-11-27 06:39:05 | OutOrde |
| 5.5.2015.12.03 | Atmire MQM migration | 2016-11-27 06:39:06 | OutOrde |
| 5.6.2016.08.08 | CUA emailreport migration | 2017-01-29 11:18:56 | OutOrde |
+----------------+----------------------------+---------------------+---------+
</code></pre>
<ul>
<li>Merge the pull request for <a href="https://github.com/ilri/DSpace/pull/328">WLE Phase II themes</a></li>
</ul>
</article>
</div> <!-- /.blog-main -->
<aside class="col-sm-3 offset-sm-1 blog-sidebar">
<section class="sidebar-module">
<h4>Recent Posts</h4>
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2017-06/">June, 2017</a></li>
<li><a href="/cgspace-notes/2017-05/">May, 2017</a></li>
<li><a href="/cgspace-notes/2017-04/">April, 2017</a></li>
<li><a href="/cgspace-notes/2017-03/">March, 2017</a></li>
<li><a href="/cgspace-notes/2017-02/">February, 2017</a></li>
</ol>
</section>
<section class="sidebar-module">
<h4>Links</h4>
<ol class="list-unstyled">
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
</ol>
</section>
</aside>
</div> <!-- /.row -->
</div> <!-- /.container -->
<footer class="blog-footer">
<p>
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
</p>
<p>
<a href="#">Back to top</a>
</p>
</footer>
</body>
</html>