cgspace-notes/docs/2018-07/index.html

319 lines
8.8 KiB
HTML

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:title" content="July, 2018" />
<meta property="og:description" content="2018-07-01
I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:
$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
During the mvn package stage on the 5.8 branch I kept getting issues with java running out of memory:
There is insufficient memory for the Java Runtime Environment to continue.
" />
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-07/" />
<meta property="article:published_time" content="2018-07-01T12:56:54&#43;03:00"/>
<meta property="article:modified_time" content="2018-07-02T17:33:38&#43;03:00"/>
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="July, 2018"/>
<meta name="twitter:description" content="2018-07-01
I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:
$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
During the mvn package stage on the 5.8 branch I kept getting issues with java running out of memory:
There is insufficient memory for the Java Runtime Environment to continue.
"/>
<meta name="generator" content="Hugo 0.42.2" />
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "July, 2018",
"url": "https://alanorth.github.io/cgspace-notes/2018-07/",
"wordCount": "469",
"datePublished": "2018-07-01T12:56:54&#43;03:00",
"dateModified": "2018-07-02T17:33:38&#43;03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"keywords": "Notes"
}
</script>
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2018-07/">
<title>July, 2018 | CGSpace Notes</title>
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.css" rel="stylesheet" integrity="sha384-TbfEhJn4HkgPUIZUhhHaAYsycYKHxSuIloCjZOiyCSpbVunRQxg5T5pxKVFwxilF" crossorigin="anonymous">
</head>
<body>
<div class="blog-masthead">
<div class="container">
<nav class="nav blog-nav">
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
</nav>
</div>
</div>
<header class="blog-header">
<div class="container">
<h1 class="blog-title"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
<p class="lead blog-description">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
</div>
</header>
<div class="container">
<div class="row">
<div class="col-sm-8 blog-main">
<article class="blog-post">
<header>
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2018-07/">July, 2018</a></h2>
<p class="blog-post-meta"><time datetime="2018-07-01T12:56:54&#43;03:00">Sun Jul 01, 2018</time> by Alan Orth in
<i class="fa fa-tag" aria-hidden="true"></i>&nbsp;<a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
</p>
</header>
<h2 id="2018-07-01">2018-07-01</h2>
<ul>
<li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
</ul>
<pre><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
</code></pre>
<ul>
<li>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</li>
</ul>
<pre><code>There is insufficient memory for the Java Runtime Environment to continue.
</code></pre>
<p></p>
<ul>
<li>As the machine only has 8GB of RAM, I reduced the Tomcat memory heap from 5120m to 4096m so I could try to allocate more to the build process:</li>
</ul>
<pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx1024m&quot;
$ mvn -U -Dmirage2.on=true -Dmirage2.deps.included=false -Denv=dspacetest.cgiar.org -P \!dspace-lni,\!dspace-rdf,\!dspace-sword,\!dspace-swordv2 clean package
</code></pre>
<ul>
<li>Then I stopped the Tomcat 7 service, ran the ant update, and manually ran the old and ignored SQL migrations:</li>
</ul>
<pre><code>$ sudo su - postgres
$ psql dspace
...
dspace=# begin;
BEGIN
dspace=# \i Atmire-DSpace-5.8-Schema-Migration.sql
DELETE 0
UPDATE 1
DELETE 1
dspace=# commit
dspace=# \q
$ exit
$ dspace database migrate ignored
</code></pre>
<ul>
<li>After that I started Tomcat 7 and DSpace seems to be working, now I need to tell our colleagues to try stuff and report issues they have</li>
</ul>
<h2 id="2018-07-02">2018-07-02</h2>
<ul>
<li>Discuss AgriKnowledge including our Handle identifier on their harvested items from CGSpace</li>
<li>They seem to be only interested in Gates-funded outputs, for example: <a href="https://www.agriknowledge.org/files/tm70mv21t">https://www.agriknowledge.org/files/tm70mv21t</a></li>
</ul>
<h2 id="2018-07-03">2018-07-03</h2>
<ul>
<li>Finally finish with the CIFOR Archive records (a total of 2448):
<ul>
<li>I mapped the 50 items that were duplicates from elsewhere in CGSpace into <a href="https://cgspace.cgiar.org/handle/10568/16702">CIFOR Archive</a></li>
<li>I did one last check of the remaining 2398 items and found eight who have a <code>cg.identifier.doi</code> that links to some URL other than a DOI so I moved those to <code>cg.identifier.url</code> and <code>cg.identifier.googleurl</code> as appropriate</li>
<li>Also, thirteen items had a DOI in their citation, but did not have a <code>cg.identifier.doi</code> field, so I added those</li>
<li>Then I imported those 2398 items in two batches (to deal with memory issues):</li>
</ul></li>
</ul>
<pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx1024m&quot;
$ dspace metadata-import -e aorth@mjanja.ch -f /tmp/2018-06-27-New-CIFOR-Archive.csv
$ dspace metadata-import -e aorth@mjanja.ch -f /tmp/2018-06-27-New-CIFOR-Archive2.csv
</code></pre>
<ul>
<li>I noticed there are many items that use HTTP instead of HTTPS for their Google Books URL, and some missing HTTP entirely:</li>
</ul>
<pre><code>dspace=# select count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=222 and text_value like 'http://books.google.%';
count
-------
785
dspace=# select count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=222 and text_value ~ '^books\.google\..*';
count
-------
4
</code></pre>
<ul>
<li>I think I should fix that as well as some other garbage values like &ldquo;test&rdquo; and &ldquo;dspace.ilri.org&rdquo; etc:</li>
</ul>
<pre><code>dspace=# begin;
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://books.google', 'https://books.google') where resource_type_id=2 and metadata_field_id=222 and text_value like 'http://books.google.%';
UPDATE 785
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'books.google', 'https://books.google') where resource_type_id=2 and metadata_field_id=222 and text_value ~ '^books\.google\..*';
UPDATE 4
dspace=# update metadatavalue set text_value='https://books.google.com/books?id=meF1CLdPSF4C' where resource_type_id=2 and metadata_field_id=222 and text_value='meF1CLdPSF4C';
UPDATE 1
dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=222 and metadata_value_id in (2299312, 10684, 10700, 996403);
DELETE 4
dspace=# commit;
</code></pre>
<!-- vim: set sw=2 ts=2: -->
</article>
</div> <!-- /.blog-main -->
<aside class="col-sm-3 ml-auto blog-sidebar">
<section class="sidebar-module">
<h4>Recent Posts</h4>
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2018-07/">July, 2018</a></li>
<li><a href="/cgspace-notes/2018-06/">June, 2018</a></li>
<li><a href="/cgspace-notes/2018-05/">May, 2018</a></li>
<li><a href="/cgspace-notes/2018-04/">April, 2018</a></li>
<li><a href="/cgspace-notes/2018-03/">March, 2018</a></li>
</ol>
</section>
<section class="sidebar-module">
<h4>Links</h4>
<ol class="list-unstyled">
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
</ol>
</section>
</aside>
</div> <!-- /.row -->
</div> <!-- /.container -->
<footer class="blog-footer">
<p>
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
</p>
<p>
<a href="#">Back to top</a>
</p>
</footer>
</body>
</html>