cgspace-notes/public/2018-02/index.html
2018-02-04 11:26:07 +02:00

269 lines
7.9 KiB
HTML

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:title" content="February, 2018" />
<meta property="og:description" content="2018-02-01
Peter gave feedback on the dc.rights proof of concept that I had sent him last week
We don&rsquo;t need to distinguish between internal and external works, so that makes it just a simple list
Yesterday I figured out how to monitor DSpace sessions using JMX
I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu&rsquo;s munin-plugins-java package and used the stuff I discovered about JMX in 2018-01
" />
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-02/" />
<meta property="article:published_time" content="2018-02-01T16:28:54&#43;02:00"/>
<meta property="article:modified_time" content="2018-02-04T00:00:51&#43;02:00"/>
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="February, 2018"/>
<meta name="twitter:description" content="2018-02-01
Peter gave feedback on the dc.rights proof of concept that I had sent him last week
We don&rsquo;t need to distinguish between internal and external works, so that makes it just a simple list
Yesterday I figured out how to monitor DSpace sessions using JMX
I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu&rsquo;s munin-plugins-java package and used the stuff I discovered about JMX in 2018-01
"/>
<meta name="generator" content="Hugo 0.35" />
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "February, 2018",
"url": "https://alanorth.github.io/cgspace-notes/2018-02/",
"wordCount": "328",
"datePublished": "2018-02-01T16:28:54&#43;02:00",
"dateModified": "2018-02-04T00:00:51&#43;02:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"keywords": "Notes"
}
</script>
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2018-02/">
<title>February, 2018 | CGSpace Notes</title>
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.css" rel="stylesheet" integrity="sha384-HjEPigLMLBzVQsUi6JWp9tmxJtBimdClDBxwZrwZR&#43;VE3s11/PtFYOrLClxIv2SG" crossorigin="anonymous">
</head>
<body>
<div class="blog-masthead">
<div class="container">
<nav class="nav blog-nav">
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
</nav>
</div>
</div>
<header class="blog-header">
<div class="container">
<h1 class="blog-title"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
<p class="lead blog-description">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
</div>
</header>
<div class="container">
<div class="row">
<div class="col-sm-8 blog-main">
<article class="blog-post">
<header>
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2018-02/">February, 2018</a></h2>
<p class="blog-post-meta"><time datetime="2018-02-01T16:28:54&#43;02:00">Thu Feb 01, 2018</time> by Alan Orth in
<i class="fa fa-tag" aria-hidden="true"></i>&nbsp;<a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
</p>
</header>
<h2 id="2018-02-01">2018-02-01</h2>
<ul>
<li>Peter gave feedback on the <code>dc.rights</code> proof of concept that I had sent him last week</li>
<li>We don&rsquo;t need to distinguish between internal and external works, so that makes it just a simple list</li>
<li>Yesterday I figured out how to monitor DSpace sessions using JMX</li>
<li>I copied the logic in the <code>jmx_tomcat_dbpools</code> provided by Ubuntu&rsquo;s <code>munin-plugins-java</code> package and used the stuff I discovered about JMX <a href="/cgspace-notes/2018-01/">in 2018-01</a></li>
</ul>
<p></p>
<p><img src="/cgspace-notes/2018/02/jmx_dspace_sessions-day.png" alt="DSpace Sessions" /></p>
<ul>
<li>Run all system updates and reboot DSpace Test</li>
<li>Wow, I packaged up the <code>jmx_dspace_sessions</code> stuff in the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> and deployed it on CGSpace and it totally works:</li>
</ul>
<pre><code># munin-run jmx_dspace_sessions
v_.value 223
v_jspui.value 1
v_oai.value 0
</code></pre>
<h2 id="2018-02-03">2018-02-03</h2>
<ul>
<li>Bram from Atmire responded about the high load caused by the Solr updater script and said it will be fixed with the updates to DSpace 5.8 compatibility: <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=566">https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=566</a></li>
<li>We will close that ticket for now and wait for the 5.8 stuff: <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560</a></li>
<li>I finally took a look at the second round of cleanups Peter had sent me for author affiliations in mid January</li>
<li>After trimming whitespace and quickly scanning for encoding errors I applied them on CGSpace:</li>
</ul>
<pre><code>$ ./delete-metadata-values.py -i /tmp/2018-02-03-Affiliations-12-deletions.csv -f cg.contributor.affiliation -m 211 -d dspace -u dspace -p 'fuuu'
$ ./fix-metadata-values.py -i /tmp/2018-02-03-Affiliations-1116-corrections.csv -f cg.contributor.affiliation -t correct -m 211 -d dspace -u dspace -p 'fuuu'
</code></pre>
<ul>
<li>Then I started a full Discovery reindex:</li>
</ul>
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
real 96m39.823s
user 14m10.975s
sys 2m29.088s
</code></pre>
<ul>
<li>Generate a new list of affiliations for Peter to sort through:</li>
</ul>
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'affiliation') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/affiliations.csv with csv;
COPY 3723
</code></pre>
<ul>
<li>Oh, and it looks like we processed over 3.1 million requests in January, up from 2.9 million in <a href="/cgspace-notes/2017-12/">December</a>:</li>
</ul>
<pre><code># time zcat --force /var/log/nginx/* | grep -cE &quot;[0-9]{1,2}/Jan/2018&quot;
3126109
real 0m23.839s
user 0m27.225s
sys 0m1.905s
</code></pre>
</article>
</div> <!-- /.blog-main -->
<aside class="col-sm-3 ml-auto blog-sidebar">
<section class="sidebar-module">
<h4>Recent Posts</h4>
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2018-02/">February, 2018</a></li>
<li><a href="/cgspace-notes/2018-01/">January, 2018</a></li>
<li><a href="/cgspace-notes/2017-12/">December, 2017</a></li>
<li><a href="/cgspace-notes/2017-11/">November, 2017</a></li>
<li><a href="/cgspace-notes/2017-10/">October, 2017</a></li>
</ol>
</section>
<section class="sidebar-module">
<h4>Links</h4>
<ol class="list-unstyled">
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
</ol>
</section>
</aside>
</div> <!-- /.row -->
</div> <!-- /.container -->
<footer class="blog-footer">
<p>
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
</p>
<p>
<a href="#">Back to top</a>
</p>
</footer>
</body>
</html>