cgspace-notes/docs/2021-11/index.html

225 lines
7.0 KiB
HTML
Raw Normal View History

2021-11-01 09:49:21 +01:00
<!DOCTYPE html>
<html lang="en" >
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:title" content="November, 2021" />
2021-11-03 14:56:15 +01:00
<meta property="og:description" content="2021-11-02
I experimented with manually sharding the Solr statistics on DSpace Test
First I exported all the 2019 stats from CGSpace:
$ ./run.sh -s http://localhost:8081/solr/statistics -f &#39;time:2019-*&#39; -a export -o statistics-2019.json -k uid
$ zstd statistics-2019.json
" />
2021-11-01 09:49:21 +01:00
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2021-11/" />
2021-11-03 14:56:15 +01:00
<meta property="article:published_time" content="2021-11-02T22:27:07+02:00" />
<meta property="article:modified_time" content="2021-11-01T10:49:21+02:00" />
2021-11-01 09:49:21 +01:00
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="November, 2021"/>
2021-11-03 14:56:15 +01:00
<meta name="twitter:description" content="2021-11-02
I experimented with manually sharding the Solr statistics on DSpace Test
First I exported all the 2019 stats from CGSpace:
$ ./run.sh -s http://localhost:8081/solr/statistics -f &#39;time:2019-*&#39; -a export -o statistics-2019.json -k uid
$ zstd statistics-2019.json
"/>
2021-11-01 09:49:21 +01:00
<meta name="generator" content="Hugo 0.88.1" />
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "November, 2021",
"url": "https://alanorth.github.io/cgspace-notes/2021-11/",
2021-11-03 14:56:15 +01:00
"wordCount": "238",
"datePublished": "2021-11-02T22:27:07+02:00",
"dateModified": "2021-11-01T10:49:21+02:00",
2021-11-01 09:49:21 +01:00
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"keywords": "Notes"
}
</script>
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2021-11/">
<title>November, 2021 | CGSpace Notes</title>
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.beb8012edc08ba10be012f079d618dc243812267efe62e11f22fe49618f976a4.css" rel="stylesheet" integrity="sha256-vrgBLtwIuhC&#43;AS8HnWGNwkOBImfv5i4R8i/klhj5dqQ=" crossorigin="anonymous">
<!-- minified Font Awesome for SVG icons -->
<script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f5072c55a0721857184db93a50561d7dc13975b4de2e19db7f81eb5f3fa57270.js" integrity="sha256-9QcsVaByGFcYTbk6UFYdfcE5dbTeLhnbf4HrXz&#43;lcnA=" crossorigin="anonymous"></script>
<!-- RSS 2.0 feed -->
</head>
<body>
<div class="blog-masthead">
<div class="container">
<nav class="nav blog-nav">
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
</nav>
</div>
</div>
<header class="blog-header">
<div class="container">
<h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
<p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
</div>
</header>
<div class="container">
<div class="row">
<div class="col-sm-8 blog-main">
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2021-11/">November, 2021</a></h2>
<p class="blog-post-meta">
2021-11-03 14:56:15 +01:00
<time datetime="2021-11-02T22:27:07+02:00">Tue Nov 02, 2021</time>
2021-11-01 09:49:21 +01:00
in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
2021-11-03 14:56:15 +01:00
<h2 id="2021-11-02">2021-11-02</h2>
<ul>
<li>I experimented with manually sharding the Solr statistics on DSpace Test</li>
<li>First I exported all the 2019 stats from CGSpace:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ ./run.sh -s http://localhost:8081/solr/statistics -f 'time:2019-*' -a export -o statistics-2019.json -k uid
$ zstd statistics-2019.json
</code></pre><ul>
<li>Then on DSpace Test I created a <code>statistics-2019</code> core with the same instance dir as the main <code>statistics</code> core (as <a href="https://wiki.lyrasis.org/display/DSDOC6x/Testing+Solr+Shards">illustrated in the DSpace docs</a>)</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ mkdir -p /home/dspacetest.cgiar.org/solr/statistics-2019/data
# create core in Solr admin
$ curl -s &quot;http://localhost:8081/solr/statistics/update?softCommit=true&quot; -H &quot;Content-Type: text/xml&quot; --data-binary &quot;&lt;delete&gt;&lt;query&gt;time:2019-*&lt;/query&gt;&lt;/delete&gt;&quot;
$ ./run.sh -s http://localhost:8081/solr/statistics-2019 -a import -o statistics-2019.json -k uid
</code></pre><ul>
<li>The key thing above is that you create the core in the Solr admin UI, but the data directory must already exist so you have to do that first in the file system</li>
<li>I restarted the server after the import was done to see if the cores would come back up OK
<ul>
<li>I remember last time I tried this the manually created statistics cores didn&rsquo;t come back up after I rebooted, but this time they did</li>
</ul>
</li>
</ul>
<h2 id="2021-11-03">2021-11-03</h2>
<ul>
<li>While inspecting the stats for the new statistics-2019 shard on DSpace Test I noticed that I can&rsquo;t find any stats via the DSpace Statistics API for an item that <em>should</em> have some
<ul>
<li>I checked on CGSpace&rsquo;s and I can&rsquo;t find them there either, but I see them in Solr when I query in the admin UI</li>
<li>I need to debug that, but it doesn&rsquo;t seem to be related to the sharding&hellip;</li>
</ul>
</li>
</ul>
2021-11-01 09:49:21 +01:00
<!-- raw HTML omitted -->
</article>
</div> <!-- /.blog-main -->
<aside class="col-sm-3 ml-auto blog-sidebar">
<section class="sidebar-module">
<h4>Recent Posts</h4>
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2021-11/">November, 2021</a></li>
<li><a href="/cgspace-notes/2021-10/">October, 2021</a></li>
<li><a href="/cgspace-notes/2021-09/">September, 2021</a></li>
<li><a href="/cgspace-notes/2021-08/">August, 2021</a></li>
<li><a href="/cgspace-notes/2021-07/">July, 2021</a></li>
</ol>
</section>
<section class="sidebar-module">
<h4>Links</h4>
<ol class="list-unstyled">
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
</ol>
</section>
</aside>
</div> <!-- /.row -->
</div> <!-- /.container -->
<footer class="blog-footer">
<p dir="auto">
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
</p>
<p>
<a href="#">Back to top</a>
</p>
</footer>
</body>
</html>