cgspace-notes/docs/2021-01/index.html

290 lines
10 KiB
HTML

<!DOCTYPE html>
<html lang="en" >
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:title" content="January, 2021" />
<meta property="og:description" content="2021-01-03
Peter notified me that some filters on AReS were broken again
It&rsquo;s the same issue with the field names getting .keyword appended to the end that I already filed an issue on OpenRXV about last month
I fixed the broken filters (careful to not edit any others, lest they break too!)
Fix an issue with start page number for the DSpace REST API and statistics API in OpenRXV
The start page had been &ldquo;1&rdquo; in the UI, but in the backend they were doing some gymnastics to adjust to the zero-based offset/limit/page of the DSpace REST API and the statistics API
I adjusted it to default to 0 and added a note to the admin screen
I realized that this issue was actually causing the first page of 100 statistics to be missing&hellip;
For example, this item has 51 views on CGSpace, but 0 on AReS
" />
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2021-01/" />
<meta property="article:published_time" content="2021-01-03T10:13:54+02:00" />
<meta property="article:modified_time" content="2021-01-04T20:09:02+02:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="January, 2021"/>
<meta name="twitter:description" content="2021-01-03
Peter notified me that some filters on AReS were broken again
It&rsquo;s the same issue with the field names getting .keyword appended to the end that I already filed an issue on OpenRXV about last month
I fixed the broken filters (careful to not edit any others, lest they break too!)
Fix an issue with start page number for the DSpace REST API and statistics API in OpenRXV
The start page had been &ldquo;1&rdquo; in the UI, but in the backend they were doing some gymnastics to adjust to the zero-based offset/limit/page of the DSpace REST API and the statistics API
I adjusted it to default to 0 and added a note to the admin screen
I realized that this issue was actually causing the first page of 100 statistics to be missing&hellip;
For example, this item has 51 views on CGSpace, but 0 on AReS
"/>
<meta name="generator" content="Hugo 0.80.0" />
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "January, 2021",
"url": "https://alanorth.github.io/cgspace-notes/2021-01/",
"wordCount": "514",
"datePublished": "2021-01-03T10:13:54+02:00",
"dateModified": "2021-01-04T20:09:02+02:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"keywords": "Notes"
}
</script>
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2021-01/">
<title>January, 2021 | CGSpace Notes</title>
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.16633182cd803b52b9bf9e29ea1ef4b2e3d460deee0ded49466d7e16e449c158.css" rel="stylesheet" integrity="sha256-FmMxgs2AO1K5v54p6h70suPUYN7uDe1JRm1&#43;FuRJwVg=" crossorigin="anonymous">
<!-- minified Font Awesome for SVG icons -->
<script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.4ed405d7c7002b970d34cbe6026ff44a556b0808cb98a9db4008752110ed964b.js" integrity="sha256-TtQF18cAK5cNNMvmAm/0SlVrCAjLmKnbQAh1IRDtlks=" crossorigin="anonymous"></script>
<!-- RSS 2.0 feed -->
</head>
<body>
<div class="blog-masthead">
<div class="container">
<nav class="nav blog-nav">
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
</nav>
</div>
</div>
<header class="blog-header">
<div class="container">
<h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
<p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
</div>
</header>
<div class="container">
<div class="row">
<div class="col-sm-8 blog-main">
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2021-01/">January, 2021</a></h2>
<p class="blog-post-meta">
<time datetime="2021-01-03T10:13:54+02:00">Sun Jan 03, 2021</time>
in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2021-01-03">2021-01-03</h2>
<ul>
<li>Peter notified me that some filters on AReS were broken again
<ul>
<li>It&rsquo;s the same issue with the field names getting <code>.keyword</code> appended to the end that I already <a href="https://github.com/ilri/OpenRXV/issues/66">filed an issue on OpenRXV about last month</a></li>
<li>I fixed the broken filters (careful to not edit any others, lest they break too!)</li>
</ul>
</li>
<li>Fix an issue with start page number for the DSpace REST API and statistics API in OpenRXV
<ul>
<li>The start page had been &ldquo;1&rdquo; in the UI, but in the backend they were doing some gymnastics to adjust to the zero-based offset/limit/page of the DSpace REST API and the statistics API</li>
<li>I adjusted it to default to 0 and added a note to the admin screen</li>
<li>I realized that this issue was actually causing the first page of 100 statistics to be missing&hellip;</li>
<li>For example, <a href="https://cgspace.cgiar.org/handle/10568/66839">this item</a> has 51 views on CGSpace, but 0 on AReS</li>
</ul>
</li>
</ul>
<ul>
<li>Start a re-index on AReS
<ul>
<li>First delete the old Elasticsearch temp index:</li>
</ul>
</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -XDELETE 'http://localhost:9200/openrxv-items-temp'
# start indexing in AReS
</code></pre><ul>
<li>Then, the next morning when it&rsquo;s done, check the results of the harvesting, backup the current <code>openrxv-items</code> index, and clone the <code>openrxv-items-temp</code> index to <code>openrxv-items</code>:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*&amp;pretty'
{
&quot;count&quot; : 100278,
&quot;_shards&quot; : {
&quot;total&quot; : 1,
&quot;successful&quot; : 1,
&quot;skipped&quot; : 0,
&quot;failed&quot; : 0
}
}
$ curl -X PUT &quot;localhost:9200/openrxv-items/_settings&quot; -H 'Content-Type: application/json' -d'{&quot;settings&quot;: {&quot;index.blocks.write&quot;: true}}'
$ curl -s -X POST http://localhost:9200/openrxv-items/_clone/openrxv-items-2021-01-04
$ curl -XDELETE 'http://localhost:9200/openrxv-items'
$ curl -X PUT &quot;localhost:9200/openrxv-items-temp/_settings&quot; -H 'Content-Type: application/json' -d'{&quot;settings&quot;: {&quot;index.blocks.write&quot;: true}}'
$ curl -s -X POST http://localhost:9200/openrxv-items-temp/_clone/openrxv-items
$ curl -XDELETE 'http://localhost:9200/openrxv-items-temp'
$ curl -XDELETE 'http://localhost:9200/openrxv-items-2021-01-04'
</code></pre><h2 id="2021-01-04">2021-01-04</h2>
<ul>
<li>There is one item that appears twice in AReS: <a href="https://cgspace.cgiar.org/handle/10568/66839">10568/66839</a>
<ul>
<li>If I use the Handle filter I see it twice&hellip; whereas other items don&rsquo;t appear twice</li>
<li>I filed a bug on OpenRXV: <a href="https://github.com/ilri/OpenRXV/issues/67">https://github.com/ilri/OpenRXV/issues/67</a></li>
</ul>
</li>
<li>Help Peter troubleshoot an issue with Altmetric badges on AReS
<ul>
<li>He generated a report of our repository from Altmetric and noticed that many were missing scores despite having scores on CGSpace item pages</li>
<li>AReS harvest Altmetric scores using the Handle prefix (10568) in batch, while CGSpace uses the DOI if it is found, and falls back to using the Handle</li>
<li>I think it&rsquo;s due to the fact that some items were never tweeted, so Altmetric never made the link between the DOI and the Handle</li>
<li>I did some tweets of five items and within an hour or so the DOI API link registers the associated Handle, and within an hour or so the Handle API link is live with the same score</li>
</ul>
</li>
</ul>
<h2 id="2021-01-05">2021-01-05</h2>
<ul>
<li>A user sent me <a href="https://github.com/ilri/dspace-statistics-api/issues/12">feedback about the dspace-statistics-api</a>
<ul>
<li>He noticed that the indexer fails if there are unmigrated legacy records in Solr</li>
<li>I added a UUID filter to the queries in the indexer</li>
</ul>
</li>
<li>I generated a CSV of titles and Handles for 2019 and 2020 items for Peter to Tweet
<ul>
<li>We need to make sure that Altmetric has linked them all with their DOIs</li>
<li>I wrote a quick and dirty script called <a href="https://gist.github.com/alanorth/281b7624301049e8fa91742b9b8c51b9">doi-to-handle.py</a> to read the DOIs from a text file, query the database, and save the handles and titles to a CSV</li>
</ul>
</li>
</ul>
<!-- raw HTML omitted -->
</article>
</div> <!-- /.blog-main -->
<aside class="col-sm-3 ml-auto blog-sidebar">
<section class="sidebar-module">
<h4>Recent Posts</h4>
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2021-01/">January, 2021</a></li>
<li><a href="/cgspace-notes/2020-12/">December, 2020</a></li>
<li><a href="/cgspace-notes/cgspace-dspace6-upgrade/">CGSpace DSpace 6 Upgrade</a></li>
<li><a href="/cgspace-notes/2020-11/">November, 2020</a></li>
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
</ol>
</section>
<section class="sidebar-module">
<h4>Links</h4>
<ol class="list-unstyled">
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
</ol>
</section>
</aside>
</div> <!-- /.row -->
</div> <!-- /.container -->
<footer class="blog-footer">
<p dir="auto">
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
</p>
<p>
<a href="#">Back to top</a>
</p>
</footer>
</body>
</html>