Add notes for 2021-02-10

This commit is contained in:
Alan Orth 2021-02-11 09:36:09 +02:00
parent 04145d22b6
commit ef56896102
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
23 changed files with 200 additions and 29 deletions

View File

@ -234,4 +234,89 @@ $ curl -XDELETE 'http://localhost:9200/openrxv-items-temp'
# start indexing in AReS
```
## 2021-02-08
- Finish rotating the AReS indexes after the harvesting last night:
```console
$ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*&pretty'
{
"count" : 100983,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
}
}
$ curl -X PUT "localhost:9200/openrxv-items/_settings" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write":true}}'
$ curl -s -X POST http://localhost:9200/openrxv-items/_clone/openrxv-items-2021-02-08
$ curl -XDELETE 'http://localhost:9200/openrxv-items'
$ curl -X PUT "localhost:9200/openrxv-items-temp/_settings" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": true}}'
$ curl -s -X POST http://localhost:9200/openrxv-items-temp/_clone/openrxv-items
$ curl -XDELETE 'http://localhost:9200/openrxv-items-temp'
$ curl -XDELETE 'http://localhost:9200/openrxv-items-2021-02-08'
```
## 2021-02-10
- Talk to Abdullah from CodeObia about a few of the issues we filed on OpenRXV
- Verify a fix he made for the issue with spaces in template file names
- He says that the [Angular expressions support should be enabled](https://github.com/ilri/OpenRXV/issues/49), but I tried it and couldn't get a few simple examples working
- Atmire responded to a few issues today:
- First, the one about a crash while exporting a community CSV, which appears to be a [vanilla DSpace issue with a patch in DSpace 6.4](https://jira.lyrasis.org/browse/DS-4211)
- Second, the MQM batch consumer issue, which appears to be harmless log spam in *most* cases and they have sent a patch that adjusts the logging as such
- Third, a version bump for CUA to fix the `java.lang.UnsupportedOperationException: Multiple update components target the same field:solr_update_time_stamp` error
- I cherry-picked the patches for DS-4111 and was able to export the ILRI community finally, but the results are almost twice as many items as in the community!
- Investigating with csvcut I see there are some ids that appear up to five, six, or seven times!
```console
$ csvcut -c id /tmp/2021-02-10-ILRI.csv | sed '1d' | wc -l
30354
$ csvcut -c id /tmp/2021-02-10-ILRI.csv | sed '1d' | sort -u | wc -l
18555
$ csvcut -c id /tmp/2021-02-10-ILRI.csv | sed '1d' | sort | uniq -c | sort -h | tail
5 c21a79e5-e24e-4861-aa07-e06703d1deb7
5 c2460aa1-ae28-4003-9a99-2d7c5cd7fd38
5 d73fb3ae-9fac-4f7e-990f-e394f344246c
5 dc0e24fa-b7f5-437e-ac09-e15c0704be00
5 dc50bcca-0abf-473f-8770-69d5ab95cc33
5 e714bdf9-cc0f-4d9a-a808-d572e25c9238
6 7dfd1c61-9e8c-4677-8d41-e1c4b11d867d
6 fb76888c-03ae-4d53-b27d-87d7ca91371a
6 ff42d1e6-c489-492c-a40a-803cabd901ed
7 094e9e1d-09ff-40ca-a6b9-eca580936147
```
- I added a comment to that bug to ask if this is a side effect of the patch
- I started working on tagging pre-2010 ILRI items with license information, like we talked about with Peter and Abenet last week
- Due to the export bug I had to sort and remove duplicates first, then use csvgrep to filter out books and journal articles:
```console
$ csvcut -c 'id,dc.date.issued,dc.date.issued[],dc.date.issued[en_US],dc.rights,dc.rights[],dc.rights[en],dc.rights[en_US],dc.publisher,dc.publisher[],dc.publisher[en_US],dc.type[en_US]' /tmp/2021-02-10-ILRI.csv | csvgrep -c 'dc.type[en_US]' -r '^.+[^(Journal Item|Journal Article|Book|Book Chapter)]'
```
- I imported the CSV into OpenRefine and converted the date text values to date types so I could facet by dates before 2010:
```console
if(diff(value,"01/01/2010".toDate(),"days")<0, true, false)
```
- Then I filtered by publisher to make sure they were only ours:
```console
or(
value.contains("International Livestock Research Institute"),
value.contains("ILRI"),
value.contains("International Livestock Centre for Africa"),
value.contains("ILCA"),
value.contains("ILRAD"),
value.contains("International Laboratory for Research on Animal Diseases")
)
```
- I tagged these pre-2010 items with "Other" if they didn't already have a license
- I checked 2010 to 2015, and 2016 to date, but they were all tagged already!
- In the end I added the "Other" license to 1,523 items from before 2010
<!-- vim: set sw=2 ts=2: -->

View File

@ -32,7 +32,7 @@ $ curl -s &#39;http://localhost:9200/openrxv-items-temp/_count?q=*&amp;pretty&#3
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2021-02/" />
<meta property="article:published_time" content="2021-02-01T10:13:54+02:00" />
<meta property="article:modified_time" content="2021-02-06T14:00:36+02:00" />
<meta property="article:modified_time" content="2021-02-07T16:27:36+02:00" />
@ -70,9 +70,9 @@ $ curl -s &#39;http://localhost:9200/openrxv-items-temp/_count?q=*&amp;pretty&#3
"@type": "BlogPosting",
"headline": "February, 2021",
"url": "https://alanorth.github.io/cgspace-notes/2021-02/",
"wordCount": "1517",
"wordCount": "2017",
"datePublished": "2021-02-01T10:13:54+02:00",
"dateModified": "2021-02-06T14:00:36+02:00",
"dateModified": "2021-02-07T16:27:36+02:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -389,7 +389,93 @@ sys 2m26.050s
</ul>
<pre><code class="language-console" data-lang="console">$ curl -XDELETE 'http://localhost:9200/openrxv-items-temp'
# start indexing in AReS
</code></pre><!-- raw HTML omitted -->
</code></pre><h2 id="2021-02-08">2021-02-08</h2>
<ul>
<li>Finish rotating the AReS indexes after the harvesting last night:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*&amp;pretty'
{
&quot;count&quot; : 100983,
&quot;_shards&quot; : {
&quot;total&quot; : 1,
&quot;successful&quot; : 1,
&quot;skipped&quot; : 0,
&quot;failed&quot; : 0
}
}
$ curl -X PUT &quot;localhost:9200/openrxv-items/_settings&quot; -H 'Content-Type: application/json' -d'{&quot;settings&quot;: {&quot;index.blocks.write&quot;:true}}'
$ curl -s -X POST http://localhost:9200/openrxv-items/_clone/openrxv-items-2021-02-08
$ curl -XDELETE 'http://localhost:9200/openrxv-items'
$ curl -X PUT &quot;localhost:9200/openrxv-items-temp/_settings&quot; -H 'Content-Type: application/json' -d'{&quot;settings&quot;: {&quot;index.blocks.write&quot;: true}}'
$ curl -s -X POST http://localhost:9200/openrxv-items-temp/_clone/openrxv-items
$ curl -XDELETE 'http://localhost:9200/openrxv-items-temp'
$ curl -XDELETE 'http://localhost:9200/openrxv-items-2021-02-08'
</code></pre><h2 id="2021-02-10">2021-02-10</h2>
<ul>
<li>Talk to Abdullah from CodeObia about a few of the issues we filed on OpenRXV
<ul>
<li>Verify a fix he made for the issue with spaces in template file names</li>
<li>He says that the <a href="https://github.com/ilri/OpenRXV/issues/49">Angular expressions support should be enabled</a>, but I tried it and couldn&rsquo;t get a few simple examples working</li>
</ul>
</li>
<li>Atmire responded to a few issues today:
<ul>
<li>First, the one about a crash while exporting a community CSV, which appears to be a <a href="https://jira.lyrasis.org/browse/DS-4211">vanilla DSpace issue with a patch in DSpace 6.4</a></li>
<li>Second, the MQM batch consumer issue, which appears to be harmless log spam in <em>most</em> cases and they have sent a patch that adjusts the logging as such</li>
<li>Third, a version bump for CUA to fix the <code>java.lang.UnsupportedOperationException: Multiple update components target the same field:solr_update_time_stamp</code> error</li>
</ul>
</li>
<li>I cherry-picked the patches for DS-4111 and was able to export the ILRI community finally, but the results are almost twice as many items as in the community!
<ul>
<li>Investigating with csvcut I see there are some ids that appear up to five, six, or seven times!</li>
</ul>
</li>
</ul>
<pre><code class="language-console" data-lang="console">$ csvcut -c id /tmp/2021-02-10-ILRI.csv | sed '1d' | wc -l
30354
$ csvcut -c id /tmp/2021-02-10-ILRI.csv | sed '1d' | sort -u | wc -l
18555
$ csvcut -c id /tmp/2021-02-10-ILRI.csv | sed '1d' | sort | uniq -c | sort -h | tail
5 c21a79e5-e24e-4861-aa07-e06703d1deb7
5 c2460aa1-ae28-4003-9a99-2d7c5cd7fd38
5 d73fb3ae-9fac-4f7e-990f-e394f344246c
5 dc0e24fa-b7f5-437e-ac09-e15c0704be00
5 dc50bcca-0abf-473f-8770-69d5ab95cc33
5 e714bdf9-cc0f-4d9a-a808-d572e25c9238
6 7dfd1c61-9e8c-4677-8d41-e1c4b11d867d
6 fb76888c-03ae-4d53-b27d-87d7ca91371a
6 ff42d1e6-c489-492c-a40a-803cabd901ed
7 094e9e1d-09ff-40ca-a6b9-eca580936147
</code></pre><ul>
<li>I added a comment to that bug to ask if this is a side effect of the patch</li>
<li>I started working on tagging pre-2010 ILRI items with license information, like we talked about with Peter and Abenet last week
<ul>
<li>Due to the export bug I had to sort and remove duplicates first, then use csvgrep to filter out books and journal articles:</li>
</ul>
</li>
</ul>
<pre><code class="language-console" data-lang="console">$ csvcut -c 'id,dc.date.issued,dc.date.issued[],dc.date.issued[en_US],dc.rights,dc.rights[],dc.rights[en],dc.rights[en_US],dc.publisher,dc.publisher[],dc.publisher[en_US],dc.type[en_US]' /tmp/2021-02-10-ILRI.csv | csvgrep -c 'dc.type[en_US]' -r '^.+[^(Journal Item|Journal Article|Book|Book Chapter)]'
</code></pre><ul>
<li>I imported the CSV into OpenRefine and converted the date text values to date types so I could facet by dates before 2010:</li>
</ul>
<pre><code class="language-console" data-lang="console">if(diff(value,&quot;01/01/2010&quot;.toDate(),&quot;days&quot;)&lt;0, true, false)
</code></pre><ul>
<li>Then I filtered by publisher to make sure they were only ours:</li>
</ul>
<pre><code class="language-console" data-lang="console">or(
value.contains(&quot;International Livestock Research Institute&quot;),
value.contains(&quot;ILRI&quot;),
value.contains(&quot;International Livestock Centre for Africa&quot;),
value.contains(&quot;ILCA&quot;),
value.contains(&quot;ILRAD&quot;),
value.contains(&quot;International Laboratory for Research on Animal Diseases&quot;)
)
</code></pre><ul>
<li>I tagged these pre-2010 items with &ldquo;Other&rdquo; if they didn&rsquo;t already have a license</li>
<li>I checked 2010 to 2015, and 2016 to date, but they were all tagged already!</li>
<li>In the end I added the &ldquo;Other&rdquo; license to 1,523 items from before 2010</li>
</ul>
<!-- raw HTML omitted -->

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
<meta property="og:updated_time" content="2021-02-06T14:00:36+02:00" />
<meta property="og:updated_time" content="2021-02-07T16:27:36+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2021-02-06T14:00:36+02:00" />
<meta property="og:updated_time" content="2021-02-07T16:27:36+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2021-02-06T14:00:36+02:00" />
<meta property="og:updated_time" content="2021-02-07T16:27:36+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2021-02-06T14:00:36+02:00" />
<meta property="og:updated_time" content="2021-02-07T16:27:36+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2021-02-06T14:00:36+02:00" />
<meta property="og:updated_time" content="2021-02-07T16:27:36+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2021-02-06T14:00:36+02:00" />
<meta property="og:updated_time" content="2021-02-07T16:27:36+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2021-02-06T14:00:36+02:00" />
<meta property="og:updated_time" content="2021-02-07T16:27:36+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2021-02-06T14:00:36+02:00" />
<meta property="og:updated_time" content="2021-02-07T16:27:36+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2021-02-06T14:00:36+02:00" />
<meta property="og:updated_time" content="2021-02-07T16:27:36+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2021-02-06T14:00:36+02:00" />
<meta property="og:updated_time" content="2021-02-07T16:27:36+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2021-02-06T14:00:36+02:00" />
<meta property="og:updated_time" content="2021-02-07T16:27:36+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2021-02-06T14:00:36+02:00" />
<meta property="og:updated_time" content="2021-02-07T16:27:36+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2021-02-06T14:00:36+02:00" />
<meta property="og:updated_time" content="2021-02-07T16:27:36+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2021-02-06T14:00:36+02:00" />
<meta property="og:updated_time" content="2021-02-07T16:27:36+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2021-02-06T14:00:36+02:00" />
<meta property="og:updated_time" content="2021-02-07T16:27:36+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2021-02-06T14:00:36+02:00" />
<meta property="og:updated_time" content="2021-02-07T16:27:36+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2021-02-06T14:00:36+02:00" />
<meta property="og:updated_time" content="2021-02-07T16:27:36+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2021-02-06T14:00:36+02:00" />
<meta property="og:updated_time" content="2021-02-07T16:27:36+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2021-02-06T14:00:36+02:00" />
<meta property="og:updated_time" content="2021-02-07T16:27:36+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2021-02-06T14:00:36+02:00" />
<meta property="og:updated_time" content="2021-02-07T16:27:36+02:00" />

View File

@ -4,27 +4,27 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
<lastmod>2021-02-06T14:00:36+02:00</lastmod>
<lastmod>2021-02-07T16:27:36+02:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2021-02-06T14:00:36+02:00</lastmod>
<lastmod>2021-02-07T16:27:36+02:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/2021-02/</loc>
<lastmod>2021-02-06T14:00:36+02:00</lastmod>
<lastmod>2021-02-07T16:27:36+02:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
<lastmod>2021-02-06T14:00:36+02:00</lastmod>
<lastmod>2021-02-07T16:27:36+02:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2021-02-06T14:00:36+02:00</lastmod>
<lastmod>2021-02-07T16:27:36+02:00</lastmod>
</url>
<url>