Add notes for 2020-12-20

This commit is contained in:
Alan Orth 2020-12-20 16:47:45 +02:00
parent a84f008b09
commit 461428c926
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
24 changed files with 154 additions and 32 deletions

View File

@ -535,4 +535,63 @@ $ ./fix-metadata-values.py -i 2020-12-17-update-ILRI-author.csv -db dspace -u ds
$ csvcut -c 'dc.identifier.citation[en_US],dc.identifier.uri,dc.identifier.uri[],dc.identifier.uri[en_US],dc.date.issued,dc.date.issued[],dc.date.issued[en_US],cg.identifier.status[en_US]' ~/Downloads/10568-80099.csv | csvgrep -c 'cg.identifier.status[en_US]' -m 'Limited Access' | csvgrep -c 'dc.date.issued' -m 2020 -c 'dc.date.issued[]' -m 2020 -c 'dc.date.issued[en_US]' -m 2020 > /tmp/limited-2020.csv
```
## 2020-12-18
- I added support for indexing community views and downloads to [dspace-statistics-api](https://github.com/ilri/dspace-statistics-api)
- I still have to add the API endpoints to make the stats available
- Also, I played a little bit with Swagger via [falcon-swagger-ui](https://github.com/rdidyk/falcon-swagger-ui) and I think I can get that working for better API documentation / testing
- Atmire sent some feedback on the DeduplicateValuesProcessor
- They confirm that it should process _all_ duplicates, not just those in `owningComm` and `owningColl`
- They asked me to try it again on DSpace Test now that I've resync'd the Solr statistics cores from production
- I started processing the statistics core on DSpace Test
## 2020-12-20
- The DeduplicateValuesProcessor has been running on DSpace Test since two days ago and it almost completed its second twelve-hour run, but crashed near the end:
```console
...
Run 1 — 100% — 8,230,000/8,239,228 docs — 39s — 9h 8m 31s
Exception: Java heap space
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
at org.noggit.CharArr.toString(CharArr.java:164)
at org.apache.solr.common.util.JavaBinCodec.readStr(JavaBinCodec.java:599)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:180)
at org.apache.solr.common.util.JavaBinCodec.readArray(JavaBinCodec.java:492)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186)
at org.apache.solr.common.util.JavaBinCodec.readSolrDocument(JavaBinCodec.java:360)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:219)
at org.apache.solr.common.util.JavaBinCodec.readArray(JavaBinCodec.java:492)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186)
at org.apache.solr.common.util.JavaBinCodec.readSolrDocumentList(JavaBinCodec.java:374)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221)
at org.apache.solr.common.util.JavaBinCodec.readOrderedMap(JavaBinCodec.java:125)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188)
at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:116)
at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:43)
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:528)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.getNextSetOfSolrDocuments(SourceFile:392)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.performRun(SourceFile:157)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.update(SourceFile:128)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI.main(SourceFile:78)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)
```
- That was with a JVM heap of 512m
- I looked in Solr and found dozens of duplicates of each field again...
- I sent [feedback to Atmire](https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=839)
- I finished the technical work on adding community and collection support to the DSpace Statistics API
- I still need to update the tests as well as the documentation
<!-- vim: set sw=2 ts=2: -->

View File

@ -60,7 +60,7 @@ I don&rsquo;t see anything interesting in the web server logs around that time t
"@type": "BlogPosting",
"headline": "January, 2019",
"url": "https://alanorth.github.io/cgspace-notes/2019-01/",
"wordCount": "5532",
"wordCount": "5531",
"datePublished": "2019-01-02T09:48:30+02:00",
"dateModified": "2020-10-19T15:23:30+03:00",
"author": {
@ -949,7 +949,7 @@ $ http 'http://localhost:8081/solr/statistics/select?indent=on&amp;rows=0&amp;q=
<ul>
<li>Peter noticed that some goo.gl links in our tweets from Feedburner are broken, for example this one from last week:</li>
</ul>
<blockquote class="twitter-tweet"><p lang="en" dir="ltr"><a href="https://twitter.com/hashtag/ILRI?src=hash&amp;ref_src=twsrc%5Etfw">#ILRI</a> research: Towards unlocking the potential of the hides and skins value chain in Somaliland <a href="https://t.co/EZH7ALW4dp">https://t.co/EZH7ALW4dp</a></p>&mdash; ILRI Communications (@ILRI) <a href="https://twitter.com/ILRI/status/1086330519904673793?ref_src=twsrc%5Etfw">January 18, 2019</a></blockquote>
<blockquote class="twitter-tweet"><p lang="en" dir="ltr"><a href="https://twitter.com/hashtag/ILRI?src=hash&amp;ref_src=twsrc%5Etfw">#ILRI</a> research: Towards unlocking the potential of the hides and skins value chain in Somaliland <a href="https://t.co/EZH7ALW4dp">https://t.co/EZH7ALW4dp</a></p>&mdash; ILRI.org (@ILRI) <a href="https://twitter.com/ILRI/status/1086330519904673793?ref_src=twsrc%5Etfw">January 18, 2019</a></blockquote>
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<ul>

View File

@ -20,7 +20,7 @@ I started processing those (about 411,000 records):
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2020-12/" />
<meta property="article:published_time" content="2020-12-01T11:32:54+02:00" />
<meta property="article:modified_time" content="2020-12-16T12:08:00+02:00" />
<meta property="article:modified_time" content="2020-12-17T16:50:56+02:00" />
@ -46,9 +46,9 @@ I started processing those (about 411,000 records):
"@type": "BlogPosting",
"headline": "December, 2020",
"url": "https://alanorth.github.io/cgspace-notes/2020-12/",
"wordCount": "2970",
"wordCount": "3242",
"datePublished": "2020-12-01T11:32:54+02:00",
"dateModified": "2020-12-16T12:08:00+02:00",
"dateModified": "2020-12-17T16:50:56+02:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -667,8 +667,71 @@ $ ./fix-metadata-values.py -i 2020-12-17-update-ILRI-author.csv -db dspace -u ds
</code></pre><p>$ csvcut -c &lsquo;dc.identifier.citation[en_US],dc.identifier.uri,dc.identifier.uri[],dc.identifier.uri[en_US],dc.date.issued,dc.date.issued[],dc.date.issued[en_US],cg.identifier.status[en_US]&rsquo; ~/Downloads/10568-80099.csv | csvgrep -c &lsquo;cg.identifier.status[en_US]&rsquo; -m &lsquo;Limited Access&rsquo; | csvgrep -c &lsquo;dc.date.issued&rsquo; -m 2020 -c &lsquo;dc.date.issued[]&rsquo; -m 2020 -c &lsquo;dc.date.issued[en_US]&rsquo; -m 2020 &gt; /tmp/limited-2020.csv</p>
<pre><code>
&lt;!-- vim: set sw=2 ts=2: --&gt;
</code></pre>
## 2020-12-18
- I added support for indexing community views and downloads to [dspace-statistics-api](https://github.com/ilri/dspace-statistics-api)
- I still have to add the API endpoints to make the stats available
- Also, I played a little bit with Swagger via [falcon-swagger-ui](https://github.com/rdidyk/falcon-swagger-ui) and I think I can get that working for better API documentation / testing
- Atmire sent some feedback on the DeduplicateValuesProcessor
- They confirm that it should process _all_ duplicates, not just those in `owningComm` and `owningColl`
- They asked me to try it again on DSpace Test now that I've resync'd the Solr statistics cores from production
- I started processing the statistics core on DSpace Test
## 2020-12-20
- The DeduplicateValuesProcessor has been running on DSpace Test since two days ago and it almost completed its second twelve-hour run, but crashed near the end:
```console
...
Run 1 — 100% — 8,230,000/8,239,228 docs — 39s — 9h 8m 31s
Exception: Java heap space
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.&lt;init&gt;(String.java:207)
at org.noggit.CharArr.toString(CharArr.java:164)
at org.apache.solr.common.util.JavaBinCodec.readStr(JavaBinCodec.java:599)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:180)
at org.apache.solr.common.util.JavaBinCodec.readArray(JavaBinCodec.java:492)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186)
at org.apache.solr.common.util.JavaBinCodec.readSolrDocument(JavaBinCodec.java:360)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:219)
at org.apache.solr.common.util.JavaBinCodec.readArray(JavaBinCodec.java:492)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186)
at org.apache.solr.common.util.JavaBinCodec.readSolrDocumentList(JavaBinCodec.java:374)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221)
at org.apache.solr.common.util.JavaBinCodec.readOrderedMap(JavaBinCodec.java:125)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188)
at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:116)
at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:43)
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:528)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.getNextSetOfSolrDocuments(SourceFile:392)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.performRun(SourceFile:157)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.update(SourceFile:128)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI.main(SourceFile:78)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)
</code></pre><ul>
<li>That was with a JVM heap of 512m</li>
<li>I looked in Solr and found dozens of duplicates of each field again&hellip;
<ul>
<li>I sent <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=839">feedback to Atmire</a></li>
</ul>
</li>
<li>I finished the technical work on adding community and collection support to the DSpace Statistics API
<ul>
<li>I still need to update the tests as well as the documentation</li>
</ul>
</li>
</ul>
<!-- raw HTML omitted -->

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
<meta property="og:updated_time" content="2020-12-16T12:08:00+02:00" />
<meta property="og:updated_time" content="2020-12-17T16:50:56+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2020-12-16T12:08:00+02:00" />
<meta property="og:updated_time" content="2020-12-17T16:50:56+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2020-12-16T12:08:00+02:00" />
<meta property="og:updated_time" content="2020-12-17T16:50:56+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2020-12-16T12:08:00+02:00" />
<meta property="og:updated_time" content="2020-12-17T16:50:56+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2020-12-16T12:08:00+02:00" />
<meta property="og:updated_time" content="2020-12-17T16:50:56+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2020-12-16T12:08:00+02:00" />
<meta property="og:updated_time" content="2020-12-17T16:50:56+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-12-16T12:08:00+02:00" />
<meta property="og:updated_time" content="2020-12-17T16:50:56+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-12-16T12:08:00+02:00" />
<meta property="og:updated_time" content="2020-12-17T16:50:56+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-12-16T12:08:00+02:00" />
<meta property="og:updated_time" content="2020-12-17T16:50:56+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-12-16T12:08:00+02:00" />
<meta property="og:updated_time" content="2020-12-17T16:50:56+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-12-16T12:08:00+02:00" />
<meta property="og:updated_time" content="2020-12-17T16:50:56+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-12-16T12:08:00+02:00" />
<meta property="og:updated_time" content="2020-12-17T16:50:56+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-12-16T12:08:00+02:00" />
<meta property="og:updated_time" content="2020-12-17T16:50:56+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-12-16T12:08:00+02:00" />
<meta property="og:updated_time" content="2020-12-17T16:50:56+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-12-16T12:08:00+02:00" />
<meta property="og:updated_time" content="2020-12-17T16:50:56+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-12-16T12:08:00+02:00" />
<meta property="og:updated_time" content="2020-12-17T16:50:56+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-12-16T12:08:00+02:00" />
<meta property="og:updated_time" content="2020-12-17T16:50:56+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-12-16T12:08:00+02:00" />
<meta property="og:updated_time" content="2020-12-17T16:50:56+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-12-16T12:08:00+02:00" />
<meta property="og:updated_time" content="2020-12-17T16:50:56+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-12-16T12:08:00+02:00" />
<meta property="og:updated_time" content="2020-12-17T16:50:56+02:00" />

View File

@ -4,27 +4,27 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
<lastmod>2020-12-16T12:08:00+02:00</lastmod>
<lastmod>2020-12-17T16:50:56+02:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2020-12-16T12:08:00+02:00</lastmod>
<lastmod>2020-12-17T16:50:56+02:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/2020-12/</loc>
<lastmod>2020-12-16T12:08:00+02:00</lastmod>
<lastmod>2020-12-17T16:50:56+02:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
<lastmod>2020-12-16T12:08:00+02:00</lastmod>
<lastmod>2020-12-17T16:50:56+02:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2020-12-16T12:08:00+02:00</lastmod>
<lastmod>2020-12-17T16:50:56+02:00</lastmod>
</url>
<url>