Add notes for 2018-11-20

This commit is contained in:
Alan Orth 2018-11-20 12:25:12 +02:00
parent 4e5e1ad4a6
commit d2e4a490ff
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
3 changed files with 109 additions and 8 deletions

View File

@ -332,4 +332,50 @@ $ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = 57 AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2018-11-19-top-1500-subject.csv WITH CSV HEADER;
```
## 2018-11-20
- The Discovery re-indexing on CGSpace never finished yesterday... the command died after six minutes
- The `dspace.log.2018-11-19` shows this at the time:
```
2018-11-19 15:23:04,221 ERROR com.atmire.dspace.discovery.AtmireSolrService @ DSpace kernel cannot be null
java.lang.IllegalStateException: DSpace kernel cannot be null
at org.dspace.utils.DSpace.getServiceManager(DSpace.java:63)
at org.dspace.utils.DSpace.getSingletonService(DSpace.java:87)
at com.atmire.dspace.discovery.AtmireSolrService.buildDocument(AtmireSolrService.java:102)
at com.atmire.dspace.discovery.AtmireSolrService.indexContent(AtmireSolrService.java:815)
at com.atmire.dspace.discovery.AtmireSolrService.updateIndex(AtmireSolrService.java:884)
at org.dspace.discovery.SolrServiceImpl.createIndex(SolrServiceImpl.java:370)
at org.dspace.discovery.IndexClient.main(IndexClient.java:117)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
2018-11-19 15:23:04,223 INFO com.atmire.dspace.discovery.AtmireSolrService @ Processing (4629 of 76007): 72731
```
- I looked in the Solr log around that time and I don't see anything...
- Working on Udana's WLE records from last month, first the sixteen records in [2018-11-20 RDL Temp](https://dspacetest.cgiar.org/handle/10568/108254)
- these items will go to the [Restoring Degraded Landscapes collection](https://dspacetest.cgiar.org/handle/10568/81592)
- a few items missing DOIs, but they are easily available on the publication page
- clean up DOIs to use "https://doi.org" format
- clean up some cg.identifier.url to remove unneccessary query strings
- remove columns with no metadata (river basin, place, target audience, isbn, uri, publisher, ispartofseries, subject)
- fix column with invalid spaces in metadata field name (cg. subject. wle)
- trim and collapse whitespace in all fields
- remove some weird Unicode characters (0xfffd) from abstracts, citations, and titles using Open Refine: `value.replace('<27>','')`
- add dc.rights to some fields that I noticed while checking DOIs
- Then the 24 records in [2018-11-20 VRC Temp](https://dspacetest.cgiar.org/handle/10568/108271)
- these items will go to the [Variability, Risks and Competing Uses collection](https://dspacetest.cgiar.org/handle/10568/81589)
- trim and collapse whitespace in all fields (lots in WLE subject!)
- clean up some cg.identifier.url fields that had unneccessary anchors in their links
- clean up DOIs to use "https://doi.org" format
- fix column with invalid spaces in metadata field name (cg. subject. wle)
- remove columns with no metadata (place, target audience, isbn, uri, publisher, ispartofseries, subject)
- remove some weird Unicode characters (0xfffd) from abstracts, citations, and titles using Open Refine: `value.replace('<27>','')`
- I notice a few items using DOIs pointing at ICARDA's DSpace like: https://doi.org/20.500.11766/8178, which then points at the "real" DOI on the publisher's site... these should be using the real DOI instead of ICARDA's "fake" Handle DOI
- Some items missing DOIs, but they clearly have them if you look at the publisher's site
<!-- vim: set sw=2 ts=2: -->

View File

@ -21,7 +21,7 @@ Today these are the top 10 IPs:
" />
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-11/" /><meta property="article:published_time" content="2018-11-01T16:41:30&#43;02:00"/>
<meta property="article:modified_time" content="2018-11-19T17:17:04&#43;02:00"/>
<meta property="article:modified_time" content="2018-11-19T17:25:08&#43;02:00"/>
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="November, 2018"/>
@ -48,9 +48,9 @@ Today these are the top 10 IPs:
"@type": "BlogPosting",
"headline": "November, 2018",
"url": "https://alanorth.github.io/cgspace-notes/2018-11/",
"wordCount": "1774",
"wordCount": "2122",
"datePublished": "2018-11-01T16:41:30&#43;02:00",
"dateModified": "2018-11-19T17:17:04&#43;02:00",
"dateModified": "2018-11-19T17:25:08&#43;02:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -490,6 +490,61 @@ $ ./delete-metadata-values.py -i 2018-11-19-delete-agrovoc.csv -f dc.subject -m
<pre><code>dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = 57 AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2018-11-19-top-1500-subject.csv WITH CSV HEADER;
</code></pre>
<h2 id="2018-11-20">2018-11-20</h2>
<ul>
<li>The Discovery re-indexing on CGSpace never finished yesterday&hellip; the command died after six minutes</li>
<li>The <code>dspace.log.2018-11-19</code> shows this at the time:</li>
</ul>
<pre><code>2018-11-19 15:23:04,221 ERROR com.atmire.dspace.discovery.AtmireSolrService @ DSpace kernel cannot be null
java.lang.IllegalStateException: DSpace kernel cannot be null
at org.dspace.utils.DSpace.getServiceManager(DSpace.java:63)
at org.dspace.utils.DSpace.getSingletonService(DSpace.java:87)
at com.atmire.dspace.discovery.AtmireSolrService.buildDocument(AtmireSolrService.java:102)
at com.atmire.dspace.discovery.AtmireSolrService.indexContent(AtmireSolrService.java:815)
at com.atmire.dspace.discovery.AtmireSolrService.updateIndex(AtmireSolrService.java:884)
at org.dspace.discovery.SolrServiceImpl.createIndex(SolrServiceImpl.java:370)
at org.dspace.discovery.IndexClient.main(IndexClient.java:117)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
2018-11-19 15:23:04,223 INFO com.atmire.dspace.discovery.AtmireSolrService @ Processing (4629 of 76007): 72731
</code></pre>
<ul>
<li>I looked in the Solr log around that time and I don&rsquo;t see anything&hellip;</li>
<li>Working on Udana&rsquo;s WLE records from last month, first the sixteen records in <a href="https://dspacetest.cgiar.org/handle/10568/108254">2018-11-20 RDL Temp</a>
<ul>
<li>these items will go to the <a href="https://dspacetest.cgiar.org/handle/10568/81592">Restoring Degraded Landscapes collection</a></li>
<li>a few items missing DOIs, but they are easily available on the publication page</li>
<li>clean up DOIs to use &ldquo;<a href="https://doi.org&quot;">https://doi.org&quot;</a> format</li>
<li>clean up some cg.identifier.url to remove unneccessary query strings</li>
<li>remove columns with no metadata (river basin, place, target audience, isbn, uri, publisher, ispartofseries, subject)</li>
<li>fix column with invalid spaces in metadata field name (cg. subject. wle)</li>
<li>trim and collapse whitespace in all fields</li>
<li>remove some weird Unicode characters (0xfffd) from abstracts, citations, and titles using Open Refine: <code>value.replace('<27>','')</code></li>
<li>add dc.rights to some fields that I noticed while checking DOIs</li>
</ul></li>
<li>Then the 24 records in <a href="https://dspacetest.cgiar.org/handle/10568/108271">2018-11-20 VRC Temp</a>
<ul>
<li>these items will go to the <a href="https://dspacetest.cgiar.org/handle/10568/81589">Variability, Risks and Competing Uses collection</a></li>
<li>trim and collapse whitespace in all fields (lots in WLE subject!)</li>
<li>clean up some cg.identifier.url fields that had unneccessary anchors in their links</li>
<li>clean up DOIs to use &ldquo;<a href="https://doi.org&quot;">https://doi.org&quot;</a> format</li>
<li>fix column with invalid spaces in metadata field name (cg. subject. wle)</li>
<li>remove columns with no metadata (place, target audience, isbn, uri, publisher, ispartofseries, subject)</li>
<li>remove some weird Unicode characters (0xfffd) from abstracts, citations, and titles using Open Refine: <code>value.replace('<27>','')</code></li>
<li>I notice a few items using DOIs pointing at ICARDA&rsquo;s DSpace like: <a href="https://doi.org/20.500.11766/8178">https://doi.org/20.500.11766/8178</a>, which then points at the &ldquo;real&rdquo; DOI on the publisher&rsquo;s site&hellip; these should be using the real DOI instead of ICARDA&rsquo;s &ldquo;fake&rdquo; Handle DOI</li>
<li>Some items missing DOIs, but they clearly have them if you look at the publisher&rsquo;s site</li>
</ul></li>
</ul>
<!-- vim: set sw=2 ts=2: -->

View File

@ -4,7 +4,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/2018-11/</loc>
<lastmod>2018-11-19T17:17:04+02:00</lastmod>
<lastmod>2018-11-19T17:25:08+02:00</lastmod>
</url>
<url>
@ -194,7 +194,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2018-11-19T17:17:04+02:00</lastmod>
<lastmod>2018-11-19T17:25:08+02:00</lastmod>
<priority>0</priority>
</url>
@ -205,7 +205,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
<lastmod>2018-11-19T17:17:04+02:00</lastmod>
<lastmod>2018-11-19T17:25:08+02:00</lastmod>
<priority>0</priority>
</url>
@ -217,13 +217,13 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2018-11-19T17:17:04+02:00</lastmod>
<lastmod>2018-11-19T17:25:08+02:00</lastmod>
<priority>0</priority>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
<lastmod>2018-11-19T17:17:04+02:00</lastmod>
<lastmod>2018-11-19T17:25:08+02:00</lastmod>
<priority>0</priority>
</url>