mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-12-24 05:54:29 +01:00
Add notes for 2018-11-20
This commit is contained in:
parent
4e5e1ad4a6
commit
d2e4a490ff
@ -332,4 +332,50 @@ $ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
||||
dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = 57 AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2018-11-19-top-1500-subject.csv WITH CSV HEADER;
|
||||
```
|
||||
|
||||
## 2018-11-20
|
||||
|
||||
- The Discovery re-indexing on CGSpace never finished yesterday... the command died after six minutes
|
||||
- The `dspace.log.2018-11-19` shows this at the time:
|
||||
|
||||
```
|
||||
2018-11-19 15:23:04,221 ERROR com.atmire.dspace.discovery.AtmireSolrService @ DSpace kernel cannot be null
|
||||
java.lang.IllegalStateException: DSpace kernel cannot be null
|
||||
at org.dspace.utils.DSpace.getServiceManager(DSpace.java:63)
|
||||
at org.dspace.utils.DSpace.getSingletonService(DSpace.java:87)
|
||||
at com.atmire.dspace.discovery.AtmireSolrService.buildDocument(AtmireSolrService.java:102)
|
||||
at com.atmire.dspace.discovery.AtmireSolrService.indexContent(AtmireSolrService.java:815)
|
||||
at com.atmire.dspace.discovery.AtmireSolrService.updateIndex(AtmireSolrService.java:884)
|
||||
at org.dspace.discovery.SolrServiceImpl.createIndex(SolrServiceImpl.java:370)
|
||||
at org.dspace.discovery.IndexClient.main(IndexClient.java:117)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
|
||||
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
||||
at java.lang.reflect.Method.invoke(Method.java:498)
|
||||
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
|
||||
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
|
||||
2018-11-19 15:23:04,223 INFO com.atmire.dspace.discovery.AtmireSolrService @ Processing (4629 of 76007): 72731
|
||||
```
|
||||
|
||||
- I looked in the Solr log around that time and I don't see anything...
|
||||
- Working on Udana's WLE records from last month, first the sixteen records in [2018-11-20 RDL Temp](https://dspacetest.cgiar.org/handle/10568/108254)
|
||||
- these items will go to the [Restoring Degraded Landscapes collection](https://dspacetest.cgiar.org/handle/10568/81592)
|
||||
- a few items missing DOIs, but they are easily available on the publication page
|
||||
- clean up DOIs to use "https://doi.org" format
|
||||
- clean up some cg.identifier.url to remove unneccessary query strings
|
||||
- remove columns with no metadata (river basin, place, target audience, isbn, uri, publisher, ispartofseries, subject)
|
||||
- fix column with invalid spaces in metadata field name (cg. subject. wle)
|
||||
- trim and collapse whitespace in all fields
|
||||
- remove some weird Unicode characters (0xfffd) from abstracts, citations, and titles using Open Refine: `value.replace('<27>','')`
|
||||
- add dc.rights to some fields that I noticed while checking DOIs
|
||||
- Then the 24 records in [2018-11-20 VRC Temp](https://dspacetest.cgiar.org/handle/10568/108271)
|
||||
- these items will go to the [Variability, Risks and Competing Uses collection](https://dspacetest.cgiar.org/handle/10568/81589)
|
||||
- trim and collapse whitespace in all fields (lots in WLE subject!)
|
||||
- clean up some cg.identifier.url fields that had unneccessary anchors in their links
|
||||
- clean up DOIs to use "https://doi.org" format
|
||||
- fix column with invalid spaces in metadata field name (cg. subject. wle)
|
||||
- remove columns with no metadata (place, target audience, isbn, uri, publisher, ispartofseries, subject)
|
||||
- remove some weird Unicode characters (0xfffd) from abstracts, citations, and titles using Open Refine: `value.replace('<27>','')`
|
||||
- I notice a few items using DOIs pointing at ICARDA's DSpace like: https://doi.org/20.500.11766/8178, which then points at the "real" DOI on the publisher's site... these should be using the real DOI instead of ICARDA's "fake" Handle DOI
|
||||
- Some items missing DOIs, but they clearly have them if you look at the publisher's site
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
@ -21,7 +21,7 @@ Today these are the top 10 IPs:
|
||||
" />
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-11/" /><meta property="article:published_time" content="2018-11-01T16:41:30+02:00"/>
|
||||
<meta property="article:modified_time" content="2018-11-19T17:17:04+02:00"/>
|
||||
<meta property="article:modified_time" content="2018-11-19T17:25:08+02:00"/>
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="November, 2018"/>
|
||||
@ -48,9 +48,9 @@ Today these are the top 10 IPs:
|
||||
"@type": "BlogPosting",
|
||||
"headline": "November, 2018",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2018-11/",
|
||||
"wordCount": "1774",
|
||||
"wordCount": "2122",
|
||||
"datePublished": "2018-11-01T16:41:30+02:00",
|
||||
"dateModified": "2018-11-19T17:17:04+02:00",
|
||||
"dateModified": "2018-11-19T17:25:08+02:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -490,6 +490,61 @@ $ ./delete-metadata-values.py -i 2018-11-19-delete-agrovoc.csv -f dc.subject -m
|
||||
<pre><code>dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = 57 AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2018-11-19-top-1500-subject.csv WITH CSV HEADER;
|
||||
</code></pre>
|
||||
|
||||
<h2 id="2018-11-20">2018-11-20</h2>
|
||||
|
||||
<ul>
|
||||
<li>The Discovery re-indexing on CGSpace never finished yesterday… the command died after six minutes</li>
|
||||
<li>The <code>dspace.log.2018-11-19</code> shows this at the time:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>2018-11-19 15:23:04,221 ERROR com.atmire.dspace.discovery.AtmireSolrService @ DSpace kernel cannot be null
|
||||
java.lang.IllegalStateException: DSpace kernel cannot be null
|
||||
at org.dspace.utils.DSpace.getServiceManager(DSpace.java:63)
|
||||
at org.dspace.utils.DSpace.getSingletonService(DSpace.java:87)
|
||||
at com.atmire.dspace.discovery.AtmireSolrService.buildDocument(AtmireSolrService.java:102)
|
||||
at com.atmire.dspace.discovery.AtmireSolrService.indexContent(AtmireSolrService.java:815)
|
||||
at com.atmire.dspace.discovery.AtmireSolrService.updateIndex(AtmireSolrService.java:884)
|
||||
at org.dspace.discovery.SolrServiceImpl.createIndex(SolrServiceImpl.java:370)
|
||||
at org.dspace.discovery.IndexClient.main(IndexClient.java:117)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
|
||||
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
||||
at java.lang.reflect.Method.invoke(Method.java:498)
|
||||
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
|
||||
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
|
||||
2018-11-19 15:23:04,223 INFO com.atmire.dspace.discovery.AtmireSolrService @ Processing (4629 of 76007): 72731
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>I looked in the Solr log around that time and I don’t see anything…</li>
|
||||
<li>Working on Udana’s WLE records from last month, first the sixteen records in <a href="https://dspacetest.cgiar.org/handle/10568/108254">2018-11-20 RDL Temp</a>
|
||||
|
||||
<ul>
|
||||
<li>these items will go to the <a href="https://dspacetest.cgiar.org/handle/10568/81592">Restoring Degraded Landscapes collection</a></li>
|
||||
<li>a few items missing DOIs, but they are easily available on the publication page</li>
|
||||
<li>clean up DOIs to use “<a href="https://doi.org"">https://doi.org"</a> format</li>
|
||||
<li>clean up some cg.identifier.url to remove unneccessary query strings</li>
|
||||
<li>remove columns with no metadata (river basin, place, target audience, isbn, uri, publisher, ispartofseries, subject)</li>
|
||||
<li>fix column with invalid spaces in metadata field name (cg. subject. wle)</li>
|
||||
<li>trim and collapse whitespace in all fields</li>
|
||||
<li>remove some weird Unicode characters (0xfffd) from abstracts, citations, and titles using Open Refine: <code>value.replace('<27>','')</code></li>
|
||||
<li>add dc.rights to some fields that I noticed while checking DOIs</li>
|
||||
</ul></li>
|
||||
<li>Then the 24 records in <a href="https://dspacetest.cgiar.org/handle/10568/108271">2018-11-20 VRC Temp</a>
|
||||
|
||||
<ul>
|
||||
<li>these items will go to the <a href="https://dspacetest.cgiar.org/handle/10568/81589">Variability, Risks and Competing Uses collection</a></li>
|
||||
<li>trim and collapse whitespace in all fields (lots in WLE subject!)</li>
|
||||
<li>clean up some cg.identifier.url fields that had unneccessary anchors in their links</li>
|
||||
<li>clean up DOIs to use “<a href="https://doi.org"">https://doi.org"</a> format</li>
|
||||
<li>fix column with invalid spaces in metadata field name (cg. subject. wle)</li>
|
||||
<li>remove columns with no metadata (place, target audience, isbn, uri, publisher, ispartofseries, subject)</li>
|
||||
<li>remove some weird Unicode characters (0xfffd) from abstracts, citations, and titles using Open Refine: <code>value.replace('<27>','')</code></li>
|
||||
<li>I notice a few items using DOIs pointing at ICARDA’s DSpace like: <a href="https://doi.org/20.500.11766/8178">https://doi.org/20.500.11766/8178</a>, which then points at the “real” DOI on the publisher’s site… these should be using the real DOI instead of ICARDA’s “fake” Handle DOI</li>
|
||||
<li>Some items missing DOIs, but they clearly have them if you look at the publisher’s site</li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
||||
|
||||
|
@ -4,7 +4,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2018-11/</loc>
|
||||
<lastmod>2018-11-19T17:17:04+02:00</lastmod>
|
||||
<lastmod>2018-11-19T17:25:08+02:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
@ -194,7 +194,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||
<lastmod>2018-11-19T17:17:04+02:00</lastmod>
|
||||
<lastmod>2018-11-19T17:25:08+02:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
@ -205,7 +205,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||
<lastmod>2018-11-19T17:17:04+02:00</lastmod>
|
||||
<lastmod>2018-11-19T17:25:08+02:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
@ -217,13 +217,13 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||
<lastmod>2018-11-19T17:17:04+02:00</lastmod>
|
||||
<lastmod>2018-11-19T17:25:08+02:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||
<lastmod>2018-11-19T17:17:04+02:00</lastmod>
|
||||
<lastmod>2018-11-19T17:25:08+02:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user