Add notes for 2018-11-20

This commit is contained in:
2018-11-20 12:25:12 +02:00
parent 4e5e1ad4a6
commit d2e4a490ff
3 changed files with 109 additions and 8 deletions

View File

@ -332,4 +332,50 @@ $ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = 57 AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2018-11-19-top-1500-subject.csv WITH CSV HEADER;
```
## 2018-11-20
- The Discovery re-indexing on CGSpace never finished yesterday... the command died after six minutes
- The `dspace.log.2018-11-19` shows this at the time:
```
2018-11-19 15:23:04,221 ERROR com.atmire.dspace.discovery.AtmireSolrService @ DSpace kernel cannot be null
java.lang.IllegalStateException: DSpace kernel cannot be null
at org.dspace.utils.DSpace.getServiceManager(DSpace.java:63)
at org.dspace.utils.DSpace.getSingletonService(DSpace.java:87)
at com.atmire.dspace.discovery.AtmireSolrService.buildDocument(AtmireSolrService.java:102)
at com.atmire.dspace.discovery.AtmireSolrService.indexContent(AtmireSolrService.java:815)
at com.atmire.dspace.discovery.AtmireSolrService.updateIndex(AtmireSolrService.java:884)
at org.dspace.discovery.SolrServiceImpl.createIndex(SolrServiceImpl.java:370)
at org.dspace.discovery.IndexClient.main(IndexClient.java:117)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
2018-11-19 15:23:04,223 INFO com.atmire.dspace.discovery.AtmireSolrService @ Processing (4629 of 76007): 72731
```
- I looked in the Solr log around that time and I don't see anything...
- Working on Udana's WLE records from last month, first the sixteen records in [2018-11-20 RDL Temp](https://dspacetest.cgiar.org/handle/10568/108254)
- these items will go to the [Restoring Degraded Landscapes collection](https://dspacetest.cgiar.org/handle/10568/81592)
- a few items missing DOIs, but they are easily available on the publication page
- clean up DOIs to use "https://doi.org" format
- clean up some cg.identifier.url to remove unneccessary query strings
- remove columns with no metadata (river basin, place, target audience, isbn, uri, publisher, ispartofseries, subject)
- fix column with invalid spaces in metadata field name (cg. subject. wle)
- trim and collapse whitespace in all fields
- remove some weird Unicode characters (0xfffd) from abstracts, citations, and titles using Open Refine: `value.replace('<27>','')`
- add dc.rights to some fields that I noticed while checking DOIs
- Then the 24 records in [2018-11-20 VRC Temp](https://dspacetest.cgiar.org/handle/10568/108271)
- these items will go to the [Variability, Risks and Competing Uses collection](https://dspacetest.cgiar.org/handle/10568/81589)
- trim and collapse whitespace in all fields (lots in WLE subject!)
- clean up some cg.identifier.url fields that had unneccessary anchors in their links
- clean up DOIs to use "https://doi.org" format
- fix column with invalid spaces in metadata field name (cg. subject. wle)
- remove columns with no metadata (place, target audience, isbn, uri, publisher, ispartofseries, subject)
- remove some weird Unicode characters (0xfffd) from abstracts, citations, and titles using Open Refine: `value.replace('<27>','')`
- I notice a few items using DOIs pointing at ICARDA's DSpace like: https://doi.org/20.500.11766/8178, which then points at the "real" DOI on the publisher's site... these should be using the real DOI instead of ICARDA's "fake" Handle DOI
- Some items missing DOIs, but they clearly have them if you look at the publisher's site
<!-- vim: set sw=2 ts=2: -->