mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-19 05:07:03 +01:00
361 lines
24 KiB
Markdown
361 lines
24 KiB
Markdown
+++
|
|
date = "2016-12-02T10:43:00+03:00"
|
|
author = "Alan Orth"
|
|
title = "December, 2016"
|
|
tags = ["Notes"]
|
|
|
|
+++
|
|
## 2016-12-02
|
|
|
|
- CGSpace was down for five hours in the morning while I was sleeping
|
|
- While looking in the logs for errors, I see tons of warnings about Atmire MQM:
|
|
|
|
```
|
|
2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
|
|
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail="dc.title", transactionID="TX157907838689377964651674089851855413607")
|
|
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail="THUMBNAIL", transactionID="TX157907838689377964651674089851855413607")
|
|
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail="-1", transactionID="TX157907838689377964651674089851855413607")
|
|
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
|
|
```
|
|
|
|
- I see thousands of them in the logs for the last few months, so it's not related to the DSpace 5.5 upgrade
|
|
- I've raised a ticket with Atmire to ask
|
|
- Another worrying error from dspace.log is:
|
|
|
|
```
|
|
org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceObjectDatasetGenerator.toDatasetQuery(Lorg/dspace/core/Context;)Lcom/atmire/statistics/content/DatasetQuery;
|
|
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:972)
|
|
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
|
|
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
|
|
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:789)
|
|
at javax.servlet.http.HttpServlet.service(HttpServlet.java:646)
|
|
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
|
|
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
|
|
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
|
|
at org.dspace.app.xmlui.cocoon.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:111)
|
|
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
|
|
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
|
|
at org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter.doFilter(DSpaceCocoonServletFilter.java:274)
|
|
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
|
|
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
|
|
at org.dspace.app.xmlui.cocoon.servlet.multipart.DSpaceMultipartFilter.doFilter(DSpaceMultipartFilter.java:119)
|
|
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
|
|
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
|
|
at org.dspace.utils.servlet.DSpaceWebappServletFilter.doFilter(DSpaceWebappServletFilter.java:78)
|
|
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
|
|
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
|
|
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
|
|
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
|
|
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:501)
|
|
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
|
|
at com.googlecode.psiprobe.Tomcat70AgentValve.invoke(Tomcat70AgentValve.java:44)
|
|
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
|
|
at org.apache.catalina.valves.CrawlerSessionManagerValve.invoke(CrawlerSessionManagerValve.java:180)
|
|
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
|
|
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
|
|
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
|
|
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1041)
|
|
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
|
|
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:313)
|
|
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
|
|
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
|
|
at java.lang.Thread.run(Thread.java:745)
|
|
Caused by: java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceObjectDatasetGenerator.toDatasetQuery(Lorg/dspace/core/Context;)Lcom/atmire/statistics/content/DatasetQuery;
|
|
at com.atmire.statistics.generator.TopNDSODatasetGenerator.toDatasetQuery(SourceFile:39)
|
|
at com.atmire.statistics.display.StatisticsDataVisitsMultidata.createDataset(SourceFile:108)
|
|
at org.dspace.statistics.content.StatisticsDisplay.createDataset(SourceFile:384)
|
|
at org.dspace.statistics.content.StatisticsDisplay.getDataset(SourceFile:404)
|
|
at com.atmire.statistics.mostpopular.JSONStatsMostPopularGenerator.generateJsonData(SourceFile:170)
|
|
at com.atmire.statistics.mostpopular.JSONStatsMostPopularGenerator.generate(SourceFile:246)
|
|
at com.atmire.app.xmlui.aspect.statistics.JSONStatsMostPopular.generate(JSONStatsMostPopular.java:145)
|
|
at sun.reflect.GeneratedMethodAccessor296.invoke(Unknown Source)
|
|
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
|
at java.lang.reflect.Method.invoke(Method.java:498)
|
|
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
|
|
at com.sun.proxy.$Proxy96.process(Unknown Source)
|
|
at org.apache.cocoon.components.treeprocessor.sitemap.ReadNode.invoke(ReadNode.java:94)
|
|
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:55)
|
|
at org.apache.cocoon.components.treeprocessor.sitemap.MatchNode.invoke(MatchNode.java:87)
|
|
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:55)
|
|
at org.apache.cocoon.components.treeprocessor.sitemap.MatchNode.invoke(MatchNode.java:87)
|
|
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
|
|
at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(PipelineNode.java:143)
|
|
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
|
|
at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(PipelinesNode.java:81)
|
|
at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:239)
|
|
at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:171)
|
|
at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcessor.java:247)
|
|
at org.apache.cocoon.components.treeprocessor.sitemap.MountNode.invoke(MountNode.java:117)
|
|
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:55)
|
|
at org.apache.cocoon.components.treeprocessor.sitemap.MatchNode.invoke(MatchNode.java:87)
|
|
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
|
|
at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(PipelineNode.java:143)
|
|
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
|
|
at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(PipelinesNode.java:81)
|
|
at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:239)
|
|
at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:171)
|
|
at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcessor.java:247)
|
|
at org.apache.cocoon.servlet.RequestProcessor.process(RequestProcessor.java:351)
|
|
at org.apache.cocoon.servlet.RequestProcessor.service(RequestProcessor.java:169)
|
|
at org.apache.cocoon.sitemap.SitemapServlet.service(SitemapServlet.java:84)
|
|
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
|
|
at org.apache.cocoon.servletservice.ServletServiceContext$PathDispatcher.forward(ServletServiceContext.java:468)
|
|
at org.apache.cocoon.servletservice.ServletServiceContext$PathDispatcher.forward(ServletServiceContext.java:443)
|
|
at org.apache.cocoon.servletservice.spring.ServletFactoryBean$ServiceInterceptor.invoke(ServletFactoryBean.java:264)
|
|
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
|
|
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
|
|
at com.sun.proxy.$Proxy89.service(Unknown Source)
|
|
at org.dspace.springmvc.CocoonView.render(CocoonView.java:113)
|
|
at org.springframework.web.servlet.DispatcherServlet.render(DispatcherServlet.java:1180)
|
|
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:950)
|
|
... 35 more
|
|
```
|
|
|
|
- The first error I see in dspace.log this morning is:
|
|
|
|
```
|
|
2016-12-02 03:00:46,656 ERROR org.dspace.authority.AuthorityValueFinder @ anonymous::Error while retrieving AuthorityValue from solr:query\colon; id\colon;"b0b541c1-ec15-48bf-9209-6dbe8e338cdc"
|
|
org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://localhost:8081/solr/authority
|
|
```
|
|
|
|
- Looking through DSpace's solr log I see that about 20 seconds before this, there were a few 30+ KiB solr queries
|
|
- The last logs here right before Solr became unresponsive (and right after I restarted it five hours later) were:
|
|
|
|
```
|
|
2016-12-02 03:00:42,606 INFO org.apache.solr.core.SolrCore @ [statistics] webapp=/solr path=/select params={q=containerItem:72828+AND+type:0&shards=localhost:8081/solr/statistics-2010,localhost:8081/solr/statistics&fq=-isInternal:true&fq=-(author_mtdt:"CGIAR\+Institutional\+Learning\+and\+Change\+Initiative"++AND+subject_mtdt:"PARTNERSHIPS"+AND+subject_mtdt:"RESEARCH"+AND+subject_mtdt:"AGRICULTURE"+AND+subject_mtdt:"DEVELOPMENT"++AND+iso_mtdt:"en"+)&rows=0&wt=javabin&version=2} hits=0 status=0 QTime=19
|
|
2016-12-02 08:28:23,908 INFO org.apache.solr.servlet.SolrDispatchFilter @ SolrDispatchFilter.init()
|
|
```
|
|
|
|
- DSpace's own Solr logs don't give IP addresses, so I will have to enable Nginx's logging of `/solr` so I can see where this request came from
|
|
- I enabled logging of `/rest/` and I think I'll leave it on for good
|
|
- Also, the disk is nearly full because of log file issues, so I'm running some compression on DSpace logs
|
|
- Normally these stay uncompressed for a month just in case we need to look at them, so now I've just compressed anything older than 2 weeks so we can get some disk space back
|
|
|
|
## 2016-12-04
|
|
|
|
- I got a weird report from the CGSpace checksum checker this morning
|
|
- It says 732 bitstreams have potential issues, for example:
|
|
|
|
```
|
|
------------------------------------------------
|
|
Bitstream Id = 6
|
|
Process Start Date = Dec 4, 2016
|
|
Process End Date = Dec 4, 2016
|
|
Checksum Expected = a1d9eef5e2d85f50f67ce04d0329e96a
|
|
Checksum Calculated = a1d9eef5e2d85f50f67ce04d0329e96a
|
|
Result = Bitstream marked deleted in bitstream table
|
|
-----------------------------------------------
|
|
...
|
|
------------------------------------------------
|
|
Bitstream Id = 77581
|
|
Process Start Date = Dec 4, 2016
|
|
Process End Date = Dec 4, 2016
|
|
Checksum Expected = 9959301aa4ca808d00957dff88214e38
|
|
Checksum Calculated =
|
|
Result = The bitstream could not be found
|
|
-----------------------------------------------
|
|
```
|
|
|
|
- The first one seems ok, but I don't know what to make of the second one...
|
|
- I had a look and there is indeed no file with the second checksum in the assetstore (ie, looking in `[dspace-dir]/assetstore/99/59/30/...`)
|
|
- For what it's worth, there is no item on DSpace Test or S3 backups with that checksum either...
|
|
- In other news, I'm looking at JVM settings from the Solr 4.10.2 release, from `bin/solr.in.sh`:
|
|
|
|
```
|
|
# These GC settings have shown to work well for a number of common Solr workloads
|
|
GC_TUNE="-XX:-UseSuperWord \
|
|
-XX:NewRatio=3 \
|
|
-XX:SurvivorRatio=4 \
|
|
-XX:TargetSurvivorRatio=90 \
|
|
-XX:MaxTenuringThreshold=8 \
|
|
-XX:+UseConcMarkSweepGC \
|
|
-XX:+UseParNewGC \
|
|
-XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 \
|
|
-XX:+CMSScavengeBeforeRemark \
|
|
-XX:PretenureSizeThreshold=64m \
|
|
-XX:CMSFullGCsBeforeCompaction=1 \
|
|
-XX:+UseCMSInitiatingOccupancyOnly \
|
|
-XX:CMSInitiatingOccupancyFraction=50 \
|
|
-XX:CMSTriggerPermRatio=80 \
|
|
-XX:CMSMaxAbortablePrecleanTime=6000 \
|
|
-XX:+CMSParallelRemarkEnabled \
|
|
-XX:+ParallelRefProcEnabled \
|
|
-XX:+AggressiveOpts"
|
|
```
|
|
|
|
- I need to try these because they are recommended by the Solr project itself
|
|
- Also, as always, I need to read [Shawn Heisey's wiki page on Solr](https://wiki.apache.org/solr/ShawnHeisey)
|
|
|
|
## 2016-12-05
|
|
|
|
- I did some basic benchmarking on a local DSpace before and after the JVM settings above, but there wasn't anything amazingly obvious
|
|
- I want to make the changes on DSpace Test and monitor the JVM heap graphs for a few days to see if they change the JVM GC patterns or anything (munin graphs)
|
|
- Spin up new CGSpace server on Linode
|
|
- I did a few traceroutes from Jordan and Kenya and it seems that Linode's Frankfurt datacenter is a few less hops and perhaps less packet loss than the London one, so I put the new server in Frankfurt
|
|
- Do initial provisioning
|
|
- Atmire responded about the MQM warnings in the DSpace logs
|
|
- Apparently we need to change the batch edit consumers in `dspace/config/dspace.cfg`:
|
|
|
|
```
|
|
event.consumer.batchedit.filters = Community|Collection+Create
|
|
```
|
|
|
|
- I haven't tested it yet, but I created a pull request: [#289](https://github.com/ilri/DSpace/pull/289)
|
|
|
|
## 2016-12-06
|
|
|
|
- Some author authority corrections and name standardizations for Peter:
|
|
|
|
```
|
|
dspace=# update metadatavalue set authority='b041f2f4-19e7-4113-b774-0439baabd197', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Mora Benard%';
|
|
UPDATE 11
|
|
dspace=# update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-6fd5-4b43-9363-58d18e7952c9', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Hoek, R%';
|
|
UPDATE 36
|
|
dspace=# update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-6fd5-4b43-9363-58d18e7952c9', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like '%an der Hoek%' and text_value !~ '^.*W\.?$';
|
|
UPDATE 14
|
|
dspace=# update metadatavalue set authority='18349f29-61b1-44d7-ac60-89e55546e812', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thorne, P%';
|
|
UPDATE 42
|
|
dspace=# update metadatavalue set authority='0d8369bb-57f7-4b2f-92aa-af820b183aca', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thornton, P%';
|
|
UPDATE 360
|
|
dspace=# update metadatavalue set text_value='Grace, Delia', authority='0b4fcbc1-d930-4319-9b4d-ea1553cca70b', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
|
|
UPDATE 561
|
|
```
|
|
|
|
- Pay attention to the regex to prevent false positives in tricky cases with Dutch names!
|
|
- I will run these updates on DSpace Test and then force a Discovery reindex, and then run them on CGSpace next week
|
|
- More work on the KM4Dev Journal article
|
|
- In other news, it seems the batch edit patch is working, there are no more WARN errors in the logs and the batch edit seems to work
|
|
- I need to check the CGSpace logs to see if there are still errors there, and then deploy/monitor it there
|
|
- Paola from CCAFS mentioned she also has the "take task" bug on CGSpace
|
|
- Reading about [`shared_buffers` in PostgreSQL configuration](https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html) (default is 128MB)
|
|
- Looks like we have ~5GB of memory used by caches on the test server (after OS and JVM heap!), so we might as well bump up the buffers for Postgres
|
|
- The docs say a good starting point for a dedicated server is 25% of the system RAM, and our server isn't dedicated (also runs Solr, which can benefit from OS cache) so let's try 1024MB
|
|
- In other news, the authority reindexing keeps crashing (I was manually running it after the author updates above):
|
|
|
|
```
|
|
$ time JAVA_OPTS="-Xms768m -Xmx768m -Dfile.encoding=UTF-8" /home/dspacetest.cgiar.org/bin/dspace index-authority
|
|
Retrieving all data
|
|
Initialize org.dspace.authority.indexer.DSpaceAuthorityIndexer
|
|
Exception: null
|
|
java.lang.NullPointerException
|
|
at org.dspace.authority.AuthorityValueGenerator.generateRaw(AuthorityValueGenerator.java:82)
|
|
at org.dspace.authority.AuthorityValueGenerator.generate(AuthorityValueGenerator.java:39)
|
|
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.prepareNextValue(DSpaceAuthorityIndexer.java:201)
|
|
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:132)
|
|
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
|
|
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
|
|
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:159)
|
|
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
|
|
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
|
|
at org.dspace.authority.indexer.AuthorityIndexClient.main(AuthorityIndexClient.java:61)
|
|
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
|
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
|
|
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
|
at java.lang.reflect.Method.invoke(Method.java:498)
|
|
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
|
|
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
|
|
|
|
real 8m39.913s
|
|
user 1m54.190s
|
|
sys 0m22.647s
|
|
```
|
|
|
|
## 2016-12-07
|
|
|
|
- For what it's worth, after running the same SQL updates on my local test server, `index-authority` runs and completes just fine
|
|
- I will have to test more
|
|
- Anyways, I noticed that some of the authority values I set actually have versions of author names we don't want, ie "Grace, D."
|
|
- For example, do a Solr query for "first_name:Grace" and look at the results
|
|
- Querying that ID shows the fields that need to be changed:
|
|
|
|
```
|
|
{
|
|
"responseHeader": {
|
|
"status": 0,
|
|
"QTime": 1,
|
|
"params": {
|
|
"q": "id:0b4fcbc1-d930-4319-9b4d-ea1553cca70b",
|
|
"indent": "true",
|
|
"wt": "json",
|
|
"_": "1481102189244"
|
|
}
|
|
},
|
|
"response": {
|
|
"numFound": 1,
|
|
"start": 0,
|
|
"docs": [
|
|
{
|
|
"id": "0b4fcbc1-d930-4319-9b4d-ea1553cca70b",
|
|
"field": "dc_contributor_author",
|
|
"value": "Grace, D.",
|
|
"deleted": false,
|
|
"creation_date": "2016-11-10T15:13:40.318Z",
|
|
"last_modified_date": "2016-11-10T15:13:40.318Z",
|
|
"authority_type": "person",
|
|
"first_name": "D.",
|
|
"last_name": "Grace"
|
|
}
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
- I think I can just update the `value`, `first_name`, and `last_name` fields...
|
|
- The update syntax should be something like this, but I'm getting errors from Solr:
|
|
|
|
```
|
|
$ curl 'localhost:8081/solr/authority/update?commit=true&wt=json&indent=true' -H 'Content-type:application/json' -d '[{"id":"1","price":{"set":100}}]'
|
|
{
|
|
"responseHeader":{
|
|
"status":400,
|
|
"QTime":0},
|
|
"error":{
|
|
"msg":"Unexpected character '[' (code 91) in prolog; expected '<'\n at [row,col {unknown-source}]: [1,1]",
|
|
"code":400}}
|
|
```
|
|
|
|
- When I try using the XML format I get an error that the `updateLog` needs to be configured for that core
|
|
- Maybe I can just remove the authority UUID from the records, run the indexing again so it creates a new one for each name variant, then match them correctly?
|
|
|
|
|
|
```
|
|
dspace=# update metadatavalue set authority=null, confidence=-1 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
|
|
UPDATE 561
|
|
```
|
|
|
|
- Then I'll reindex discovery and authority and see how the authority Solr core looks
|
|
- After this, now there are authorities for some of the "Grace, D." and "Grace, Delia" text_values in the database (the first version is actually the same authority that already exists in the core, so it was just added back to some text_values, but the second one is new):
|
|
|
|
```
|
|
$ curl 'localhost:8081/solr/authority/select?q=id%3A18ea1525-2513-430a-8817-a834cd733fbc&wt=json&indent=true'
|
|
{
|
|
"responseHeader":{
|
|
"status":0,
|
|
"QTime":0,
|
|
"params":{
|
|
"q":"id:18ea1525-2513-430a-8817-a834cd733fbc",
|
|
"indent":"true",
|
|
"wt":"json"}},
|
|
"response":{"numFound":1,"start":0,"docs":[
|
|
{
|
|
"id":"18ea1525-2513-430a-8817-a834cd733fbc",
|
|
"field":"dc_contributor_author",
|
|
"value":"Grace, Delia",
|
|
"deleted":false,
|
|
"creation_date":"2016-12-07T10:54:34.356Z",
|
|
"last_modified_date":"2016-12-07T10:54:34.356Z",
|
|
"authority_type":"person",
|
|
"first_name":"Delia",
|
|
"last_name":"Grace"}]
|
|
}}
|
|
```
|
|
- So now I could set them all to this ID and the name would be ok, but there has to be a better way!
|
|
- In this case it seems that since there were also two different IDs in the original database, I just picked the wrong one!
|
|
- Better to use:
|
|
|
|
```
|
|
dspace#= update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
|
|
```
|
|
|
|
- This proves that unifying author name varieties in authorities is easy, but fixing the name in the authority is tricky!
|