December, 2016
2016-12-02
- CGSpace was down for five hours in the morning while I was sleeping
While looking in the logs for errors, I see tons of warnings about Atmire MQM:
2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607") 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail="dc.title", transactionID="TX157907838689377964651674089851855413607") 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail="THUMBNAIL", transactionID="TX157907838689377964651674089851855413607") 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail="-1", transactionID="TX157907838689377964651674089851855413607") 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade
I’ve raised a ticket with Atmire to ask
Another worrying error from dspace.log is:
org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceObjectDatasetGenerator.toDatasetQuery(Lorg/dspace/core/Context;)Lcom/atmire/statistics/content/DatasetQuery;
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:972)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:789)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:646)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.dspace.app.xmlui.cocoon.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:111)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter.doFilter(DSpaceCocoonServletFilter.java:274)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.dspace.app.xmlui.cocoon.servlet.multipart.DSpaceMultipartFilter.doFilter(DSpaceMultipartFilter.java:119)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.dspace.utils.servlet.DSpaceWebappServletFilter.doFilter(DSpaceWebappServletFilter.java:78)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:501)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
at com.googlecode.psiprobe.Tomcat70AgentValve.invoke(Tomcat70AgentValve.java:44)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at org.apache.catalina.valves.CrawlerSessionManagerValve.invoke(CrawlerSessionManagerValve.java:180)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1041)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:313)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceObjectDatasetGenerator.toDatasetQuery(Lorg/dspace/core/Context;)Lcom/atmire/statistics/content/DatasetQuery;
at com.atmire.statistics.generator.TopNDSODatasetGenerator.toDatasetQuery(SourceFile:39)
at com.atmire.statistics.display.StatisticsDataVisitsMultidata.createDataset(SourceFile:108)
at org.dspace.statistics.content.StatisticsDisplay.createDataset(SourceFile:384)
at org.dspace.statistics.content.StatisticsDisplay.getDataset(SourceFile:404)
at com.atmire.statistics.mostpopular.JSONStatsMostPopularGenerator.generateJsonData(SourceFile:170)
at com.atmire.statistics.mostpopular.JSONStatsMostPopularGenerator.generate(SourceFile:246)
at com.atmire.app.xmlui.aspect.statistics.JSONStatsMostPopular.generate(JSONStatsMostPopular.java:145)
at sun.reflect.GeneratedMethodAccessor296.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
at com.sun.proxy.$Proxy96.process(Unknown Source)
at org.apache.cocoon.components.treeprocessor.sitemap.ReadNode.invoke(ReadNode.java:94)
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:55)
at org.apache.cocoon.components.treeprocessor.sitemap.MatchNode.invoke(MatchNode.java:87)
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:55)
at org.apache.cocoon.components.treeprocessor.sitemap.MatchNode.invoke(MatchNode.java:87)
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(PipelineNode.java:143)
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(PipelinesNode.java:81)
at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:239)
at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:171)
at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcessor.java:247)
at org.apache.cocoon.components.treeprocessor.sitemap.MountNode.invoke(MountNode.java:117)
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:55)
at org.apache.cocoon.components.treeprocessor.sitemap.MatchNode.invoke(MatchNode.java:87)
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(PipelineNode.java:143)
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(PipelinesNode.java:81)
at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:239)
at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:171)
at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcessor.java:247)
at org.apache.cocoon.servlet.RequestProcessor.process(RequestProcessor.java:351)
at org.apache.cocoon.servlet.RequestProcessor.service(RequestProcessor.java:169)
at org.apache.cocoon.sitemap.SitemapServlet.service(SitemapServlet.java:84)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
at org.apache.cocoon.servletservice.ServletServiceContext$PathDispatcher.forward(ServletServiceContext.java:468)
at org.apache.cocoon.servletservice.ServletServiceContext$PathDispatcher.forward(ServletServiceContext.java:443)
at org.apache.cocoon.servletservice.spring.ServletFactoryBean$ServiceInterceptor.invoke(ServletFactoryBean.java:264)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
at com.sun.proxy.$Proxy89.service(Unknown Source)
at org.dspace.springmvc.CocoonView.render(CocoonView.java:113)
at org.springframework.web.servlet.DispatcherServlet.render(DispatcherServlet.java:1180)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:950)
... 35 more
The first error I see in dspace.log this morning is:
2016-12-02 03:00:46,656 ERROR org.dspace.authority.AuthorityValueFinder @ anonymous::Error while retrieving AuthorityValue from solr:query\colon; id\colon;"b0b541c1-ec15-48bf-9209-6dbe8e338cdc" org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://localhost:8081/solr/authority
Looking through DSpace’s solr log I see that about 20 seconds before this, there were a few 30+ KiB solr queries
The last logs here right before Solr became unresponsive (and right after I restarted it five hours later) were:
2016-12-02 03:00:42,606 INFO org.apache.solr.core.SolrCore @ [statistics] webapp=/solr path=/select params={q=containerItem:72828+AND+type:0&shards=localhost:8081/solr/statistics-2010,localhost:8081/solr/statistics&fq=-isInternal:true&fq=-(author_mtdt:"CGIAR\+Institutional\+Learning\+and\+Change\+Initiative"++AND+subject_mtdt:"PARTNERSHIPS"+AND+subject_mtdt:"RESEARCH"+AND+subject_mtdt:"AGRICULTURE"+AND+subject_mtdt:"DEVELOPMENT"++AND+iso_mtdt:"en"+)&rows=0&wt=javabin&version=2} hits=0 status=0 QTime=19 2016-12-02 08:28:23,908 INFO org.apache.solr.servlet.SolrDispatchFilter @ SolrDispatchFilter.init()
DSpace’s own Solr logs don’t give IP addresses, so I will have to enable Nginx’s logging of
/solr
so I can see where this request came fromI enabled logging of
/rest/
and I think I’ll leave it on for goodAlso, the disk is nearly full because of log file issues, so I’m running some compression on DSpace logs
Normally these stay uncompressed for a month just in case we need to look at them, so now I’ve just compressed anything older than 2 weeks so we can get some disk space back
2016-12-04
- I got a weird report from the CGSpace checksum checker this morning
It says 732 bitstreams have potential issues, for example:
------------------------------------------------ Bitstream Id = 6 Process Start Date = Dec 4, 2016 Process End Date = Dec 4, 2016 Checksum Expected = a1d9eef5e2d85f50f67ce04d0329e96a Checksum Calculated = a1d9eef5e2d85f50f67ce04d0329e96a Result = Bitstream marked deleted in bitstream table ----------------------------------------------- ... ------------------------------------------------ Bitstream Id = 77581 Process Start Date = Dec 4, 2016 Process End Date = Dec 4, 2016 Checksum Expected = 9959301aa4ca808d00957dff88214e38 Checksum Calculated = Result = The bitstream could not be found -----------------------------------------------
The first one seems ok, but I don’t know what to make of the second one…
I had a look and there is indeed no file with the second checksum in the assetstore (ie, looking in
[dspace-dir]/assetstore/99/59/30/...
)For what it’s worth, there is no item on DSpace Test or S3 backups with that checksum either…
In other news, I’m looking at JVM settings from the Solr 4.10.2 release, from
bin/solr.in.sh
:# These GC settings have shown to work well for a number of common Solr workloads GC_TUNE="-XX:-UseSuperWord \ -XX:NewRatio=3 \ -XX:SurvivorRatio=4 \ -XX:TargetSurvivorRatio=90 \ -XX:MaxTenuringThreshold=8 \ -XX:+UseConcMarkSweepGC \ -XX:+UseParNewGC \ -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 \ -XX:+CMSScavengeBeforeRemark \ -XX:PretenureSizeThreshold=64m \ -XX:CMSFullGCsBeforeCompaction=1 \ -XX:+UseCMSInitiatingOccupancyOnly \ -XX:CMSInitiatingOccupancyFraction=50 \ -XX:CMSTriggerPermRatio=80 \ -XX:CMSMaxAbortablePrecleanTime=6000 \ -XX:+CMSParallelRemarkEnabled \ -XX:+ParallelRefProcEnabled \ -XX:+AggressiveOpts"
I need to try these because they are recommended by the Solr project itself
Also, as always, I need to read Shawn Heisey’s wiki page on Solr
2016-12-05
- I did some basic benchmarking on a local DSpace before and after the JVM settings above, but there wasn’t anything amazingly obvious
- I want to make the changes on DSpace Test and monitor the JVM heap graphs for a few days to see if they change the JVM GC patterns or anything (munin graphs)
- Spin up new CGSpace server on Linode
- I did a few traceroutes from Jordan and Kenya and it seems that Linode’s Frankfurt datacenter is a few less hops and perhaps less packet loss than the London one, so I put the new server in Frankfurt
- Do initial provisioning
- Atmire responded about the MQM warnings in the DSpace logs
Apparently we need to change the batch edit consumers in
dspace/config/dspace.cfg
:event.consumer.batchedit.filters = Community|Collection+Create
I haven’t tested it yet, but I created a pull request: #289
2016-12-06
Some author authority corrections and name standardizations for Peter:
dspace=# update metadatavalue set authority='b041f2f4-19e7-4113-b774-0439baabd197', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Mora Benard%'; UPDATE 11 dspace=# update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-6fd5-4b43-9363-58d18e7952c9', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Hoek, R%'; UPDATE 36 dspace=# update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-6fd5-4b43-9363-58d18e7952c9', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like '%an der Hoek%' and text_value !~ '^.*W\.?$'; UPDATE 14 dspace=# update metadatavalue set authority='18349f29-61b1-44d7-ac60-89e55546e812', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thorne, P%'; UPDATE 42 dspace=# update metadatavalue set authority='0d8369bb-57f7-4b2f-92aa-af820b183aca', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thornton, P%'; UPDATE 360 dspace=# update metadatavalue set text_value='Grace, Delia', authority='0b4fcbc1-d930-4319-9b4d-ea1553cca70b', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%'; UPDATE 561
Pay attention to the regex to prevent false positives in tricky cases with Dutch names!
I will run these updates on DSpace Test and then force a Discovery reindex, and then run them on CGSpace next week
More work on the KM4Dev Journal article
In other news, it seems the batch edit patch is working, there are no more WARN errors in the logs and the batch edit seems to work
I need to check the CGSpace logs to see if there are still errors there, and then deploy/monitor it there
Paola from CCAFS mentioned she also has the “take task” bug on CGSpace
Reading about
shared_buffers
in PostgreSQL configuration (default is 128MB)Looks like we have ~5GB of memory used by caches on the test server (after OS and JVM heap!), so we might as well bump up the buffers for Postgres
The docs say a good starting point for a dedicated server is 25% of the system RAM, and our server isn’t dedicated (also runs Solr, which can benefit from OS cache) so let’s try 1024MB
In other news, the authority reindexing keeps crashing (I was manually running it after the author updates above):
$ time JAVA_OPTS="-Xms768m -Xmx768m -Dfile.encoding=UTF-8" /home/dspacetest.cgiar.org/bin/dspace index-authority Retrieving all data Initialize org.dspace.authority.indexer.DSpaceAuthorityIndexer Exception: null java.lang.NullPointerException at org.dspace.authority.AuthorityValueGenerator.generateRaw(AuthorityValueGenerator.java:82) at org.dspace.authority.AuthorityValueGenerator.generate(AuthorityValueGenerator.java:39) at org.dspace.authority.indexer.DSpaceAuthorityIndexer.prepareNextValue(DSpaceAuthorityIndexer.java:201) at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:132) at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144) at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144) at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:159) at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144) at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144) at org.dspace.authority.indexer.AuthorityIndexClient.main(AuthorityIndexClient.java:61) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226) at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78) real 8m39.913s user 1m54.190s sys 0m22.647s
2016-12-07
- For what it’s worth, after running the same SQL updates on my local test server,
index-authority
runs and completes just fine - I will have to test more
- Anyways, I noticed that some of the authority values I set actually have versions of author names we don’t want, ie “Grace, D.”
- For example, do a Solr query for “first_name:Grace” and look at the results
Querying that ID shows the fields that need to be changed:
{ "responseHeader": { "status": 0, "QTime": 1, "params": { "q": "id:0b4fcbc1-d930-4319-9b4d-ea1553cca70b", "indent": "true", "wt": "json", "_": "1481102189244" } }, "response": { "numFound": 1, "start": 0, "docs": [ { "id": "0b4fcbc1-d930-4319-9b4d-ea1553cca70b", "field": "dc_contributor_author", "value": "Grace, D.", "deleted": false, "creation_date": "2016-11-10T15:13:40.318Z", "last_modified_date": "2016-11-10T15:13:40.318Z", "authority_type": "person", "first_name": "D.", "last_name": "Grace" } ] } }
I think I can just update the
value
,first_name
, andlast_name
fields…The update syntax should be something like this, but I’m getting errors from Solr:
$ curl 'localhost:8081/solr/authority/update?commit=true&wt=json&indent=true' -H 'Content-type:application/json' -d '[{"id":"1","price":{"set":100}}]' { "responseHeader":{ "status":400, "QTime":0}, "error":{ "msg":"Unexpected character '[' (code 91) in prolog; expected '<'\n at [row,col {unknown-source}]: [1,1]", "code":400}}
When I try using the XML format I get an error that the
updateLog
needs to be configured for that coreMaybe I can just remove the authority UUID from the records, run the indexing again so it creates a new one for each name variant, then match them correctly?
dspace=# update metadatavalue set authority=null, confidence=-1 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%'; UPDATE 561
Then I’ll reindex discovery and authority and see how the authority Solr core looks
After this, now there are authorities for some of the “Grace, D.” and “Grace, Delia” text_values in the database (the first version is actually the same authority that already exists in the core, so it was just added back to some text_values, but the second one is new):
$ curl 'localhost:8081/solr/authority/select?q=id%3A18ea1525-2513-430a-8817-a834cd733fbc&wt=json&indent=true' { "responseHeader":{ "status":0, "QTime":0, "params":{ "q":"id:18ea1525-2513-430a-8817-a834cd733fbc", "indent":"true", "wt":"json"}}, "response":{"numFound":1,"start":0,"docs":[ { "id":"18ea1525-2513-430a-8817-a834cd733fbc", "field":"dc_contributor_author", "value":"Grace, Delia", "deleted":false, "creation_date":"2016-12-07T10:54:34.356Z", "last_modified_date":"2016-12-07T10:54:34.356Z", "authority_type":"person", "first_name":"Delia", "last_name":"Grace"}] }}
So now I could set them all to this ID and the name would be ok, but there has to be a better way!
In this case it seems that since there were also two different IDs in the original database, I just picked the wrong one!
Better to use:
dspace#= update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
This proves that unifying author name varieties in authorities is easy, but fixing the name in the authority is tricky!
Perhaps another way is to just add our own UUID to the authority field for the text_value we like, then re-index authority so they get synced from PostgreSQL to Solr, then set the other text_values to use that authority ID
Deploy MQM WARN fix on CGSpace (#289)
Deploy “take task” hack/fix on CGSpace (#290)
I ran the following author corrections and then reindexed discovery:
update metadatavalue set authority='b041f2f4-19e7-4113-b774-0439baabd197', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Mora Benard%'; update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-6fd5-4b43-9363-58d18e7952c9', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Hoek, R%'; update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-6fd5-4b43-9363-58d18e7952c9', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like '%an der Hoek%' and text_value !~ '^.*W\.?$'; update metadatavalue set authority='18349f29-61b1-44d7-ac60-89e55546e812', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thorne, P%'; update metadatavalue set authority='0d8369bb-57f7-4b2f-92aa-af820b183aca', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thornton, P%'; update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
2016-12-08
Something weird happened and Peter Thorne’s names all ended up as “Thorne”, I guess because the original authority had that as its name value:
dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thorne%'; text_value | authority | confidence ------------------+--------------------------------------+------------ Thorne, P.J. | 18349f29-61b1-44d7-ac60-89e55546e812 | 600 Thorne | 18349f29-61b1-44d7-ac60-89e55546e812 | 600 Thorne-Lyman, A. | 0781e13a-1dc8-4e3f-82e8-5c422b44a344 | -1 Thorne, M. D. | 54c52649-cefd-438d-893f-3bcef3702f07 | -1 Thorne, P.J | 18349f29-61b1-44d7-ac60-89e55546e812 | 600 Thorne, P. | 18349f29-61b1-44d7-ac60-89e55546e812 | 600 (6 rows)
I generated a new UUID using
uuidgen | tr [A-Z] [a-z]
and set it along with correct name variation for all records:dspace=# update metadatavalue set authority='b2f7603d-2fb5-4018-923a-c4ec8d85b3bb', text_value='Thorne, P.J.' where resource_type_id=2 and metadata_field_id=3 and authority='18349f29-61b1-44d7-ac60-89e55546e812'; UPDATE 43
Apparently we also need to normalize Phil Thornton’s names to
Thornton, Philip K.
:dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^Thornton[,\.]? P.*'; text_value | authority | confidence ---------------------+--------------------------------------+------------ Thornton, P | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600 Thornton, P K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600 Thornton, P K | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600 Thornton. P.K. | 3e1e6639-d4fb-449e-9fce-ce06b5b0f702 | -1 Thornton, P K . | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600 Thornton, P.K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600 Thornton, P.K | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600 Thornton, Philip K | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600 Thornton, Philip K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600 Thornton, P. K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600 (10 rows)
Seems his original authorities are using an incorrect version of the name so I need to generate another UUID and tie it to the correct name, then reindex:
dspace=# update metadatavalue set authority='2df8136e-d8f4-4142-b58c-562337cab764', text_value='Thornton, Philip K.', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^Thornton[,\.]? P.*'; UPDATE 362
It seems that, when you are messing with authority and author text values in the database, it is better to run authority reindex first (postgres→solr authority core) and then Discovery reindex (postgres→solr Discovery core)
Everything looks ok after authority and discovery reindex
In other news, I think we should really be using more RAM for PostgreSQL’s
shared_buffers
The PostgreSQL documentation recommends using 25% of the system’s RAM on dedicated systems, but we should use a bit less since we also have a massive JVM heap and also benefit from some RAM being used by the OS cache
2016-12-09
- More work on finishing rough draft of KM4Dev article
- Set PostgreSQL’s
shared_buffers
on CGSpace to 10% of system RAM (1200MB) Run the following author corrections on CGSpace:
dspace=# update metadatavalue set authority='34df639a-42d8-4867-a3f2-1892075fcb3f', text_value='Thorne, P.J.' where resource_type_id=2 and metadata_field_id=3 and authority='18349f29-61b1-44d7-ac60-89e55546e812' or authority='021cd183-946b-42bb-964e-522ebff02993'; dspace=# update metadatavalue set authority='2df8136e-d8f4-4142-b58c-562337cab764', text_value='Thornton, Philip K.', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^Thornton[,\.]? P.*';
The authority IDs were different now than when I was looking a few days ago so I had to adjust them here
2016-12-11
- After enabling a sizable
shared_buffers
for CGSpace’s PostgreSQL configuration the number of connections to the database dropped significantly
Looking at CIAT records from last week again, they have a lot of double authors like:
International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::600 International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::500 International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::0
Some in the same
dc.contributor.author
field, and some in others likedc.contributor.author[en_US]
etcRemoving the duplicates in OpenRefine and uploading a CSV to DSpace says “no changes detected”
Seems like the only way to sortof clean these up would be to start in SQL:
dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like 'International Center for Tropical Agriculture'; text_value | authority | confidence -----------------------------------------------+--------------------------------------+------------ International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | -1 International Center for Tropical Agriculture | | 600 International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 500 International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | 600 International Center for Tropical Agriculture | | -1 International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | 500 International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 600 International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | -1 International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 0 dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value = 'International Center for Tropical Agriculture'; UPDATE 1693 dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', text_value='International Center for Tropical Agriculture', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like '%CIAT%'; UPDATE 35
Work on article for KM4Dev journal
2016-12-13
- Checking in on CGSpace postgres stats again, looks like the
shared_buffers
change from a few days ago really made a big impact:
- Looking at logs, it seems we need to evaluate which logs we keep and for how long
- Basically the only ones we need are
dspace.log
because those are used for legacy statistics (need to keep for 1 month) - Other logs will be an issue because they don’t have date stamps
- I will add date stamps to the logs we’re storing from the tomcat7 user’s cron jobs at least, using:
$(date --iso-8601)
- Would probably be better to make custom logrotate files for them in the future
- Clean up some unneeded log files from 2014 (they weren’t large, just don’t need them)
- So basically, new cron jobs for logs should look something like this:
Find any file named
*.log*
that isn’tdspace.log*
, isn’t already zipped, and is older than one day, and zip it:# find /home/dspacetest.cgiar.org/log -regextype posix-extended -iregex ".*\.log.*" ! -iregex ".*dspace\.log.*" ! -iregex ".*\.(gz|lrz|lzo|xz)" ! -newermt "Yesterday" -exec schedtool -B -e ionice -c2 -n7 xz {} \;
Since there is
xzgrep
andxzless
we can actually just zip them after one day, why not?!We can keep the zipped ones for two weeks just in case we need to look for errors, etc, and delete them after that
I use
schedtool -B
andionice -c2 -n7
to set the CPU scheduling toSCHED_BATCH
and the IO to best effort which should, in theory, impact important system processes like Tomcat and PostgreSQL lessWhen the tasks are running you can see that the policies do apply:
$ schedtool $(ps aux | grep "xz /home" | grep -v grep | awk '{print $2}') && ionice -p $(ps aux | grep "xz /home" | grep -v grep | awk '{print $2}') PID 17049: PRIO 0, POLICY B: SCHED_BATCH , NICE 0, AFFINITY 0xf best-effort: prio 7
All in all this should free up a few gigs (we were at 9.3GB free when I started)
Next thing to look at is whether we need Tomcat’s access logs
I just looked and it seems that we saved 10GB by zipping these logs
Some users pointed out issues with the “most popular” stats on a community or collection
This error appears in the logs when you try to view them:
2016-12-13 21:17:37,486 ERROR org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter @ Serious Error Occurred Processing Request! org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceObjectDatasetGenerator.toDatasetQuery(Lorg/dspace/core/Context;)Lcom/atmire/statistics/content/DatasetQuery; at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:972) at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852) at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882) at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:789) at javax.servlet.http.HttpServlet.service(HttpServlet.java:650) at javax.servlet.http.HttpServlet.service(HttpServlet.java:731) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.dspace.app.xmlui.cocoon.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:111) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter.doFilter(DSpaceCocoonServletFilter.java:274) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.dspace.app.xmlui.cocoon.servlet.multipart.DSpaceMultipartFilter.doFilter(DSpaceMultipartFilter.java:119) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.dspace.utils.servlet.DSpaceWebappServletFilter.doFilter(DSpaceWebappServletFilter.java:78) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:221) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:505) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:169) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) at org.apache.catalina.valves.CrawlerSessionManagerValve.invoke(CrawlerSessionManagerValve.java:180) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:956) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:436) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1078) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:625) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:316) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceObjectDatasetGenerator.toDatasetQuery(Lorg/dspace/core/Context;)Lcom/atmire/statistics/content/DatasetQuery; at com.atmire.statistics.generator.TopNDSODatasetGenerator.toDatasetQuery(SourceFile:39) at com.atmire.statistics.display.StatisticsDataVisitsMultidata.createDataset(SourceFile:108) at org.dspace.statistics.content.StatisticsDisplay.createDataset(SourceFile:384) at org.dspace.statistics.content.StatisticsDisplay.getDataset(SourceFile:404) at com.atmire.statistics.mostpopular.JSONStatsMostPopularGenerator.generateJsonData(SourceFile:170) at com.atmire.statistics.mostpopular.JSONStatsMostPopularGenerator.generate(SourceFile:246) at com.atmire.app.xmlui.aspect.statistics.JSONStatsMostPopular.generate(JSONStatsMostPopular.java:145) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
It happens on development and production, so I will have to ask Atmire
Most likely an issue with installation/configuration
2016-12-14
- Atmire sent a quick fix for the
last-update.txt
file not found error - After applying pull request #291 on DSpace Test I no longer see the error in the logs after the
UpdateSolrStorageReports
task runs - Also, I’m toying with the idea of moving the
tomcat7
user’s cron jobs to/etc/cron.d
so we can manage them in Ansible - Made a pull request with a template for the cron jobs (#75)
- Testing SMTP from the new CGSpace server and it’s not working, I’ll have to tell James
2016-12-15
- Start planning for server migration this weekend, letting users know
- I am trying to figure out what the process is to update the server’s IP in the Handle system, and emailing the hdladmin account bounces(!)
- I will contact the Jane Euler directly as I know I’ve corresponded with her in the past
- She said that I should indeed just re-run the
[dspace]/bin/dspace make-handle-config
command and submit the newsitebndl.zip
file to the CNRI website - Also I was troubleshooting some workflow issues from Bizuwork
- I re-created the same scenario by adding a non-admin account and submitting an item, but I was able to successfully approve and commit it
- So it turns out it’s not a bug, it’s just that Peter was added as a reviewer/admin AFTER the items were submitted
- This is how DSpace works, and I need to ask if there is a way to override someone’s submission, as the other reviewer seems to not be paying attention, or has perhaps taken the item from the task pool?
- Run a batch edit to add “RANGELANDS” ILRI subject to all items containing the word “RANGELANDS” in their metadata for Peter Ballantyne
2016-12-18
- Add four new CRP subjects for 2017 and sort the input forms alphabetically (#294)
- Test the SMTP on the new server and it’s working
- Last week, when we asked CGNET to update the DNS records this weekend, they misunderstood and did it immediately
- We quickly told them to undo it, but I just realized they didn’t undo the IPv6 AAAA record!
- None of our users in African institutes will have IPv6, but some Europeans might, so I need to check if any submissions have been added since then
Update some names and authorities in the database:
dspace=# update metadatavalue set authority='5ff35043-942e-4d0a-b377-4daed6e3c1a3', confidence=600, text_value='Duncan, Alan' where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^.*Duncan,? A.*'; UPDATE 204 dspace=# update metadatavalue set authority='46804b53-ea30-4a85-9ccf-b79a35816fa9', confidence=600, text_value='Mekonnen, Kindu' where resource_type_id=2 and metadata_field_id=3 and text_value like '%Mekonnen, K%'; UPDATE 89 dspace=# update metadatavalue set authority='f840da02-26e7-4a74-b7ba-3e2b723f3684', confidence=600, text_value='Lukuyu, Ben A.' where resource_type_id=2 and metadata_field_id=3 and text_value like '%Lukuyu, B%'; UPDATE 140
Generated a new UUID for Ben using
uuidgen | tr [A-Z] [a-z]
as the one in Solr had his ORCID but the name format was incorrectIn theory DSpace should be able to check names from ORCID and update the records in the database, but I find that this doesn’t work (see Jira bug DS-3302)
I need to run these updates along with the other one for CIAT that I found last week
Enable OCSP stapling for hosts >= Ubuntu 16.04 in our Ansible playbooks (#76)
Working for DSpace Test on the second response:
$ openssl s_client -connect dspacetest.cgiar.org:443 -servername dspacetest.cgiar.org -tls1_2 -tlsextdebug -status ... OCSP response: no response sent $ openssl s_client -connect dspacetest.cgiar.org:443 -servername dspacetest.cgiar.org -tls1_2 -tlsextdebug -status ... OCSP Response Data: ... Cert Status: good
Migrate CGSpace to new server, roughly following these steps:
On old server:
# service tomcat7 stop # /home/backup/scripts/postgres_backup.sh
On new server:
# systemctl stop tomcat7 # rsync -4 -av --delete 178.79.187.182:/home/cgspace.cgiar.org/assetstore/ /home/cgspace.cgiar.org/assetstore/ # rsync -4 -av --delete 178.79.187.182:/home/backup/ /home/backup/ # rsync -4 -av --delete 178.79.187.182:/home/cgspace.cgiar.org/solr/ /home/cgspace.cgiar.org/solr # su - postgres $ dropdb cgspace $ createdb -O cgspace --encoding=UNICODE cgspace $ psql cgspace -c 'alter user cgspace createuser;' $ pg_restore -O -U cgspace -d cgspace -W -h localhost /home/backup/postgres/cgspace_2016-12-18.backup $ psql cgspace -c 'alter user cgspace nocreateuser;' $ psql -U cgspace -f ~tomcat7/src/git/DSpace/dspace/etc/postgres/update-sequences.sql cgspace -h localhost $ vacuumdb cgspace $ psql cgspace postgres=# \i /tmp/author-authority-updates-2016-12-11.sql postgres=# \q $ exit # chown -R tomcat7:tomcat7 /home/cgspace.cgiar.org # rsync -4 -av 178.79.187.182:/home/cgspace.cgiar.org/log/*.dat /home/cgspace.cgiar.org/log/ # rsync -4 -av 178.79.187.182:/home/cgspace.cgiar.org/log/dspace.log.2016-1[12]* /home/cgspace.cgiar.org/log/ # su - tomcat7 $ cd src/git/DSpace/dspace/target/dspace-installer $ ant update clean_backups $ exit # systemctl start tomcat7
It took about twenty minutes and afterwards I had to check a few things, like:
- check and enable systemd timer for let’s encrypt
- enable root cron jobs
- disable root cron jobs on old server after!
- enable tomcat7 cron jobs
- disable tomcat7 cron jobs on old server after!
- regenerate
sitebndl.zip
with new IP for handle server and submit it to Handle.net
2016-12-22
- Abenet wanted a CSV of the IITA community, but the web export doesn’t include the
dc.date.accessioned
field I had to export it from the command line using the
-a
flag:$ [dspace]/bin/dspace metadata-export -a -f /tmp/iita.csv -i 10568/68616
2016-12-28
- We’ve been getting two alerts per day about CPU usage on the new server from Linode
- These are caused by the batch jobs for Solr etc that run in the early morning hours
- The Linode default is to alert at 90% CPU usage for two hours, but I see the old server was at 150%, so maybe we just need to adjust it
- Speaking of the old server (linode01), I think we can decommission it now
- I checked the S3 logs on the new server (linode18) to make sure the backups have been running and everything looks good
- In other news, I was looking at the Munin graphs for PostgreSQL on the new server and it looks slightly worrying:
- I will have to check later why the size keeps increasing