cgspace-notes/content/post/2016-12.md

46 KiB

+++ date = "2016-12-02T10:43:00+03:00" author = "Alan Orth" title = "December, 2016" tags = ["Notes"]

+++

2016-12-02

  • CGSpace was down for five hours in the morning while I was sleeping
  • While looking in the logs for errors, I see tons of warnings about Atmire MQM:
2016-12-02 03:00:32,352 WARN  com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN  com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail="dc.title", transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN  com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail="THUMBNAIL", transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN  com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail="-1", transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN  com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
  • I see thousands of them in the logs for the last few months, so it's not related to the DSpace 5.5 upgrade
  • I've raised a ticket with Atmire to ask
  • Another worrying error from dspace.log is:
org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceObjectDatasetGenerator.toDatasetQuery(Lorg/dspace/core/Context;)Lcom/atmire/statistics/content/DatasetQuery;
        at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:972)
        at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
        at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
        at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:789)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:646)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
        at org.dspace.app.xmlui.cocoon.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:111)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
        at org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter.doFilter(DSpaceCocoonServletFilter.java:274)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
        at org.dspace.app.xmlui.cocoon.servlet.multipart.DSpaceMultipartFilter.doFilter(DSpaceMultipartFilter.java:119)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
        at org.dspace.utils.servlet.DSpaceWebappServletFilter.doFilter(DSpaceWebappServletFilter.java:78)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
        at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:501)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
        at com.googlecode.psiprobe.Tomcat70AgentValve.invoke(Tomcat70AgentValve.java:44)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
        at org.apache.catalina.valves.CrawlerSessionManagerValve.invoke(CrawlerSessionManagerValve.java:180)
        at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
        at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1041)
        at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
        at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:313)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceObjectDatasetGenerator.toDatasetQuery(Lorg/dspace/core/Context;)Lcom/atmire/statistics/content/DatasetQuery;
        at com.atmire.statistics.generator.TopNDSODatasetGenerator.toDatasetQuery(SourceFile:39)
        at com.atmire.statistics.display.StatisticsDataVisitsMultidata.createDataset(SourceFile:108)
        at org.dspace.statistics.content.StatisticsDisplay.createDataset(SourceFile:384)
        at org.dspace.statistics.content.StatisticsDisplay.getDataset(SourceFile:404)
        at com.atmire.statistics.mostpopular.JSONStatsMostPopularGenerator.generateJsonData(SourceFile:170)
        at com.atmire.statistics.mostpopular.JSONStatsMostPopularGenerator.generate(SourceFile:246)
        at com.atmire.app.xmlui.aspect.statistics.JSONStatsMostPopular.generate(JSONStatsMostPopular.java:145)
        at sun.reflect.GeneratedMethodAccessor296.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
        at com.sun.proxy.$Proxy96.process(Unknown Source)
        at org.apache.cocoon.components.treeprocessor.sitemap.ReadNode.invoke(ReadNode.java:94)
        at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:55)
        at org.apache.cocoon.components.treeprocessor.sitemap.MatchNode.invoke(MatchNode.java:87)
        at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:55)
        at org.apache.cocoon.components.treeprocessor.sitemap.MatchNode.invoke(MatchNode.java:87)
        at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
        at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(PipelineNode.java:143)
        at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
        at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(PipelinesNode.java:81)
        at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:239)
        at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:171)
        at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcessor.java:247)
        at org.apache.cocoon.components.treeprocessor.sitemap.MountNode.invoke(MountNode.java:117)
        at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:55)
        at org.apache.cocoon.components.treeprocessor.sitemap.MatchNode.invoke(MatchNode.java:87)
        at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
        at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(PipelineNode.java:143)
        at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
        at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(PipelinesNode.java:81)
        at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:239)
        at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:171)
        at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcessor.java:247)
        at org.apache.cocoon.servlet.RequestProcessor.process(RequestProcessor.java:351)
        at org.apache.cocoon.servlet.RequestProcessor.service(RequestProcessor.java:169)
        at org.apache.cocoon.sitemap.SitemapServlet.service(SitemapServlet.java:84)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
        at org.apache.cocoon.servletservice.ServletServiceContext$PathDispatcher.forward(ServletServiceContext.java:468)
        at org.apache.cocoon.servletservice.ServletServiceContext$PathDispatcher.forward(ServletServiceContext.java:443)
        at org.apache.cocoon.servletservice.spring.ServletFactoryBean$ServiceInterceptor.invoke(ServletFactoryBean.java:264)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
        at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
        at com.sun.proxy.$Proxy89.service(Unknown Source)
        at org.dspace.springmvc.CocoonView.render(CocoonView.java:113)
        at org.springframework.web.servlet.DispatcherServlet.render(DispatcherServlet.java:1180)
        at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:950)
        ... 35 more
  • The first error I see in dspace.log this morning is:
2016-12-02 03:00:46,656 ERROR org.dspace.authority.AuthorityValueFinder @ anonymous::Error while retrieving AuthorityValue from solr:query\colon; id\colon;"b0b541c1-ec15-48bf-9209-6dbe8e338cdc"
org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://localhost:8081/solr/authority
  • Looking through DSpace's solr log I see that about 20 seconds before this, there were a few 30+ KiB solr queries
  • The last logs here right before Solr became unresponsive (and right after I restarted it five hours later) were:
2016-12-02 03:00:42,606 INFO  org.apache.solr.core.SolrCore @ [statistics] webapp=/solr path=/select params={q=containerItem:72828+AND+type:0&shards=localhost:8081/solr/statistics-2010,localhost:8081/solr/statistics&fq=-isInternal:true&fq=-(author_mtdt:"CGIAR\+Institutional\+Learning\+and\+Change\+Initiative"++AND+subject_mtdt:"PARTNERSHIPS"+AND+subject_mtdt:"RESEARCH"+AND+subject_mtdt:"AGRICULTURE"+AND+subject_mtdt:"DEVELOPMENT"++AND+iso_mtdt:"en"+)&rows=0&wt=javabin&version=2} hits=0 status=0 QTime=19
2016-12-02 08:28:23,908 INFO  org.apache.solr.servlet.SolrDispatchFilter @ SolrDispatchFilter.init()
  • DSpace's own Solr logs don't give IP addresses, so I will have to enable Nginx's logging of /solr so I can see where this request came from
  • I enabled logging of /rest/ and I think I'll leave it on for good
  • Also, the disk is nearly full because of log file issues, so I'm running some compression on DSpace logs
  • Normally these stay uncompressed for a month just in case we need to look at them, so now I've just compressed anything older than 2 weeks so we can get some disk space back

2016-12-04

  • I got a weird report from the CGSpace checksum checker this morning
  • It says 732 bitstreams have potential issues, for example:
------------------------------------------------ 
Bitstream Id = 6
Process Start Date = Dec 4, 2016
Process End Date = Dec 4, 2016
Checksum Expected = a1d9eef5e2d85f50f67ce04d0329e96a
Checksum Calculated = a1d9eef5e2d85f50f67ce04d0329e96a
Result = Bitstream marked deleted in bitstream table
----------------------------------------------- 
...
------------------------------------------------ 
Bitstream Id = 77581
Process Start Date = Dec 4, 2016
Process End Date = Dec 4, 2016
Checksum Expected = 9959301aa4ca808d00957dff88214e38
Checksum Calculated = 
Result = The bitstream could not be found
----------------------------------------------- 
  • The first one seems ok, but I don't know what to make of the second one...
  • I had a look and there is indeed no file with the second checksum in the assetstore (ie, looking in [dspace-dir]/assetstore/99/59/30/...)
  • For what it's worth, there is no item on DSpace Test or S3 backups with that checksum either...
  • In other news, I'm looking at JVM settings from the Solr 4.10.2 release, from bin/solr.in.sh:
# These GC settings have shown to work well for a number of common Solr workloads
GC_TUNE="-XX:-UseSuperWord \
-XX:NewRatio=3 \
-XX:SurvivorRatio=4 \
-XX:TargetSurvivorRatio=90 \
-XX:MaxTenuringThreshold=8 \
-XX:+UseConcMarkSweepGC \
-XX:+UseParNewGC \
-XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 \
-XX:+CMSScavengeBeforeRemark \
-XX:PretenureSizeThreshold=64m \
-XX:CMSFullGCsBeforeCompaction=1 \
-XX:+UseCMSInitiatingOccupancyOnly \
-XX:CMSInitiatingOccupancyFraction=50 \
-XX:CMSTriggerPermRatio=80 \
-XX:CMSMaxAbortablePrecleanTime=6000 \
-XX:+CMSParallelRemarkEnabled \
-XX:+ParallelRefProcEnabled \
-XX:+AggressiveOpts"

2016-12-05

  • I did some basic benchmarking on a local DSpace before and after the JVM settings above, but there wasn't anything amazingly obvious
  • I want to make the changes on DSpace Test and monitor the JVM heap graphs for a few days to see if they change the JVM GC patterns or anything (munin graphs)
  • Spin up new CGSpace server on Linode
  • I did a few traceroutes from Jordan and Kenya and it seems that Linode's Frankfurt datacenter is a few less hops and perhaps less packet loss than the London one, so I put the new server in Frankfurt
  • Do initial provisioning
  • Atmire responded about the MQM warnings in the DSpace logs
  • Apparently we need to change the batch edit consumers in dspace/config/dspace.cfg:
event.consumer.batchedit.filters = Community|Collection+Create
  • I haven't tested it yet, but I created a pull request: #289

2016-12-06

  • Some author authority corrections and name standardizations for Peter:
dspace=# update metadatavalue set authority='b041f2f4-19e7-4113-b774-0439baabd197', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Mora Benard%';
UPDATE 11
dspace=# update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-6fd5-4b43-9363-58d18e7952c9', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Hoek, R%';
UPDATE 36
dspace=# update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-6fd5-4b43-9363-58d18e7952c9', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like '%an der Hoek%' and text_value !~ '^.*W\.?$';
UPDATE 14
dspace=# update metadatavalue set authority='18349f29-61b1-44d7-ac60-89e55546e812', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thorne, P%';
UPDATE 42
dspace=# update metadatavalue set authority='0d8369bb-57f7-4b2f-92aa-af820b183aca', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thornton, P%';
UPDATE 360
dspace=# update metadatavalue set text_value='Grace, Delia', authority='0b4fcbc1-d930-4319-9b4d-ea1553cca70b', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
UPDATE 561
  • Pay attention to the regex to prevent false positives in tricky cases with Dutch names!
  • I will run these updates on DSpace Test and then force a Discovery reindex, and then run them on CGSpace next week
  • More work on the KM4Dev Journal article
  • In other news, it seems the batch edit patch is working, there are no more WARN errors in the logs and the batch edit seems to work
  • I need to check the CGSpace logs to see if there are still errors there, and then deploy/monitor it there
  • Paola from CCAFS mentioned she also has the "take task" bug on CGSpace
  • Reading about shared_buffers in PostgreSQL configuration (default is 128MB)
  • Looks like we have ~5GB of memory used by caches on the test server (after OS and JVM heap!), so we might as well bump up the buffers for Postgres
  • The docs say a good starting point for a dedicated server is 25% of the system RAM, and our server isn't dedicated (also runs Solr, which can benefit from OS cache) so let's try 1024MB
  • In other news, the authority reindexing keeps crashing (I was manually running it after the author updates above):
$ time JAVA_OPTS="-Xms768m -Xmx768m -Dfile.encoding=UTF-8" /home/dspacetest.cgiar.org/bin/dspace index-authority
Retrieving all data
Initialize org.dspace.authority.indexer.DSpaceAuthorityIndexer
Exception: null
java.lang.NullPointerException
        at org.dspace.authority.AuthorityValueGenerator.generateRaw(AuthorityValueGenerator.java:82)
        at org.dspace.authority.AuthorityValueGenerator.generate(AuthorityValueGenerator.java:39)
        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.prepareNextValue(DSpaceAuthorityIndexer.java:201)
        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:132)
        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:159)
        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
        at org.dspace.authority.indexer.AuthorityIndexClient.main(AuthorityIndexClient.java:61)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
        at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)

real    8m39.913s
user    1m54.190s
sys     0m22.647s

2016-12-07

  • For what it's worth, after running the same SQL updates on my local test server, index-authority runs and completes just fine
  • I will have to test more
  • Anyways, I noticed that some of the authority values I set actually have versions of author names we don't want, ie "Grace, D."
  • For example, do a Solr query for "first_name:Grace" and look at the results
  • Querying that ID shows the fields that need to be changed:
{
  "responseHeader": {
    "status": 0,
    "QTime": 1,
    "params": {
      "q": "id:0b4fcbc1-d930-4319-9b4d-ea1553cca70b",
      "indent": "true",
      "wt": "json",
      "_": "1481102189244"
    }
  },
  "response": {
    "numFound": 1,
    "start": 0,
    "docs": [
      {
        "id": "0b4fcbc1-d930-4319-9b4d-ea1553cca70b",
        "field": "dc_contributor_author",
        "value": "Grace, D.",
        "deleted": false,
        "creation_date": "2016-11-10T15:13:40.318Z",
        "last_modified_date": "2016-11-10T15:13:40.318Z",
        "authority_type": "person",
        "first_name": "D.",
        "last_name": "Grace"
      }
    ]
  }
}
  • I think I can just update the value, first_name, and last_name fields...
  • The update syntax should be something like this, but I'm getting errors from Solr:
$ curl 'localhost:8081/solr/authority/update?commit=true&wt=json&indent=true' -H 'Content-type:application/json' -d '[{"id":"1","price":{"set":100}}]'
{
  "responseHeader":{
    "status":400,
    "QTime":0},
  "error":{
    "msg":"Unexpected character '[' (code 91) in prolog; expected '<'\n at [row,col {unknown-source}]: [1,1]",
    "code":400}}
  • When I try using the XML format I get an error that the updateLog needs to be configured for that core
  • Maybe I can just remove the authority UUID from the records, run the indexing again so it creates a new one for each name variant, then match them correctly?
dspace=# update metadatavalue set authority=null, confidence=-1 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
UPDATE 561
  • Then I'll reindex discovery and authority and see how the authority Solr core looks
  • After this, now there are authorities for some of the "Grace, D." and "Grace, Delia" text_values in the database (the first version is actually the same authority that already exists in the core, so it was just added back to some text_values, but the second one is new):
$ curl 'localhost:8081/solr/authority/select?q=id%3A18ea1525-2513-430a-8817-a834cd733fbc&wt=json&indent=true'
{
  "responseHeader":{
    "status":0,
    "QTime":0,
    "params":{
      "q":"id:18ea1525-2513-430a-8817-a834cd733fbc",
      "indent":"true",
      "wt":"json"}},
  "response":{"numFound":1,"start":0,"docs":[
      {
        "id":"18ea1525-2513-430a-8817-a834cd733fbc",
        "field":"dc_contributor_author",
        "value":"Grace, Delia",
        "deleted":false,
        "creation_date":"2016-12-07T10:54:34.356Z",
        "last_modified_date":"2016-12-07T10:54:34.356Z",
        "authority_type":"person",
        "first_name":"Delia",
        "last_name":"Grace"}]
  }}
  • So now I could set them all to this ID and the name would be ok, but there has to be a better way!
  • In this case it seems that since there were also two different IDs in the original database, I just picked the wrong one!
  • Better to use:
dspace#= update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
  • This proves that unifying author name varieties in authorities is easy, but fixing the name in the authority is tricky!
  • Perhaps another way is to just add our own UUID to the authority field for the text_value we like, then re-index authority so they get synced from PostgreSQL to Solr, then set the other text_values to use that authority ID
  • Deploy MQM WARN fix on CGSpace (#289)
  • Deploy "take task" hack/fix on CGSpace (#290)
  • I ran the following author corrections and then reindexed discovery:
update metadatavalue set authority='b041f2f4-19e7-4113-b774-0439baabd197', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Mora Benard%';
update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-6fd5-4b43-9363-58d18e7952c9', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Hoek, R%';
update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-6fd5-4b43-9363-58d18e7952c9', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like '%an der Hoek%' and text_value !~ '^.*W\.?$';
update metadatavalue set authority='18349f29-61b1-44d7-ac60-89e55546e812', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thorne, P%';
update metadatavalue set authority='0d8369bb-57f7-4b2f-92aa-af820b183aca', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thornton, P%';
update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';

2016-12-08

  • Something weird happened and Peter Thorne's names all ended up as "Thorne", I guess because the original authority had that as its name value:
dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thorne%';
    text_value    |              authority               | confidence
------------------+--------------------------------------+------------
 Thorne, P.J.     | 18349f29-61b1-44d7-ac60-89e55546e812 |        600
 Thorne           | 18349f29-61b1-44d7-ac60-89e55546e812 |        600
 Thorne-Lyman, A. | 0781e13a-1dc8-4e3f-82e8-5c422b44a344 |         -1
 Thorne, M. D.    | 54c52649-cefd-438d-893f-3bcef3702f07 |         -1
 Thorne, P.J      | 18349f29-61b1-44d7-ac60-89e55546e812 |        600
 Thorne, P.       | 18349f29-61b1-44d7-ac60-89e55546e812 |        600
(6 rows)
  • I generated a new UUID using uuidgen | tr [A-Z] [a-z] and set it along with correct name variation for all records:
dspace=# update metadatavalue set authority='b2f7603d-2fb5-4018-923a-c4ec8d85b3bb', text_value='Thorne, P.J.' where resource_type_id=2 and metadata_field_id=3 and authority='18349f29-61b1-44d7-ac60-89e55546e812';
UPDATE 43
  • Apparently we also need to normalize Phil Thornton's names to Thornton, Philip K.:
dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^Thornton[,\.]? P.*';
     text_value      |              authority               | confidence
---------------------+--------------------------------------+------------
 Thornton, P         | 0d8369bb-57f7-4b2f-92aa-af820b183aca |        600
 Thornton, P K.      | 0d8369bb-57f7-4b2f-92aa-af820b183aca |        600
 Thornton, P K       | 0d8369bb-57f7-4b2f-92aa-af820b183aca |        600
 Thornton. P.K.      | 3e1e6639-d4fb-449e-9fce-ce06b5b0f702 |         -1
 Thornton, P K .     | 0d8369bb-57f7-4b2f-92aa-af820b183aca |        600
 Thornton, P.K.      | 0d8369bb-57f7-4b2f-92aa-af820b183aca |        600
 Thornton, P.K       | 0d8369bb-57f7-4b2f-92aa-af820b183aca |        600
 Thornton, Philip K  | 0d8369bb-57f7-4b2f-92aa-af820b183aca |        600
 Thornton, Philip K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca |        600
 Thornton, P. K.     | 0d8369bb-57f7-4b2f-92aa-af820b183aca |        600
(10 rows)
  • Seems his original authorities are using an incorrect version of the name so I need to generate another UUID and tie it to the correct name, then reindex:
dspace=# update metadatavalue set authority='2df8136e-d8f4-4142-b58c-562337cab764', text_value='Thornton, Philip K.', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^Thornton[,\.]? P.*';
UPDATE 362
  • It seems that, when you are messing with authority and author text values in the database, it is better to run authority reindex first (postgres→solr authority core) and then Discovery reindex (postgres→solr Discovery core)
  • Everything looks ok after authority and discovery reindex
  • In other news, I think we should really be using more RAM for PostgreSQL's shared_buffers
  • The PostgreSQL documentation recommends using 25% of the system's RAM on dedicated systems, but we should use a bit less since we also have a massive JVM heap and also benefit from some RAM being used by the OS cache

2016-12-09

  • More work on finishing rough draft of KM4Dev article
  • Set PostgreSQL's shared_buffers on CGSpace to 10% of system RAM (1200MB)
  • Run the following author corrections on CGSpace:
dspace=# update metadatavalue set authority='34df639a-42d8-4867-a3f2-1892075fcb3f', text_value='Thorne, P.J.' where resource_type_id=2 and metadata_field_id=3 and authority='18349f29-61b1-44d7-ac60-89e55546e812' or authority='021cd183-946b-42bb-964e-522ebff02993';
dspace=# update metadatavalue set authority='2df8136e-d8f4-4142-b58c-562337cab764', text_value='Thornton, Philip K.', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^Thornton[,\.]? P.*';
  • The authority IDs were different now than when I was looking a few days ago so I had to adjust them here

2016-12-11

  • After enabling a sizable shared_buffers for CGSpace's PostgreSQL configuration the number of connections to the database dropped significantly

postgres_bgwriter-week postgres_connections_ALL-week

  • Looking at CIAT records from last week again, they have a lot of double authors like:
International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::600
International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::500
International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::0
  • Some in the same dc.contributor.author field, and some in others like dc.contributor.author[en_US] etc
  • Removing the duplicates in OpenRefine and uploading a CSV to DSpace says "no changes detected"
  • Seems like the only way to sortof clean these up would be to start in SQL:
dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like 'International Center for Tropical Agriculture';
                  text_value                   |              authority               | confidence
-----------------------------------------------+--------------------------------------+------------
 International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 |         -1
 International Center for Tropical Agriculture |                                      |        600
 International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 |        500
 International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 |        600
 International Center for Tropical Agriculture |                                      |         -1
 International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 |        500
 International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 |        600
 International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 |         -1
 International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 |          0
dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value = 'International Center for Tropical Agriculture';
UPDATE 1693
dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', text_value='International Center for Tropical Agriculture', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like '%CIAT%';
UPDATE 35
  • Work on article for KM4Dev journal

2016-12-13

  • Checking in on CGSpace postgres stats again, looks like the shared_buffers change from a few days ago really made a big impact:

postgres_bgwriter-week postgres_connections_ALL-week

  • Looking at logs, it seems we need to evaluate which logs we keep and for how long
  • Basically the only ones we need are dspace.log because those are used for legacy statistics (need to keep for 1 month)
  • Other logs will be an issue because they don't have date stamps
  • I will add date stamps to the logs we're storing from the tomcat7 user's cron jobs at least, using: $(date --iso-8601)
  • Would probably be better to make custom logrotate files for them in the future
  • Clean up some unneeded log files from 2014 (they weren't large, just don't need them)
  • So basically, new cron jobs for logs should look something like this:
  • Find any file named *.log* that isn't dspace.log*, isn't already zipped, and is older than one day, and zip it:
# find /home/dspacetest.cgiar.org/log -regextype posix-extended -iregex ".*\.log.*" ! -iregex ".*dspace\.log.*" ! -iregex ".*\.(gz|lrz|lzo|xz)" ! -newermt "Yesterday" -exec schedtool -B -e ionice -c2 -n7 xz {} \;
  • Since there is xzgrep and xzless we can actually just zip them after one day, why not?!
  • We can keep the zipped ones for two weeks just in case we need to look for errors, etc, and delete them after that
  • I use schedtool -B and ionice -c2 -n7 to set the CPU scheduling to SCHED_BATCH and the IO to best effort which should, in theory, impact important system processes like Tomcat and PostgreSQL less
  • When the tasks are running you can see that the policies do apply:
$ schedtool $(ps aux | grep "xz /home" | grep -v grep | awk '{print $2}') && ionice -p $(ps aux | grep "xz /home" | grep -v grep | awk '{print $2}')
PID 17049: PRIO   0, POLICY B: SCHED_BATCH   , NICE   0, AFFINITY 0xf
best-effort: prio 7
  • All in all this should free up a few gigs (we were at 9.3GB free when I started)
  • Next thing to look at is whether we need Tomcat's access logs
  • I just looked and it seems that we saved 10GB by zipping these logs
  • Some users pointed out issues with the "most popular" stats on a community or collection
  • This error appears in the logs when you try to view them:
2016-12-13 21:17:37,486 ERROR org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter @ Serious Error Occurred Processing Request!
org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceObjectDatasetGenerator.toDatasetQuery(Lorg/dspace/core/Context;)Lcom/atmire/statistics/content/DatasetQuery;
	at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:972)
	at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
	at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
	at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:789)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:650)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:731)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
	at org.dspace.app.xmlui.cocoon.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:111)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
	at org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter.doFilter(DSpaceCocoonServletFilter.java:274)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
	at org.dspace.app.xmlui.cocoon.servlet.multipart.DSpaceMultipartFilter.doFilter(DSpaceMultipartFilter.java:119)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
	at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
	at org.dspace.utils.servlet.DSpaceWebappServletFilter.doFilter(DSpaceWebappServletFilter.java:78)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:221)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
	at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:505)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:169)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
	at org.apache.catalina.valves.CrawlerSessionManagerValve.invoke(CrawlerSessionManagerValve.java:180)
	at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:956)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:436)
	at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1078)
	at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:625)
	at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:316)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceObjectDatasetGenerator.toDatasetQuery(Lorg/dspace/core/Context;)Lcom/atmire/statistics/content/DatasetQuery;
	at com.atmire.statistics.generator.TopNDSODatasetGenerator.toDatasetQuery(SourceFile:39)
	at com.atmire.statistics.display.StatisticsDataVisitsMultidata.createDataset(SourceFile:108)
	at org.dspace.statistics.content.StatisticsDisplay.createDataset(SourceFile:384)
	at org.dspace.statistics.content.StatisticsDisplay.getDataset(SourceFile:404)
	at com.atmire.statistics.mostpopular.JSONStatsMostPopularGenerator.generateJsonData(SourceFile:170)
	at com.atmire.statistics.mostpopular.JSONStatsMostPopularGenerator.generate(SourceFile:246)
	at com.atmire.app.xmlui.aspect.statistics.JSONStatsMostPopular.generate(JSONStatsMostPopular.java:145)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  • It happens on development and production, so I will have to ask Atmire
  • Most likely an issue with installation/configuration

2016-12-14

  • Atmire sent a quick fix for the last-update.txt file not found error
  • After applying pull request #291 on DSpace Test I no longer see the error in the logs after the UpdateSolrStorageReports task runs
  • Also, I'm toying with the idea of moving the tomcat7 user's cron jobs to /etc/cron.d so we can manage them in Ansible
  • Made a pull request with a template for the cron jobs (#75)
  • Testing SMTP from the new CGSpace server and it's not working, I'll have to tell James

2016-12-15

  • Start planning for server migration this weekend, letting users know
  • I am trying to figure out what the process is to update the server's IP in the Handle system, and emailing the hdladmin account bounces(!)
  • I will contact the Jane Euler directly as I know I've corresponded with her in the past
  • She said that I should indeed just re-run the [dspace]/bin/dspace make-handle-server command and submit the new sitebndl.zip file to the CNRI website
  • Also I was troubleshooting some workflow issues from Bizuwork
  • I re-created the same scenario by adding a non-admin account and submitting an item, but I was able to successfully approve and commit it
  • So it turns out it's not a bug, it's just that Peter was added as a reviewer/admin AFTER the items were submitted
  • This is how DSpace works, and I need to ask if there is a way to override someone's submission, as the other reviewer seems to not be paying attention, or has perhaps taken the item from the task pool?
  • Run a batch edit to add "RANGELANDS" ILRI subject to all items containing the word "RANGELANDS" in their metadata for Peter Ballantyne

Select all items with "rangelands" in metadata Add RANGELANDS ILRI subject

2016-12-18

  • Add four new CRP subjects for 2017 and sort the input forms alphabetically (#294)
  • Test the SMTP on the new server and it's working
  • Last week, when we asked CGNET to update the DNS records this weekend, they misunderstood and did it immediately
  • We quickly told them to undo it, but I just realized they didn't undo the IPv6 AAAA record!
  • None of our users in African institutes will have IPv6, but some Europeans might, so I need to check if any submissions have been added since then
  • Update some names and authorities in the database:
dspace=# update metadatavalue set authority='5ff35043-942e-4d0a-b377-4daed6e3c1a3', confidence=600, text_value='Duncan, Alan' where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^.*Duncan,? A.*';
UPDATE 204
dspace=# update metadatavalue set authority='46804b53-ea30-4a85-9ccf-b79a35816fa9', confidence=600, text_value='Mekonnen, Kindu' where resource_type_id=2 and metadata_field_id=3 and text_value like '%Mekonnen, K%';
UPDATE 89
dspace=# update metadatavalue set authority='f840da02-26e7-4a74-b7ba-3e2b723f3684', confidence=600, text_value='Lukuyu, Ben A.' where resource_type_id=2 and metadata_field_id=3 and text_value like '%Lukuyu, B%';
UPDATE 140
  • Generated a new UUID for Ben using uuidgen | tr [A-Z] [a-z] as the one in Solr had his ORCID but the name format was incorrect
  • In theory DSpace should be able to check names from ORCID and update the records in the database, but I find that this doesn't work (see Jira bug DS-3302)
  • I need to run these updates along with the other one for CIAT that I found last week
  • Enable OCSP stapling for hosts >= Ubuntu 16.04 in our Ansible playbooks (#76)
  • Working for DSpace Test on the second response:
$ openssl s_client -connect dspacetest.cgiar.org:443 -servername dspacetest.cgiar.org -tls1_2 -tlsextdebug -status
...
OCSP response: no response sent
$ openssl s_client -connect dspacetest.cgiar.org:443 -servername dspacetest.cgiar.org -tls1_2 -tlsextdebug -status
...
OCSP Response Data:
...
    Cert Status: good
  • Migrate CGSpace to new server, roughly following these steps:
  • On old server:
# service tomcat7 stop
# /home/backup/scripts/postgres_backup.sh
  • On new server:
# systemctl stop tomcat7
# rsync -4 -av --delete 178.79.187.182:/home/cgspace.cgiar.org/assetstore/ /home/cgspace.cgiar.org/assetstore/
# rsync -4 -av --delete 178.79.187.182:/home/backup/ /home/backup/
# rsync -4 -av --delete 178.79.187.182:/home/cgspace.cgiar.org/solr/ /home/cgspace.cgiar.org/solr
# su - postgres
$ dropdb cgspace
$ createdb -O cgspace --encoding=UNICODE cgspace
$ psql cgspace -c 'alter user cgspace createuser;'
$ pg_restore -O -U cgspace -d cgspace -W -h localhost /home/backup/postgres/cgspace_2016-12-18.backup
$ psql cgspace -c 'alter user cgspace nocreateuser;'
$ psql -U cgspace -f ~tomcat7/src/git/DSpace/dspace/etc/postgres/update-sequences.sql cgspace -h localhost
$ vacuumdb cgspace
$ psql cgspace
postgres=# \i /tmp/author-authority-updates-2016-12-11.sql
postgres=# \q
$ exit
# chown -R tomcat7:tomcat7 /home/cgspace.cgiar.org
# rsync -4 -av 178.79.187.182:/home/cgspace.cgiar.org/log/*.dat /home/cgspace.cgiar.org/log/
# rsync -4 -av 178.79.187.182:/home/cgspace.cgiar.org/log/dspace.log.2016-1[12]* /home/cgspace.cgiar.org/log/
# su - tomcat7
$ cd src/git/DSpace/dspace/target/dspace-installer
$ ant update clean_backups
$ exit
# systemctl start tomcat7
  • It took about twenty minutes and afterwards I had to check a few things, like:
    • check and enable systemd timer for let's encrypt
    • enable root cron jobs
    • disable root cron jobs on old server after!
    • enable tomcat7 cron jobs
    • disable tomcat7 cron jobs on old server after!
    • regenerate sitebndl.zip with new IP for handle server and submit it to Handle.net

2016-12-22

  • Abenet wanted a CSV of the IITA community, but the web export doesn't include the dc.date.accessioned field
  • I had to export it from the command line using the -a flag:
$ [dspace]/bin/dspace metadata-export -a -f /tmp/iita.csv -i 10568/68616

2016-12-28

  • We've been getting two alerts per day about CPU usage on the new server from Linode
  • These are caused by the batch jobs for Solr etc that run in the early morning hours
  • The Linode default is to alert at 90% CPU usage for two hours, but I see the old server was at 150%, so maybe we just need to adjust it
  • Speaking of the old server (linode01), I think we can decommission it now
  • I checked the S3 logs on the new server (linode18) to make sure the backups have been running and everything looks good
  • In other news, I was looking at the Munin graphs for PostgreSQL on the new server and it looks slightly worrying:

munin postgres stats

  • I will have to check later why the size keeps increasing