mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-12-20 20:22:19 +01:00
978 lines
59 KiB
HTML
978 lines
59 KiB
HTML
<!DOCTYPE html>
|
|
<html lang="en">
|
|
|
|
<head>
|
|
<meta charset="utf-8">
|
|
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
|
|
|
|
<meta property="og:title" content="December, 2016" />
|
|
<meta property="og:description" content="2016-12-02
|
|
|
|
|
|
CGSpace was down for five hours in the morning while I was sleeping
|
|
While looking in the logs for errors, I see tons of warnings about Atmire MQM:
|
|
|
|
|
|
2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
|
|
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail="dc.title", transactionID="TX157907838689377964651674089851855413607")
|
|
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail="THUMBNAIL", transactionID="TX157907838689377964651674089851855413607")
|
|
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail="-1", transactionID="TX157907838689377964651674089851855413607")
|
|
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
|
|
|
|
|
|
|
|
I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade
|
|
I’ve raised a ticket with Atmire to ask
|
|
Another worrying error from dspace.log is:
|
|
|
|
|
|
" />
|
|
<meta property="og:type" content="article" />
|
|
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2016-12/" />
|
|
|
|
|
|
|
|
<meta property="article:published_time" content="2016-12-02T10:43:00+03:00"/>
|
|
|
|
<meta property="article:modified_time" content="2018-03-09T22:10:33+02:00"/>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<meta name="twitter:card" content="summary"/>
|
|
<meta name="twitter:title" content="December, 2016"/>
|
|
<meta name="twitter:description" content="2016-12-02
|
|
|
|
|
|
CGSpace was down for five hours in the morning while I was sleeping
|
|
While looking in the logs for errors, I see tons of warnings about Atmire MQM:
|
|
|
|
|
|
2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
|
|
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail="dc.title", transactionID="TX157907838689377964651674089851855413607")
|
|
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail="THUMBNAIL", transactionID="TX157907838689377964651674089851855413607")
|
|
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail="-1", transactionID="TX157907838689377964651674089851855413607")
|
|
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
|
|
|
|
|
|
|
|
I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade
|
|
I’ve raised a ticket with Atmire to ask
|
|
Another worrying error from dspace.log is:
|
|
|
|
|
|
"/>
|
|
<meta name="generator" content="Hugo 0.45.1" />
|
|
|
|
|
|
|
|
<script type="application/ld+json">
|
|
{
|
|
"@context": "http://schema.org",
|
|
"@type": "BlogPosting",
|
|
"headline": "December, 2016",
|
|
"url": "https://alanorth.github.io/cgspace-notes/2016-12/",
|
|
"wordCount": "4078",
|
|
"datePublished": "2016-12-02T10:43:00+03:00",
|
|
"dateModified": "2018-03-09T22:10:33+02:00",
|
|
"author": {
|
|
"@type": "Person",
|
|
"name": "Alan Orth"
|
|
},
|
|
"keywords": "Notes"
|
|
}
|
|
</script>
|
|
|
|
|
|
|
|
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2016-12/">
|
|
|
|
<title>December, 2016 | CGSpace Notes</title>
|
|
|
|
<!-- combined, minified CSS -->
|
|
<link href="https://alanorth.github.io/cgspace-notes/css/style.css" rel="stylesheet" integrity="sha384-Upm5uY/SXdvbjuIGH6fBjF5vOYUr9DguqBskM+EQpLBzO9U+9fMVmWEt+TTlGrWQ" crossorigin="anonymous">
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
</head>
|
|
|
|
<body>
|
|
|
|
|
|
<div class="blog-masthead">
|
|
<div class="container">
|
|
<nav class="nav blog-nav">
|
|
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
|
|
</nav>
|
|
</div>
|
|
</div>
|
|
|
|
|
|
|
|
<header class="blog-header">
|
|
<div class="container">
|
|
<h1 class="blog-title"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
|
|
<p class="lead blog-description">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
|
|
</div>
|
|
</header>
|
|
|
|
|
|
|
|
<div class="container">
|
|
<div class="row">
|
|
<div class="col-sm-8 blog-main">
|
|
|
|
|
|
|
|
|
|
<article class="blog-post">
|
|
<header>
|
|
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2016-12/">December, 2016</a></h2>
|
|
<p class="blog-post-meta"><time datetime="2016-12-02T10:43:00+03:00">Fri Dec 02, 2016</time> by Alan Orth in
|
|
|
|
<i class="fa fa-tag" aria-hidden="true"></i> <a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
|
|
|
|
</p>
|
|
</header>
|
|
<h2 id="2016-12-02">2016-12-02</h2>
|
|
|
|
<ul>
|
|
<li>CGSpace was down for five hours in the morning while I was sleeping</li>
|
|
<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li>
|
|
</ul>
|
|
|
|
<pre><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
|
|
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail="dc.title", transactionID="TX157907838689377964651674089851855413607")
|
|
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail="THUMBNAIL", transactionID="TX157907838689377964651674089851855413607")
|
|
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail="-1", transactionID="TX157907838689377964651674089851855413607")
|
|
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade</li>
|
|
<li>I’ve raised a ticket with Atmire to ask</li>
|
|
<li>Another worrying error from dspace.log is:</li>
|
|
</ul>
|
|
|
|
<p></p>
|
|
|
|
<pre><code>org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceObjectDatasetGenerator.toDatasetQuery(Lorg/dspace/core/Context;)Lcom/atmire/statistics/content/DatasetQuery;
|
|
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:972)
|
|
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
|
|
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
|
|
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:789)
|
|
at javax.servlet.http.HttpServlet.service(HttpServlet.java:646)
|
|
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
|
|
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
|
|
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
|
|
at org.dspace.app.xmlui.cocoon.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:111)
|
|
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
|
|
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
|
|
at org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter.doFilter(DSpaceCocoonServletFilter.java:274)
|
|
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
|
|
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
|
|
at org.dspace.app.xmlui.cocoon.servlet.multipart.DSpaceMultipartFilter.doFilter(DSpaceMultipartFilter.java:119)
|
|
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
|
|
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
|
|
at org.dspace.utils.servlet.DSpaceWebappServletFilter.doFilter(DSpaceWebappServletFilter.java:78)
|
|
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
|
|
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
|
|
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
|
|
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
|
|
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:501)
|
|
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
|
|
at com.googlecode.psiprobe.Tomcat70AgentValve.invoke(Tomcat70AgentValve.java:44)
|
|
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
|
|
at org.apache.catalina.valves.CrawlerSessionManagerValve.invoke(CrawlerSessionManagerValve.java:180)
|
|
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
|
|
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
|
|
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
|
|
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1041)
|
|
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
|
|
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:313)
|
|
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
|
|
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
|
|
at java.lang.Thread.run(Thread.java:745)
|
|
Caused by: java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceObjectDatasetGenerator.toDatasetQuery(Lorg/dspace/core/Context;)Lcom/atmire/statistics/content/DatasetQuery;
|
|
at com.atmire.statistics.generator.TopNDSODatasetGenerator.toDatasetQuery(SourceFile:39)
|
|
at com.atmire.statistics.display.StatisticsDataVisitsMultidata.createDataset(SourceFile:108)
|
|
at org.dspace.statistics.content.StatisticsDisplay.createDataset(SourceFile:384)
|
|
at org.dspace.statistics.content.StatisticsDisplay.getDataset(SourceFile:404)
|
|
at com.atmire.statistics.mostpopular.JSONStatsMostPopularGenerator.generateJsonData(SourceFile:170)
|
|
at com.atmire.statistics.mostpopular.JSONStatsMostPopularGenerator.generate(SourceFile:246)
|
|
at com.atmire.app.xmlui.aspect.statistics.JSONStatsMostPopular.generate(JSONStatsMostPopular.java:145)
|
|
at sun.reflect.GeneratedMethodAccessor296.invoke(Unknown Source)
|
|
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
|
at java.lang.reflect.Method.invoke(Method.java:498)
|
|
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
|
|
at com.sun.proxy.$Proxy96.process(Unknown Source)
|
|
at org.apache.cocoon.components.treeprocessor.sitemap.ReadNode.invoke(ReadNode.java:94)
|
|
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:55)
|
|
at org.apache.cocoon.components.treeprocessor.sitemap.MatchNode.invoke(MatchNode.java:87)
|
|
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:55)
|
|
at org.apache.cocoon.components.treeprocessor.sitemap.MatchNode.invoke(MatchNode.java:87)
|
|
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
|
|
at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(PipelineNode.java:143)
|
|
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
|
|
at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(PipelinesNode.java:81)
|
|
at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:239)
|
|
at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:171)
|
|
at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcessor.java:247)
|
|
at org.apache.cocoon.components.treeprocessor.sitemap.MountNode.invoke(MountNode.java:117)
|
|
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:55)
|
|
at org.apache.cocoon.components.treeprocessor.sitemap.MatchNode.invoke(MatchNode.java:87)
|
|
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
|
|
at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(PipelineNode.java:143)
|
|
at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
|
|
at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(PipelinesNode.java:81)
|
|
at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:239)
|
|
at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:171)
|
|
at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcessor.java:247)
|
|
at org.apache.cocoon.servlet.RequestProcessor.process(RequestProcessor.java:351)
|
|
at org.apache.cocoon.servlet.RequestProcessor.service(RequestProcessor.java:169)
|
|
at org.apache.cocoon.sitemap.SitemapServlet.service(SitemapServlet.java:84)
|
|
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
|
|
at org.apache.cocoon.servletservice.ServletServiceContext$PathDispatcher.forward(ServletServiceContext.java:468)
|
|
at org.apache.cocoon.servletservice.ServletServiceContext$PathDispatcher.forward(ServletServiceContext.java:443)
|
|
at org.apache.cocoon.servletservice.spring.ServletFactoryBean$ServiceInterceptor.invoke(ServletFactoryBean.java:264)
|
|
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
|
|
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
|
|
at com.sun.proxy.$Proxy89.service(Unknown Source)
|
|
at org.dspace.springmvc.CocoonView.render(CocoonView.java:113)
|
|
at org.springframework.web.servlet.DispatcherServlet.render(DispatcherServlet.java:1180)
|
|
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:950)
|
|
... 35 more
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>The first error I see in dspace.log this morning is:</li>
|
|
</ul>
|
|
|
|
<pre><code>2016-12-02 03:00:46,656 ERROR org.dspace.authority.AuthorityValueFinder @ anonymous::Error while retrieving AuthorityValue from solr:query\colon; id\colon;"b0b541c1-ec15-48bf-9209-6dbe8e338cdc"
|
|
org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://localhost:8081/solr/authority
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Looking through DSpace’s solr log I see that about 20 seconds before this, there were a few 30+ KiB solr queries</li>
|
|
<li>The last logs here right before Solr became unresponsive (and right after I restarted it five hours later) were:</li>
|
|
</ul>
|
|
|
|
<pre><code>2016-12-02 03:00:42,606 INFO org.apache.solr.core.SolrCore @ [statistics] webapp=/solr path=/select params={q=containerItem:72828+AND+type:0&shards=localhost:8081/solr/statistics-2010,localhost:8081/solr/statistics&fq=-isInternal:true&fq=-(author_mtdt:"CGIAR\+Institutional\+Learning\+and\+Change\+Initiative"++AND+subject_mtdt:"PARTNERSHIPS"+AND+subject_mtdt:"RESEARCH"+AND+subject_mtdt:"AGRICULTURE"+AND+subject_mtdt:"DEVELOPMENT"++AND+iso_mtdt:"en"+)&rows=0&wt=javabin&version=2} hits=0 status=0 QTime=19
|
|
2016-12-02 08:28:23,908 INFO org.apache.solr.servlet.SolrDispatchFilter @ SolrDispatchFilter.init()
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>DSpace’s own Solr logs don’t give IP addresses, so I will have to enable Nginx’s logging of <code>/solr</code> so I can see where this request came from</li>
|
|
<li>I enabled logging of <code>/rest/</code> and I think I’ll leave it on for good</li>
|
|
<li>Also, the disk is nearly full because of log file issues, so I’m running some compression on DSpace logs</li>
|
|
<li>Normally these stay uncompressed for a month just in case we need to look at them, so now I’ve just compressed anything older than 2 weeks so we can get some disk space back</li>
|
|
</ul>
|
|
|
|
<h2 id="2016-12-04">2016-12-04</h2>
|
|
|
|
<ul>
|
|
<li>I got a weird report from the CGSpace checksum checker this morning</li>
|
|
<li>It says 732 bitstreams have potential issues, for example:</li>
|
|
</ul>
|
|
|
|
<pre><code>------------------------------------------------
|
|
Bitstream Id = 6
|
|
Process Start Date = Dec 4, 2016
|
|
Process End Date = Dec 4, 2016
|
|
Checksum Expected = a1d9eef5e2d85f50f67ce04d0329e96a
|
|
Checksum Calculated = a1d9eef5e2d85f50f67ce04d0329e96a
|
|
Result = Bitstream marked deleted in bitstream table
|
|
-----------------------------------------------
|
|
...
|
|
------------------------------------------------
|
|
Bitstream Id = 77581
|
|
Process Start Date = Dec 4, 2016
|
|
Process End Date = Dec 4, 2016
|
|
Checksum Expected = 9959301aa4ca808d00957dff88214e38
|
|
Checksum Calculated =
|
|
Result = The bitstream could not be found
|
|
-----------------------------------------------
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>The first one seems ok, but I don’t know what to make of the second one…</li>
|
|
<li>I had a look and there is indeed no file with the second checksum in the assetstore (ie, looking in <code>[dspace-dir]/assetstore/99/59/30/...</code>)</li>
|
|
<li>For what it’s worth, there is no item on DSpace Test or S3 backups with that checksum either…</li>
|
|
<li>In other news, I’m looking at JVM settings from the Solr 4.10.2 release, from <code>bin/solr.in.sh</code>:</li>
|
|
</ul>
|
|
|
|
<pre><code># These GC settings have shown to work well for a number of common Solr workloads
|
|
GC_TUNE="-XX:-UseSuperWord \
|
|
-XX:NewRatio=3 \
|
|
-XX:SurvivorRatio=4 \
|
|
-XX:TargetSurvivorRatio=90 \
|
|
-XX:MaxTenuringThreshold=8 \
|
|
-XX:+UseConcMarkSweepGC \
|
|
-XX:+UseParNewGC \
|
|
-XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 \
|
|
-XX:+CMSScavengeBeforeRemark \
|
|
-XX:PretenureSizeThreshold=64m \
|
|
-XX:CMSFullGCsBeforeCompaction=1 \
|
|
-XX:+UseCMSInitiatingOccupancyOnly \
|
|
-XX:CMSInitiatingOccupancyFraction=50 \
|
|
-XX:CMSTriggerPermRatio=80 \
|
|
-XX:CMSMaxAbortablePrecleanTime=6000 \
|
|
-XX:+CMSParallelRemarkEnabled \
|
|
-XX:+ParallelRefProcEnabled \
|
|
-XX:+AggressiveOpts"
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>I need to try these because they are recommended by the Solr project itself</li>
|
|
<li>Also, as always, I need to read <a href="https://wiki.apache.org/solr/ShawnHeisey">Shawn Heisey’s wiki page on Solr</a></li>
|
|
</ul>
|
|
|
|
<h2 id="2016-12-05">2016-12-05</h2>
|
|
|
|
<ul>
|
|
<li>I did some basic benchmarking on a local DSpace before and after the JVM settings above, but there wasn’t anything amazingly obvious</li>
|
|
<li>I want to make the changes on DSpace Test and monitor the JVM heap graphs for a few days to see if they change the JVM GC patterns or anything (munin graphs)</li>
|
|
<li>Spin up new CGSpace server on Linode</li>
|
|
<li>I did a few traceroutes from Jordan and Kenya and it seems that Linode’s Frankfurt datacenter is a few less hops and perhaps less packet loss than the London one, so I put the new server in Frankfurt</li>
|
|
<li>Do initial provisioning</li>
|
|
<li>Atmire responded about the MQM warnings in the DSpace logs</li>
|
|
<li>Apparently we need to change the batch edit consumers in <code>dspace/config/dspace.cfg</code>:</li>
|
|
</ul>
|
|
|
|
<pre><code>event.consumer.batchedit.filters = Community|Collection+Create
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>I haven’t tested it yet, but I created a pull request: <a href="https://github.com/ilri/DSpace/pull/289">#289</a></li>
|
|
</ul>
|
|
|
|
<h2 id="2016-12-06">2016-12-06</h2>
|
|
|
|
<ul>
|
|
<li>Some author authority corrections and name standardizations for Peter:</li>
|
|
</ul>
|
|
|
|
<pre><code>dspace=# update metadatavalue set authority='b041f2f4-19e7-4113-b774-0439baabd197', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Mora Benard%';
|
|
UPDATE 11
|
|
dspace=# update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-6fd5-4b43-9363-58d18e7952c9', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Hoek, R%';
|
|
UPDATE 36
|
|
dspace=# update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-6fd5-4b43-9363-58d18e7952c9', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like '%an der Hoek%' and text_value !~ '^.*W\.?$';
|
|
UPDATE 14
|
|
dspace=# update metadatavalue set authority='18349f29-61b1-44d7-ac60-89e55546e812', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thorne, P%';
|
|
UPDATE 42
|
|
dspace=# update metadatavalue set authority='0d8369bb-57f7-4b2f-92aa-af820b183aca', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thornton, P%';
|
|
UPDATE 360
|
|
dspace=# update metadatavalue set text_value='Grace, Delia', authority='0b4fcbc1-d930-4319-9b4d-ea1553cca70b', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
|
|
UPDATE 561
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Pay attention to the regex to prevent false positives in tricky cases with Dutch names!</li>
|
|
<li>I will run these updates on DSpace Test and then force a Discovery reindex, and then run them on CGSpace next week</li>
|
|
<li>More work on the KM4Dev Journal article</li>
|
|
<li>In other news, it seems the batch edit patch is working, there are no more WARN errors in the logs and the batch edit seems to work</li>
|
|
<li>I need to check the CGSpace logs to see if there are still errors there, and then deploy/monitor it there</li>
|
|
<li>Paola from CCAFS mentioned she also has the “take task” bug on CGSpace</li>
|
|
<li>Reading about <a href="https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html"><code>shared_buffers</code> in PostgreSQL configuration</a> (default is 128MB)</li>
|
|
<li>Looks like we have ~5GB of memory used by caches on the test server (after OS and JVM heap!), so we might as well bump up the buffers for Postgres</li>
|
|
<li>The docs say a good starting point for a dedicated server is 25% of the system RAM, and our server isn’t dedicated (also runs Solr, which can benefit from OS cache) so let’s try 1024MB</li>
|
|
<li>In other news, the authority reindexing keeps crashing (I was manually running it after the author updates above):</li>
|
|
</ul>
|
|
|
|
<pre><code>$ time JAVA_OPTS="-Xms768m -Xmx768m -Dfile.encoding=UTF-8" /home/dspacetest.cgiar.org/bin/dspace index-authority
|
|
Retrieving all data
|
|
Initialize org.dspace.authority.indexer.DSpaceAuthorityIndexer
|
|
Exception: null
|
|
java.lang.NullPointerException
|
|
at org.dspace.authority.AuthorityValueGenerator.generateRaw(AuthorityValueGenerator.java:82)
|
|
at org.dspace.authority.AuthorityValueGenerator.generate(AuthorityValueGenerator.java:39)
|
|
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.prepareNextValue(DSpaceAuthorityIndexer.java:201)
|
|
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:132)
|
|
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
|
|
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
|
|
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:159)
|
|
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
|
|
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
|
|
at org.dspace.authority.indexer.AuthorityIndexClient.main(AuthorityIndexClient.java:61)
|
|
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
|
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
|
|
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
|
at java.lang.reflect.Method.invoke(Method.java:498)
|
|
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
|
|
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
|
|
|
|
real 8m39.913s
|
|
user 1m54.190s
|
|
sys 0m22.647s
|
|
</code></pre>
|
|
|
|
<h2 id="2016-12-07">2016-12-07</h2>
|
|
|
|
<ul>
|
|
<li>For what it’s worth, after running the same SQL updates on my local test server, <code>index-authority</code> runs and completes just fine</li>
|
|
<li>I will have to test more</li>
|
|
<li>Anyways, I noticed that some of the authority values I set actually have versions of author names we don’t want, ie “Grace, D.”</li>
|
|
<li>For example, do a Solr query for “first_name:Grace” and look at the results</li>
|
|
<li>Querying that ID shows the fields that need to be changed:</li>
|
|
</ul>
|
|
|
|
<pre><code>{
|
|
"responseHeader": {
|
|
"status": 0,
|
|
"QTime": 1,
|
|
"params": {
|
|
"q": "id:0b4fcbc1-d930-4319-9b4d-ea1553cca70b",
|
|
"indent": "true",
|
|
"wt": "json",
|
|
"_": "1481102189244"
|
|
}
|
|
},
|
|
"response": {
|
|
"numFound": 1,
|
|
"start": 0,
|
|
"docs": [
|
|
{
|
|
"id": "0b4fcbc1-d930-4319-9b4d-ea1553cca70b",
|
|
"field": "dc_contributor_author",
|
|
"value": "Grace, D.",
|
|
"deleted": false,
|
|
"creation_date": "2016-11-10T15:13:40.318Z",
|
|
"last_modified_date": "2016-11-10T15:13:40.318Z",
|
|
"authority_type": "person",
|
|
"first_name": "D.",
|
|
"last_name": "Grace"
|
|
}
|
|
]
|
|
}
|
|
}
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>I think I can just update the <code>value</code>, <code>first_name</code>, and <code>last_name</code> fields…</li>
|
|
<li>The update syntax should be something like this, but I’m getting errors from Solr:</li>
|
|
</ul>
|
|
|
|
<pre><code>$ curl 'localhost:8081/solr/authority/update?commit=true&wt=json&indent=true' -H 'Content-type:application/json' -d '[{"id":"1","price":{"set":100}}]'
|
|
{
|
|
"responseHeader":{
|
|
"status":400,
|
|
"QTime":0},
|
|
"error":{
|
|
"msg":"Unexpected character '[' (code 91) in prolog; expected '<'\n at [row,col {unknown-source}]: [1,1]",
|
|
"code":400}}
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>When I try using the XML format I get an error that the <code>updateLog</code> needs to be configured for that core</li>
|
|
<li>Maybe I can just remove the authority UUID from the records, run the indexing again so it creates a new one for each name variant, then match them correctly?</li>
|
|
</ul>
|
|
|
|
<pre><code>dspace=# update metadatavalue set authority=null, confidence=-1 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
|
|
UPDATE 561
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Then I’ll reindex discovery and authority and see how the authority Solr core looks</li>
|
|
<li>After this, now there are authorities for some of the “Grace, D.” and “Grace, Delia” text_values in the database (the first version is actually the same authority that already exists in the core, so it was just added back to some text_values, but the second one is new):</li>
|
|
</ul>
|
|
|
|
<pre><code>$ curl 'localhost:8081/solr/authority/select?q=id%3A18ea1525-2513-430a-8817-a834cd733fbc&wt=json&indent=true'
|
|
{
|
|
"responseHeader":{
|
|
"status":0,
|
|
"QTime":0,
|
|
"params":{
|
|
"q":"id:18ea1525-2513-430a-8817-a834cd733fbc",
|
|
"indent":"true",
|
|
"wt":"json"}},
|
|
"response":{"numFound":1,"start":0,"docs":[
|
|
{
|
|
"id":"18ea1525-2513-430a-8817-a834cd733fbc",
|
|
"field":"dc_contributor_author",
|
|
"value":"Grace, Delia",
|
|
"deleted":false,
|
|
"creation_date":"2016-12-07T10:54:34.356Z",
|
|
"last_modified_date":"2016-12-07T10:54:34.356Z",
|
|
"authority_type":"person",
|
|
"first_name":"Delia",
|
|
"last_name":"Grace"}]
|
|
}}
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>So now I could set them all to this ID and the name would be ok, but there has to be a better way!</li>
|
|
<li>In this case it seems that since there were also two different IDs in the original database, I just picked the wrong one!</li>
|
|
<li>Better to use:</li>
|
|
</ul>
|
|
|
|
<pre><code>dspace#= update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>This proves that unifying author name varieties in authorities is easy, but fixing the name in the authority is tricky!</li>
|
|
<li>Perhaps another way is to just add our own UUID to the authority field for the text_value we like, then re-index authority so they get synced from PostgreSQL to Solr, then set the other text_values to use that authority ID</li>
|
|
<li>Deploy MQM WARN fix on CGSpace (<a href="https://github.com/ilri/DSpace/pull/289">#289</a>)</li>
|
|
<li>Deploy “take task” hack/fix on CGSpace (<a href="https://github.com/ilri/DSpace/pull/290">#290</a>)</li>
|
|
<li>I ran the following author corrections and then reindexed discovery:</li>
|
|
</ul>
|
|
|
|
<pre><code>update metadatavalue set authority='b041f2f4-19e7-4113-b774-0439baabd197', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Mora Benard%';
|
|
update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-6fd5-4b43-9363-58d18e7952c9', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Hoek, R%';
|
|
update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-6fd5-4b43-9363-58d18e7952c9', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like '%an der Hoek%' and text_value !~ '^.*W\.?$';
|
|
update metadatavalue set authority='18349f29-61b1-44d7-ac60-89e55546e812', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thorne, P%';
|
|
update metadatavalue set authority='0d8369bb-57f7-4b2f-92aa-af820b183aca', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thornton, P%';
|
|
update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
|
|
</code></pre>
|
|
|
|
<h2 id="2016-12-08">2016-12-08</h2>
|
|
|
|
<ul>
|
|
<li>Something weird happened and Peter Thorne’s names all ended up as “Thorne”, I guess because the original authority had that as its name value:</li>
|
|
</ul>
|
|
|
|
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thorne%';
|
|
text_value | authority | confidence
|
|
------------------+--------------------------------------+------------
|
|
Thorne, P.J. | 18349f29-61b1-44d7-ac60-89e55546e812 | 600
|
|
Thorne | 18349f29-61b1-44d7-ac60-89e55546e812 | 600
|
|
Thorne-Lyman, A. | 0781e13a-1dc8-4e3f-82e8-5c422b44a344 | -1
|
|
Thorne, M. D. | 54c52649-cefd-438d-893f-3bcef3702f07 | -1
|
|
Thorne, P.J | 18349f29-61b1-44d7-ac60-89e55546e812 | 600
|
|
Thorne, P. | 18349f29-61b1-44d7-ac60-89e55546e812 | 600
|
|
(6 rows)
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>I generated a new UUID using <code>uuidgen | tr [A-Z] [a-z]</code> and set it along with correct name variation for all records:</li>
|
|
</ul>
|
|
|
|
<pre><code>dspace=# update metadatavalue set authority='b2f7603d-2fb5-4018-923a-c4ec8d85b3bb', text_value='Thorne, P.J.' where resource_type_id=2 and metadata_field_id=3 and authority='18349f29-61b1-44d7-ac60-89e55546e812';
|
|
UPDATE 43
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Apparently we also need to normalize Phil Thornton’s names to <code>Thornton, Philip K.</code>:
|
|
<br /></li>
|
|
</ul>
|
|
|
|
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^Thornton[,\.]? P.*';
|
|
text_value | authority | confidence
|
|
---------------------+--------------------------------------+------------
|
|
Thornton, P | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
|
|
Thornton, P K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
|
|
Thornton, P K | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
|
|
Thornton. P.K. | 3e1e6639-d4fb-449e-9fce-ce06b5b0f702 | -1
|
|
Thornton, P K . | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
|
|
Thornton, P.K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
|
|
Thornton, P.K | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
|
|
Thornton, Philip K | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
|
|
Thornton, Philip K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
|
|
Thornton, P. K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
|
|
(10 rows)
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Seems his original authorities are using an incorrect version of the name so I need to generate another UUID and tie it to the correct name, then reindex:</li>
|
|
</ul>
|
|
|
|
<pre><code>dspace=# update metadatavalue set authority='2df8136e-d8f4-4142-b58c-562337cab764', text_value='Thornton, Philip K.', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^Thornton[,\.]? P.*';
|
|
UPDATE 362
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>It seems that, when you are messing with authority and author text values in the database, it is better to run authority reindex first (postgres→solr authority core) and then Discovery reindex (postgres→solr Discovery core)</li>
|
|
<li>Everything looks ok after authority and discovery reindex</li>
|
|
<li>In other news, I think we should really be using more RAM for PostgreSQL’s <code>shared_buffers</code></li>
|
|
<li>The <a href="https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html">PostgreSQL documentation</a> recommends using 25% of the system’s RAM on dedicated systems, but we should use a bit less since we also have a massive JVM heap and also benefit from some RAM being used by the OS cache</li>
|
|
</ul>
|
|
|
|
<h2 id="2016-12-09">2016-12-09</h2>
|
|
|
|
<ul>
|
|
<li>More work on finishing rough draft of KM4Dev article</li>
|
|
<li>Set PostgreSQL’s <code>shared_buffers</code> on CGSpace to 10% of system RAM (1200MB)</li>
|
|
<li>Run the following author corrections on CGSpace:</li>
|
|
</ul>
|
|
|
|
<pre><code>dspace=# update metadatavalue set authority='34df639a-42d8-4867-a3f2-1892075fcb3f', text_value='Thorne, P.J.' where resource_type_id=2 and metadata_field_id=3 and authority='18349f29-61b1-44d7-ac60-89e55546e812' or authority='021cd183-946b-42bb-964e-522ebff02993';
|
|
dspace=# update metadatavalue set authority='2df8136e-d8f4-4142-b58c-562337cab764', text_value='Thornton, Philip K.', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^Thornton[,\.]? P.*';
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>The authority IDs were different now than when I was looking a few days ago so I had to adjust them here</li>
|
|
</ul>
|
|
|
|
<h2 id="2016-12-11">2016-12-11</h2>
|
|
|
|
<ul>
|
|
<li>After enabling a sizable <code>shared_buffers</code> for CGSpace’s PostgreSQL configuration the number of connections to the database dropped significantly</li>
|
|
</ul>
|
|
|
|
<p><img src="/cgspace-notes/2016/12/postgres_bgwriter-week.png" alt="postgres_bgwriter-week" />
|
|
<img src="/cgspace-notes/2016/12/postgres_connections_ALL-week.png" alt="postgres_connections_ALL-week" /></p>
|
|
|
|
<ul>
|
|
<li>Looking at CIAT records from last week again, they have a lot of double authors like:</li>
|
|
</ul>
|
|
|
|
<pre><code>International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::600
|
|
International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::500
|
|
International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::0
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Some in the same <code>dc.contributor.author</code> field, and some in others like <code>dc.contributor.author[en_US]</code> etc</li>
|
|
<li>Removing the duplicates in OpenRefine and uploading a CSV to DSpace says “no changes detected”</li>
|
|
<li>Seems like the only way to sortof clean these up would be to start in SQL:</li>
|
|
</ul>
|
|
|
|
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like 'International Center for Tropical Agriculture';
|
|
text_value | authority | confidence
|
|
-----------------------------------------------+--------------------------------------+------------
|
|
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | -1
|
|
International Center for Tropical Agriculture | | 600
|
|
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 500
|
|
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | 600
|
|
International Center for Tropical Agriculture | | -1
|
|
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | 500
|
|
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 600
|
|
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | -1
|
|
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 0
|
|
dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value = 'International Center for Tropical Agriculture';
|
|
UPDATE 1693
|
|
dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', text_value='International Center for Tropical Agriculture', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like '%CIAT%';
|
|
UPDATE 35
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Work on article for KM4Dev journal</li>
|
|
</ul>
|
|
|
|
<h2 id="2016-12-13">2016-12-13</h2>
|
|
|
|
<ul>
|
|
<li>Checking in on CGSpace postgres stats again, looks like the <code>shared_buffers</code> change from a few days ago really made a big impact:</li>
|
|
</ul>
|
|
|
|
<p><img src="/cgspace-notes/2016/12/postgres_bgwriter-week-2016-12-13.png" alt="postgres_bgwriter-week" />
|
|
<img src="/cgspace-notes/2016/12/postgres_connections_ALL-week-2016-12-13.png" alt="postgres_connections_ALL-week" /></p>
|
|
|
|
<ul>
|
|
<li>Looking at logs, it seems we need to evaluate which logs we keep and for how long</li>
|
|
<li>Basically the only ones we <em>need</em> are <code>dspace.log</code> because those are used for legacy statistics (need to keep for 1 month)</li>
|
|
<li>Other logs will be an issue because they don’t have date stamps</li>
|
|
<li>I will add date stamps to the logs we’re storing from the tomcat7 user’s cron jobs at least, using: <code>$(date --iso-8601)</code></li>
|
|
<li>Would probably be better to make custom logrotate files for them in the future</li>
|
|
<li>Clean up some unneeded log files from 2014 (they weren’t large, just don’t need them)</li>
|
|
<li>So basically, new cron jobs for logs should look something like this:</li>
|
|
<li>Find any file named <code>*.log*</code> that isn’t <code>dspace.log*</code>, isn’t already zipped, and is older than one day, and zip it:</li>
|
|
</ul>
|
|
|
|
<pre><code># find /home/dspacetest.cgiar.org/log -regextype posix-extended -iregex ".*\.log.*" ! -iregex ".*dspace\.log.*" ! -iregex ".*\.(gz|lrz|lzo|xz)" ! -newermt "Yesterday" -exec schedtool -B -e ionice -c2 -n7 xz {} \;
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Since there is <code>xzgrep</code> and <code>xzless</code> we can actually just zip them after one day, why not?!</li>
|
|
<li>We can keep the zipped ones for two weeks just in case we need to look for errors, etc, and delete them after that</li>
|
|
<li>I use <code>schedtool -B</code> and <code>ionice -c2 -n7</code> to set the CPU scheduling to <code>SCHED_BATCH</code> and the IO to best effort which should, in theory, impact important system processes like Tomcat and PostgreSQL less</li>
|
|
<li>When the tasks are running you can see that the policies do apply:</li>
|
|
</ul>
|
|
|
|
<pre><code>$ schedtool $(ps aux | grep "xz /home" | grep -v grep | awk '{print $2}') && ionice -p $(ps aux | grep "xz /home" | grep -v grep | awk '{print $2}')
|
|
PID 17049: PRIO 0, POLICY B: SCHED_BATCH , NICE 0, AFFINITY 0xf
|
|
best-effort: prio 7
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>All in all this should free up a few gigs (we were at 9.3GB free when I started)</li>
|
|
<li>Next thing to look at is whether we need Tomcat’s access logs</li>
|
|
<li>I just looked and it seems that we saved 10GB by zipping these logs</li>
|
|
<li>Some users pointed out issues with the “most popular” stats on a community or collection</li>
|
|
<li>This error appears in the logs when you try to view them:</li>
|
|
</ul>
|
|
|
|
<pre><code>2016-12-13 21:17:37,486 ERROR org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter @ Serious Error Occurred Processing Request!
|
|
org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceObjectDatasetGenerator.toDatasetQuery(Lorg/dspace/core/Context;)Lcom/atmire/statistics/content/DatasetQuery;
|
|
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:972)
|
|
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
|
|
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
|
|
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:789)
|
|
at javax.servlet.http.HttpServlet.service(HttpServlet.java:650)
|
|
at javax.servlet.http.HttpServlet.service(HttpServlet.java:731)
|
|
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
|
|
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
|
|
at org.dspace.app.xmlui.cocoon.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:111)
|
|
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
|
|
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
|
|
at org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter.doFilter(DSpaceCocoonServletFilter.java:274)
|
|
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
|
|
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
|
|
at org.dspace.app.xmlui.cocoon.servlet.multipart.DSpaceMultipartFilter.doFilter(DSpaceMultipartFilter.java:119)
|
|
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
|
|
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
|
|
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
|
|
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
|
|
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
|
|
at org.dspace.utils.servlet.DSpaceWebappServletFilter.doFilter(DSpaceWebappServletFilter.java:78)
|
|
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
|
|
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
|
|
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:221)
|
|
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
|
|
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:505)
|
|
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:169)
|
|
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
|
|
at org.apache.catalina.valves.CrawlerSessionManagerValve.invoke(CrawlerSessionManagerValve.java:180)
|
|
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:956)
|
|
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
|
|
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:436)
|
|
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1078)
|
|
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:625)
|
|
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:316)
|
|
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
|
|
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
|
|
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
|
|
at java.lang.Thread.run(Thread.java:745)
|
|
Caused by: java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceObjectDatasetGenerator.toDatasetQuery(Lorg/dspace/core/Context;)Lcom/atmire/statistics/content/DatasetQuery;
|
|
at com.atmire.statistics.generator.TopNDSODatasetGenerator.toDatasetQuery(SourceFile:39)
|
|
at com.atmire.statistics.display.StatisticsDataVisitsMultidata.createDataset(SourceFile:108)
|
|
at org.dspace.statistics.content.StatisticsDisplay.createDataset(SourceFile:384)
|
|
at org.dspace.statistics.content.StatisticsDisplay.getDataset(SourceFile:404)
|
|
at com.atmire.statistics.mostpopular.JSONStatsMostPopularGenerator.generateJsonData(SourceFile:170)
|
|
at com.atmire.statistics.mostpopular.JSONStatsMostPopularGenerator.generate(SourceFile:246)
|
|
at com.atmire.app.xmlui.aspect.statistics.JSONStatsMostPopular.generate(JSONStatsMostPopular.java:145)
|
|
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>It happens on development and production, so I will have to ask Atmire</li>
|
|
<li>Most likely an issue with installation/configuration</li>
|
|
</ul>
|
|
|
|
<h2 id="2016-12-14">2016-12-14</h2>
|
|
|
|
<ul>
|
|
<li>Atmire sent a quick fix for the <code>last-update.txt</code> file not found error</li>
|
|
<li>After applying pull request <a href="https://github.com/ilri/DSpace/pull/291">#291</a> on DSpace Test I no longer see the error in the logs after the <code>UpdateSolrStorageReports</code> task runs</li>
|
|
<li>Also, I’m toying with the idea of moving the <code>tomcat7</code> user’s cron jobs to <code>/etc/cron.d</code> so we can manage them in Ansible</li>
|
|
<li>Made a pull request with a template for the cron jobs (<a href="https://github.com/ilri/rmg-ansible-public/pull/75">#75</a>)</li>
|
|
<li>Testing SMTP from the new CGSpace server and it’s not working, I’ll have to tell James</li>
|
|
</ul>
|
|
|
|
<h2 id="2016-12-15">2016-12-15</h2>
|
|
|
|
<ul>
|
|
<li>Start planning for server migration this weekend, letting users know</li>
|
|
<li>I am trying to figure out what the process is to <a href="http://handle.net/hnr_support.html">update the server’s IP in the Handle system</a>, and emailing the hdladmin account bounces(!)</li>
|
|
<li>I will contact the Jane Euler directly as I know I’ve corresponded with her in the past</li>
|
|
<li>She said that I should indeed just re-run the <code>[dspace]/bin/dspace make-handle-config</code> command and submit the new <code>sitebndl.zip</code> file to the CNRI website</li>
|
|
<li>Also I was troubleshooting some workflow issues from Bizuwork</li>
|
|
<li>I re-created the same scenario by adding a non-admin account and submitting an item, but I was able to successfully approve and commit it</li>
|
|
<li>So it turns out it’s not a bug, it’s just that Peter was added as a reviewer/admin AFTER the items were submitted</li>
|
|
<li>This is how DSpace works, and I need to ask if there is a way to override someone’s submission, as the other reviewer seems to not be paying attention, or has perhaps taken the item from the task pool?</li>
|
|
<li>Run a batch edit to add “RANGELANDS” ILRI subject to all items containing the word “RANGELANDS” in their metadata for Peter Ballantyne</li>
|
|
</ul>
|
|
|
|
<p><img src="/cgspace-notes/2016/12/batch-edit1.png" alt="Select all items with "rangelands" in metadata" />
|
|
<img src="/cgspace-notes/2016/12/batch-edit2.png" alt="Add RANGELANDS ILRI subject" /></p>
|
|
|
|
<h2 id="2016-12-18">2016-12-18</h2>
|
|
|
|
<ul>
|
|
<li>Add four new CRP subjects for 2017 and sort the input forms alphabetically (<a href="https://github.com/ilri/DSpace/pull/294">#294</a>)</li>
|
|
<li>Test the SMTP on the new server and it’s working</li>
|
|
<li>Last week, when we asked CGNET to update the DNS records this weekend, they misunderstood and did it immediately</li>
|
|
<li>We quickly told them to undo it, but I just realized they didn’t undo the IPv6 AAAA record!</li>
|
|
<li>None of our users in African institutes will have IPv6, but some Europeans might, so I need to check if any submissions have been added since then</li>
|
|
<li>Update some names and authorities in the database:</li>
|
|
</ul>
|
|
|
|
<pre><code>dspace=# update metadatavalue set authority='5ff35043-942e-4d0a-b377-4daed6e3c1a3', confidence=600, text_value='Duncan, Alan' where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^.*Duncan,? A.*';
|
|
UPDATE 204
|
|
dspace=# update metadatavalue set authority='46804b53-ea30-4a85-9ccf-b79a35816fa9', confidence=600, text_value='Mekonnen, Kindu' where resource_type_id=2 and metadata_field_id=3 and text_value like '%Mekonnen, K%';
|
|
UPDATE 89
|
|
dspace=# update metadatavalue set authority='f840da02-26e7-4a74-b7ba-3e2b723f3684', confidence=600, text_value='Lukuyu, Ben A.' where resource_type_id=2 and metadata_field_id=3 and text_value like '%Lukuyu, B%';
|
|
UPDATE 140
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Generated a new UUID for Ben using <code>uuidgen | tr [A-Z] [a-z]</code> as the one in Solr had his ORCID but the name format was incorrect</li>
|
|
<li>In theory DSpace should be able to check names from ORCID and update the records in the database, but I find that this doesn’t work (see Jira bug <a href="https://jira.duraspace.org/browse/DS-3302">DS-3302</a>)</li>
|
|
<li>I need to run these updates along with the other one for CIAT that I found last week</li>
|
|
<li>Enable OCSP stapling for hosts >= Ubuntu 16.04 in our Ansible playbooks (<a href="https://github.com/ilri/rmg-ansible-public/pull/76">#76</a>)</li>
|
|
<li>Working for DSpace Test on the second response:</li>
|
|
</ul>
|
|
|
|
<pre><code>$ openssl s_client -connect dspacetest.cgiar.org:443 -servername dspacetest.cgiar.org -tls1_2 -tlsextdebug -status
|
|
...
|
|
OCSP response: no response sent
|
|
$ openssl s_client -connect dspacetest.cgiar.org:443 -servername dspacetest.cgiar.org -tls1_2 -tlsextdebug -status
|
|
...
|
|
OCSP Response Data:
|
|
...
|
|
Cert Status: good
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Migrate CGSpace to new server, roughly following these steps:</li>
|
|
<li>On old server:</li>
|
|
</ul>
|
|
|
|
<pre><code># service tomcat7 stop
|
|
# /home/backup/scripts/postgres_backup.sh
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>On new server:</li>
|
|
</ul>
|
|
|
|
<pre><code># systemctl stop tomcat7
|
|
# rsync -4 -av --delete 178.79.187.182:/home/cgspace.cgiar.org/assetstore/ /home/cgspace.cgiar.org/assetstore/
|
|
# rsync -4 -av --delete 178.79.187.182:/home/backup/ /home/backup/
|
|
# rsync -4 -av --delete 178.79.187.182:/home/cgspace.cgiar.org/solr/ /home/cgspace.cgiar.org/solr
|
|
# su - postgres
|
|
$ dropdb cgspace
|
|
$ createdb -O cgspace --encoding=UNICODE cgspace
|
|
$ psql cgspace -c 'alter user cgspace createuser;'
|
|
$ pg_restore -O -U cgspace -d cgspace -W -h localhost /home/backup/postgres/cgspace_2016-12-18.backup
|
|
$ psql cgspace -c 'alter user cgspace nocreateuser;'
|
|
$ psql -U cgspace -f ~tomcat7/src/git/DSpace/dspace/etc/postgres/update-sequences.sql cgspace -h localhost
|
|
$ vacuumdb cgspace
|
|
$ psql cgspace
|
|
postgres=# \i /tmp/author-authority-updates-2016-12-11.sql
|
|
postgres=# \q
|
|
$ exit
|
|
# chown -R tomcat7:tomcat7 /home/cgspace.cgiar.org
|
|
# rsync -4 -av 178.79.187.182:/home/cgspace.cgiar.org/log/*.dat /home/cgspace.cgiar.org/log/
|
|
# rsync -4 -av 178.79.187.182:/home/cgspace.cgiar.org/log/dspace.log.2016-1[12]* /home/cgspace.cgiar.org/log/
|
|
# su - tomcat7
|
|
$ cd src/git/DSpace/dspace/target/dspace-installer
|
|
$ ant update clean_backups
|
|
$ exit
|
|
# systemctl start tomcat7
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>It took about twenty minutes and afterwards I had to check a few things, like:
|
|
|
|
<ul>
|
|
<li>check and enable systemd timer for let’s encrypt</li>
|
|
<li>enable root cron jobs</li>
|
|
<li>disable root cron jobs on old server after!</li>
|
|
<li>enable tomcat7 cron jobs</li>
|
|
<li>disable tomcat7 cron jobs on old server after!</li>
|
|
<li>regenerate <code>sitebndl.zip</code> with new IP for handle server and submit it to Handle.net</li>
|
|
</ul></li>
|
|
</ul>
|
|
|
|
<h2 id="2016-12-22">2016-12-22</h2>
|
|
|
|
<ul>
|
|
<li>Abenet wanted a CSV of the IITA community, but the web export doesn’t include the <code>dc.date.accessioned</code> field</li>
|
|
<li>I had to export it from the command line using the <code>-a</code> flag:</li>
|
|
</ul>
|
|
|
|
<pre><code>$ [dspace]/bin/dspace metadata-export -a -f /tmp/iita.csv -i 10568/68616
|
|
</code></pre>
|
|
|
|
<h2 id="2016-12-28">2016-12-28</h2>
|
|
|
|
<ul>
|
|
<li>We’ve been getting two alerts per day about CPU usage on the new server from Linode</li>
|
|
<li>These are caused by the batch jobs for Solr etc that run in the early morning hours</li>
|
|
<li>The Linode default is to alert at 90% CPU usage for two hours, but I see the old server was at 150%, so maybe we just need to adjust it</li>
|
|
<li>Speaking of the old server (linode01), I think we can decommission it now</li>
|
|
<li>I checked the S3 logs on the new server (linode18) to make sure the backups have been running and everything looks good</li>
|
|
<li>In other news, I was looking at the Munin graphs for PostgreSQL on the new server and it looks slightly worrying:</li>
|
|
</ul>
|
|
|
|
<p><img src="/cgspace-notes/2016/12/postgres_size_ALL-week.png" alt="munin postgres stats" /></p>
|
|
|
|
<ul>
|
|
<li>I will have to check later why the size keeps increasing</li>
|
|
</ul>
|
|
|
|
|
|
|
|
|
|
|
|
</article>
|
|
|
|
|
|
|
|
</div> <!-- /.blog-main -->
|
|
|
|
<aside class="col-sm-3 ml-auto blog-sidebar">
|
|
|
|
|
|
|
|
<section class="sidebar-module">
|
|
<h4>Recent Posts</h4>
|
|
<ol class="list-unstyled">
|
|
|
|
|
|
<li><a href="/cgspace-notes/2018-08/">August, 2018</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2018-07/">July, 2018</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2018-06/">June, 2018</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2018-05/">May, 2018</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2018-04/">April, 2018</a></li>
|
|
|
|
</ol>
|
|
</section>
|
|
|
|
|
|
|
|
|
|
<section class="sidebar-module">
|
|
<h4>Links</h4>
|
|
<ol class="list-unstyled">
|
|
|
|
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
|
|
|
|
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
|
|
|
|
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
|
|
|
|
</ol>
|
|
</section>
|
|
|
|
</aside>
|
|
|
|
|
|
</div> <!-- /.row -->
|
|
</div> <!-- /.container -->
|
|
|
|
|
|
|
|
<footer class="blog-footer">
|
|
<p>
|
|
|
|
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
|
|
|
|
</p>
|
|
<p>
|
|
<a href="#">Back to top</a>
|
|
</p>
|
|
</footer>
|
|
|
|
|
|
</body>
|
|
|
|
</html>
|