<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"> <meta property="og:title" content="January, 2019" /> <meta property="og:description" content="2019-01-02 Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning I don’t see anything interesting in the web server logs around that time though: # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 92 40.77.167.4 99 210.7.29.100 120 38.126.157.45 177 35.237.175.180 177 40.77.167.32 216 66.249.75.219 225 18.203.76.93 261 46.101.86.248 357 207.46.13.1 903 54.70.40.11 " /> <meta property="og:type" content="article" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-01/" /><meta property="article:published_time" content="2019-01-02T09:48:30+02:00"/> <meta property="article:modified_time" content="2019-01-07T22:30:23+02:00"/> <meta name="twitter:card" content="summary"/> <meta name="twitter:title" content="January, 2019"/> <meta name="twitter:description" content="2019-01-02 Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning I don’t see anything interesting in the web server logs around that time though: # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 92 40.77.167.4 99 210.7.29.100 120 38.126.157.45 177 35.237.175.180 177 40.77.167.32 216 66.249.75.219 225 18.203.76.93 261 46.101.86.248 357 207.46.13.1 903 54.70.40.11 "/> <meta name="generator" content="Hugo 0.53" /> <script type="application/ld+json"> { "@context": "http://schema.org", "@type": "BlogPosting", "headline": "January, 2019", "url": "https://alanorth.github.io/cgspace-notes/2019-01/", "wordCount": "1308", "datePublished": "2019-01-02T09:48:30+02:00", "dateModified": "2019-01-07T22:30:23+02:00", "author": { "@type": "Person", "name": "Alan Orth" }, "keywords": "Notes" } </script> <link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2019-01/"> <title>January, 2019 | CGSpace Notes</title> <!-- combined, minified CSS --> <link href="https://alanorth.github.io/cgspace-notes/css/style.css" rel="stylesheet" integrity="sha384-6+EGfPoOzk/n2DVJSlglKT8TV1TgIMvVcKI73IZgBswLasPBn94KommV6ilJqCXE" crossorigin="anonymous"> </head> <body> <div class="blog-masthead"> <div class="container"> <nav class="nav blog-nav"> <a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a> </nav> </div> </div> <header class="blog-header"> <div class="container"> <h1 class="blog-title"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1> <p class="lead blog-description">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p> </div> </header> <div class="container"> <div class="row"> <div class="col-sm-8 blog-main"> <article class="blog-post"> <header> <h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2019-01/">January, 2019</a></h2> <p class="blog-post-meta"><time datetime="2019-01-02T09:48:30+02:00">Wed Jan 02, 2019</time> by Alan Orth in <i class="fa fa-tag" aria-hidden="true"></i> <a href="/cgspace-notes/tags/notes" rel="tag">Notes</a> </p> </header> <h2 id="2019-01-02">2019-01-02</h2> <ul> <li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li> <li>I don’t see anything interesting in the web server logs around that time though:</li> </ul> <pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 92 40.77.167.4 99 210.7.29.100 120 38.126.157.45 177 35.237.175.180 177 40.77.167.32 216 66.249.75.219 225 18.203.76.93 261 46.101.86.248 357 207.46.13.1 903 54.70.40.11 </code></pre> <ul> <li>Analyzing the types of requests made by the top few IPs during that time:</li> </ul> <pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | grep 54.70.40.11 | grep -o -E "(bitstream|discover|handle)" | sort | uniq -c 30 bitstream 534 discover 352 handle # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | grep 207.46.13.1 | grep -o -E "(bitstream|discover|handle)" | sort | uniq -c 194 bitstream 345 handle # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | grep 46.101.86.248 | grep -o -E "(bitstream|discover|handle)" | sort | uniq -c 261 handle </code></pre> <ul> <li>It’s not clear to me what was causing the outbound traffic spike</li> <li>Oh nice! The once-per-year cron job for rotating the Solr statistics actually worked now (for the first time ever!):</li> </ul> <pre><code>Moving: 81742 into core statistics-2010 Moving: 1837285 into core statistics-2011 Moving: 3764612 into core statistics-2012 Moving: 4557946 into core statistics-2013 Moving: 5483684 into core statistics-2014 Moving: 2941736 into core statistics-2015 Moving: 5926070 into core statistics-2016 Moving: 10562554 into core statistics-2017 Moving: 18497180 into core statistics-2018 </code></pre> <ul> <li>This could by why the outbound traffic rate was high, due to the S3 backup that run at 3:30AM…</li> <li>Run all system updates on DSpace Test (linode19) and reboot the server</li> </ul> <h2 id="2019-01-03">2019-01-03</h2> <ul> <li>Update local Docker image for DSpace PostgreSQL, re-using the existing data volume:</li> </ul> <pre><code>$ sudo docker pull postgres:9.6-alpine $ sudo docker rm dspacedb $ sudo docker run --name dspacedb -v /home/aorth/.local/lib/containers/volumes/dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine </code></pre> <ul> <li>Testing DSpace 5.9 with Tomcat 8.5.37 on my local machine and I see that Atmire’s Listings and Reports still doesn’t work <ul> <li>After logging in via XMLUI and clicking the Listings and Reports link from the sidebar it redirects me to a JSPUI login page</li> <li>If I log in again there the Listings and Reports work… hmm.</li> </ul></li> <li>The JSPUI application—which Listings and Reports depends upon—also does not load, though the error is perhaps unrelated:</li> </ul> <pre><code>2019-01-03 14:45:21,727 INFO org.dspace.browse.BrowseEngine @ anonymous:session_id=9471D72242DAA05BCC87734FE3C66EA6:ip_addr=127.0.0.1:browse_mini: 2019-01-03 14:45:21,971 INFO org.dspace.app.webui.discovery.DiscoverUtility @ facets for scope, null: 23 2019-01-03 14:45:22,115 WARN org.dspace.app.webui.servlet.InternalErrorServlet @ :session_id=9471D72242DAA05BCC87734FE3C66EA6:internal_error:-- URL Was: http://localhost:8080/jspui/internal-error -- Method: GET -- Parameters were: org.apache.jasper.JasperException: /home.jsp (line: [214], column: [1]) /discovery/static-tagcloud-facet.jsp (line: [57], column: [8]) No tag [tagcloud] defined in tag library imported with prefix [dspace] at org.apache.jasper.compiler.DefaultErrorHandler.jspError(DefaultErrorHandler.java:41) at org.apache.jasper.compiler.ErrorDispatcher.dispatch(ErrorDispatcher.java:291) at org.apache.jasper.compiler.ErrorDispatcher.jspError(ErrorDispatcher.java:97) at org.apache.jasper.compiler.Parser.processIncludeDirective(Parser.java:347) at org.apache.jasper.compiler.Parser.parseIncludeDirective(Parser.java:380) at org.apache.jasper.compiler.Parser.parseDirective(Parser.java:481) at org.apache.jasper.compiler.Parser.parseElements(Parser.java:1445) at org.apache.jasper.compiler.Parser.parseBody(Parser.java:1683) at org.apache.jasper.compiler.Parser.parseOptionalBody(Parser.java:1016) at org.apache.jasper.compiler.Parser.parseCustomTag(Parser.java:1291) at org.apache.jasper.compiler.Parser.parseElements(Parser.java:1470) at org.apache.jasper.compiler.Parser.parse(Parser.java:144) at org.apache.jasper.compiler.ParserController.doParse(ParserController.java:244) at org.apache.jasper.compiler.ParserController.parse(ParserController.java:105) at org.apache.jasper.compiler.Compiler.generateJava(Compiler.java:202) at org.apache.jasper.compiler.Compiler.compile(Compiler.java:373) at org.apache.jasper.compiler.Compiler.compile(Compiler.java:350) at org.apache.jasper.compiler.Compiler.compile(Compiler.java:334) at org.apache.jasper.JspCompilationContext.compile(JspCompilationContext.java:595) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:399) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:386) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:330) at javax.servlet.http.HttpServlet.service(HttpServlet.java:742) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:728) at org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:470) at org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatcher.java:395) at org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:316) at org.dspace.app.webui.util.JSPManager.showJSP(JSPManager.java:60) at org.apache.jsp.index_jsp._jspService(index_jsp.java:191) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:742) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:476) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:386) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:330) at javax.servlet.http.HttpServlet.service(HttpServlet.java:742) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) at org.dspace.utils.servlet.DSpaceWebappServletFilter.doFilter(DSpaceWebappServletFilter.java:78) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:198) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:493) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:81) at org.apache.catalina.valves.CrawlerSessionManagerValve.invoke(CrawlerSessionManagerValve.java:234) at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:650) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:342) at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:800) at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66) at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:806) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1498) at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:748) </code></pre> <ul> <li>I notice that I get different JSESSIONID cookies for <code>/</code> (XMLUI) and <code>/jspui</code> (JSPUI) on Tomcat 8.5.37, I wonder if it’s the same on Tomcat 7.0.92… yes I do.</li> <li>Hmm, on Tomcat 7.0.92 I see that I get a <code>dspace.current.user.id</code> session cookie after logging into XMLUI, and then when I browse to JSPUI I am still logged in… <ul> <li>I didn’t see that cookie being set on Tomcat 8.5.37</li> </ul></li> <li>I sent a message to the dspace-tech mailing list to ask</li> </ul> <h2 id="2019-01-04">2019-01-04</h2> <ul> <li>Linode sent a message last night that CGSpace (linode18) had high CPU usage, but I don’t see anything around that time in the web server logs:</li> </ul> <pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Jan/2019:1(7|8|9)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 189 207.46.13.192 217 31.6.77.23 340 66.249.70.29 349 40.77.167.86 417 34.218.226.147 630 207.46.13.173 710 35.237.175.180 790 40.77.167.87 1776 66.249.70.27 2099 54.70.40.11 </code></pre> <ul> <li>I’m thinking about trying to validate our <code>dc.subject</code> terms against <a href="http://aims.fao.org/agrovoc/webservices">AGROVOC webservices</a></li> <li>There seem to be a few APIs and the documentation is kinda confusing, but I found this REST endpoint that does work well, for example searching for <code>SOIL</code>:</li> </ul> <pre><code>$ http http://agrovoc.uniroma2.it/agrovoc/rest/v1/search?query=SOIL&lang=en HTTP/1.1 200 OK Access-Control-Allow-Origin: * Connection: Keep-Alive Content-Length: 493 Content-Type: application/json; charset=utf-8 Date: Fri, 04 Jan 2019 13:44:27 GMT Keep-Alive: timeout=5, max=100 Server: Apache Strict-Transport-Security: max-age=63072000; includeSubdomains Vary: Accept X-Content-Type-Options: nosniff X-Frame-Options: ALLOW-FROM http://aims.fao.org { "@context": { "@language": "en", "altLabel": "skos:altLabel", "hiddenLabel": "skos:hiddenLabel", "isothes": "http://purl.org/iso25964/skos-thes#", "onki": "http://schema.onki.fi/onki#", "prefLabel": "skos:prefLabel", "results": { "@container": "@list", "@id": "onki:results" }, "skos": "http://www.w3.org/2004/02/skos/core#", "type": "@type", "uri": "@id" }, "results": [ { "lang": "en", "prefLabel": "soil", "type": [ "skos:Concept" ], "uri": "http://aims.fao.org/aos/agrovoc/c_7156", "vocab": "agrovoc" } ], "uri": "" } </code></pre> <ul> <li>The API does not appear to be case sensitive (searches for <code>SOIL</code> and <code>soil</code> return the same thing)</li> <li>I’m a bit confused that there’s no obvious return code or status when a term is not found, for example <code>SOILS</code>:</li> </ul> <pre><code>HTTP/1.1 200 OK Access-Control-Allow-Origin: * Connection: Keep-Alive Content-Length: 367 Content-Type: application/json; charset=utf-8 Date: Fri, 04 Jan 2019 13:48:31 GMT Keep-Alive: timeout=5, max=100 Server: Apache Strict-Transport-Security: max-age=63072000; includeSubdomains Vary: Accept X-Content-Type-Options: nosniff X-Frame-Options: ALLOW-FROM http://aims.fao.org { "@context": { "@language": "en", "altLabel": "skos:altLabel", "hiddenLabel": "skos:hiddenLabel", "isothes": "http://purl.org/iso25964/skos-thes#", "onki": "http://schema.onki.fi/onki#", "prefLabel": "skos:prefLabel", "results": { "@container": "@list", "@id": "onki:results" }, "skos": "http://www.w3.org/2004/02/skos/core#", "type": "@type", "uri": "@id" }, "results": [], "uri": "" } </code></pre> <ul> <li>I guess the <code>results</code> object will just be empty…</li> <li>Another way would be to try with SPARQL, perhaps using the Python 2.7 <a href="https://pypi.org/project/sparql-client/">sparql-client</a>:</li> </ul> <pre><code>$ python2.7 -m virtualenv /tmp/sparql $ . /tmp/sparql/bin/activate $ pip install sparql-client ipython $ ipython In [10]: import sparql In [11]: s = sparql.Service("http://agrovoc.uniroma2.it:3030/agrovoc/sparql", "utf-8", "GET") In [12]: statement=('PREFIX skos: <http://www.w3.org/2004/02/skos/core#> ' ...: 'SELECT ' ...: '?label ' ...: 'WHERE { ' ...: '{ ?concept skos:altLabel ?label . } UNION { ?concept skos:prefLabel ?label . } ' ...: 'FILTER regex(str(?label), "^fish", "i") . ' ...: '} LIMIT 10') In [13]: result = s.query(statement) In [14]: for row in result.fetchone(): ...: print(row) ...: (<Literal "fish catching"@en>,) (<Literal "fish harvesting"@en>,) (<Literal "fish meat"@en>,) (<Literal "fish roe"@en>,) (<Literal "fish conversion"@en>,) (<Literal "fisheries catches (composition)"@en>,) (<Literal "fishtail palm"@en>,) (<Literal "fishflies"@en>,) (<Literal "fishery biology"@en>,) (<Literal "fish production"@en>,) </code></pre> <ul> <li>The SPARQL query comes from my notes in <a href="/cgspace-notes/2017-08/">2017-08</a></li> </ul> <h2 id="2019-01-06">2019-01-06</h2> <ul> <li>I built a clean DSpace 5.8 installation from the upstream <code>dspace-5.8</code> tag and the issue with the XMLUI/JSPUI login is still there with Tomcat 8.5.37 <ul> <li>If I log into XMLUI and then nagivate to JSPUI I need to log in again</li> <li>XMLUI does not set the <code>dspace.current.user.id</code> session cookie in Tomcat 8.5.37 for some reason</li> <li>I sent an update to the dspace-tech mailing list to ask for more help troubleshooting</li> </ul></li> </ul> <h2 id="2019-01-07">2019-01-07</h2> <ul> <li>I built a clean DSpace 6.3 installation from the upstream <code>dspace-6.3</code> tag and the issue with the XMLUI/JSPUI login is still there with Tomcat 8.5.37 <ul> <li>If I log into XMLUI and then nagivate to JSPUI I need to log in again</li> <li>XMLUI does not set the <code>dspace.current.user.id</code> session cookie in Tomcat 8.5.37 for some reason</li> <li>I sent an update to the dspace-tech mailing list to ask for more help troubleshooting</li> </ul></li> </ul> <h2 id="2019-01-08">2019-01-08</h2> <ul> <li>Tim Donohue responded to my thread about the cookies on the dspace-tech mailing list <ul> <li>He suspects it’s a change of behavior in Tomcat 8.5, and indeed I see a mention of new cookie processing in the <a href="https://tomcat.apache.org/migration-85.html#Cookies">Tomcat 8.5 migration guide</a></li> <li>I tried to switch my XMLUI and JSPUI contexts to use the <code>LegacyCookieProcessor</code>, but it didn’t seem to help</li> <li>I <a href="https://jira.duraspace.org/browse/DS-4140">filed DS-4140 on the DSpace issue tracker</a></li> </ul></li> </ul> <!-- vim: set sw=2 ts=2: --> </article> </div> <!-- /.blog-main --> <aside class="col-sm-3 ml-auto blog-sidebar"> <section class="sidebar-module"> <h4>Recent Posts</h4> <ol class="list-unstyled"> <li><a href="/cgspace-notes/2019-01/">January, 2019</a></li> <li><a href="/cgspace-notes/2018-12/">December, 2018</a></li> <li><a href="/cgspace-notes/2018-11/">November, 2018</a></li> <li><a href="/cgspace-notes/2018-10/">October, 2018</a></li> <li><a href="/cgspace-notes/2018-09/">September, 2018</a></li> </ol> </section> <section class="sidebar-module"> <h4>Links</h4> <ol class="list-unstyled"> <li><a href="https://cgspace.cgiar.org">CGSpace</a></li> <li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li> <li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li> </ol> </section> </aside> </div> <!-- /.row --> </div> <!-- /.container --> <footer class="blog-footer"> <p> Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>. </p> <p> <a href="#">Back to top</a> </p> </footer> </body> </html>