--- title: "January, 2019" date: 2019-01-02T09:48:30+02:00 author: "Alan Orth" tags: ["Notes"] --- ## 2019-01-02 - Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning - I don't see anything interesting in the web server logs around that time though: ``` # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 92 40.77.167.4 99 210.7.29.100 120 38.126.157.45 177 35.237.175.180 177 40.77.167.32 216 66.249.75.219 225 18.203.76.93 261 46.101.86.248 357 207.46.13.1 903 54.70.40.11 ``` - Analyzing the types of requests made by the top few IPs during that time: ``` # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | grep 54.70.40.11 | grep -o -E "(bitstream|discover|handle)" | sort | uniq -c 30 bitstream 534 discover 352 handle # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | grep 207.46.13.1 | grep -o -E "(bitstream|discover|handle)" | sort | uniq -c 194 bitstream 345 handle # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | grep 46.101.86.248 | grep -o -E "(bitstream|discover|handle)" | sort | uniq -c 261 handle ``` - It's not clear to me what was causing the outbound traffic spike - Oh nice! The once-per-year cron job for rotating the Solr statistics actually worked now (for the first time ever!): ``` Moving: 81742 into core statistics-2010 Moving: 1837285 into core statistics-2011 Moving: 3764612 into core statistics-2012 Moving: 4557946 into core statistics-2013 Moving: 5483684 into core statistics-2014 Moving: 2941736 into core statistics-2015 Moving: 5926070 into core statistics-2016 Moving: 10562554 into core statistics-2017 Moving: 18497180 into core statistics-2018 ``` - This could by why the outbound traffic rate was high, due to the S3 backup that run at 3:30AM... - Run all system updates on DSpace Test (linode19) and reboot the server ## 2019-01-03 - Update local Docker image for DSpace PostgreSQL, re-using the existing data volume: ``` $ sudo docker pull postgres:9.6-alpine $ sudo docker rm dspacedb $ sudo docker run --name dspacedb -v /home/aorth/.local/lib/containers/volumes/dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine ``` - Testing DSpace 5.9 with Tomcat 8.5.37 on my local machine and I see that Atmire's Listings and Reports still doesn't work - After logging in via XMLUI and clicking the Listings and Reports link from the sidebar it redirects me to a JSPUI login page - If I log in again there the Listings and Reports work... hmm. - The JSPUI application—which Listings and Reports depends upon—also does not load, though the error is perhaps unrelated: ``` 2019-01-03 14:45:21,727 INFO org.dspace.browse.BrowseEngine @ anonymous:session_id=9471D72242DAA05BCC87734FE3C66EA6:ip_addr=127.0.0.1:browse_mini: 2019-01-03 14:45:21,971 INFO org.dspace.app.webui.discovery.DiscoverUtility @ facets for scope, null: 23 2019-01-03 14:45:22,115 WARN org.dspace.app.webui.servlet.InternalErrorServlet @ :session_id=9471D72242DAA05BCC87734FE3C66EA6:internal_error:-- URL Was: http://localhost:8080/jspui/internal-error -- Method: GET -- Parameters were: org.apache.jasper.JasperException: /home.jsp (line: [214], column: [1]) /discovery/static-tagcloud-facet.jsp (line: [57], column: [8]) No tag [tagcloud] defined in tag library imported with prefix [dspace] at org.apache.jasper.compiler.DefaultErrorHandler.jspError(DefaultErrorHandler.java:41) at org.apache.jasper.compiler.ErrorDispatcher.dispatch(ErrorDispatcher.java:291) at org.apache.jasper.compiler.ErrorDispatcher.jspError(ErrorDispatcher.java:97) at org.apache.jasper.compiler.Parser.processIncludeDirective(Parser.java:347) at org.apache.jasper.compiler.Parser.parseIncludeDirective(Parser.java:380) at org.apache.jasper.compiler.Parser.parseDirective(Parser.java:481) at org.apache.jasper.compiler.Parser.parseElements(Parser.java:1445) at org.apache.jasper.compiler.Parser.parseBody(Parser.java:1683) at org.apache.jasper.compiler.Parser.parseOptionalBody(Parser.java:1016) at org.apache.jasper.compiler.Parser.parseCustomTag(Parser.java:1291) at org.apache.jasper.compiler.Parser.parseElements(Parser.java:1470) at org.apache.jasper.compiler.Parser.parse(Parser.java:144) at org.apache.jasper.compiler.ParserController.doParse(ParserController.java:244) at org.apache.jasper.compiler.ParserController.parse(ParserController.java:105) at org.apache.jasper.compiler.Compiler.generateJava(Compiler.java:202) at org.apache.jasper.compiler.Compiler.compile(Compiler.java:373) at org.apache.jasper.compiler.Compiler.compile(Compiler.java:350) at org.apache.jasper.compiler.Compiler.compile(Compiler.java:334) at org.apache.jasper.JspCompilationContext.compile(JspCompilationContext.java:595) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:399) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:386) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:330) at javax.servlet.http.HttpServlet.service(HttpServlet.java:742) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:728) at org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:470) at org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatcher.java:395) at org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:316) at org.dspace.app.webui.util.JSPManager.showJSP(JSPManager.java:60) at org.apache.jsp.index_jsp._jspService(index_jsp.java:191) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:742) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:476) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:386) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:330) at javax.servlet.http.HttpServlet.service(HttpServlet.java:742) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) at org.dspace.utils.servlet.DSpaceWebappServletFilter.doFilter(DSpaceWebappServletFilter.java:78) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:198) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:493) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:81) at org.apache.catalina.valves.CrawlerSessionManagerValve.invoke(CrawlerSessionManagerValve.java:234) at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:650) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:342) at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:800) at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66) at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:806) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1498) at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:748) ``` - I notice that I get different JSESSIONID cookies for `/` (XMLUI) and `/jspui` (JSPUI) on Tomcat 8.5.37, I wonder if it's the same on Tomcat 7.0.92... yes I do. - Hmm, on Tomcat 7.0.92 I see that I get a `dspace.current.user.id` session cookie after logging into XMLUI, and then when I browse to JSPUI I am still logged in... - I didn't see that cookie being set on Tomcat 8.5.37 - I sent a message to the dspace-tech mailing list to ask ## 2019-01-04 - Linode sent a message last night that CGSpace (linode18) had high CPU usage, but I don't see anything around that time in the web server logs: ``` # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Jan/2019:1(7|8|9)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 189 207.46.13.192 217 31.6.77.23 340 66.249.70.29 349 40.77.167.86 417 34.218.226.147 630 207.46.13.173 710 35.237.175.180 790 40.77.167.87 1776 66.249.70.27 2099 54.70.40.11 ``` - I'm thinking about trying to validate our `dc.subject` terms against [AGROVOC webservices](http://aims.fao.org/agrovoc/webservices) - There seem to be a few APIs and the documentation is kinda confusing, but I found this REST endpoint that does work well, for example searching for `SOIL`: ``` $ http http://agrovoc.uniroma2.it/agrovoc/rest/v1/search?query=SOIL&lang=en HTTP/1.1 200 OK Access-Control-Allow-Origin: * Connection: Keep-Alive Content-Length: 493 Content-Type: application/json; charset=utf-8 Date: Fri, 04 Jan 2019 13:44:27 GMT Keep-Alive: timeout=5, max=100 Server: Apache Strict-Transport-Security: max-age=63072000; includeSubdomains Vary: Accept X-Content-Type-Options: nosniff X-Frame-Options: ALLOW-FROM http://aims.fao.org { "@context": { "@language": "en", "altLabel": "skos:altLabel", "hiddenLabel": "skos:hiddenLabel", "isothes": "http://purl.org/iso25964/skos-thes#", "onki": "http://schema.onki.fi/onki#", "prefLabel": "skos:prefLabel", "results": { "@container": "@list", "@id": "onki:results" }, "skos": "http://www.w3.org/2004/02/skos/core#", "type": "@type", "uri": "@id" }, "results": [ { "lang": "en", "prefLabel": "soil", "type": [ "skos:Concept" ], "uri": "http://aims.fao.org/aos/agrovoc/c_7156", "vocab": "agrovoc" } ], "uri": "" } ``` - The API does not appear to be case sensitive (searches for `SOIL` and `soil` return the same thing) - I'm a bit confused that there's no obvious return code or status when a term is not found, for example `SOILS`: ``` HTTP/1.1 200 OK Access-Control-Allow-Origin: * Connection: Keep-Alive Content-Length: 367 Content-Type: application/json; charset=utf-8 Date: Fri, 04 Jan 2019 13:48:31 GMT Keep-Alive: timeout=5, max=100 Server: Apache Strict-Transport-Security: max-age=63072000; includeSubdomains Vary: Accept X-Content-Type-Options: nosniff X-Frame-Options: ALLOW-FROM http://aims.fao.org { "@context": { "@language": "en", "altLabel": "skos:altLabel", "hiddenLabel": "skos:hiddenLabel", "isothes": "http://purl.org/iso25964/skos-thes#", "onki": "http://schema.onki.fi/onki#", "prefLabel": "skos:prefLabel", "results": { "@container": "@list", "@id": "onki:results" }, "skos": "http://www.w3.org/2004/02/skos/core#", "type": "@type", "uri": "@id" }, "results": [], "uri": "" } ``` - I guess the `results` object will just be empty... - Another way would be to try with SPARQL, perhaps using the Python 2.7 [sparql-client](https://pypi.org/project/sparql-client/): ``` $ python2.7 -m virtualenv /tmp/sparql $ . /tmp/sparql/bin/activate $ pip install sparql-client ipython $ ipython In [10]: import sparql In [11]: s = sparql.Service("http://agrovoc.uniroma2.it:3030/agrovoc/sparql", "utf-8", "GET") In [12]: statement=('PREFIX skos: ' ...: 'SELECT ' ...: '?label ' ...: 'WHERE { ' ...: '{ ?concept skos:altLabel ?label . } UNION { ?concept skos:prefLabel ?label . } ' ...: 'FILTER regex(str(?label), "^fish", "i") . ' ...: '} LIMIT 10') In [13]: result = s.query(statement) In [14]: for row in result.fetchone(): ...: print(row) ...: (,) (,) (,) (,) (,) (,) (,) (,) (,) (,) ``` - The SPARQL query comes from my notes in [2017-08]({{< relref "2017-08.md" >}}) ## 2019-01-06 - I built a clean DSpace 5.8 installation from the upstream `dspace-5.8` tag and the issue with the XMLUI/JSPUI login is still there with Tomcat 8.5.37 - If I log into XMLUI and then nagivate to JSPUI I need to log in again - XMLUI does not set the `dspace.current.user.id` session cookie in Tomcat 8.5.37 for some reason - I sent an update to the dspace-tech mailing list to ask for more help troubleshooting ## 2019-01-07 - I built a clean DSpace 6.3 installation from the upstream `dspace-6.3` tag and the issue with the XMLUI/JSPUI login is still there with Tomcat 8.5.37 - If I log into XMLUI and then nagivate to JSPUI I need to log in again - XMLUI does not set the `dspace.current.user.id` session cookie in Tomcat 8.5.37 for some reason - I sent an update to the dspace-tech mailing list to ask for more help troubleshooting ## 2019-01-08 - Tim Donohue responded to my thread about the cookies on the dspace-tech mailing list - He suspects it's a change of behavior in Tomcat 8.5, and indeed I see a mention of new cookie processing in the [Tomcat 8.5 migration guide](https://tomcat.apache.org/migration-85.html#Cookies) - I tried to switch my XMLUI and JSPUI contexts to use the `LegacyCookieProcessor`, but it didn't seem to help - I [filed DS-4140 on the DSpace issue tracker](https://jira.duraspace.org/browse/DS-4140) ## 2019-01-11 - Tezira wrote to say she has stopped receiving the `DSpace Submission Approved and Archived` emails from CGSpace as of January 2nd - I told her that I haven't done anything to disable it lately, but that I would check - Bizu also says she hasn't received them lately ## 2019-01-14 - Day one of CGSpace AReS meeting in Amman ## 2019-01-15 - Day two of CGSpace AReS meeting in Amman - Discuss possibly extending the [dspace-statistics-api](https://github.com/ilri/dspace-statistics-api) to make community and collection statistics available - Discuss new "final" CG Core document and some changes that we'll need to do on CGSpace and other repositories - We agreed to try to stick to pure Dublin Core where possible, then use fields that exist in standard DSpace, and use "cg" namespace for everything else - Major changes are to move `dc.contributor.author` to `dc.creator` (which MELSpace and WorldFish are already using in their DSpace repositories) - I am testing the speed of the WorldFish DSpace repository's REST API and it's five to ten times faster than CGSpace as I tested in [2018-10]({{< relref "2018-10.md" >}}): ``` $ time http --print h 'https://digitalarchive.worldfishcenter.org/rest/items?expand=metadata,bitstreams,parentCommunityList&limit=100&offset=0' 0.16s user 0.03s system 3% cpu 5.185 total 0.17s user 0.02s system 2% cpu 7.123 total 0.18s user 0.02s system 6% cpu 3.047 total ``` - In other news, Linode sent a mail last night that the CPU load on CGSpace (linode18) was high, here are the top IPs in the logs around those few hours: ``` # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "14/Jan/2019:(17|18|19|20)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 157 31.6.77.23 192 54.70.40.11 202 66.249.64.157 207 40.77.167.204 220 157.55.39.140 326 197.156.105.116 385 207.46.13.158 1211 35.237.175.180 1830 66.249.64.155 2482 45.5.186.2 ``` ## 2019-01-16 - Day three of CGSpace AReS meeting in Amman - We discussed CG Core 2.0 metadata and decided some action points - We discussed branding of AReS tool - Notes from our CG Core 2.0 metadata discussion: - Not Dublin Core: - dc.subtype - dc.peer-reviewed - Dublin Core, possible action for CGSpace: - dc.description: - We use dc.description.abstract, dc.description (Notes), dc.description.version (Peer review status), dc.description.sponsorship (Funder) - Maybe move abstract to dc.description - Maybe notes moves to cg.description.notes??? - Maybe move dc.description.version to cg.peer-reviewed or cg.peer-review-status??? - Move dc.description.sponsorship to cg.contributor.donor??? - dc.subject: - Wait for guidance, evaluate technical implications (Google indexing, OAI, etc) - Move dc.contributor.author to dc.creator - dc.contributor Project - Recommend against creating new fields for all projects - We use collections projects/themes/etc - dc.contributor Project Lead Center - MELSpace uses cg.contributor.project-lead-institute (institute is more generic than center) - Maybe we use? - dc.contributor Partner - Wait for guidance - MELSpace uses cg.contibutor.center (?) - dc.contributor Donor - Use cg.contributor.donor - dc.date - Wait for guidance, maybe move dc.date.issued? - dc.date.accessioned and dc.date.available are automatic in DSpace - dc.language - Move dc.language.iso to dc.language - dc.identifier - Move cg.identifier.url to dc.identifier - dc.identifier bibliographicCitation - dc.identifier.citation should move to dc.bibliographicCitation - dc.description.notes - Wait for guidance, maybe move to cg.description.notes ??? - dc.relation - Maybe move cg.link.reference - Perhaps consolodate cg.link.audio etc there...? - dc.relation.isPartOf - Move dc.relation.ispartofseries to dc.relation.isPartOf - dc.audience - Move cg.targetaudience to dc.audience - Something happened to the Solr usage statistics on CGSpace - I looked on the server and the Solr cores are there (56GB!), and I don't see any obvious errors in dmesg or anything - I see that the server hasn't been rebooted in 26 days so I rebooted it - After reboot the Solr stats are still messed up in the Atmire Usage Stats module, it only shows 2019-01! ![Solr stats fucked up](/cgspace-notes/2019/01/solr-stats-incorrect.png) - In the Solr admin UI I see the following error: ``` statistics-2018: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error opening new searcher ``` - Looking in the Solr log I see this: ``` 2019-01-16 13:37:55,395 ERROR org.apache.solr.core.CoreContainer @ Error creating core [statistics-2018]: Error opening new searcher org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.(SolrCore.java:873) at org.apache.solr.core.SolrCore.(SolrCore.java:646) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:491) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:466) at org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:575) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:199) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:188) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:258) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.dspace.solr.filters.LocalHostRestrictionFilter.doFilter(LocalHostRestrictionFilter.java:50) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:221) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:505) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:169) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) at org.apache.catalina.valves.CrawlerSessionManagerValve.invoke(CrawlerSessionManagerValve.java:180) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:956) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:436) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1078) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:625) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:316) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1565) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1677) at org.apache.solr.core.SolrCore.(SolrCore.java:845) ... 31 more Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/home/cgspace.cgiar.org/solr/statistics-2018/data/index/write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:89) at org.apache.lucene.index.IndexWriter.(IndexWriter.java:753) at org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:77) at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64) at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:279) at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:111) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1528) ... 33 more 2019-01-16 13:37:55,401 ERROR org.apache.solr.core.SolrCore @ org.apache.solr.common.SolrException: Error CREATEing SolrCore 'statistics-2018': Unable to create core [statistics-2018] Caused by: Lock obtain timed out: NativeFSLock@/home/cgspace.cgiar.org/solr/statistics-2018/data/index/write.lock at org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:613) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:199) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:188) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:258) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.dspace.solr.filters.LocalHostRestrictionFilter.doFilter(LocalHostRestrictionFilter.java:50) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:221) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:505) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:169) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) at org.apache.catalina.valves.CrawlerSessionManagerValve.invoke(CrawlerSessionManagerValve.java:180) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:956) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:436) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1078) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:625) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:316) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.solr.common.SolrException: Unable to create core [statistics-2018] at org.apache.solr.core.CoreContainer.create(CoreContainer.java:507) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:466) at org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:575) ... 27 more Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.(SolrCore.java:873) at org.apache.solr.core.SolrCore.(SolrCore.java:646) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:491) ... 29 more Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1565) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1677) at org.apache.solr.core.SolrCore.(SolrCore.java:845) ... 31 more Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/home/cgspace.cgiar.org/solr/statistics-2018/data/index/write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:89) at org.apache.lucene.index.IndexWriter.(IndexWriter.java:753) at org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:77) at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64) at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:279) at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:111) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1528) ... 33 more ``` - I found some threads on StackOverflow etc discussing this and several suggested increasing the address space for the shell with ulimit - I added `ulimit -v unlimited` to the `/etc/default/tomcat7` and restarted Tomcat and now Solr is working again: ![Solr stats working](/cgspace-notes/2019/01/solr-stats-incorrect.png) - Some StackOverflow discussions related to this: - https://stackoverflow.com/questions/2895417/solrexception-internal-server-error/3035916#3035916 - https://stackoverflow.com/questions/11683850/how-much-memory-could-vm-use - https://stackoverflow.com/questions/8892143/error-when-opening-a-lucene-index-map-failed/8893684#8893684 - Abenet was asking if the Atmire Usage Stats are correct because they are over 2 million the last few months... - For 2019-01 alone the Usage Stats are already around 1.2 million - I tried to look in the nginx logs to see how many raw requests there are so far this month and it's about 1.4 million: ``` # time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Jan/2019" 1442874 real 0m17.161s user 0m16.205s sys 0m2.396s ``` ## 2019-01-17 - Send reminder to Atmire about purchasing the [MQM module](https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=657) - Trying to decide the solid action points for CGSpace on the CG Core 2.0 metadata... - It's difficult to decide some of these because the current CG Core 2.0 document does not provide guidance or rationale (yet)! - Also, there is not a good Dublin Core reference (or maybe I just don't understand?) - Several authoritative documents on Dublin Core appear to be: - [Dublin Core Metadata Element Set, Version 1.1: Reference Description](http://dublincore.org/documents/dces/) - [DCMI Metadata Terms](http://www.dublincore.org/documents/dcmi-terms/) - And what is the relationship between DC and DCTERMS? - DSpace uses DCTERMS in the metadata it embeds in XMLUI item views! - We really need to look at this more carefully and see the impacts that might be made from switching core fields like languages, abstract, authors, etc - We can check WorldFish and MELSpace repositories to see what effects these changes have had on theirs because they have already adopted some of these changes... - I think I understand the difference between DC and DCTERMS finally: DC is the original set of fifteen elements and DCTERMS is the newer version that was supposed to address much of the drawbacks of the original with regards to digital content - We might be able to use some proper fields for citation, abstract, etc that are part of DCTERMS - To make matters more confusing, there is also "qualified Dublin Core" that uses the original fifteen elements of legacy DC and qualifies them, like `dc.date.accessioned` - According to Wikipedia [Qualified Dublin Core was superseded by DCTERMS in 2008](https://en.wikipedia.org/wiki/Dublin_Core)! - So we should be trying to use DCTERMS where possible, unless it is some internal thing that might mess up DSpace (like dates) - "Elements 1.1" means legacy DC - Possible action list for CGSpace: - dc.description.abstract → dcterms.abstract - dc.description.version → cg.peer-reviewed (or cg.peer-review-status?) - dc.description.sponsorship → cg.contributor.donor - dc.contributor.author → dc.creator - dc.language.iso → dcterms.language - cg.identifier.url → dcterms.identifier - dc.identifier.citation → dcterms.bibliographicCitation - dc.relation.ispartofseries → dcterms.isPartOf - cg.targetaudience → dcterms.audience ## 2019-01-19 - There's no official set of Dublin Core qualifiers so I can't tell if things like `dc.contributor.author` that are used by DSpace are official - I found a great [presentation from 2015 by the Digital Repository of Ireland](https://www.dri.ie/sites/default/files/files/qualified-dublin-core-metadata-guidelines.pdf) that discusses using MARC Relator Terms with Dublin Core elements - It seems that `dc.contributor.author` would be a supported term according to this [Library of Congress list](https://memory.loc.gov/diglib/loc.terms/relators/dc-contributor.html) linked from the [Dublin Core website](http://dublincore.org/usage/documents/relators/) - The Library of Congress document specifically says: These terms conform with the DCMI Abstract Model and may be used in DCMI application profiles. DCMI endorses their use with Dublin Core elements as indicated. ## 2019-01-20 - That's weird, I logged into DSpace Test (linode19) and it says it has been up for 213 days: ``` # w 04:46:14 up 213 days, 7:25, 4 users, load average: 1.94, 1.50, 1.35 ``` - I've definitely rebooted it several times in the past few months... according to `journalctl -b` it was a few weeks ago on 2019-01-02 - I re-ran the Ansible DSpace tag, ran all system updates, and rebooted the host - After rebooting I notice that the Linode kernel went down from 4.19.8 to 4.18.16... - Atmire sent a quote on our [ticket about purchasing the Metadata Quality Module (MQM) for DSpace 5.8](https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=657) - Abenet asked me for an [OpenSearch query that could generate and RSS feed for items in the Livestock CRP](https://cgspace.cgiar.org/open-search/discover?query=crpsubject:Livestock&sort_by=3&order=DESC) - According to my notes, `sort_by=3` is accession date (as configured in `dspace.cfg) - The query currently shows 3023 items, but a [Discovery search for Livestock CRP only returns 858 items](https://cgspace.cgiar.org/discover?filtertype_1=crpsubject&filter_relational_operator_1=equals&filter_1=Livestock&submit_apply_filter=&query=) - That query seems to return items tagged with `Livestock and Fish` CRP as well... hmm. ## 2019-01-21 - Investigating running Tomcat 7 on Ubuntu 18.04 with the tarball and a custom systemd package instead of waiting for our DSpace to get compatible with Ubuntu 18.04's Tomcat 8.5 - I could either run with a simple `tomcat7.service` like this: ``` [Unit] Description=Apache Tomcat 7 Web Application Container After=network.target [Service] Type=forking ExecStart=/path/to/apache-tomcat-7.0.92/bin/startup.sh ExecStop=/path/to/apache-tomcat-7.0.92/bin/shutdown.sh User=aorth Group=aorth [Install] WantedBy=multi-user.target ``` - Or try to use adapt a real systemd service like Arch Linux's: ``` [Unit] Description=Tomcat 7 servlet container After=network.target [Service] Type=forking PIDFile=/var/run/tomcat7.pid Environment=CATALINA_PID=/var/run/tomcat7.pid Environment=TOMCAT_JAVA_HOME=/usr/lib/jvm/default-runtime Environment=CATALINA_HOME=/usr/share/tomcat7 Environment=CATALINA_BASE=/usr/share/tomcat7 Environment=CATALINA_OPTS= Environment=ERRFILE=SYSLOG Environment=OUTFILE=SYSLOG ExecStart=/usr/bin/jsvc \ -Dcatalina.home=${CATALINA_HOME} \ -Dcatalina.base=${CATALINA_BASE} \ -Djava.io.tmpdir=/var/tmp/tomcat7/temp \ -cp /usr/share/java/commons-daemon.jar:/usr/share/java/eclipse-ecj.jar:${CATALINA_HOME}/bin/bootstrap.jar:${CATALINA_HOME}/bin/tomcat-juli.jar \ -user tomcat7 \ -java-home ${TOMCAT_JAVA_HOME} \ -pidfile /var/run/tomcat7.pid \ -errfile ${ERRFILE} \ -outfile ${OUTFILE} \ $CATALINA_OPTS \ org.apache.catalina.startup.Bootstrap ExecStop=/usr/bin/jsvc \ -pidfile /var/run/tomcat7.pid \ -stop \ org.apache.catalina.startup.Bootstrap [Install] WantedBy=multi-user.target ``` - I see that `jsvc` and `libcommons-daemon-java` are both available on Ubuntu so that should be easy to port - We probably don't need Eclipse Java Bytecode Compiler (ecj) - I tested Tomcat 7.0.92 on Arch Linux using the `tomcat7.service` with `jsvc` and it works... nice! - I think I might manage this the same way I do the restic releases in the [Ansible infrastructure scripts](https://github.com/ilri/rmg-ansible-public), where I download a specific version and symlink to some generic location without the version number - I verified that there is indeed an issue with sharded Solr statistics cores on DSpace, which will cause inaccurate results in the dspace-statistics-api: ``` $ http 'http://localhost:3000/solr/statistics/select?indent=on&rows=0&q=type:2+id:11576&fq=isBot:false&fq=statistics_type:view' | grep numFound $ http 'http://localhost:3000/solr/statistics-2018/select?indent=on&rows=0&q=type:2+id:11576&fq=isBot:false&fq=statistics_type:view' | grep numFound ``` - I opened an issue on the GitHub issue tracker ([#10](https://github.com/ilri/dspace-statistics-api/issues/10)) - I don't think the [SolrClient library](https://solrclient.readthedocs.io/en/latest/) we are currently using supports these type of queries so we might have to just do raw queries with requests - The [pysolr](https://github.com/django-haystack/pysolr) library says it supports multicore indexes, but I am not sure it does (or at least not with our setup): ``` import pysolr solr = pysolr.Solr('http://localhost:3000/solr/statistics') results = solr.search('type:2', **{'fq': 'isBot:false AND statistics_type:view', 'facet': 'true', 'facet.field': 'id', 'facet.mincount': 1, 'facet.limit': 10, 'facet.offset': 0, 'rows': 0}) print(results.facets['facet_fields']) {'id': ['77572', 646, '93185', 380, '92932', 375, '102499', 372, '101430', 337, '77632', 331, '102449', 289, '102485', 276, '100849', 270, '47080', 260]} ``` - If I double check one item from above, for example `77572`, it appears this is only working on the current statistics core and not the shards: ``` import pysolr solr = pysolr.Solr('http://localhost:3000/solr/statistics') results = solr.search('type:2 id:77572', **{'fq': 'isBot:false AND statistics_type:view'}) print(results.hits) 646 solr = pysolr.Solr('http://localhost:3000/solr/statistics-2018/') results = solr.search('type:2 id:77572', **{'fq': 'isBot:false AND statistics_type:view'}) print(results.hits) 595 ``` - So I guess I need to figure out how to use join queries and maybe even switch to using raw Python requests with JSON