January, 2019
2019-01-02
- Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning
- I don’t see anything interesting in the web server logs around that time though:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
92 40.77.167.4
99 210.7.29.100
120 38.126.157.45
177 35.237.175.180
177 40.77.167.32
216 66.249.75.219
225 18.203.76.93
261 46.101.86.248
357 207.46.13.1
903 54.70.40.11
- Analyzing the types of requests made by the top few IPs during that time:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | grep 54.70.40.11 | grep -o -E "(bitstream|discover|handle)" | sort | uniq -c
30 bitstream
534 discover
352 handle
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | grep 207.46.13.1 | grep -o -E "(bitstream|discover|handle)" | sort | uniq -c
194 bitstream
345 handle
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | grep 46.101.86.248 | grep -o -E "(bitstream|discover|handle)" | sort | uniq -c
261 handle
- It’s not clear to me what was causing the outbound traffic spike
- Oh nice! The once-per-year cron job for rotating the Solr statistics actually worked now (for the first time ever!):
Moving: 81742 into core statistics-2010
Moving: 1837285 into core statistics-2011
Moving: 3764612 into core statistics-2012
Moving: 4557946 into core statistics-2013
Moving: 5483684 into core statistics-2014
Moving: 2941736 into core statistics-2015
Moving: 5926070 into core statistics-2016
Moving: 10562554 into core statistics-2017
Moving: 18497180 into core statistics-2018
- This could by why the outbound traffic rate was high, due to the S3 backup that run at 3:30AM…
- Run all system updates on DSpace Test (linode19) and reboot the server
2019-01-03
- Update local Docker image for DSpace PostgreSQL, re-using the existing data volume:
$ sudo docker pull postgres:9.6-alpine
$ sudo docker rm dspacedb
$ sudo docker run --name dspacedb -v /home/aorth/.local/lib/containers/volumes/dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine
- Testing DSpace 5.9 with Tomcat 8.5.37 on my local machine and I see that Atmire’s Listings and Reports still doesn’t work
- After logging in via XMLUI and clicking the Listings and Reports link from the sidebar it redirects me to a JSPUI login page
- If I log in again there the Listings and Reports work… hmm.
- The JSPUI application—which Listings and Reports depends upon—also does not load, though the error is perhaps unrelated:
2019-01-03 14:45:21,727 INFO org.dspace.browse.BrowseEngine @ anonymous:session_id=9471D72242DAA05BCC87734FE3C66EA6:ip_addr=127.0.0.1:browse_mini:
2019-01-03 14:45:21,971 INFO org.dspace.app.webui.discovery.DiscoverUtility @ facets for scope, null: 23
2019-01-03 14:45:22,115 WARN org.dspace.app.webui.servlet.InternalErrorServlet @ :session_id=9471D72242DAA05BCC87734FE3C66EA6:internal_error:-- URL Was: http://localhost:8080/jspui/internal-error
-- Method: GET
-- Parameters were:
org.apache.jasper.JasperException: /home.jsp (line: [214], column: [1]) /discovery/static-tagcloud-facet.jsp (line: [57], column: [8]) No tag [tagcloud] defined in tag library imported with prefix [dspace]
at org.apache.jasper.compiler.DefaultErrorHandler.jspError(DefaultErrorHandler.java:41)
at org.apache.jasper.compiler.ErrorDispatcher.dispatch(ErrorDispatcher.java:291)
at org.apache.jasper.compiler.ErrorDispatcher.jspError(ErrorDispatcher.java:97)
at org.apache.jasper.compiler.Parser.processIncludeDirective(Parser.java:347)
at org.apache.jasper.compiler.Parser.parseIncludeDirective(Parser.java:380)
at org.apache.jasper.compiler.Parser.parseDirective(Parser.java:481)
at org.apache.jasper.compiler.Parser.parseElements(Parser.java:1445)
at org.apache.jasper.compiler.Parser.parseBody(Parser.java:1683)
at org.apache.jasper.compiler.Parser.parseOptionalBody(Parser.java:1016)
at org.apache.jasper.compiler.Parser.parseCustomTag(Parser.java:1291)
at org.apache.jasper.compiler.Parser.parseElements(Parser.java:1470)
at org.apache.jasper.compiler.Parser.parse(Parser.java:144)
at org.apache.jasper.compiler.ParserController.doParse(ParserController.java:244)
at org.apache.jasper.compiler.ParserController.parse(ParserController.java:105)
at org.apache.jasper.compiler.Compiler.generateJava(Compiler.java:202)
at org.apache.jasper.compiler.Compiler.compile(Compiler.java:373)
at org.apache.jasper.compiler.Compiler.compile(Compiler.java:350)
at org.apache.jasper.compiler.Compiler.compile(Compiler.java:334)
at org.apache.jasper.JspCompilationContext.compile(JspCompilationContext.java:595)
at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:399)
at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:386)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:330)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:728)
at org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:470)
at org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatcher.java:395)
at org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:316)
at org.dspace.app.webui.util.JSPManager.showJSP(JSPManager.java:60)
at org.apache.jsp.index_jsp._jspService(index_jsp.java:191)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:476)
at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:386)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:330)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.dspace.utils.servlet.DSpaceWebappServletFilter.doFilter(DSpaceWebappServletFilter.java:78)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:198)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:493)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:81)
at org.apache.catalina.valves.CrawlerSessionManagerValve.invoke(CrawlerSessionManagerValve.java:234)
at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:650)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:342)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:800)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:806)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1498)
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
- I notice that I get different JSESSIONID cookies for
/
(XMLUI) and/jspui
(JSPUI) on Tomcat 8.5.37, I wonder if it’s the same on Tomcat 7.0.92… yes I do. - Hmm, on Tomcat 7.0.92 I see that I get a
dspace.current.user.id
session cookie after logging into XMLUI, and then when I browse to JSPUI I am still logged in…- I didn’t see that cookie being set on Tomcat 8.5.37
- I sent a message to the dspace-tech mailing list to ask
2019-01-04
- Linode sent a message last night that CGSpace (linode18) had high CPU usage, but I don’t see anything around that time in the web server logs:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Jan/2019:1(7|8|9)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
189 207.46.13.192
217 31.6.77.23
340 66.249.70.29
349 40.77.167.86
417 34.218.226.147
630 207.46.13.173
710 35.237.175.180
790 40.77.167.87
1776 66.249.70.27
2099 54.70.40.11
- I’m thinking about trying to validate our
dc.subject
terms against AGROVOC webservices - There seem to be a few APIs and the documentation is kinda confusing, but I found this REST endpoint that does work well, for example searching for
SOIL
:
$ http http://agrovoc.uniroma2.it/agrovoc/rest/v1/search?query=SOIL&lang=en
HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Connection: Keep-Alive
Content-Length: 493
Content-Type: application/json; charset=utf-8
Date: Fri, 04 Jan 2019 13:44:27 GMT
Keep-Alive: timeout=5, max=100
Server: Apache
Strict-Transport-Security: max-age=63072000; includeSubdomains
Vary: Accept
X-Content-Type-Options: nosniff
X-Frame-Options: ALLOW-FROM http://aims.fao.org
{
"@context": {
"@language": "en",
"altLabel": "skos:altLabel",
"hiddenLabel": "skos:hiddenLabel",
"isothes": "http://purl.org/iso25964/skos-thes#",
"onki": "http://schema.onki.fi/onki#",
"prefLabel": "skos:prefLabel",
"results": {
"@container": "@list",
"@id": "onki:results"
},
"skos": "http://www.w3.org/2004/02/skos/core#",
"type": "@type",
"uri": "@id"
},
"results": [
{
"lang": "en",
"prefLabel": "soil",
"type": [
"skos:Concept"
],
"uri": "http://aims.fao.org/aos/agrovoc/c_7156",
"vocab": "agrovoc"
}
],
"uri": ""
}
- The API does not appear to be case sensitive (searches for
SOIL
andsoil
return the same thing) - I’m a bit confused that there’s no obvious return code or status when a term is not found, for example
SOILS
:
HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Connection: Keep-Alive
Content-Length: 367
Content-Type: application/json; charset=utf-8
Date: Fri, 04 Jan 2019 13:48:31 GMT
Keep-Alive: timeout=5, max=100
Server: Apache
Strict-Transport-Security: max-age=63072000; includeSubdomains
Vary: Accept
X-Content-Type-Options: nosniff
X-Frame-Options: ALLOW-FROM http://aims.fao.org
{
"@context": {
"@language": "en",
"altLabel": "skos:altLabel",
"hiddenLabel": "skos:hiddenLabel",
"isothes": "http://purl.org/iso25964/skos-thes#",
"onki": "http://schema.onki.fi/onki#",
"prefLabel": "skos:prefLabel",
"results": {
"@container": "@list",
"@id": "onki:results"
},
"skos": "http://www.w3.org/2004/02/skos/core#",
"type": "@type",
"uri": "@id"
},
"results": [],
"uri": ""
}
- I guess the
results
object will just be empty… - Another way would be to try with SPARQL, perhaps using the Python 2.7 sparql-client:
$ python2.7 -m virtualenv /tmp/sparql
$ . /tmp/sparql/bin/activate
$ pip install sparql-client ipython
$ ipython
In [10]: import sparql
In [11]: s = sparql.Service("http://agrovoc.uniroma2.it:3030/agrovoc/sparql", "utf-8", "GET")
In [12]: statement=('PREFIX skos: <http://www.w3.org/2004/02/skos/core#> '
...: 'SELECT '
...: '?label '
...: 'WHERE { '
...: '{ ?concept skos:altLabel ?label . } UNION { ?concept skos:prefLabel ?label . } '
...: 'FILTER regex(str(?label), "^fish", "i") . '
...: '} LIMIT 10')
In [13]: result = s.query(statement)
In [14]: for row in result.fetchone():
...: print(row)
...:
(<Literal "fish catching"@en>,)
(<Literal "fish harvesting"@en>,)
(<Literal "fish meat"@en>,)
(<Literal "fish roe"@en>,)
(<Literal "fish conversion"@en>,)
(<Literal "fisheries catches (composition)"@en>,)
(<Literal "fishtail palm"@en>,)
(<Literal "fishflies"@en>,)
(<Literal "fishery biology"@en>,)
(<Literal "fish production"@en>,)
- The SPARQL query comes from my notes in 2017-08
2019-01-06
- I built a clean DSpace 5.8 installation from the upstream
dspace-5.8
tag and the issue with the XMLUI/JSPUI login is still there with Tomcat 8.5.37- If I log into XMLUI and then nagivate to JSPUI I need to log in again
- XMLUI does not set the
dspace.current.user.id
session cookie in Tomcat 8.5.37 for some reason - I sent an update to the dspace-tech mailing list to ask for more help troubleshooting
2019-01-07
- I built a clean DSpace 6.3 installation from the upstream
dspace-6.3
tag and the issue with the XMLUI/JSPUI login is still there with Tomcat 8.5.37- If I log into XMLUI and then nagivate to JSPUI I need to log in again
- XMLUI does not set the
dspace.current.user.id
session cookie in Tomcat 8.5.37 for some reason - I sent an update to the dspace-tech mailing list to ask for more help troubleshooting
2019-01-08
- Tim Donohue responded to my thread about the cookies on the dspace-tech mailing list
- He suspects it’s a change of behavior in Tomcat 8.5, and indeed I see a mention of new cookie processing in the Tomcat 8.5 migration guide
- I tried to switch my XMLUI and JSPUI contexts to use the
LegacyCookieProcessor
, but it didn’t seem to help - I filed DS-4140 on the DSpace issue tracker
2019-01-11
- Tezira wrote to say she has stopped receiving the
DSpace Submission Approved and Archived
emails from CGSpace as of January 2nd- I told her that I haven’t done anything to disable it lately, but that I would check
- Bizu also says she hasn’t received them lately
2019-01-14
- Day one of CGSpace AReS meeting in Amman
2019-01-15
- Day two of CGSpace AReS meeting in Amman
- Discuss possibly extending the dspace-statistics-api to make community and collection statistics available
- Discuss new “final” CG Core document and some changes that we’ll need to do on CGSpace and other repositories
- We agreed to try to stick to pure Dublin Core where possible, then use fields that exist in standard DSpace, and use “cg” namespace for everything else
- Major changes are to move
dc.contributor.author
todc.creator
(which MELSpace and WorldFish are already using in their DSpace repositories)
- I am testing the speed of the WorldFish DSpace repository’s REST API and it’s five to ten times faster than CGSpace as I tested in 2018-10:
$ time http --print h 'https://digitalarchive.worldfishcenter.org/rest/items?expand=metadata,bitstreams,parentCommunityList&limit=100&offset=0'
0.16s user 0.03s system 3% cpu 5.185 total
0.17s user 0.02s system 2% cpu 7.123 total
0.18s user 0.02s system 6% cpu 3.047 total
- In other news, Linode sent a mail last night that the CPU load on CGSpace (linode18) was high, here are the top IPs in the logs around those few hours:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "14/Jan/2019:(17|18|19|20)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
157 31.6.77.23
192 54.70.40.11
202 66.249.64.157
207 40.77.167.204
220 157.55.39.140
326 197.156.105.116
385 207.46.13.158
1211 35.237.175.180
1830 66.249.64.155
2482 45.5.186.2
2019-01-16
- Day three of CGSpace AReS meeting in Amman
- We discussed CG Core 2.0 metadata and decided some action points
- We discussed branding of AReS tool
- Notes from our CG Core 2.0 metadata discussion:
- Not Dublin Core:
- dc.subtype
- dc.peer-reviewed
- Dublin Core, possible action for CGSpace:
- dc.description:
- We use dc.description.abstract, dc.description (Notes), dc.description.version (Peer review status), dc.description.sponsorship (Funder)
- Maybe move abstract to dc.description
- Maybe notes moves to cg.description.notes???
- Maybe move dc.description.version to cg.peer-reviewed or cg.peer-review-status???
- Move dc.description.sponsorship to cg.contributor.donor???
- dc.subject:
- Wait for guidance, evaluate technical implications (Google indexing, OAI, etc)
- Move dc.contributor.author to dc.creator
- dc.contributor Project
- Recommend against creating new fields for all projects
- We use collections projects/themes/etc
- dc.contributor Project Lead Center
- MELSpace uses cg.contributor.project-lead-institute (institute is more generic than center)
- Maybe we use?
- dc.contributor Partner
- Wait for guidance
- MELSpace uses cg.contibutor.center (?)
- dc.contributor Donor
- Use cg.contributor.donor
- dc.date
- Wait for guidance, maybe move dc.date.issued?
- dc.date.accessioned and dc.date.available are automatic in DSpace
- dc.language
- Move dc.language.iso to dc.language
- dc.identifier
- Move cg.identifier.url to dc.identifier
- dc.identifier bibliographicCitation
- dc.identifier.citation should move to dc.bibliographicCitation
- dc.description.notes
- Wait for guidance, maybe move to cg.description.notes ???
- dc.relation
- Maybe move cg.link.reference
- Perhaps consolodate cg.link.audio etc there…?
- dc.relation.isPartOf
- Move dc.relation.ispartofseries to dc.relation.isPartOf
- dc.audience
- Move cg.targetaudience to dc.audience
- dc.description:
- Not Dublin Core:
- Something happened to the Solr usage statistics on CGSpace
- I looked on the server and the Solr cores are there (56GB!), and I don’t see any obvious errors in dmesg or anything
- I see that the server hasn’t been rebooted in 26 days so I rebooted it
- After reboot the Solr stats are still messed up in the Atmire Usage Stats module, it only shows 2019-01!
- In the Solr admin UI I see the following error:
statistics-2018: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error opening new searcher
- Looking in the Solr log I see this:
2019-01-16 13:37:55,395 ERROR org.apache.solr.core.CoreContainer @ Error creating core [statistics-2018]: Error opening new searcher
org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:873)
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:646)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:491)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:466)
at org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:575)
at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:199)
at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:188)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:258)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.dspace.solr.filters.LocalHostRestrictionFilter.doFilter(LocalHostRestrictionFilter.java:50)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:221)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:505)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:169)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at org.apache.catalina.valves.CrawlerSessionManagerValve.invoke(CrawlerSessionManagerValve.java:180)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:956)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:436)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1078)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:625)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:316)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1565)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1677)
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:845)
... 31 more
Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/home/cgspace.cgiar.org/solr/statistics-2018/data/index/write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:89)
at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:753)
at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:77)
at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64)
at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:279)
at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:111)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1528)
... 33 more
2019-01-16 13:37:55,401 ERROR org.apache.solr.core.SolrCore @ org.apache.solr.common.SolrException: Error CREATEing SolrCore 'statistics-2018': Unable to create core [statistics-2018] Caused by: Lock obtain timed out: NativeFSLock@/home/cgspace.cgiar.org/solr/statistics-2018/data/index/write.lock
at org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:613)
at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:199)
at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:188)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:258)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.dspace.solr.filters.LocalHostRestrictionFilter.doFilter(LocalHostRestrictionFilter.java:50)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:221)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:505)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:169)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at org.apache.catalina.valves.CrawlerSessionManagerValve.invoke(CrawlerSessionManagerValve.java:180)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:956)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:436)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1078)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:625)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:316)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.solr.common.SolrException: Unable to create core [statistics-2018]
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:507)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:466)
at org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:575)
... 27 more
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:873)
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:646)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:491)
... 29 more
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1565)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1677)
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:845)
... 31 more
Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/home/cgspace.cgiar.org/solr/statistics-2018/data/index/write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:89)
at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:753)
at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:77)
at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64)
at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:279)
at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:111)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1528)
... 33 more
- I found some threads on StackOverflow etc discussing this and several suggested increasing the address space for the shell with ulimit
- I added
ulimit -v unlimited
to the/etc/default/tomcat7
and restarted Tomcat and now Solr is working again:
- Some StackOverflow discussions related to this:
- Abenet was asking if the Atmire Usage Stats are correct because they are over 2 million the last few months…
- For 2019-01 alone the Usage Stats are already around 1.2 million
- I tried to look in the nginx logs to see how many raw requests there are so far this month and it’s about 1.4 million:
# time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Jan/2019"
1442874
real 0m17.161s
user 0m16.205s
sys 0m2.396s
2019-01-17
- Send reminder to Atmire about purchasing the MQM module
- Trying to decide the solid action points for CGSpace on the CG Core 2.0 metadata…
- It’s difficult to decide some of these because the current CG Core 2.0 document does not provide guidance or rationale (yet)!
- Also, there is not a good Dublin Core reference (or maybe I just don’t understand?)
- Several authoritative documents on Dublin Core appear to be:
- And what is the relationship between DC and DCTERMS?
- DSpace uses DCTERMS in the metadata it embeds in XMLUI item views!
- We really need to look at this more carefully and see the impacts that might be made from switching core fields like languages, abstract, authors, etc
- We can check WorldFish and MELSpace repositories to see what effects these changes have had on theirs because they have already adopted some of these changes…
- I think I understand the difference between DC and DCTERMS finally: DC is the original set of fifteen elements and DCTERMS is the newer version that was supposed to address much of the drawbacks of the original with regards to digital content
- We might be able to use some proper fields for citation, abstract, etc that are part of DCTERMS
- To make matters more confusing, there is also “qualified Dublin Core” that uses the original fifteen elements of legacy DC and qualifies them, like
dc.date.accessioned
- According to Wikipedia Qualified Dublin Core was superseded by DCTERMS in 2008!
- So we should be trying to use DCTERMS where possible, unless it is some internal thing that might mess up DSpace (like dates)
- “Elements 1.1” means legacy DC
- Possible action list for CGSpace:
- dc.description.abstract → dcterms.abstract
- dc.description.version → cg.peer-reviewed (or cg.peer-review-status?)
- dc.description.sponsorship → cg.contributor.donor
- dc.contributor.author → dc.creator
- dc.language.iso → dcterms.language
- cg.identifier.url → dcterms.identifier
- dc.identifier.citation → dcterms.bibliographicCitation
- dc.relation.ispartofseries → dcterms.isPartOf
- cg.targetaudience → dcterms.audience
2019-01-19
-
There’s no official set of Dublin Core qualifiers so I can’t tell if things like
dc.contributor.author
that are used by DSpace are official -
I found a great presentation from 2015 by the Digital Repository of Ireland that discusses using MARC Relator Terms with Dublin Core elements
-
It seems that
dc.contributor.author
would be a supported term according to this Library of Congress list linked from the Dublin Core website -
The Library of Congress document specifically says:
These terms conform with the DCMI Abstract Model and may be used in DCMI application profiles. DCMI endorses their use with Dublin Core elements as indicated.
2019-01-20
- That’s weird, I logged into DSpace Test (linode19) and it says it has been up for 213 days:
# w
04:46:14 up 213 days, 7:25, 4 users, load average: 1.94, 1.50, 1.35
- I’ve definitely rebooted it several times in the past few months… according to
journalctl -b
it was a few weeks ago on 2019-01-02 - I re-ran the Ansible DSpace tag, ran all system updates, and rebooted the host
- After rebooting I notice that the Linode kernel went down from 4.19.8 to 4.18.16…
- Atmire sent a quote on our ticket about purchasing the Metadata Quality Module (MQM) for DSpace 5.8
- Abenet asked me for an OpenSearch query that could generate and RSS feed for items in the Livestock CRP
- According to my notes,
sort_by=3
is accession date (as configured indspace.cfg
) - The query currently shows 3023 items, but a Discovery search for Livestock CRP only returns 858 items
- That query seems to return items tagged with
Livestock and Fish
CRP as well… hmm.
2019-01-21
- Investigating running Tomcat 7 on Ubuntu 18.04 with the tarball and a custom systemd package instead of waiting for our DSpace to get compatible with Ubuntu 18.04’s Tomcat 8.5
- I could either run with a simple
tomcat7.service
like this:
[Unit]
Description=Apache Tomcat 7 Web Application Container
After=network.target
[Service]
Type=forking
ExecStart=/path/to/apache-tomcat-7.0.92/bin/startup.sh
ExecStop=/path/to/apache-tomcat-7.0.92/bin/shutdown.sh
User=aorth
Group=aorth
[Install]
WantedBy=multi-user.target
- Or try to use adapt a real systemd service like Arch Linux’s:
[Unit]
Description=Tomcat 7 servlet container
After=network.target
[Service]
Type=forking
PIDFile=/var/run/tomcat7.pid
Environment=CATALINA_PID=/var/run/tomcat7.pid
Environment=TOMCAT_JAVA_HOME=/usr/lib/jvm/default-runtime
Environment=CATALINA_HOME=/usr/share/tomcat7
Environment=CATALINA_BASE=/usr/share/tomcat7
Environment=CATALINA_OPTS=
Environment=ERRFILE=SYSLOG
Environment=OUTFILE=SYSLOG
ExecStart=/usr/bin/jsvc \
-Dcatalina.home=${CATALINA_HOME} \
-Dcatalina.base=${CATALINA_BASE} \
-Djava.io.tmpdir=/var/tmp/tomcat7/temp \
-cp /usr/share/java/commons-daemon.jar:/usr/share/java/eclipse-ecj.jar:${CATALINA_HOME}/bin/bootstrap.jar:${CATALINA_HOME}/bin/tomcat-juli.jar \
-user tomcat7 \
-java-home ${TOMCAT_JAVA_HOME} \
-pidfile /var/run/tomcat7.pid \
-errfile ${ERRFILE} \
-outfile ${OUTFILE} \
$CATALINA_OPTS \
org.apache.catalina.startup.Bootstrap
ExecStop=/usr/bin/jsvc \
-pidfile /var/run/tomcat7.pid \
-stop \
org.apache.catalina.startup.Bootstrap
[Install]
WantedBy=multi-user.target
- I see that
jsvc
andlibcommons-daemon-java
are both available on Ubuntu so that should be easy to port - We probably don’t need Eclipse Java Bytecode Compiler (ecj)
- I tested Tomcat 7.0.92 on Arch Linux using the
tomcat7.service
withjsvc
and it works… nice! - I think I might manage this the same way I do the restic releases in the Ansible infrastructure scripts, where I download a specific version and symlink to some generic location without the version number
- I verified that there is indeed an issue with sharded Solr statistics cores on DSpace, which will cause inaccurate results in the dspace-statistics-api:
$ http 'http://localhost:3000/solr/statistics/select?indent=on&rows=0&q=type:2+id:11576&fq=isBot:false&fq=statistics_type:view' | grep numFound
<result name="response" numFound="33" start="0">
$ http 'http://localhost:3000/solr/statistics-2018/select?indent=on&rows=0&q=type:2+id:11576&fq=isBot:false&fq=statistics_type:view' | grep numFound
<result name="response" numFound="241" start="0">
- I opened an issue on the GitHub issue tracker (#10)
- I don’t think the SolrClient library we are currently using supports these type of queries so we might have to just do raw queries with requests
- The pysolr library says it supports multicore indexes, but I am not sure it does (or at least not with our setup):
import pysolr
solr = pysolr.Solr('http://localhost:3000/solr/statistics')
results = solr.search('type:2', **{'fq': 'isBot:false AND statistics_type:view', 'facet': 'true', 'facet.field': 'id', 'facet.mincount': 1, 'facet.limit': 10, 'facet.offset': 0, 'rows': 0})
print(results.facets['facet_fields'])
{'id': ['77572', 646, '93185', 380, '92932', 375, '102499', 372, '101430', 337, '77632', 331, '102449', 289, '102485', 276, '100849', 270, '47080', 260]}
- If I double check one item from above, for example
77572
, it appears this is only working on the current statistics core and not the shards:
import pysolr
solr = pysolr.Solr('http://localhost:3000/solr/statistics')
results = solr.search('type:2 id:77572', **{'fq': 'isBot:false AND statistics_type:view'})
print(results.hits)
646
solr = pysolr.Solr('http://localhost:3000/solr/statistics-2018/')
results = solr.search('type:2 id:77572', **{'fq': 'isBot:false AND statistics_type:view'})
print(results.hits)
595
- So I guess I need to figure out how to use join queries and maybe even switch to using raw Python requests with JSON
- This enumerates the list of Solr cores and returns JSON format:
http://localhost:3000/solr/admin/cores?action=STATUS&wt=json
- I think I figured out how to search across shards, I needed to give the whole URL to each other core
- Now I get more results when I start adding the other statistics cores:
$ http 'http://localhost:3000/solr/statistics/select?&indent=on&rows=0&q=*:*' | grep numFound<result name="response" numFound="2061320" start="0">
$ http 'http://localhost:3000/solr/statistics/select?&shards=localhost:8081/solr/statistics-2018&indent=on&rows=0&q=*:*' | grep numFound
<result name="response" numFound="16280292" start="0" maxScore="1.0">
$ http 'http://localhost:3000/solr/statistics/select?&shards=localhost:8081/solr/statistics-2018,localhost:8081/solr/statistics-2017&indent=on&rows=0&q=*:*' | grep numFound
<result name="response" numFound="25606142" start="0" maxScore="1.0">
$ http 'http://localhost:3000/solr/statistics/select?&shards=localhost:8081/solr/statistics-2018,localhost:8081/solr/statistics-2017,localhost:8081/solr/statistics-2016&indent=on&rows=0&q=*:*' | grep numFound
<result name="response" numFound="31532212" start="0" maxScore="1.0">
- I should be able to modify the dspace-statistics-api to check the shards via the Solr core status, then add the
shards
parameter to each query to make the search distributed among the cores - I implemented a proof of concept to query the Solr STATUS for active cores and to add them with a
shards
query string - A few things I noticed:
- Solr doesn’t mind if you use an empty
shards
parameter - Solr doesn’t mind if you have an extra comma at the end of the
shards
parameter - If you are searching multiple cores, you need to include the base core in the
shards
parameter as well - For example, compare the following two queries, first including the base core and the shard in the
shards
parameter, and then only including the shard:
- Solr doesn’t mind if you use an empty
$ http 'http://localhost:8081/solr/statistics/select?indent=on&rows=0&q=type:2+id:11576&fq=isBot:false&fq=statistics_type:view&shards=localhost:8081/solr/statistics,localhost:8081/solr/statistics-2018' | grep numFound
<result name="response" numFound="275" start="0" maxScore="12.205825">
$ http 'http://localhost:8081/solr/statistics/select?indent=on&rows=0&q=type:2+id:11576&fq=isBot:false&fq=statistics_type:view&shards=localhost:8081/solr/statistics-2018' | grep numFound
<result name="response" numFound="241" start="0" maxScore="12.205825">
2019-01-22
- Release version 0.9.0 of the dspace-statistics-api to address the issue of querying multiple Solr statistics shards
- I deployed it on DSpace Test (linode19) and restarted the indexer and now it shows all the stats from 2018 as well (756 pages of views, intead of 6)
- I deployed it on CGSpace (linode18) and restarted the indexer as well
- Linode sent an alert that CGSpace (linode18) was using high CPU this afternoon, the top ten IPs during that time were:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "22/Jan/2019:1(4|5|6)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
155 40.77.167.106
176 2003:d5:fbda:1c00:1106:c7a0:4b17:3af8
189 107.21.16.70
217 54.83.93.85
310 46.174.208.142
346 83.103.94.48
360 45.5.186.2
595 154.113.73.30
716 196.191.127.37
915 35.237.175.180
- 35.237.175.180 is known to us
- I don’t think we’ve seen 196.191.127.37 before. Its user agent is:
Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 UBrowser/7.0.185.1002 Safari/537.36
- Interestingly this IP is located in Addis Ababa…
- Another interesting one is 154.113.73.30, which is apparently at IITA Nigeria and uses the user agent:
Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36
2019-01-23
- Peter noticed that some goo.gl links in our tweets from Feedburner are broken, for example this one from last week:
#ILRI research: Towards unlocking the potential of the hides and skins value chain in Somaliland https://t.co/EZH7ALW4dp
— ILRI.org (@ILRI) January 18, 2019
- The shortened link is goo.gl/fb/VRj9Gq and it shows a “Dynamic Link not found” error from Firebase:
-
Apparently Google announced last year that they plan to discontinue the shortner and transition to Firebase Dynamic Links in March, 2019, so maybe this is related…
-
Very interesting discussion of methods for running Tomcat under systemd
-
We can set the ulimit options that used to be in
/etc/default/tomcat7
with systemd’sLimitNOFILE
andLimitAS
(see thesystemd.exec
man page)- Note that we need to use
infinity
instead ofunlimited
for the address space
- Note that we need to use
-
Create accounts for Bosun from IITA and Valerio from ICARDA / CGMEL on DSpace Test
-
Maria Garruccio asked me for a list of author affiliations from all of their submitted items so she can clean them up
-
I got a list of their collections from the CGSpace XMLUI and then used an SQL query to dump the unique values to CSV:
dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'affiliation') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/35501', '10568/41728', '10568/49622', '10568/56589', '10568/56592', '10568/65064', '10568/65718', '10568/65719', '10568/67373', '10568/67731', '10568/68235', '10568/68546', '10568/69089', '10568/69160', '10568/69419', '10568/69556', '10568/70131', '10568/70252', '10568/70978'))) group by text_value order by count desc) to /tmp/bioversity-affiliations.csv with csv;
COPY 1109
- Send a mail to the dspace-tech mailing list about the OpenSearch issue we had with the Livestock CRP
- Linode sent an alert that CGSpace (linode18) had a high load this morning, here are the top ten IPs during that time:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "23/Jan/2019:0(4|5|6)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
222 54.226.25.74
241 40.77.167.13
272 46.101.86.248
297 35.237.175.180
332 45.5.184.72
355 34.218.226.147
404 66.249.64.155
4637 205.186.128.185
4637 70.32.83.92
9265 45.5.186.2
-
I think it’s the usual IPs:
- 45.5.186.2 is CIAT
- 70.32.83.92 is CCAFS
- 205.186.128.185 is CCAFS or perhaps another Macaroni Bros harvester (new ILRI website?)
-
Following up on the thumbnail issue that we had in 2018-12
-
It looks like the two items with problematic PDFs both have thumbnails now:
-
Just to make sure these were not uploaded by the user or something, I manually forced the regeneration of these with DSpace’s
filter-media
:
$ schedtool -D -e ionice -c2 -n7 nice -n19 /home/cgspace.cgiar.org/bin/dspace filter-media -v -f -i 10568/98390
$ schedtool -D -e ionice -c2 -n7 nice -n19 /home/cgspace.cgiar.org/bin/dspace filter-media -v -f -i 10568/98391
- Both of these were successful, so there must have been an update to ImageMagick or Ghostscript in Ubuntu since early 2018-12
- Looking at the apt history logs I see that on 2018-12-07 a security update for Ghostscript was installed (version 9.26~dfsg+0-0ubuntu0.16.04.3)
- I think this Launchpad discussion is relevant: https://bugs.launchpad.net/ubuntu/+source/ghostscript/+bug/1806517
- As well as the original Ghostscript bug report: https://bugs.ghostscript.com/show_bug.cgi?id=699815
2019-01-24
- I noticed Ubuntu’s Ghostscript 9.26 works on some troublesome PDFs where Arch’s Ghostscript 9.26 doesn’t, so the fix for the first/last page crash is not the patch I found yesterday
- Ubuntu’s Ghostscript uses another patch from Ghostscript git (upstream bug report)
- I re-compiled Arch’s ghostscript with the patch and then I was able to generate a thumbnail from one of the troublesome PDFs
- Before and after:
$ identify Food\ safety\ Kenya\ fruits.pdf\[0\]
zsh: abort (core dumped) identify Food\ safety\ Kenya\ fruits.pdf\[0\]
$ identify Food\ safety\ Kenya\ fruits.pdf\[0\]
Food safety Kenya fruits.pdf[0]=>Food safety Kenya fruits.pdf PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.000u 0:00.000
identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/1747.
- I reported it to the Arch Linux bug tracker (61513)
- I told Atmire to go ahead with the Metadata Quality Module addition based on our
5_x-dev
branch (657) - Linode sent alerts last night to say that CGSpace (linode18) was using high CPU last night, here are the top ten IPs from the nginx logs around that time:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "23/Jan/2019:(18|19|20)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
305 3.81.136.184
306 3.83.14.11
306 52.54.252.47
325 54.221.57.180
378 66.249.64.157
424 54.70.40.11
497 47.29.247.74
783 35.237.175.180
1108 66.249.64.155
2378 45.5.186.2
- 45.5.186.2 is CIAT and 66.249.64.155 is Google… hmmm.
- Linode sent another alert this morning, here are the top ten IPs active during that time:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "24/Jan/2019:0(4|5|6)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
360 3.89.134.93
362 34.230.15.139
366 100.24.48.177
369 18.212.208.240
377 3.81.136.184
404 54.221.57.180
506 66.249.64.155
4642 70.32.83.92
4643 205.186.128.185
8593 45.5.186.2
- Just double checking what CIAT is doing, they are mainly hitting the REST API:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "24/Jan/2019:" | grep 45.5.186.2 | grep -Eo "GET /(handle|bitstream|rest|oai)/" | sort | uniq -c | sort -n
- CIAT’s community currently has 12,000 items in it so this is normal
- The issue with goo.gl links that we saw yesterday appears to be resolved, as links are working again…
- For example: https://goo.gl/fb/VRj9Gq
- The full list of MARC Relators on the Library of Congress website linked from the DMCI relators page is very confusing
- Looking at the default DSpace XMLUI crosswalk in xhtml-head-item.properties I see a very complete mapping of DSpace DC and QDC fields to DCTERMS
- This is good for standards-compliant web crawlers, but what about for those harvesting via REST or OAI APIs?
- I sent a message titled “DC, QDC, and DCTERMS: reviewing our metadata practices” to the dspace-tech mailing list to ask about some of this
2019-01-25
- A little bit more work on getting Tomcat to run from a tarball on our Ansible infrastructure playbooks
- I tested by doing a Tomcat 7.0.91 installation, then switching it to 7.0.92 and it worked… nice!
- I refined the tasks so much that I was confident enough to deploy them on DSpace Test and it went very well
- Basically I just stopped tomcat7, created a dspace user, removed tomcat7, chown’d everything to the dspace user, then ran the playbook
- So now DSpace Test (linode19) is running Tomcat 7.0.92… w00t
- Now we need to monitor it for a few weeks to see if there is anything we missed, and then I can change CGSpace (linode18) as well, and we’re ready for Ubuntu 18.04 too!
2019-01-27
- Linode sent an email that the server was using a lot of CPU this morning, and these were the top IPs in the web server logs at the time:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "27/Jan/2019:0(6|7|8)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
189 40.77.167.108
191 157.55.39.2
263 34.218.226.147
283 45.5.184.2
332 45.5.184.72
608 5.9.6.51
679 66.249.66.223
1116 66.249.66.219
4644 205.186.128.185
4644 70.32.83.92
- I think it’s the usual IPs:
- 70.32.83.92 is CCAFS
- 205.186.128.185 is CCAFS or perhaps another Macaroni Bros harvester (new ILRI website?)
2019-01-28
- Udana from WLE asked me about the interaction between their publication website and their items on CGSpace
- There is an item that is mapped into their collection from IWMI and is missing their
cg.identifier.wletheme
metadata - I told him that, as far as I remember, when WLE introduced Phase II research themes in 2017 we decided to infer theme ownership from the collection hierarchy and we created a WLE Phase II Research Themes subCommunity
- Perhaps they need to ask Macaroni Bros about the mapping
- There is an item that is mapped into their collection from IWMI and is missing their
- Linode alerted that CGSpace (linode18) was using too much CPU again this morning, here are the active IPs from the web server log at the time:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "28/Jan/2019:0(6|7|8)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
67 207.46.13.50
105 41.204.190.40
117 34.218.226.147
126 35.237.175.180
203 213.55.99.121
332 45.5.184.72
377 5.9.6.51
512 45.5.184.2
4644 205.186.128.185
4644 70.32.83.92
- There seems to be a pattern with
70.32.83.92
and205.186.128.185
lately! - Every morning at 8AM they are the top users… I should tell them to stagger their requests…
- I signed up for a VisualPing of the PostgreSQL JDBC driver download page to my CGIAR email address
- Hopefully this will one day alert me that a new driver is released!
- Last night Linode sent an alert that CGSpace (linode18) was using high CPU, here are the most active IPs in the hours just before, during, and after the alert:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "28/Jan/2019:(17|18|19|20|21)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
310 45.5.184.2
425 5.143.231.39
526 54.70.40.11
1003 199.47.87.141
1374 35.237.175.180
1455 5.9.6.51
1501 66.249.66.223
1771 66.249.66.219
2107 199.47.87.140
2540 45.5.186.2
- Of course there is CIAT’s
45.5.186.2
, but also45.5.184.2
appears to be CIAT… I wonder why they have two harvesters? 199.47.87.140
and199.47.87.141
is TurnItIn with the following user agent:
TurnitinBot (https://turnitin.com/robot/crawlerinfo.html)
2019-01-29
- Linode sent an alert about CGSpace (linode18) CPU usage this morning, here are the top IPs in the web server logs just before, during, and after the alert:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "29/Jan/2019:0(3|4|5|6|7)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
334 45.5.184.72
429 66.249.66.223
522 35.237.175.180
555 34.218.226.147
655 66.249.66.221
844 5.9.6.51
2507 66.249.66.219
4645 70.32.83.92
4646 205.186.128.185
9329 45.5.186.2
45.5.186.2
is CIAT as usual…70.32.83.92
and205.186.128.185
are CCAFS as usual…66.249.66.219
is Google…- I’m thinking it might finally be time to increase the threshold of the Linode CPU alerts
- I adjusted the alert threshold from 250% to 275%
2019-01-30
- Got another alert from Linode about CGSpace (linode18) this morning, here are the top IPs before, during, and after the alert:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "30/Jan/2019:0(5|6|7|8|9)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
273 46.101.86.248
301 35.237.175.180
334 45.5.184.72
387 5.9.6.51
527 2a01:4f8:13b:1296::2
1021 34.218.226.147
1448 66.249.66.219
4649 205.186.128.185
4649 70.32.83.92
5163 45.5.184.2
- I might need to adjust the threshold again, because the load average this morning was 296% and the activity looks pretty normal (as always recently)
2019-01-31
- Linode sent alerts about CGSpace (linode18) last night and this morning, here are the top IPs before, during, and after those times:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "30/Jan/2019:(16|17|18|19|20)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
436 18.196.196.108
460 157.55.39.168
460 207.46.13.96
500 197.156.105.116
728 54.70.40.11
1560 5.9.6.51
1562 35.237.175.180
1601 85.25.237.71
1894 66.249.66.219
2610 45.5.184.2
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "31/Jan/2019:0(2|3|4|5|6)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
318 207.46.13.242
334 45.5.184.72
486 35.237.175.180
609 34.218.226.147
620 66.249.66.219
1054 5.9.6.51
4391 70.32.83.92
4428 205.186.128.185
6758 85.25.237.71
9239 45.5.186.2
45.5.186.2
and45.5.184.2
are CIAT as always85.25.237.71
is some new server in Germany that I’ve never seen before with the user agent:
Linguee Bot (http://www.linguee.com/bot; bot@linguee.com)