mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2022-03-04
This commit is contained in:
@ -12,7 +12,7 @@
|
||||
Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning
|
||||
I don’t see anything interesting in the web server logs around that time though:
|
||||
|
||||
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
92 40.77.167.4
|
||||
99 210.7.29.100
|
||||
120 38.126.157.45
|
||||
@ -38,7 +38,7 @@ I don’t see anything interesting in the web server logs around that time t
|
||||
Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning
|
||||
I don’t see anything interesting in the web server logs around that time though:
|
||||
|
||||
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
92 40.77.167.4
|
||||
99 210.7.29.100
|
||||
120 38.126.157.45
|
||||
@ -50,7 +50,7 @@ I don’t see anything interesting in the web server logs around that time t
|
||||
357 207.46.13.1
|
||||
903 54.70.40.11
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.92.2" />
|
||||
<meta name="generator" content="Hugo 0.93.1" />
|
||||
|
||||
|
||||
|
||||
@ -141,7 +141,7 @@ I don’t see anything interesting in the web server logs around that time t
|
||||
<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
|
||||
<li>I don’t see anything interesting in the web server logs around that time though:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
92 40.77.167.4
|
||||
99 210.7.29.100
|
||||
120 38.126.157.45
|
||||
@ -155,14 +155,14 @@ I don’t see anything interesting in the web server logs around that time t
|
||||
</code></pre><ul>
|
||||
<li>Analyzing the types of requests made by the top few IPs during that time:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | grep 54.70.40.11 | grep -o -E "(bitstream|discover|handle)" | sort | uniq -c
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | grep 54.70.40.11 | grep -o -E "(bitstream|discover|handle)" | sort | uniq -c
|
||||
30 bitstream
|
||||
534 discover
|
||||
352 handle
|
||||
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | grep 207.46.13.1 | grep -o -E "(bitstream|discover|handle)" | sort | uniq -c
|
||||
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | grep 207.46.13.1 | grep -o -E "(bitstream|discover|handle)" | sort | uniq -c
|
||||
194 bitstream
|
||||
345 handle
|
||||
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | grep 46.101.86.248 | grep -o -E "(bitstream|discover|handle)" | sort | uniq -c
|
||||
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | grep 46.101.86.248 | grep -o -E "(bitstream|discover|handle)" | sort | uniq -c
|
||||
261 handle
|
||||
</code></pre><ul>
|
||||
<li>It’s not clear to me what was causing the outbound traffic spike</li>
|
||||
@ -283,7 +283,7 @@ org.apache.jasper.JasperException: /home.jsp (line: [214], column: [1]) /discove
|
||||
<ul>
|
||||
<li>Linode sent a message last night that CGSpace (linode18) had high CPU usage, but I don’t see anything around that time in the web server logs:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Jan/2019:1(7|8|9)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Jan/2019:1(7|8|9)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
189 207.46.13.192
|
||||
217 31.6.77.23
|
||||
340 66.249.70.29
|
||||
@ -313,33 +313,33 @@ X-Content-Type-Options: nosniff
|
||||
X-Frame-Options: ALLOW-FROM http://aims.fao.org
|
||||
|
||||
{
|
||||
"@context": {
|
||||
"@language": "en",
|
||||
"altLabel": "skos:altLabel",
|
||||
"hiddenLabel": "skos:hiddenLabel",
|
||||
"isothes": "http://purl.org/iso25964/skos-thes#",
|
||||
"onki": "http://schema.onki.fi/onki#",
|
||||
"prefLabel": "skos:prefLabel",
|
||||
"results": {
|
||||
"@container": "@list",
|
||||
"@id": "onki:results"
|
||||
"@context": {
|
||||
"@language": "en",
|
||||
"altLabel": "skos:altLabel",
|
||||
"hiddenLabel": "skos:hiddenLabel",
|
||||
"isothes": "http://purl.org/iso25964/skos-thes#",
|
||||
"onki": "http://schema.onki.fi/onki#",
|
||||
"prefLabel": "skos:prefLabel",
|
||||
"results": {
|
||||
"@container": "@list",
|
||||
"@id": "onki:results"
|
||||
},
|
||||
"skos": "http://www.w3.org/2004/02/skos/core#",
|
||||
"type": "@type",
|
||||
"uri": "@id"
|
||||
"skos": "http://www.w3.org/2004/02/skos/core#",
|
||||
"type": "@type",
|
||||
"uri": "@id"
|
||||
},
|
||||
"results": [
|
||||
"results": [
|
||||
{
|
||||
"lang": "en",
|
||||
"prefLabel": "soil",
|
||||
"type": [
|
||||
"skos:Concept"
|
||||
"lang": "en",
|
||||
"prefLabel": "soil",
|
||||
"type": [
|
||||
"skos:Concept"
|
||||
],
|
||||
"uri": "http://aims.fao.org/aos/agrovoc/c_7156",
|
||||
"vocab": "agrovoc"
|
||||
"uri": "http://aims.fao.org/aos/agrovoc/c_7156",
|
||||
"vocab": "agrovoc"
|
||||
}
|
||||
],
|
||||
"uri": ""
|
||||
"uri": ""
|
||||
}
|
||||
</code></pre><ul>
|
||||
<li>The API does not appear to be case sensitive (searches for <code>SOIL</code> and <code>soil</code> return the same thing)</li>
|
||||
@ -359,23 +359,23 @@ X-Content-Type-Options: nosniff
|
||||
X-Frame-Options: ALLOW-FROM http://aims.fao.org
|
||||
|
||||
{
|
||||
"@context": {
|
||||
"@language": "en",
|
||||
"altLabel": "skos:altLabel",
|
||||
"hiddenLabel": "skos:hiddenLabel",
|
||||
"isothes": "http://purl.org/iso25964/skos-thes#",
|
||||
"onki": "http://schema.onki.fi/onki#",
|
||||
"prefLabel": "skos:prefLabel",
|
||||
"results": {
|
||||
"@container": "@list",
|
||||
"@id": "onki:results"
|
||||
"@context": {
|
||||
"@language": "en",
|
||||
"altLabel": "skos:altLabel",
|
||||
"hiddenLabel": "skos:hiddenLabel",
|
||||
"isothes": "http://purl.org/iso25964/skos-thes#",
|
||||
"onki": "http://schema.onki.fi/onki#",
|
||||
"prefLabel": "skos:prefLabel",
|
||||
"results": {
|
||||
"@container": "@list",
|
||||
"@id": "onki:results"
|
||||
},
|
||||
"skos": "http://www.w3.org/2004/02/skos/core#",
|
||||
"type": "@type",
|
||||
"uri": "@id"
|
||||
"skos": "http://www.w3.org/2004/02/skos/core#",
|
||||
"type": "@type",
|
||||
"uri": "@id"
|
||||
},
|
||||
"results": [],
|
||||
"uri": ""
|
||||
"results": [],
|
||||
"uri": ""
|
||||
}
|
||||
</code></pre><ul>
|
||||
<li>I guess the <code>results</code> object will just be empty…</li>
|
||||
@ -386,28 +386,28 @@ $ . /tmp/sparql/bin/activate
|
||||
$ pip install sparql-client ipython
|
||||
$ ipython
|
||||
In [10]: import sparql
|
||||
In [11]: s = sparql.Service("http://agrovoc.uniroma2.it:3030/agrovoc/sparql", "utf-8", "GET")
|
||||
In [12]: statement=('PREFIX skos: <http://www.w3.org/2004/02/skos/core#> '
|
||||
...: 'SELECT '
|
||||
...: '?label '
|
||||
...: 'WHERE { '
|
||||
...: '{ ?concept skos:altLabel ?label . } UNION { ?concept skos:prefLabel ?label . } '
|
||||
...: 'FILTER regex(str(?label), "^fish", "i") . '
|
||||
...: '} LIMIT 10')
|
||||
In [11]: s = sparql.Service("http://agrovoc.uniroma2.it:3030/agrovoc/sparql", "utf-8", "GET")
|
||||
In [12]: statement=('PREFIX skos: <http://www.w3.org/2004/02/skos/core#> '
|
||||
...: 'SELECT '
|
||||
...: '?label '
|
||||
...: 'WHERE { '
|
||||
...: '{ ?concept skos:altLabel ?label . } UNION { ?concept skos:prefLabel ?label . } '
|
||||
...: 'FILTER regex(str(?label), "^fish", "i") . '
|
||||
...: '} LIMIT 10')
|
||||
In [13]: result = s.query(statement)
|
||||
In [14]: for row in result.fetchone():
|
||||
...: print(row)
|
||||
...:
|
||||
(<Literal "fish catching"@en>,)
|
||||
(<Literal "fish harvesting"@en>,)
|
||||
(<Literal "fish meat"@en>,)
|
||||
(<Literal "fish roe"@en>,)
|
||||
(<Literal "fish conversion"@en>,)
|
||||
(<Literal "fisheries catches (composition)"@en>,)
|
||||
(<Literal "fishtail palm"@en>,)
|
||||
(<Literal "fishflies"@en>,)
|
||||
(<Literal "fishery biology"@en>,)
|
||||
(<Literal "fish production"@en>,)
|
||||
(<Literal "fish catching"@en>,)
|
||||
(<Literal "fish harvesting"@en>,)
|
||||
(<Literal "fish meat"@en>,)
|
||||
(<Literal "fish roe"@en>,)
|
||||
(<Literal "fish conversion"@en>,)
|
||||
(<Literal "fisheries catches (composition)"@en>,)
|
||||
(<Literal "fishtail palm"@en>,)
|
||||
(<Literal "fishflies"@en>,)
|
||||
(<Literal "fishery biology"@en>,)
|
||||
(<Literal "fish production"@en>,)
|
||||
</code></pre><ul>
|
||||
<li>The SPARQL query comes from my notes in <a href="/cgspace-notes/2017-08/">2017-08</a></li>
|
||||
</ul>
|
||||
@ -466,7 +466,7 @@ In [14]: for row in result.fetchone():
|
||||
</li>
|
||||
<li>I am testing the speed of the WorldFish DSpace repository’s REST API and it’s five to ten times faster than CGSpace as I tested in <a href="/cgspace-notes/2018-10/">2018-10</a>:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ time http --print h 'https://digitalarchive.worldfishcenter.org/rest/items?expand=metadata,bitstreams,parentCommunityList&limit=100&offset=0'
|
||||
<pre tabindex="0"><code>$ time http --print h 'https://digitalarchive.worldfishcenter.org/rest/items?expand=metadata,bitstreams,parentCommunityList&limit=100&offset=0'
|
||||
|
||||
0.16s user 0.03s system 3% cpu 5.185 total
|
||||
0.17s user 0.02s system 2% cpu 7.123 total
|
||||
@ -474,7 +474,7 @@ In [14]: for row in result.fetchone():
|
||||
</code></pre><ul>
|
||||
<li>In other news, Linode sent a mail last night that the CPU load on CGSpace (linode18) was high, here are the top IPs in the logs around those few hours:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "14/Jan/2019:(17|18|19|20)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "14/Jan/2019:(17|18|19|20)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
157 31.6.77.23
|
||||
192 54.70.40.11
|
||||
202 66.249.64.157
|
||||
@ -651,7 +651,7 @@ Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
|
||||
at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:111)
|
||||
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1528)
|
||||
... 33 more
|
||||
2019-01-16 13:37:55,401 ERROR org.apache.solr.core.SolrCore @ org.apache.solr.common.SolrException: Error CREATEing SolrCore 'statistics-2018': Unable to create core [statistics-2018] Caused by: Lock obtain timed out: NativeFSLock@/home/cgspace.cgiar.org/solr/statistics-2018/data/index/write.lock
|
||||
2019-01-16 13:37:55,401 ERROR org.apache.solr.core.SolrCore @ org.apache.solr.common.SolrException: Error CREATEing SolrCore 'statistics-2018': Unable to create core [statistics-2018] Caused by: Lock obtain timed out: NativeFSLock@/home/cgspace.cgiar.org/solr/statistics-2018/data/index/write.lock
|
||||
at org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:613)
|
||||
at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:199)
|
||||
at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:188)
|
||||
@ -721,7 +721,7 @@ Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
|
||||
<li>For 2019-01 alone the Usage Stats are already around 1.2 million</li>
|
||||
<li>I tried to look in the nginx logs to see how many raw requests there are so far this month and it’s about 1.4 million:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Jan/2019"
|
||||
<pre tabindex="0"><code># time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Jan/2019"
|
||||
1442874
|
||||
|
||||
real 0m17.161s
|
||||
@ -859,30 +859,30 @@ WantedBy=multi-user.target
|
||||
<li>I think I might manage this the same way I do the restic releases in the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a>, where I download a specific version and symlink to some generic location without the version number</li>
|
||||
<li>I verified that there is indeed an issue with sharded Solr statistics cores on DSpace, which will cause inaccurate results in the dspace-statistics-api:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ http 'http://localhost:3000/solr/statistics/select?indent=on&rows=0&q=type:2+id:11576&fq=isBot:false&fq=statistics_type:view' | grep numFound
|
||||
<result name="response" numFound="33" start="0">
|
||||
$ http 'http://localhost:3000/solr/statistics-2018/select?indent=on&rows=0&q=type:2+id:11576&fq=isBot:false&fq=statistics_type:view' | grep numFound
|
||||
<result name="response" numFound="241" start="0">
|
||||
<pre tabindex="0"><code>$ http 'http://localhost:3000/solr/statistics/select?indent=on&rows=0&q=type:2+id:11576&fq=isBot:false&fq=statistics_type:view' | grep numFound
|
||||
<result name="response" numFound="33" start="0">
|
||||
$ http 'http://localhost:3000/solr/statistics-2018/select?indent=on&rows=0&q=type:2+id:11576&fq=isBot:false&fq=statistics_type:view' | grep numFound
|
||||
<result name="response" numFound="241" start="0">
|
||||
</code></pre><ul>
|
||||
<li>I opened an issue on the GitHub issue tracker (<a href="https://github.com/ilri/dspace-statistics-api/issues/10">#10</a>)</li>
|
||||
<li>I don’t think the <a href="https://solrclient.readthedocs.io/en/latest/">SolrClient library</a> we are currently using supports these type of queries so we might have to just do raw queries with requests</li>
|
||||
<li>The <a href="https://github.com/django-haystack/pysolr">pysolr</a> library says it supports multicore indexes, but I am not sure it does (or at least not with our setup):</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>import pysolr
|
||||
solr = pysolr.Solr('http://localhost:3000/solr/statistics')
|
||||
results = solr.search('type:2', **{'fq': 'isBot:false AND statistics_type:view', 'facet': 'true', 'facet.field': 'id', 'facet.mincount': 1, 'facet.limit': 10, 'facet.offset': 0, 'rows': 0})
|
||||
print(results.facets['facet_fields'])
|
||||
{'id': ['77572', 646, '93185', 380, '92932', 375, '102499', 372, '101430', 337, '77632', 331, '102449', 289, '102485', 276, '100849', 270, '47080', 260]}
|
||||
solr = pysolr.Solr('http://localhost:3000/solr/statistics')
|
||||
results = solr.search('type:2', **{'fq': 'isBot:false AND statistics_type:view', 'facet': 'true', 'facet.field': 'id', 'facet.mincount': 1, 'facet.limit': 10, 'facet.offset': 0, 'rows': 0})
|
||||
print(results.facets['facet_fields'])
|
||||
{'id': ['77572', 646, '93185', 380, '92932', 375, '102499', 372, '101430', 337, '77632', 331, '102449', 289, '102485', 276, '100849', 270, '47080', 260]}
|
||||
</code></pre><ul>
|
||||
<li>If I double check one item from above, for example <code>77572</code>, it appears this is only working on the current statistics core and not the shards:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>import pysolr
|
||||
solr = pysolr.Solr('http://localhost:3000/solr/statistics')
|
||||
results = solr.search('type:2 id:77572', **{'fq': 'isBot:false AND statistics_type:view'})
|
||||
solr = pysolr.Solr('http://localhost:3000/solr/statistics')
|
||||
results = solr.search('type:2 id:77572', **{'fq': 'isBot:false AND statistics_type:view'})
|
||||
print(results.hits)
|
||||
646
|
||||
solr = pysolr.Solr('http://localhost:3000/solr/statistics-2018/')
|
||||
results = solr.search('type:2 id:77572', **{'fq': 'isBot:false AND statistics_type:view'})
|
||||
solr = pysolr.Solr('http://localhost:3000/solr/statistics-2018/')
|
||||
results = solr.search('type:2 id:77572', **{'fq': 'isBot:false AND statistics_type:view'})
|
||||
print(results.hits)
|
||||
595
|
||||
</code></pre><ul>
|
||||
@ -894,13 +894,13 @@ print(results.hits)
|
||||
<li>I think I figured out how to search across shards, I needed to give the whole URL to each other core</li>
|
||||
<li>Now I get more results when I start adding the other statistics cores:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ http 'http://localhost:3000/solr/statistics/select?&indent=on&rows=0&q=*:*' | grep numFound<result name="response" numFound="2061320" start="0">
|
||||
$ http 'http://localhost:3000/solr/statistics/select?&shards=localhost:8081/solr/statistics-2018&indent=on&rows=0&q=*:*' | grep numFound
|
||||
<result name="response" numFound="16280292" start="0" maxScore="1.0">
|
||||
$ http 'http://localhost:3000/solr/statistics/select?&shards=localhost:8081/solr/statistics-2018,localhost:8081/solr/statistics-2017&indent=on&rows=0&q=*:*' | grep numFound
|
||||
<result name="response" numFound="25606142" start="0" maxScore="1.0">
|
||||
$ http 'http://localhost:3000/solr/statistics/select?&shards=localhost:8081/solr/statistics-2018,localhost:8081/solr/statistics-2017,localhost:8081/solr/statistics-2016&indent=on&rows=0&q=*:*' | grep numFound
|
||||
<result name="response" numFound="31532212" start="0" maxScore="1.0">
|
||||
<pre tabindex="0"><code>$ http 'http://localhost:3000/solr/statistics/select?&indent=on&rows=0&q=*:*' | grep numFound<result name="response" numFound="2061320" start="0">
|
||||
$ http 'http://localhost:3000/solr/statistics/select?&shards=localhost:8081/solr/statistics-2018&indent=on&rows=0&q=*:*' | grep numFound
|
||||
<result name="response" numFound="16280292" start="0" maxScore="1.0">
|
||||
$ http 'http://localhost:3000/solr/statistics/select?&shards=localhost:8081/solr/statistics-2018,localhost:8081/solr/statistics-2017&indent=on&rows=0&q=*:*' | grep numFound
|
||||
<result name="response" numFound="25606142" start="0" maxScore="1.0">
|
||||
$ http 'http://localhost:3000/solr/statistics/select?&shards=localhost:8081/solr/statistics-2018,localhost:8081/solr/statistics-2017,localhost:8081/solr/statistics-2016&indent=on&rows=0&q=*:*' | grep numFound
|
||||
<result name="response" numFound="31532212" start="0" maxScore="1.0">
|
||||
</code></pre><ul>
|
||||
<li>I should be able to modify the dspace-statistics-api to check the shards via the Solr core status, then add the <code>shards</code> parameter to each query to make the search distributed among the cores</li>
|
||||
<li>I implemented a proof of concept to query the Solr STATUS for active cores and to add them with a <code>shards</code> query string</li>
|
||||
@ -913,10 +913,10 @@ $ http 'http://localhost:3000/solr/statistics/select?&shards=localhost:8081/
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ http 'http://localhost:8081/solr/statistics/select?indent=on&rows=0&q=type:2+id:11576&fq=isBot:false&fq=statistics_type:view&shards=localhost:8081/solr/statistics,localhost:8081/solr/statistics-2018' | grep numFound
|
||||
<result name="response" numFound="275" start="0" maxScore="12.205825">
|
||||
$ http 'http://localhost:8081/solr/statistics/select?indent=on&rows=0&q=type:2+id:11576&fq=isBot:false&fq=statistics_type:view&shards=localhost:8081/solr/statistics-2018' | grep numFound
|
||||
<result name="response" numFound="241" start="0" maxScore="12.205825">
|
||||
<pre tabindex="0"><code>$ http 'http://localhost:8081/solr/statistics/select?indent=on&rows=0&q=type:2+id:11576&fq=isBot:false&fq=statistics_type:view&shards=localhost:8081/solr/statistics,localhost:8081/solr/statistics-2018' | grep numFound
|
||||
<result name="response" numFound="275" start="0" maxScore="12.205825">
|
||||
$ http 'http://localhost:8081/solr/statistics/select?indent=on&rows=0&q=type:2+id:11576&fq=isBot:false&fq=statistics_type:view&shards=localhost:8081/solr/statistics-2018' | grep numFound
|
||||
<result name="response" numFound="241" start="0" maxScore="12.205825">
|
||||
</code></pre><h2 id="2019-01-22">2019-01-22</h2>
|
||||
<ul>
|
||||
<li>Release <a href="https://github.com/ilri/dspace-statistics-api/releases/tag/v0.9.0">version 0.9.0 of the dspace-statistics-api</a> to address the issue of querying multiple Solr statistics shards</li>
|
||||
@ -924,7 +924,7 @@ $ http 'http://localhost:8081/solr/statistics/select?indent=on&rows=0&q=
|
||||
<li>I deployed it on CGSpace (linode18) and restarted the indexer as well</li>
|
||||
<li>Linode sent an alert that CGSpace (linode18) was using high CPU this afternoon, the top ten IPs during that time were:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "22/Jan/2019:1(4|5|6)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "22/Jan/2019:1(4|5|6)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
155 40.77.167.106
|
||||
176 2003:d5:fbda:1c00:1106:c7a0:4b17:3af8
|
||||
189 107.21.16.70
|
||||
@ -979,13 +979,13 @@ $ http 'http://localhost:8081/solr/statistics/select?indent=on&rows=0&q=
|
||||
<p>I got a list of their collections from the CGSpace XMLUI and then used an SQL query to dump the unique values to CSV:</p>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'affiliation') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/35501', '10568/41728', '10568/49622', '10568/56589', '10568/56592', '10568/65064', '10568/65718', '10568/65719', '10568/67373', '10568/67731', '10568/68235', '10568/68546', '10568/69089', '10568/69160', '10568/69419', '10568/69556', '10568/70131', '10568/70252', '10568/70978'))) group by text_value order by count desc) to /tmp/bioversity-affiliations.csv with csv;
|
||||
<pre tabindex="0"><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'affiliation') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/35501', '10568/41728', '10568/49622', '10568/56589', '10568/56592', '10568/65064', '10568/65718', '10568/65719', '10568/67373', '10568/67731', '10568/68235', '10568/68546', '10568/69089', '10568/69160', '10568/69419', '10568/69556', '10568/70131', '10568/70252', '10568/70978'))) group by text_value order by count desc) to /tmp/bioversity-affiliations.csv with csv;
|
||||
COPY 1109
|
||||
</code></pre><ul>
|
||||
<li>Send a mail to the dspace-tech mailing list about the OpenSearch issue we had with the Livestock CRP</li>
|
||||
<li>Linode sent an alert that CGSpace (linode18) had a high load this morning, here are the top ten IPs during that time:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "23/Jan/2019:0(4|5|6)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "23/Jan/2019:0(4|5|6)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
222 54.226.25.74
|
||||
241 40.77.167.13
|
||||
272 46.101.86.248
|
||||
@ -1038,13 +1038,13 @@ $ schedtool -D -e ionice -c2 -n7 nice -n19 /home/cgspace.cgiar.org/bin/dspace fi
|
||||
zsh: abort (core dumped) identify Food\ safety\ Kenya\ fruits.pdf\[0\]
|
||||
$ identify Food\ safety\ Kenya\ fruits.pdf\[0\]
|
||||
Food safety Kenya fruits.pdf[0]=>Food safety Kenya fruits.pdf PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.000u 0:00.000
|
||||
identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/1747.
|
||||
identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/1747.
|
||||
</code></pre><ul>
|
||||
<li>I reported it to the Arch Linux bug tracker (<a href="https://bugs.archlinux.org/task/61513">61513</a>)</li>
|
||||
<li>I told Atmire to go ahead with the Metadata Quality Module addition based on our <code>5_x-dev</code> branch (<a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=657">657</a>)</li>
|
||||
<li>Linode sent alerts last night to say that CGSpace (linode18) was using high CPU last night, here are the top ten IPs from the nginx logs around that time:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "23/Jan/2019:(18|19|20)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "23/Jan/2019:(18|19|20)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
305 3.81.136.184
|
||||
306 3.83.14.11
|
||||
306 52.54.252.47
|
||||
@ -1059,7 +1059,7 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
|
||||
<li>45.5.186.2 is CIAT and 66.249.64.155 is Google… hmmm.</li>
|
||||
<li>Linode sent another alert this morning, here are the top ten IPs active during that time:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "24/Jan/2019:0(4|5|6)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "24/Jan/2019:0(4|5|6)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
360 3.89.134.93
|
||||
362 34.230.15.139
|
||||
366 100.24.48.177
|
||||
@ -1073,7 +1073,7 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
|
||||
</code></pre><ul>
|
||||
<li>Just double checking what CIAT is doing, they are mainly hitting the REST API:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "24/Jan/2019:" | grep 45.5.186.2 | grep -Eo "GET /(handle|bitstream|rest|oai)/" | sort | uniq -c | sort -n
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "24/Jan/2019:" | grep 45.5.186.2 | grep -Eo "GET /(handle|bitstream|rest|oai)/" | sort | uniq -c | sort -n
|
||||
</code></pre><ul>
|
||||
<li>CIAT’s community currently has 12,000 items in it so this is normal</li>
|
||||
<li>The issue with goo.gl links that we saw yesterday appears to be resolved, as links are working again…</li>
|
||||
@ -1102,7 +1102,7 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
|
||||
<ul>
|
||||
<li>Linode sent an email that the server was using a lot of CPU this morning, and these were the top IPs in the web server logs at the time:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "27/Jan/2019:0(6|7|8)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "27/Jan/2019:0(6|7|8)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
189 40.77.167.108
|
||||
191 157.55.39.2
|
||||
263 34.218.226.147
|
||||
@ -1132,7 +1132,7 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
|
||||
</li>
|
||||
<li>Linode alerted that CGSpace (linode18) was using too much CPU again this morning, here are the active IPs from the web server log at the time:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "28/Jan/2019:0(6|7|8)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "28/Jan/2019:0(6|7|8)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
67 207.46.13.50
|
||||
105 41.204.190.40
|
||||
117 34.218.226.147
|
||||
@ -1153,7 +1153,7 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
|
||||
</li>
|
||||
<li>Last night Linode sent an alert that CGSpace (linode18) was using high CPU, here are the most active IPs in the hours just before, during, and after the alert:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "28/Jan/2019:(17|18|19|20|21)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "28/Jan/2019:(17|18|19|20|21)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
310 45.5.184.2
|
||||
425 5.143.231.39
|
||||
526 54.70.40.11
|
||||
@ -1173,7 +1173,7 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
|
||||
<ul>
|
||||
<li>Linode sent an alert about CGSpace (linode18) CPU usage this morning, here are the top IPs in the web server logs just before, during, and after the alert:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "29/Jan/2019:0(3|4|5|6|7)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "29/Jan/2019:0(3|4|5|6|7)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
334 45.5.184.72
|
||||
429 66.249.66.223
|
||||
522 35.237.175.180
|
||||
@ -1198,7 +1198,7 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
|
||||
<ul>
|
||||
<li>Got another alert from Linode about CGSpace (linode18) this morning, here are the top IPs before, during, and after the alert:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "30/Jan/2019:0(5|6|7|8|9)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "30/Jan/2019:0(5|6|7|8|9)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
273 46.101.86.248
|
||||
301 35.237.175.180
|
||||
334 45.5.184.72
|
||||
@ -1216,7 +1216,7 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
|
||||
<ul>
|
||||
<li>Linode sent alerts about CGSpace (linode18) last night and this morning, here are the top IPs before, during, and after those times:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "30/Jan/2019:(16|17|18|19|20)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "30/Jan/2019:(16|17|18|19|20)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
436 18.196.196.108
|
||||
460 157.55.39.168
|
||||
460 207.46.13.96
|
||||
@ -1227,7 +1227,7 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
|
||||
1601 85.25.237.71
|
||||
1894 66.249.66.219
|
||||
2610 45.5.184.2
|
||||
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "31/Jan/2019:0(2|3|4|5|6)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "31/Jan/2019:0(2|3|4|5|6)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
318 207.46.13.242
|
||||
334 45.5.184.72
|
||||
486 35.237.175.180
|
||||
|
Reference in New Issue
Block a user