Add notes for 2022-03-04

This commit is contained in:
2022-03-04 15:30:06 +03:00
parent 7453499827
commit 27acbac859
115 changed files with 6550 additions and 6444 deletions

View File

@ -12,7 +12,7 @@
Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning
I don’t see anything interesting in the web server logs around that time though:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
92 40.77.167.4
99 210.7.29.100
120 38.126.157.45
@ -38,7 +38,7 @@ I don’t see anything interesting in the web server logs around that time t
Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning
I don’t see anything interesting in the web server logs around that time though:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
92 40.77.167.4
99 210.7.29.100
120 38.126.157.45
@ -50,7 +50,7 @@ I don’t see anything interesting in the web server logs around that time t
357 207.46.13.1
903 54.70.40.11
"/>
<meta name="generator" content="Hugo 0.92.2" />
<meta name="generator" content="Hugo 0.93.1" />
@ -141,7 +141,7 @@ I don&rsquo;t see anything interesting in the web server logs around that time t
<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
<li>I don&rsquo;t see anything interesting in the web server logs around that time though:</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;02/Jan/2019:0(1|2|3)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &#34;02/Jan/2019:0(1|2|3)&#34; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
92 40.77.167.4
99 210.7.29.100
120 38.126.157.45
@ -155,14 +155,14 @@ I don&rsquo;t see anything interesting in the web server logs around that time t
</code></pre><ul>
<li>Analyzing the types of requests made by the top few IPs during that time:</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;02/Jan/2019:0(1|2|3)&quot; | grep 54.70.40.11 | grep -o -E &quot;(bitstream|discover|handle)&quot; | sort | uniq -c
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &#34;02/Jan/2019:0(1|2|3)&#34; | grep 54.70.40.11 | grep -o -E &#34;(bitstream|discover|handle)&#34; | sort | uniq -c
30 bitstream
534 discover
352 handle
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;02/Jan/2019:0(1|2|3)&quot; | grep 207.46.13.1 | grep -o -E &quot;(bitstream|discover|handle)&quot; | sort | uniq -c
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &#34;02/Jan/2019:0(1|2|3)&#34; | grep 207.46.13.1 | grep -o -E &#34;(bitstream|discover|handle)&#34; | sort | uniq -c
194 bitstream
345 handle
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;02/Jan/2019:0(1|2|3)&quot; | grep 46.101.86.248 | grep -o -E &quot;(bitstream|discover|handle)&quot; | sort | uniq -c
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &#34;02/Jan/2019:0(1|2|3)&#34; | grep 46.101.86.248 | grep -o -E &#34;(bitstream|discover|handle)&#34; | sort | uniq -c
261 handle
</code></pre><ul>
<li>It&rsquo;s not clear to me what was causing the outbound traffic spike</li>
@ -283,7 +283,7 @@ org.apache.jasper.JasperException: /home.jsp (line: [214], column: [1]) /discove
<ul>
<li>Linode sent a message last night that CGSpace (linode18) had high CPU usage, but I don&rsquo;t see anything around that time in the web server logs:</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;03/Jan/2019:1(7|8|9)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &#34;03/Jan/2019:1(7|8|9)&#34; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
189 207.46.13.192
217 31.6.77.23
340 66.249.70.29
@ -313,33 +313,33 @@ X-Content-Type-Options: nosniff
X-Frame-Options: ALLOW-FROM http://aims.fao.org
{
&quot;@context&quot;: {
&quot;@language&quot;: &quot;en&quot;,
&quot;altLabel&quot;: &quot;skos:altLabel&quot;,
&quot;hiddenLabel&quot;: &quot;skos:hiddenLabel&quot;,
&quot;isothes&quot;: &quot;http://purl.org/iso25964/skos-thes#&quot;,
&quot;onki&quot;: &quot;http://schema.onki.fi/onki#&quot;,
&quot;prefLabel&quot;: &quot;skos:prefLabel&quot;,
&quot;results&quot;: {
&quot;@container&quot;: &quot;@list&quot;,
&quot;@id&quot;: &quot;onki:results&quot;
&#34;@context&#34;: {
&#34;@language&#34;: &#34;en&#34;,
&#34;altLabel&#34;: &#34;skos:altLabel&#34;,
&#34;hiddenLabel&#34;: &#34;skos:hiddenLabel&#34;,
&#34;isothes&#34;: &#34;http://purl.org/iso25964/skos-thes#&#34;,
&#34;onki&#34;: &#34;http://schema.onki.fi/onki#&#34;,
&#34;prefLabel&#34;: &#34;skos:prefLabel&#34;,
&#34;results&#34;: {
&#34;@container&#34;: &#34;@list&#34;,
&#34;@id&#34;: &#34;onki:results&#34;
},
&quot;skos&quot;: &quot;http://www.w3.org/2004/02/skos/core#&quot;,
&quot;type&quot;: &quot;@type&quot;,
&quot;uri&quot;: &quot;@id&quot;
&#34;skos&#34;: &#34;http://www.w3.org/2004/02/skos/core#&#34;,
&#34;type&#34;: &#34;@type&#34;,
&#34;uri&#34;: &#34;@id&#34;
},
&quot;results&quot;: [
&#34;results&#34;: [
{
&quot;lang&quot;: &quot;en&quot;,
&quot;prefLabel&quot;: &quot;soil&quot;,
&quot;type&quot;: [
&quot;skos:Concept&quot;
&#34;lang&#34;: &#34;en&#34;,
&#34;prefLabel&#34;: &#34;soil&#34;,
&#34;type&#34;: [
&#34;skos:Concept&#34;
],
&quot;uri&quot;: &quot;http://aims.fao.org/aos/agrovoc/c_7156&quot;,
&quot;vocab&quot;: &quot;agrovoc&quot;
&#34;uri&#34;: &#34;http://aims.fao.org/aos/agrovoc/c_7156&#34;,
&#34;vocab&#34;: &#34;agrovoc&#34;
}
],
&quot;uri&quot;: &quot;&quot;
&#34;uri&#34;: &#34;&#34;
}
</code></pre><ul>
<li>The API does not appear to be case sensitive (searches for <code>SOIL</code> and <code>soil</code> return the same thing)</li>
@ -359,23 +359,23 @@ X-Content-Type-Options: nosniff
X-Frame-Options: ALLOW-FROM http://aims.fao.org
{
&quot;@context&quot;: {
&quot;@language&quot;: &quot;en&quot;,
&quot;altLabel&quot;: &quot;skos:altLabel&quot;,
&quot;hiddenLabel&quot;: &quot;skos:hiddenLabel&quot;,
&quot;isothes&quot;: &quot;http://purl.org/iso25964/skos-thes#&quot;,
&quot;onki&quot;: &quot;http://schema.onki.fi/onki#&quot;,
&quot;prefLabel&quot;: &quot;skos:prefLabel&quot;,
&quot;results&quot;: {
&quot;@container&quot;: &quot;@list&quot;,
&quot;@id&quot;: &quot;onki:results&quot;
&#34;@context&#34;: {
&#34;@language&#34;: &#34;en&#34;,
&#34;altLabel&#34;: &#34;skos:altLabel&#34;,
&#34;hiddenLabel&#34;: &#34;skos:hiddenLabel&#34;,
&#34;isothes&#34;: &#34;http://purl.org/iso25964/skos-thes#&#34;,
&#34;onki&#34;: &#34;http://schema.onki.fi/onki#&#34;,
&#34;prefLabel&#34;: &#34;skos:prefLabel&#34;,
&#34;results&#34;: {
&#34;@container&#34;: &#34;@list&#34;,
&#34;@id&#34;: &#34;onki:results&#34;
},
&quot;skos&quot;: &quot;http://www.w3.org/2004/02/skos/core#&quot;,
&quot;type&quot;: &quot;@type&quot;,
&quot;uri&quot;: &quot;@id&quot;
&#34;skos&#34;: &#34;http://www.w3.org/2004/02/skos/core#&#34;,
&#34;type&#34;: &#34;@type&#34;,
&#34;uri&#34;: &#34;@id&#34;
},
&quot;results&quot;: [],
&quot;uri&quot;: &quot;&quot;
&#34;results&#34;: [],
&#34;uri&#34;: &#34;&#34;
}
</code></pre><ul>
<li>I guess the <code>results</code> object will just be empty&hellip;</li>
@ -386,28 +386,28 @@ $ . /tmp/sparql/bin/activate
$ pip install sparql-client ipython
$ ipython
In [10]: import sparql
In [11]: s = sparql.Service(&quot;http://agrovoc.uniroma2.it:3030/agrovoc/sparql&quot;, &quot;utf-8&quot;, &quot;GET&quot;)
In [12]: statement=('PREFIX skos: &lt;http://www.w3.org/2004/02/skos/core#&gt; '
...: 'SELECT '
...: '?label '
...: 'WHERE { '
...: '{ ?concept skos:altLabel ?label . } UNION { ?concept skos:prefLabel ?label . } '
...: 'FILTER regex(str(?label), &quot;^fish&quot;, &quot;i&quot;) . '
...: '} LIMIT 10')
In [11]: s = sparql.Service(&#34;http://agrovoc.uniroma2.it:3030/agrovoc/sparql&#34;, &#34;utf-8&#34;, &#34;GET&#34;)
In [12]: statement=(&#39;PREFIX skos: &lt;http://www.w3.org/2004/02/skos/core#&gt; &#39;
...: &#39;SELECT &#39;
...: &#39;?label &#39;
...: &#39;WHERE { &#39;
...: &#39;{ ?concept skos:altLabel ?label . } UNION { ?concept skos:prefLabel ?label . } &#39;
...: &#39;FILTER regex(str(?label), &#34;^fish&#34;, &#34;i&#34;) . &#39;
...: &#39;} LIMIT 10&#39;)
In [13]: result = s.query(statement)
In [14]: for row in result.fetchone():
...: print(row)
...:
(&lt;Literal &quot;fish catching&quot;@en&gt;,)
(&lt;Literal &quot;fish harvesting&quot;@en&gt;,)
(&lt;Literal &quot;fish meat&quot;@en&gt;,)
(&lt;Literal &quot;fish roe&quot;@en&gt;,)
(&lt;Literal &quot;fish conversion&quot;@en&gt;,)
(&lt;Literal &quot;fisheries catches (composition)&quot;@en&gt;,)
(&lt;Literal &quot;fishtail palm&quot;@en&gt;,)
(&lt;Literal &quot;fishflies&quot;@en&gt;,)
(&lt;Literal &quot;fishery biology&quot;@en&gt;,)
(&lt;Literal &quot;fish production&quot;@en&gt;,)
(&lt;Literal &#34;fish catching&#34;@en&gt;,)
(&lt;Literal &#34;fish harvesting&#34;@en&gt;,)
(&lt;Literal &#34;fish meat&#34;@en&gt;,)
(&lt;Literal &#34;fish roe&#34;@en&gt;,)
(&lt;Literal &#34;fish conversion&#34;@en&gt;,)
(&lt;Literal &#34;fisheries catches (composition)&#34;@en&gt;,)
(&lt;Literal &#34;fishtail palm&#34;@en&gt;,)
(&lt;Literal &#34;fishflies&#34;@en&gt;,)
(&lt;Literal &#34;fishery biology&#34;@en&gt;,)
(&lt;Literal &#34;fish production&#34;@en&gt;,)
</code></pre><ul>
<li>The SPARQL query comes from my notes in <a href="/cgspace-notes/2017-08/">2017-08</a></li>
</ul>
@ -466,7 +466,7 @@ In [14]: for row in result.fetchone():
</li>
<li>I am testing the speed of the WorldFish DSpace repository&rsquo;s REST API and it&rsquo;s five to ten times faster than CGSpace as I tested in <a href="/cgspace-notes/2018-10/">2018-10</a>:</li>
</ul>
<pre tabindex="0"><code>$ time http --print h 'https://digitalarchive.worldfishcenter.org/rest/items?expand=metadata,bitstreams,parentCommunityList&amp;limit=100&amp;offset=0'
<pre tabindex="0"><code>$ time http --print h &#39;https://digitalarchive.worldfishcenter.org/rest/items?expand=metadata,bitstreams,parentCommunityList&amp;limit=100&amp;offset=0&#39;
0.16s user 0.03s system 3% cpu 5.185 total
0.17s user 0.02s system 2% cpu 7.123 total
@ -474,7 +474,7 @@ In [14]: for row in result.fetchone():
</code></pre><ul>
<li>In other news, Linode sent a mail last night that the CPU load on CGSpace (linode18) was high, here are the top IPs in the logs around those few hours:</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;14/Jan/2019:(17|18|19|20)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &#34;14/Jan/2019:(17|18|19|20)&#34; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
157 31.6.77.23
192 54.70.40.11
202 66.249.64.157
@ -651,7 +651,7 @@ Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:111)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1528)
... 33 more
2019-01-16 13:37:55,401 ERROR org.apache.solr.core.SolrCore @ org.apache.solr.common.SolrException: Error CREATEing SolrCore 'statistics-2018': Unable to create core [statistics-2018] Caused by: Lock obtain timed out: NativeFSLock@/home/cgspace.cgiar.org/solr/statistics-2018/data/index/write.lock
2019-01-16 13:37:55,401 ERROR org.apache.solr.core.SolrCore @ org.apache.solr.common.SolrException: Error CREATEing SolrCore &#39;statistics-2018&#39;: Unable to create core [statistics-2018] Caused by: Lock obtain timed out: NativeFSLock@/home/cgspace.cgiar.org/solr/statistics-2018/data/index/write.lock
at org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:613)
at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:199)
at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:188)
@ -721,7 +721,7 @@ Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
<li>For 2019-01 alone the Usage Stats are already around 1.2 million</li>
<li>I tried to look in the nginx logs to see how many raw requests there are so far this month and it&rsquo;s about 1.4 million:</li>
</ul>
<pre tabindex="0"><code># time zcat --force /var/log/nginx/* | grep -cE &quot;[0-9]{1,2}/Jan/2019&quot;
<pre tabindex="0"><code># time zcat --force /var/log/nginx/* | grep -cE &#34;[0-9]{1,2}/Jan/2019&#34;
1442874
real 0m17.161s
@ -859,30 +859,30 @@ WantedBy=multi-user.target
<li>I think I might manage this the same way I do the restic releases in the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a>, where I download a specific version and symlink to some generic location without the version number</li>
<li>I verified that there is indeed an issue with sharded Solr statistics cores on DSpace, which will cause inaccurate results in the dspace-statistics-api:</li>
</ul>
<pre tabindex="0"><code>$ http 'http://localhost:3000/solr/statistics/select?indent=on&amp;rows=0&amp;q=type:2+id:11576&amp;fq=isBot:false&amp;fq=statistics_type:view' | grep numFound
&lt;result name=&quot;response&quot; numFound=&quot;33&quot; start=&quot;0&quot;&gt;
$ http 'http://localhost:3000/solr/statistics-2018/select?indent=on&amp;rows=0&amp;q=type:2+id:11576&amp;fq=isBot:false&amp;fq=statistics_type:view' | grep numFound
&lt;result name=&quot;response&quot; numFound=&quot;241&quot; start=&quot;0&quot;&gt;
<pre tabindex="0"><code>$ http &#39;http://localhost:3000/solr/statistics/select?indent=on&amp;rows=0&amp;q=type:2+id:11576&amp;fq=isBot:false&amp;fq=statistics_type:view&#39; | grep numFound
&lt;result name=&#34;response&#34; numFound=&#34;33&#34; start=&#34;0&#34;&gt;
$ http &#39;http://localhost:3000/solr/statistics-2018/select?indent=on&amp;rows=0&amp;q=type:2+id:11576&amp;fq=isBot:false&amp;fq=statistics_type:view&#39; | grep numFound
&lt;result name=&#34;response&#34; numFound=&#34;241&#34; start=&#34;0&#34;&gt;
</code></pre><ul>
<li>I opened an issue on the GitHub issue tracker (<a href="https://github.com/ilri/dspace-statistics-api/issues/10">#10</a>)</li>
<li>I don&rsquo;t think the <a href="https://solrclient.readthedocs.io/en/latest/">SolrClient library</a> we are currently using supports these type of queries so we might have to just do raw queries with requests</li>
<li>The <a href="https://github.com/django-haystack/pysolr">pysolr</a> library says it supports multicore indexes, but I am not sure it does (or at least not with our setup):</li>
</ul>
<pre tabindex="0"><code>import pysolr
solr = pysolr.Solr('http://localhost:3000/solr/statistics')
results = solr.search('type:2', **{'fq': 'isBot:false AND statistics_type:view', 'facet': 'true', 'facet.field': 'id', 'facet.mincount': 1, 'facet.limit': 10, 'facet.offset': 0, 'rows': 0})
print(results.facets['facet_fields'])
{'id': ['77572', 646, '93185', 380, '92932', 375, '102499', 372, '101430', 337, '77632', 331, '102449', 289, '102485', 276, '100849', 270, '47080', 260]}
solr = pysolr.Solr(&#39;http://localhost:3000/solr/statistics&#39;)
results = solr.search(&#39;type:2&#39;, **{&#39;fq&#39;: &#39;isBot:false AND statistics_type:view&#39;, &#39;facet&#39;: &#39;true&#39;, &#39;facet.field&#39;: &#39;id&#39;, &#39;facet.mincount&#39;: 1, &#39;facet.limit&#39;: 10, &#39;facet.offset&#39;: 0, &#39;rows&#39;: 0})
print(results.facets[&#39;facet_fields&#39;])
{&#39;id&#39;: [&#39;77572&#39;, 646, &#39;93185&#39;, 380, &#39;92932&#39;, 375, &#39;102499&#39;, 372, &#39;101430&#39;, 337, &#39;77632&#39;, 331, &#39;102449&#39;, 289, &#39;102485&#39;, 276, &#39;100849&#39;, 270, &#39;47080&#39;, 260]}
</code></pre><ul>
<li>If I double check one item from above, for example <code>77572</code>, it appears this is only working on the current statistics core and not the shards:</li>
</ul>
<pre tabindex="0"><code>import pysolr
solr = pysolr.Solr('http://localhost:3000/solr/statistics')
results = solr.search('type:2 id:77572', **{'fq': 'isBot:false AND statistics_type:view'})
solr = pysolr.Solr(&#39;http://localhost:3000/solr/statistics&#39;)
results = solr.search(&#39;type:2 id:77572&#39;, **{&#39;fq&#39;: &#39;isBot:false AND statistics_type:view&#39;})
print(results.hits)
646
solr = pysolr.Solr('http://localhost:3000/solr/statistics-2018/')
results = solr.search('type:2 id:77572', **{'fq': 'isBot:false AND statistics_type:view'})
solr = pysolr.Solr(&#39;http://localhost:3000/solr/statistics-2018/&#39;)
results = solr.search(&#39;type:2 id:77572&#39;, **{&#39;fq&#39;: &#39;isBot:false AND statistics_type:view&#39;})
print(results.hits)
595
</code></pre><ul>
@ -894,13 +894,13 @@ print(results.hits)
<li>I think I figured out how to search across shards, I needed to give the whole URL to each other core</li>
<li>Now I get more results when I start adding the other statistics cores:</li>
</ul>
<pre tabindex="0"><code>$ http 'http://localhost:3000/solr/statistics/select?&amp;indent=on&amp;rows=0&amp;q=*:*' | grep numFound&lt;result name=&quot;response&quot; numFound=&quot;2061320&quot; start=&quot;0&quot;&gt;
$ http 'http://localhost:3000/solr/statistics/select?&amp;shards=localhost:8081/solr/statistics-2018&amp;indent=on&amp;rows=0&amp;q=*:*' | grep numFound
&lt;result name=&quot;response&quot; numFound=&quot;16280292&quot; start=&quot;0&quot; maxScore=&quot;1.0&quot;&gt;
$ http 'http://localhost:3000/solr/statistics/select?&amp;shards=localhost:8081/solr/statistics-2018,localhost:8081/solr/statistics-2017&amp;indent=on&amp;rows=0&amp;q=*:*' | grep numFound
&lt;result name=&quot;response&quot; numFound=&quot;25606142&quot; start=&quot;0&quot; maxScore=&quot;1.0&quot;&gt;
$ http 'http://localhost:3000/solr/statistics/select?&amp;shards=localhost:8081/solr/statistics-2018,localhost:8081/solr/statistics-2017,localhost:8081/solr/statistics-2016&amp;indent=on&amp;rows=0&amp;q=*:*' | grep numFound
&lt;result name=&quot;response&quot; numFound=&quot;31532212&quot; start=&quot;0&quot; maxScore=&quot;1.0&quot;&gt;
<pre tabindex="0"><code>$ http &#39;http://localhost:3000/solr/statistics/select?&amp;indent=on&amp;rows=0&amp;q=*:*&#39; | grep numFound&lt;result name=&#34;response&#34; numFound=&#34;2061320&#34; start=&#34;0&#34;&gt;
$ http &#39;http://localhost:3000/solr/statistics/select?&amp;shards=localhost:8081/solr/statistics-2018&amp;indent=on&amp;rows=0&amp;q=*:*&#39; | grep numFound
&lt;result name=&#34;response&#34; numFound=&#34;16280292&#34; start=&#34;0&#34; maxScore=&#34;1.0&#34;&gt;
$ http &#39;http://localhost:3000/solr/statistics/select?&amp;shards=localhost:8081/solr/statistics-2018,localhost:8081/solr/statistics-2017&amp;indent=on&amp;rows=0&amp;q=*:*&#39; | grep numFound
&lt;result name=&#34;response&#34; numFound=&#34;25606142&#34; start=&#34;0&#34; maxScore=&#34;1.0&#34;&gt;
$ http &#39;http://localhost:3000/solr/statistics/select?&amp;shards=localhost:8081/solr/statistics-2018,localhost:8081/solr/statistics-2017,localhost:8081/solr/statistics-2016&amp;indent=on&amp;rows=0&amp;q=*:*&#39; | grep numFound
&lt;result name=&#34;response&#34; numFound=&#34;31532212&#34; start=&#34;0&#34; maxScore=&#34;1.0&#34;&gt;
</code></pre><ul>
<li>I should be able to modify the dspace-statistics-api to check the shards via the Solr core status, then add the <code>shards</code> parameter to each query to make the search distributed among the cores</li>
<li>I implemented a proof of concept to query the Solr STATUS for active cores and to add them with a <code>shards</code> query string</li>
@ -913,10 +913,10 @@ $ http 'http://localhost:3000/solr/statistics/select?&amp;shards=localhost:8081/
</ul>
</li>
</ul>
<pre tabindex="0"><code>$ http 'http://localhost:8081/solr/statistics/select?indent=on&amp;rows=0&amp;q=type:2+id:11576&amp;fq=isBot:false&amp;fq=statistics_type:view&amp;shards=localhost:8081/solr/statistics,localhost:8081/solr/statistics-2018' | grep numFound
&lt;result name=&quot;response&quot; numFound=&quot;275&quot; start=&quot;0&quot; maxScore=&quot;12.205825&quot;&gt;
$ http 'http://localhost:8081/solr/statistics/select?indent=on&amp;rows=0&amp;q=type:2+id:11576&amp;fq=isBot:false&amp;fq=statistics_type:view&amp;shards=localhost:8081/solr/statistics-2018' | grep numFound
&lt;result name=&quot;response&quot; numFound=&quot;241&quot; start=&quot;0&quot; maxScore=&quot;12.205825&quot;&gt;
<pre tabindex="0"><code>$ http &#39;http://localhost:8081/solr/statistics/select?indent=on&amp;rows=0&amp;q=type:2+id:11576&amp;fq=isBot:false&amp;fq=statistics_type:view&amp;shards=localhost:8081/solr/statistics,localhost:8081/solr/statistics-2018&#39; | grep numFound
&lt;result name=&#34;response&#34; numFound=&#34;275&#34; start=&#34;0&#34; maxScore=&#34;12.205825&#34;&gt;
$ http &#39;http://localhost:8081/solr/statistics/select?indent=on&amp;rows=0&amp;q=type:2+id:11576&amp;fq=isBot:false&amp;fq=statistics_type:view&amp;shards=localhost:8081/solr/statistics-2018&#39; | grep numFound
&lt;result name=&#34;response&#34; numFound=&#34;241&#34; start=&#34;0&#34; maxScore=&#34;12.205825&#34;&gt;
</code></pre><h2 id="2019-01-22">2019-01-22</h2>
<ul>
<li>Release <a href="https://github.com/ilri/dspace-statistics-api/releases/tag/v0.9.0">version 0.9.0 of the dspace-statistics-api</a> to address the issue of querying multiple Solr statistics shards</li>
@ -924,7 +924,7 @@ $ http 'http://localhost:8081/solr/statistics/select?indent=on&amp;rows=0&amp;q=
<li>I deployed it on CGSpace (linode18) and restarted the indexer as well</li>
<li>Linode sent an alert that CGSpace (linode18) was using high CPU this afternoon, the top ten IPs during that time were:</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;22/Jan/2019:1(4|5|6)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &#34;22/Jan/2019:1(4|5|6)&#34; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
155 40.77.167.106
176 2003:d5:fbda:1c00:1106:c7a0:4b17:3af8
189 107.21.16.70
@ -979,13 +979,13 @@ $ http 'http://localhost:8081/solr/statistics/select?indent=on&amp;rows=0&amp;q=
<p>I got a list of their collections from the CGSpace XMLUI and then used an SQL query to dump the unique values to CSV:</p>
</li>
</ul>
<pre tabindex="0"><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'affiliation') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/35501', '10568/41728', '10568/49622', '10568/56589', '10568/56592', '10568/65064', '10568/65718', '10568/65719', '10568/67373', '10568/67731', '10568/68235', '10568/68546', '10568/69089', '10568/69160', '10568/69419', '10568/69556', '10568/70131', '10568/70252', '10568/70978'))) group by text_value order by count desc) to /tmp/bioversity-affiliations.csv with csv;
<pre tabindex="0"><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = &#39;contributor&#39; and qualifier = &#39;affiliation&#39;) AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in (&#39;10568/35501&#39;, &#39;10568/41728&#39;, &#39;10568/49622&#39;, &#39;10568/56589&#39;, &#39;10568/56592&#39;, &#39;10568/65064&#39;, &#39;10568/65718&#39;, &#39;10568/65719&#39;, &#39;10568/67373&#39;, &#39;10568/67731&#39;, &#39;10568/68235&#39;, &#39;10568/68546&#39;, &#39;10568/69089&#39;, &#39;10568/69160&#39;, &#39;10568/69419&#39;, &#39;10568/69556&#39;, &#39;10568/70131&#39;, &#39;10568/70252&#39;, &#39;10568/70978&#39;))) group by text_value order by count desc) to /tmp/bioversity-affiliations.csv with csv;
COPY 1109
</code></pre><ul>
<li>Send a mail to the dspace-tech mailing list about the OpenSearch issue we had with the Livestock CRP</li>
<li>Linode sent an alert that CGSpace (linode18) had a high load this morning, here are the top ten IPs during that time:</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;23/Jan/2019:0(4|5|6)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &#34;23/Jan/2019:0(4|5|6)&#34; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
222 54.226.25.74
241 40.77.167.13
272 46.101.86.248
@ -1038,13 +1038,13 @@ $ schedtool -D -e ionice -c2 -n7 nice -n19 /home/cgspace.cgiar.org/bin/dspace fi
zsh: abort (core dumped) identify Food\ safety\ Kenya\ fruits.pdf\[0\]
$ identify Food\ safety\ Kenya\ fruits.pdf\[0\]
Food safety Kenya fruits.pdf[0]=&gt;Food safety Kenya fruits.pdf PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.000u 0:00.000
identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/1747.
identify: CorruptImageProfile `xmp&#39; @ warning/profile.c/SetImageProfileInternal/1747.
</code></pre><ul>
<li>I reported it to the Arch Linux bug tracker (<a href="https://bugs.archlinux.org/task/61513">61513</a>)</li>
<li>I told Atmire to go ahead with the Metadata Quality Module addition based on our <code>5_x-dev</code> branch (<a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=657">657</a>)</li>
<li>Linode sent alerts last night to say that CGSpace (linode18) was using high CPU last night, here are the top ten IPs from the nginx logs around that time:</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;23/Jan/2019:(18|19|20)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &#34;23/Jan/2019:(18|19|20)&#34; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
305 3.81.136.184
306 3.83.14.11
306 52.54.252.47
@ -1059,7 +1059,7 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
<li>45.5.186.2 is CIAT and 66.249.64.155 is Google&hellip; hmmm.</li>
<li>Linode sent another alert this morning, here are the top ten IPs active during that time:</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;24/Jan/2019:0(4|5|6)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &#34;24/Jan/2019:0(4|5|6)&#34; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
360 3.89.134.93
362 34.230.15.139
366 100.24.48.177
@ -1073,7 +1073,7 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
</code></pre><ul>
<li>Just double checking what CIAT is doing, they are mainly hitting the REST API:</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;24/Jan/2019:&quot; | grep 45.5.186.2 | grep -Eo &quot;GET /(handle|bitstream|rest|oai)/&quot; | sort | uniq -c | sort -n
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &#34;24/Jan/2019:&#34; | grep 45.5.186.2 | grep -Eo &#34;GET /(handle|bitstream|rest|oai)/&#34; | sort | uniq -c | sort -n
</code></pre><ul>
<li>CIAT&rsquo;s community currently has 12,000 items in it so this is normal</li>
<li>The issue with goo.gl links that we saw yesterday appears to be resolved, as links are working again&hellip;</li>
@ -1102,7 +1102,7 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
<ul>
<li>Linode sent an email that the server was using a lot of CPU this morning, and these were the top IPs in the web server logs at the time:</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;27/Jan/2019:0(6|7|8)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &#34;27/Jan/2019:0(6|7|8)&#34; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
189 40.77.167.108
191 157.55.39.2
263 34.218.226.147
@ -1132,7 +1132,7 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
</li>
<li>Linode alerted that CGSpace (linode18) was using too much CPU again this morning, here are the active IPs from the web server log at the time:</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;28/Jan/2019:0(6|7|8)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &#34;28/Jan/2019:0(6|7|8)&#34; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
67 207.46.13.50
105 41.204.190.40
117 34.218.226.147
@ -1153,7 +1153,7 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
</li>
<li>Last night Linode sent an alert that CGSpace (linode18) was using high CPU, here are the most active IPs in the hours just before, during, and after the alert:</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;28/Jan/2019:(17|18|19|20|21)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &#34;28/Jan/2019:(17|18|19|20|21)&#34; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
310 45.5.184.2
425 5.143.231.39
526 54.70.40.11
@ -1173,7 +1173,7 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
<ul>
<li>Linode sent an alert about CGSpace (linode18) CPU usage this morning, here are the top IPs in the web server logs just before, during, and after the alert:</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;29/Jan/2019:0(3|4|5|6|7)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &#34;29/Jan/2019:0(3|4|5|6|7)&#34; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
334 45.5.184.72
429 66.249.66.223
522 35.237.175.180
@ -1198,7 +1198,7 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
<ul>
<li>Got another alert from Linode about CGSpace (linode18) this morning, here are the top IPs before, during, and after the alert:</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;30/Jan/2019:0(5|6|7|8|9)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &#34;30/Jan/2019:0(5|6|7|8|9)&#34; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
273 46.101.86.248
301 35.237.175.180
334 45.5.184.72
@ -1216,7 +1216,7 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
<ul>
<li>Linode sent alerts about CGSpace (linode18) last night and this morning, here are the top IPs before, during, and after those times:</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;30/Jan/2019:(16|17|18|19|20)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &#34;30/Jan/2019:(16|17|18|19|20)&#34; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
436 18.196.196.108
460 157.55.39.168
460 207.46.13.96
@ -1227,7 +1227,7 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
1601 85.25.237.71
1894 66.249.66.219
2610 45.5.184.2
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;31/Jan/2019:0(2|3|4|5|6)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &#34;31/Jan/2019:0(2|3|4|5|6)&#34; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
318 207.46.13.242
334 45.5.184.72
486 35.237.175.180