Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!
Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!
<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
<li>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</li>
<li>There were just over 3 million accesses in the nginx logs last month:</li>
</ul>
<pre><code># time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Jan/2019"
3018243
real 0m19.873s
user 0m22.203s
sys 0m1.979s
</code></pre>
<ul>
<li>Normally I’d say this was very high, but <ahref="/cgspace-notes/2018-02/">about this time last year</a> I remember thinking the same thing when we had 3.1 million…</li>
<li>I will have to keep an eye on this to see if there is some error in Solr…</li>
<li>Atmire sent their <ahref="https://github.com/ilri/DSpace/pull/407">pull request to re-enable the Metadata Quality Module (MQM) on our <code>5_x-dev</code> branch</a> today
<ul>
<li>I will test it next week and send them feedback</li>
<li>Another alert from Linode about CGSpace (linode18) this morning, here are the top IPs in the web server logs before, during, and after that time:</li>
<li><code>45.5.184.2</code> is CIAT and <code>85.25.237.71</code> is the new Linguee bot that I first noticed a few days ago</li>
<li>I will increase the Linode alert threshold from 275 to 300% because this is becoming too much!</li>
<li>I tested the Atmire Metadata Quality Module (MQM)’s duplicate checked on the some <ahref="https://dspacetest.cgiar.org/handle/10568/81268">WLE items</a> that I helped Udana with a few months ago on DSpace Test (linode19) and indeed it found many duplicates!</li>
<li><code>45.5.184.2</code> is CIAT, <code>70.32.83.92</code> and <code>205.186.128.185</code> are Macaroni Bros harvesters for CCAFS I think</li>
<li><code>195.201.104.240</code> is a new IP address in Germany with the following user agent:</li>
</ul>
<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
</code></pre>
<ul>
<li>This user was making 20–60 requests per minute this morning… seems like I should try to block this type of behavior heuristically, regardless of user agent!</li>
<li>This user was making requests to <code>/browse</code>, which is not currently under the existing rate limiting of dynamic pages in our nginx config
<ul>
<li>I <ahref="https://github.com/ilri/rmg-ansible-public/commit/36dfb072d6724fb5cdc81ef79cab08ed9ce427ad">extended the existing <code>dynamicpages</code> (12/m) rate limit to <code>/browse</code> and <code>/discover</code></a> with an allowance for bursting of up to five requests for “real” users</li>