- Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!
- The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase
- There were just over 3 million accesses in the nginx logs last month:
```
# time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Jan/2019"
3018243
real 0m19.873s
user 0m22.203s
sys 0m1.979s
```
<!--more-->
- Normally I'd say this was very high, but [about this time last year]({{< relref "2018-02.md" >}}) I remember thinking the same thing when we had 3.1 million...
- I will have to keep an eye on this to see if there is some error in Solr...
- Atmire sent their [pull request to re-enable the Metadata Quality Module (MQM) on our `5_x-dev` branch](https://github.com/ilri/DSpace/pull/407) today
-`45.5.184.2` is CIAT and `85.25.237.71` is the new Linguee bot that I first noticed a few days ago
- I will increase the Linode alert threshold from 275 to 300% because this is becoming too much!
- I tested the Atmire Metadata Quality Module (MQM)'s duplicate checked on the some [WLE items](https://dspacetest.cgiar.org/handle/10568/81268) that I helped Udana with a few months ago on DSpace Test (linode19) and indeed it found many duplicates!
-`45.5.184.2` is CIAT, `70.32.83.92` and `205.186.128.185` are Macaroni Bros harvesters for CCAFS I think
-`195.201.104.240` is a new IP address in Germany with the following user agent:
```
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
```
- This user was making 20–60 requests per minute this morning... seems like I should try to block this type of behavior heuristically, regardless of user agent!
- This user was making requests to `/browse`, which is not currently under the existing rate limiting of dynamic pages in our nginx config
- I [extended the existing `dynamicpages` (12/m) rate limit to `/browse` and `/discover`](https://github.com/ilri/rmg-ansible-public/commit/36dfb072d6724fb5cdc81ef79cab08ed9ce427ad) with an allowance for bursting of up to five requests for "real" users
- Generate a list of CTA subjects from CGSpace for Peter:
```
dspace=# \copy (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=124 GROUP BY text_value ORDER BY COUNT DESC) to /tmp/cta-subjects.csv with csv header;