diff --git a/content/posts/2019-02.md b/content/posts/2019-02.md index e3ce1cee1..9bd937e2f 100644 --- a/content/posts/2019-02.md +++ b/content/posts/2019-02.md @@ -66,4 +66,66 @@ sys 0m1.979s - I will increase the Linode alert threshold from 275 to 300% because this is becoming too much! - I tested the Atmire Metadata Quality Module (MQM)'s duplicate checked on the some [WLE items](https://dspacetest.cgiar.org/handle/10568/81268) that I helped Udana with a few months ago on DSpace Test (linode19) and indeed it found many duplicates! +## 2019-02-03 + +- This is seriously getting annoying, Linode sent another alert this morning that CGSpace (linode18) load was 377%! +- Here are the top IPs before, during, and after that time: + +``` +# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Feb/2019:0(5|6|7|8|9)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 + 325 85.25.237.71 + 340 45.5.184.72 + 431 5.143.231.8 + 756 5.9.6.51 + 1048 34.218.226.147 + 1203 66.249.66.219 + 1496 195.201.104.240 + 4658 205.186.128.185 + 4658 70.32.83.92 + 4852 45.5.184.2 +``` + +- `45.5.184.2` is CIAT, `70.32.83.92` and `205.186.128.185` are Macaroni Bros harvesters for CCAFS I think +- `195.201.104.240` is a new IP address in Germany with the following user agent: + +``` +Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0 +``` + +- This user was making 20–60 requests per minute this morning... seems like I should try to block this type of behavior heuristically, regardless of user agent! + +``` +# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Feb/2019" | grep 195.201.104.240 | grep -o -E '03/Feb/2019:0[0-9]:[0-9][0-9]' | uniq -c | sort -n | tail -n 20 + 19 03/Feb/2019:07:42 + 20 03/Feb/2019:07:12 + 21 03/Feb/2019:07:27 + 21 03/Feb/2019:07:28 + 25 03/Feb/2019:07:23 + 25 03/Feb/2019:07:29 + 26 03/Feb/2019:07:33 + 28 03/Feb/2019:07:38 + 30 03/Feb/2019:07:31 + 33 03/Feb/2019:07:35 + 33 03/Feb/2019:07:37 + 38 03/Feb/2019:07:40 + 43 03/Feb/2019:07:24 + 43 03/Feb/2019:07:32 + 46 03/Feb/2019:07:36 + 47 03/Feb/2019:07:34 + 47 03/Feb/2019:07:39 + 47 03/Feb/2019:07:41 + 51 03/Feb/2019:07:26 + 59 03/Feb/2019:07:25 +``` + +- At least they re-used their Tomcat session! + +``` +$ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=195.201.104.240' dspace.log.2019-02-03 | sort | uniq | wc -l +1 +``` + +- This user was making requests to `/browse`, which is not currently under the existing rate limiting of dynamic pages in our nginx config + - I [extended the existing `dynamicpages` (12/m) rate limit to `/browse` and `/discover`](https://github.com/ilri/rmg-ansible-public/commit/36dfb072d6724fb5cdc81ef79cab08ed9ce427ad) with an allowance for bursting of up to five requests for "real" users + diff --git a/docs/2019-02/index.html b/docs/2019-02/index.html index 0de08d8d5..5ec2bcb81 100644 --- a/docs/2019-02/index.html +++ b/docs/2019-02/index.html @@ -89,7 +89,7 @@ sys 0m1.979s "@type": "BlogPosting", "headline": "February, 2019", "url": "https://alanorth.github.io/cgspace-notes/2019-02/", - "wordCount": "367", + "wordCount": "640", "datePublished": "2019-02-01T21:37:30+02:00", "dateModified": "2019-02-02T11:36:24+02:00", "author": { @@ -228,6 +228,77 @@ sys 0m1.979s
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Feb/2019:0(5|6|7|8|9)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
+ 325 85.25.237.71
+ 340 45.5.184.72
+ 431 5.143.231.8
+ 756 5.9.6.51
+ 1048 34.218.226.147
+ 1203 66.249.66.219
+ 1496 195.201.104.240
+ 4658 205.186.128.185
+ 4658 70.32.83.92
+ 4852 45.5.184.2
+
+
+45.5.184.2
is CIAT, 70.32.83.92
and 205.186.128.185
are Macaroni Bros harvesters for CCAFS I think195.201.104.240
is a new IP address in Germany with the following user agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
+
+
+# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Feb/2019" | grep 195.201.104.240 | grep -o -E '03/Feb/2019:0[0-9]:[0-9][0-9]' | uniq -c | sort -n | tail -n 20
+ 19 03/Feb/2019:07:42
+ 20 03/Feb/2019:07:12
+ 21 03/Feb/2019:07:27
+ 21 03/Feb/2019:07:28
+ 25 03/Feb/2019:07:23
+ 25 03/Feb/2019:07:29
+ 26 03/Feb/2019:07:33
+ 28 03/Feb/2019:07:38
+ 30 03/Feb/2019:07:31
+ 33 03/Feb/2019:07:35
+ 33 03/Feb/2019:07:37
+ 38 03/Feb/2019:07:40
+ 43 03/Feb/2019:07:24
+ 43 03/Feb/2019:07:32
+ 46 03/Feb/2019:07:36
+ 47 03/Feb/2019:07:34
+ 47 03/Feb/2019:07:39
+ 47 03/Feb/2019:07:41
+ 51 03/Feb/2019:07:26
+ 59 03/Feb/2019:07:25
+
+
+$ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=195.201.104.240' dspace.log.2019-02-03 | sort | uniq | wc -l
+1
+
+
+/browse
, which is not currently under the existing rate limiting of dynamic pages in our nginx config
+
+dynamicpages
(12/m) rate limit to /browse
and /discover
with an allowance for bursting of up to five requests for “real” users