mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2019-02-18
This commit is contained in:
@ -758,4 +758,119 @@ UPDATE 1
|
||||
- I merged the Atmire Metadata Quality Module (MQM) changes to the `5_x-prod` branch and deployed it on CGSpace ([#407](https://github.com/ilri/DSpace/pull/407))
|
||||
- Then I ran all system updates on CGSpace server and rebooted it
|
||||
|
||||
## 2019-02-18
|
||||
|
||||
- Jesus fucking Christ, Linode sent an alert that CGSpace (linode18) was using 421% CPU for a few hours this afternoon (server time):
|
||||
- There seems to have been a lot of activity in XMLUI:
|
||||
|
||||
```
|
||||
# zcat --force /var/log/nginx/{access,error,library-access}.log /var/log/nginx/{access,error,library-access}.log.1 | grep -E "18/Feb/2019:1(2|3|4|5|6)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
1236 18.212.208.240
|
||||
1276 54.164.83.99
|
||||
1277 3.83.14.11
|
||||
1282 3.80.196.188
|
||||
1296 3.84.172.18
|
||||
1299 100.24.48.177
|
||||
1299 34.230.15.139
|
||||
1327 52.54.252.47
|
||||
1477 5.9.6.51
|
||||
1861 94.71.244.172
|
||||
# zcat --force /var/log/nginx/{oai,rest,statistics}.log /var/log/nginx/{oai,rest,statistics}.log.1 | grep -E "18/Feb/2019:1(2|3|4|5|6)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
8 42.112.238.64
|
||||
9 121.52.152.3
|
||||
9 157.55.39.50
|
||||
10 110.54.151.102
|
||||
10 194.246.119.6
|
||||
10 66.249.66.221
|
||||
15 190.56.193.94
|
||||
28 66.249.66.219
|
||||
43 34.209.213.122
|
||||
178 50.116.102.77
|
||||
# zcat --force /var/log/nginx/{access,error,library-access}.log /var/log/nginx/{access,error,library-access}.log.1 | grep -E "18/Feb/2019:1(2|3|4|5|6)" | awk '{print $1}' | sort | uniq | wc -l
|
||||
2727
|
||||
# zcat --force /var/log/nginx/{oai,rest,statistics}.log /var/log/nginx/{oai,rest,statistics}.log.1 | grep -E "18/Feb/2019:1(2|3|4|5|6)" | awk '{print $1}' | sort | uniq | wc -l
|
||||
186
|
||||
```
|
||||
|
||||
- 94.71.244.172 is in Greece and uses the user agent "Indy Library"
|
||||
- At least they are re-using their Tomcat session:
|
||||
|
||||
```
|
||||
$ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=94.71.244.172' dspace.log.2019-02-18 | sort | uniq | wc -l
|
||||
```
|
||||
|
||||
- The following IPs were all hitting the server hard simultaneously and are located on Amazon and use the user agent "Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0":
|
||||
- 52.54.252.47
|
||||
- 34.230.15.139
|
||||
- 100.24.48.177
|
||||
- 3.84.172.18
|
||||
- 3.80.196.188
|
||||
- 3.83.14.11
|
||||
- 54.164.83.99
|
||||
- 18.212.208.240
|
||||
|
||||
- Actually, even up to the top 30 IPs are almost all on Amazon and use the same user agent!
|
||||
- For reference most of these IPs hitting the XMLUI this afternoon are on Amazon:
|
||||
|
||||
```
|
||||
# zcat --force /var/log/nginx/{access,error,library-access}.log /var/log/nginx/{access,error,library-access}.log.1 | grep -E "18/Feb/2019:1(2|3|4|5|6)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 30
|
||||
1173 52.91.249.23
|
||||
1176 107.22.118.106
|
||||
1178 3.88.173.152
|
||||
1179 3.81.136.184
|
||||
1183 34.201.220.164
|
||||
1183 3.89.134.93
|
||||
1184 54.162.66.53
|
||||
1187 3.84.62.209
|
||||
1188 3.87.4.140
|
||||
1189 54.158.27.198
|
||||
1190 54.209.39.13
|
||||
1192 54.82.238.223
|
||||
1208 3.82.232.144
|
||||
1209 3.80.128.247
|
||||
1214 54.167.64.164
|
||||
1219 3.91.17.126
|
||||
1220 34.201.108.226
|
||||
1221 3.84.223.134
|
||||
1222 18.206.155.14
|
||||
1231 54.210.125.13
|
||||
1236 18.212.208.240
|
||||
1276 54.164.83.99
|
||||
1277 3.83.14.11
|
||||
1282 3.80.196.188
|
||||
1296 3.84.172.18
|
||||
1299 100.24.48.177
|
||||
1299 34.230.15.139
|
||||
1327 52.54.252.47
|
||||
1477 5.9.6.51
|
||||
1861 94.71.244.172
|
||||
```
|
||||
|
||||
- In the case of 52.54.252.47 they are only making about 10 requests per minute during this time (albeit from dozens of concurrent IPs):
|
||||
|
||||
```
|
||||
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep 52.54.252.47 | grep -o -E '18/Feb/2019:1[0-9]:[0-9][0-9]' | uniq -c | sort -n | tail -n 10
|
||||
10 18/Feb/2019:17:20
|
||||
10 18/Feb/2019:17:22
|
||||
10 18/Feb/2019:17:31
|
||||
11 18/Feb/2019:13:21
|
||||
11 18/Feb/2019:15:18
|
||||
11 18/Feb/2019:16:43
|
||||
11 18/Feb/2019:16:57
|
||||
11 18/Feb/2019:16:58
|
||||
11 18/Feb/2019:18:34
|
||||
12 18/Feb/2019:14:37
|
||||
```
|
||||
|
||||
- As this user agent is not recognized as a bot by DSpace this will definitely fuck up the usage statistics
|
||||
- There were 92,000 requests from these IPs alone today!
|
||||
|
||||
```
|
||||
# zcat --force /var/log/nginx/{access,error,library-access}.log /var/log/nginx/{access,error,library-access}.log.1 | grep -c 'Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0'
|
||||
92756
|
||||
```
|
||||
|
||||
- I will add this user agent to the ["badbots" rate limiting in our nginx configuration](https://github.com/ilri/rmg-ansible-public/blob/master/roles/dspace/templates/nginx/default.conf.j2)
|
||||
- I realized that I had effectively only been applying the "badbots" rate limiting to requests at the root, so I added it to the other blocks that match Discovery, Browse, etc as well
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user