Update notes for 2017-11-07

This commit is contained in:
2017-11-07 17:03:49 +02:00
parent 1169510b5e
commit 950b0d3a24
3 changed files with 203 additions and 10 deletions

View File

@ -253,7 +253,7 @@ $ grep -c 207.46.13.36 /var/log/nginx/access.log.1
- I think I will end up blocking Baidu as well...
- Next is for me to look and see what was happening specifically at 3AM and 7AM when the server crashed
- I should look in nginx access.log, rest.log, oai.log, and DSpace's dspace.log.2017-11-07
- Here are the top IPs during 210 AM:
- Here are the top IPs making requests to XMLUI from 28 AM:
```
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '07/Nov/2017:0[2-8]' | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
@ -270,3 +270,97 @@ $ grep -c 207.46.13.36 /var/log/nginx/access.log.1
```
- Of those, most are Google, Bing, Yahoo, etc, except 63.143.42.244 and 63.143.42.242 which are Uptime Robot
- Here are the top IPs making requests to REST from 28 AM:
```
# cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 | grep -E '07/Nov/2017:0[2-8]' | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
8 207.241.229.237
10 66.249.66.90
16 104.196.152.243
25 41.60.238.61
26 157.55.39.161
27 207.46.13.103
27 207.46.13.80
31 207.46.13.36
1498 50.116.102.77
```
- The OAI requests during that same time period are nothing to worry about:
```
# cat /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E '07/Nov/2017:0[2-8]' | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
1 66.249.66.92
4 66.249.66.90
6 68.180.229.254
```
- The top IPs from dspace.log during the 28 AM period:
```
$ grep -E '2017-11-07 0[2-8]' dspace.log.2017-11-07 | grep -o -E 'ip_addr=[0-9.]+' | sort -n | uniq -c | sort -h | tail
143 ip_addr=213.55.99.121
181 ip_addr=66.249.66.91
223 ip_addr=157.55.39.161
248 ip_addr=207.46.13.80
251 ip_addr=207.46.13.103
291 ip_addr=207.46.13.36
297 ip_addr=197.210.168.174
312 ip_addr=65.49.68.199
462 ip_addr=104.196.152.243
488 ip_addr=66.249.66.90
```
- These aren't actually very interesting, as the top few are Google, CIAT, Bingbot, and a few other unknown scrapers
- The number of requests isn't even that high to be honest
- As I was looking at these logs I noticed another heavy user (124.17.34.59) that was not active during this time period, but made many requests today alone:
```
# zgrep -c 124.17.34.59 /var/log/nginx/access.log*
/var/log/nginx/access.log:22581
/var/log/nginx/access.log.1:0
/var/log/nginx/access.log.2.gz:14
/var/log/nginx/access.log.3.gz:0
/var/log/nginx/access.log.4.gz:0
/var/log/nginx/access.log.5.gz:3
/var/log/nginx/access.log.6.gz:0
/var/log/nginx/access.log.7.gz:0
/var/log/nginx/access.log.8.gz:0
/var/log/nginx/access.log.9.gz:1
```
- The whois data shows the IP is from China, but the user agent doesn't really give any clues:
```
# grep 124.17.34.59 /var/log/nginx/access.log | awk -F'" ' '{print $3}' | sort | uniq -c | sort -h
210 "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
22610 "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.2; Win64; x64; Trident/7.0; LCTE)"
```
- A Google search for "LCTE bot" doesn't return anything interesting, but this [Stack Overflow discussion](https://stackoverflow.com/questions/42500881/what-is-lcte-in-user-agent) references the lack of information
- So basically after a few hours of looking at the log files I am not closer to understanding what is going on!
- I do know that we want to block Baidu, though, as it does not respect `robots.txt`
- And as we speak Linode alerted that the outbound traffic rate is very high for the past two hours (about 1214 hours)
- At least for now it seems to be that new Chinese IP (124.17.34.59):
```
# grep -E "07/Nov/2017:1[234]:" /var/log/nginx/access.log | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
198 207.46.13.103
203 207.46.13.80
205 207.46.13.36
218 157.55.39.161
249 45.5.184.221
258 45.5.187.130
386 66.249.66.90
410 197.210.168.174
1896 104.196.152.243
11005 124.17.34.59
```
- Seems 124.17.34.59 are really downloading all our PDFs, compared to the next top active IPs during this time!
```
# grep -E "07/Nov/2017:1[234]:" /var/log/nginx/access.log | grep 124.17.34.59 | grep -c pdf
5948
# grep -E "07/Nov/2017:1[234]:" /var/log/nginx/access.log | grep 104.196.152.243 | grep -c pdf
0
```