diff --git a/content/post/2017-11.md b/content/post/2017-11.md index 471e32a00..571e92e63 100644 --- a/content/post/2017-11.md +++ b/content/post/2017-11.md @@ -253,7 +253,7 @@ $ grep -c 207.46.13.36 /var/log/nginx/access.log.1 - I think I will end up blocking Baidu as well... - Next is for me to look and see what was happening specifically at 3AM and 7AM when the server crashed - I should look in nginx access.log, rest.log, oai.log, and DSpace's dspace.log.2017-11-07 -- Here are the top IPs during 2–10 AM: +- Here are the top IPs making requests to XMLUI from 2–8 AM: ``` # cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '07/Nov/2017:0[2-8]' | awk '{print $1}' | sort -n | uniq -c | sort -h | tail @@ -270,3 +270,97 @@ $ grep -c 207.46.13.36 /var/log/nginx/access.log.1 ``` - Of those, most are Google, Bing, Yahoo, etc, except 63.143.42.244 and 63.143.42.242 which are Uptime Robot +- Here are the top IPs making requests to REST from 2–8 AM: + +``` +# cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 | grep -E '07/Nov/2017:0[2-8]' | awk '{print $1}' | sort -n | uniq -c | sort -h | tail + 8 207.241.229.237 + 10 66.249.66.90 + 16 104.196.152.243 + 25 41.60.238.61 + 26 157.55.39.161 + 27 207.46.13.103 + 27 207.46.13.80 + 31 207.46.13.36 + 1498 50.116.102.77 +``` + +- The OAI requests during that same time period are nothing to worry about: + +``` +# cat /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E '07/Nov/2017:0[2-8]' | awk '{print $1}' | sort -n | uniq -c | sort -h | tail + 1 66.249.66.92 + 4 66.249.66.90 + 6 68.180.229.254 +``` + +- The top IPs from dspace.log during the 2–8 AM period: + +``` +$ grep -E '2017-11-07 0[2-8]' dspace.log.2017-11-07 | grep -o -E 'ip_addr=[0-9.]+' | sort -n | uniq -c | sort -h | tail + 143 ip_addr=213.55.99.121 + 181 ip_addr=66.249.66.91 + 223 ip_addr=157.55.39.161 + 248 ip_addr=207.46.13.80 + 251 ip_addr=207.46.13.103 + 291 ip_addr=207.46.13.36 + 297 ip_addr=197.210.168.174 + 312 ip_addr=65.49.68.199 + 462 ip_addr=104.196.152.243 + 488 ip_addr=66.249.66.90 +``` + +- These aren't actually very interesting, as the top few are Google, CIAT, Bingbot, and a few other unknown scrapers +- The number of requests isn't even that high to be honest +- As I was looking at these logs I noticed another heavy user (124.17.34.59) that was not active during this time period, but made many requests today alone: + +``` +# zgrep -c 124.17.34.59 /var/log/nginx/access.log* +/var/log/nginx/access.log:22581 +/var/log/nginx/access.log.1:0 +/var/log/nginx/access.log.2.gz:14 +/var/log/nginx/access.log.3.gz:0 +/var/log/nginx/access.log.4.gz:0 +/var/log/nginx/access.log.5.gz:3 +/var/log/nginx/access.log.6.gz:0 +/var/log/nginx/access.log.7.gz:0 +/var/log/nginx/access.log.8.gz:0 +/var/log/nginx/access.log.9.gz:1 +``` + +- The whois data shows the IP is from China, but the user agent doesn't really give any clues: + +``` +# grep 124.17.34.59 /var/log/nginx/access.log | awk -F'" ' '{print $3}' | sort | uniq -c | sort -h + 210 "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36" + 22610 "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.2; Win64; x64; Trident/7.0; LCTE)" +``` + +- A Google search for "LCTE bot" doesn't return anything interesting, but this [Stack Overflow discussion](https://stackoverflow.com/questions/42500881/what-is-lcte-in-user-agent) references the lack of information +- So basically after a few hours of looking at the log files I am not closer to understanding what is going on! +- I do know that we want to block Baidu, though, as it does not respect `robots.txt` +- And as we speak Linode alerted that the outbound traffic rate is very high for the past two hours (about 12–14 hours) +- At least for now it seems to be that new Chinese IP (124.17.34.59): + +``` +# grep -E "07/Nov/2017:1[234]:" /var/log/nginx/access.log | awk '{print $1}' | sort -n | uniq -c | sort -h | tail + 198 207.46.13.103 + 203 207.46.13.80 + 205 207.46.13.36 + 218 157.55.39.161 + 249 45.5.184.221 + 258 45.5.187.130 + 386 66.249.66.90 + 410 197.210.168.174 + 1896 104.196.152.243 + 11005 124.17.34.59 +``` + +- Seems 124.17.34.59 are really downloading all our PDFs, compared to the next top active IPs during this time! + +``` +# grep -E "07/Nov/2017:1[234]:" /var/log/nginx/access.log | grep 124.17.34.59 | grep -c pdf +5948 +# grep -E "07/Nov/2017:1[234]:" /var/log/nginx/access.log | grep 104.196.152.243 | grep -c pdf +0 +``` diff --git a/public/2017-11/index.html b/public/2017-11/index.html index 01d98b360..900f97e96 100644 --- a/public/2017-11/index.html +++ b/public/2017-11/index.html @@ -38,7 +38,7 @@ COPY 54701 - + @@ -86,9 +86,9 @@ COPY 54701 "@type": "BlogPosting", "headline": "November, 2017", "url": "https://alanorth.github.io/cgspace-notes/2017-11/", - "wordCount": "1445", + "wordCount": "1905", "datePublished": "2017-11-02T09:37:54+02:00", - "dateModified": "2017-11-05T15:53:35+02:00", + "dateModified": "2017-11-07T14:50:01+02:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -433,7 +433,7 @@ $ grep 104.196.152.243 dspace.log.2017-11-01 | grep -o -E 'session_id=[A-Z0-9]{3
  • I think I will end up blocking Baidu as well…
  • Next is for me to look and see what was happening specifically at 3AM and 7AM when the server crashed
  • I should look in nginx access.log, rest.log, oai.log, and DSpace’s dspace.log.2017-11-07
  • -
  • Here are the top IPs during 2–10 AM:
  • +
  • Here are the top IPs making requests to XMLUI from 2–8 AM:
  • # cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '07/Nov/2017:0[2-8]' | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
    @@ -451,8 +451,107 @@ $ grep 104.196.152.243 dspace.log.2017-11-01 | grep -o -E 'session_id=[A-Z0-9]{3
     
     
     
    +
    # cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 | grep -E '07/Nov/2017:0[2-8]' | awk '{print $1}' | sort -n | uniq -c | sort -h | tail                                                                        
    +      8 207.241.229.237
    +     10 66.249.66.90
    +     16 104.196.152.243
    +     25 41.60.238.61
    +     26 157.55.39.161
    +     27 207.46.13.103
    +     27 207.46.13.80
    +     31 207.46.13.36
    +   1498 50.116.102.77
    +
    + + + +
    # cat /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E '07/Nov/2017:0[2-8]' | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
    +      1 66.249.66.92
    +      4 66.249.66.90
    +      6 68.180.229.254
    +
    + + + +
    $ grep -E '2017-11-07 0[2-8]' dspace.log.2017-11-07 | grep -o -E 'ip_addr=[0-9.]+' | sort -n | uniq -c | sort -h | tail
    +    143 ip_addr=213.55.99.121
    +    181 ip_addr=66.249.66.91
    +    223 ip_addr=157.55.39.161
    +    248 ip_addr=207.46.13.80
    +    251 ip_addr=207.46.13.103
    +    291 ip_addr=207.46.13.36
    +    297 ip_addr=197.210.168.174
    +    312 ip_addr=65.49.68.199
    +    462 ip_addr=104.196.152.243
    +    488 ip_addr=66.249.66.90
    +
    + + + +
    # zgrep -c 124.17.34.59 /var/log/nginx/access.log*
    +/var/log/nginx/access.log:22581
    +/var/log/nginx/access.log.1:0
    +/var/log/nginx/access.log.2.gz:14
    +/var/log/nginx/access.log.3.gz:0
    +/var/log/nginx/access.log.4.gz:0
    +/var/log/nginx/access.log.5.gz:3
    +/var/log/nginx/access.log.6.gz:0
    +/var/log/nginx/access.log.7.gz:0
    +/var/log/nginx/access.log.8.gz:0
    +/var/log/nginx/access.log.9.gz:1
    +
    + + + +
    # grep 124.17.34.59 /var/log/nginx/access.log | awk -F'" ' '{print $3}' | sort | uniq -c | sort -h
    +    210 "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
    +  22610 "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.2; Win64; x64; Trident/7.0; LCTE)"
    +
    + + + +
    # grep -E "07/Nov/2017:1[234]:" /var/log/nginx/access.log | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
    +    198 207.46.13.103
    +    203 207.46.13.80
    +    205 207.46.13.36
    +    218 157.55.39.161
    +    249 45.5.184.221
    +    258 45.5.187.130
    +    386 66.249.66.90
    +    410 197.210.168.174
    +   1896 104.196.152.243
    +  11005 124.17.34.59
    +
    + + + +
    # grep -E "07/Nov/2017:1[234]:" /var/log/nginx/access.log | grep 124.17.34.59 | grep -c pdf
    +5948
    +# grep -E "07/Nov/2017:1[234]:" /var/log/nginx/access.log | grep 104.196.152.243 | grep -c pdf
    +0
    +
    + diff --git a/public/sitemap.xml b/public/sitemap.xml index 07d86cd23..ce470f6f2 100644 --- a/public/sitemap.xml +++ b/public/sitemap.xml @@ -4,7 +4,7 @@ https://alanorth.github.io/cgspace-notes/2017-11/ - 2017-11-05T15:53:35+02:00 + 2017-11-07T14:50:01+02:00 @@ -134,7 +134,7 @@ https://alanorth.github.io/cgspace-notes/ - 2017-11-05T15:53:35+02:00 + 2017-11-07T14:50:01+02:00 0 @@ -145,7 +145,7 @@ https://alanorth.github.io/cgspace-notes/tags/notes/ - 2017-11-05T15:53:35+02:00 + 2017-11-07T14:50:01+02:00 0 @@ -157,13 +157,13 @@ https://alanorth.github.io/cgspace-notes/post/ - 2017-11-05T15:53:35+02:00 + 2017-11-07T14:50:01+02:00 0 https://alanorth.github.io/cgspace-notes/tags/ - 2017-11-05T15:53:35+02:00 + 2017-11-07T14:50:01+02:00 0