diff --git a/content/post/2017-11.md b/content/post/2017-11.md index 471e32a00..571e92e63 100644 --- a/content/post/2017-11.md +++ b/content/post/2017-11.md @@ -253,7 +253,7 @@ $ grep -c 207.46.13.36 /var/log/nginx/access.log.1 - I think I will end up blocking Baidu as well... - Next is for me to look and see what was happening specifically at 3AM and 7AM when the server crashed - I should look in nginx access.log, rest.log, oai.log, and DSpace's dspace.log.2017-11-07 -- Here are the top IPs during 2–10 AM: +- Here are the top IPs making requests to XMLUI from 2–8 AM: ``` # cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '07/Nov/2017:0[2-8]' | awk '{print $1}' | sort -n | uniq -c | sort -h | tail @@ -270,3 +270,97 @@ $ grep -c 207.46.13.36 /var/log/nginx/access.log.1 ``` - Of those, most are Google, Bing, Yahoo, etc, except 63.143.42.244 and 63.143.42.242 which are Uptime Robot +- Here are the top IPs making requests to REST from 2–8 AM: + +``` +# cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 | grep -E '07/Nov/2017:0[2-8]' | awk '{print $1}' | sort -n | uniq -c | sort -h | tail + 8 207.241.229.237 + 10 66.249.66.90 + 16 104.196.152.243 + 25 41.60.238.61 + 26 157.55.39.161 + 27 207.46.13.103 + 27 207.46.13.80 + 31 207.46.13.36 + 1498 50.116.102.77 +``` + +- The OAI requests during that same time period are nothing to worry about: + +``` +# cat /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E '07/Nov/2017:0[2-8]' | awk '{print $1}' | sort -n | uniq -c | sort -h | tail + 1 66.249.66.92 + 4 66.249.66.90 + 6 68.180.229.254 +``` + +- The top IPs from dspace.log during the 2–8 AM period: + +``` +$ grep -E '2017-11-07 0[2-8]' dspace.log.2017-11-07 | grep -o -E 'ip_addr=[0-9.]+' | sort -n | uniq -c | sort -h | tail + 143 ip_addr=213.55.99.121 + 181 ip_addr=66.249.66.91 + 223 ip_addr=157.55.39.161 + 248 ip_addr=207.46.13.80 + 251 ip_addr=207.46.13.103 + 291 ip_addr=207.46.13.36 + 297 ip_addr=197.210.168.174 + 312 ip_addr=65.49.68.199 + 462 ip_addr=104.196.152.243 + 488 ip_addr=66.249.66.90 +``` + +- These aren't actually very interesting, as the top few are Google, CIAT, Bingbot, and a few other unknown scrapers +- The number of requests isn't even that high to be honest +- As I was looking at these logs I noticed another heavy user (124.17.34.59) that was not active during this time period, but made many requests today alone: + +``` +# zgrep -c 124.17.34.59 /var/log/nginx/access.log* +/var/log/nginx/access.log:22581 +/var/log/nginx/access.log.1:0 +/var/log/nginx/access.log.2.gz:14 +/var/log/nginx/access.log.3.gz:0 +/var/log/nginx/access.log.4.gz:0 +/var/log/nginx/access.log.5.gz:3 +/var/log/nginx/access.log.6.gz:0 +/var/log/nginx/access.log.7.gz:0 +/var/log/nginx/access.log.8.gz:0 +/var/log/nginx/access.log.9.gz:1 +``` + +- The whois data shows the IP is from China, but the user agent doesn't really give any clues: + +``` +# grep 124.17.34.59 /var/log/nginx/access.log | awk -F'" ' '{print $3}' | sort | uniq -c | sort -h + 210 "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36" + 22610 "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.2; Win64; x64; Trident/7.0; LCTE)" +``` + +- A Google search for "LCTE bot" doesn't return anything interesting, but this [Stack Overflow discussion](https://stackoverflow.com/questions/42500881/what-is-lcte-in-user-agent) references the lack of information +- So basically after a few hours of looking at the log files I am not closer to understanding what is going on! +- I do know that we want to block Baidu, though, as it does not respect `robots.txt` +- And as we speak Linode alerted that the outbound traffic rate is very high for the past two hours (about 12–14 hours) +- At least for now it seems to be that new Chinese IP (124.17.34.59): + +``` +# grep -E "07/Nov/2017:1[234]:" /var/log/nginx/access.log | awk '{print $1}' | sort -n | uniq -c | sort -h | tail + 198 207.46.13.103 + 203 207.46.13.80 + 205 207.46.13.36 + 218 157.55.39.161 + 249 45.5.184.221 + 258 45.5.187.130 + 386 66.249.66.90 + 410 197.210.168.174 + 1896 104.196.152.243 + 11005 124.17.34.59 +``` + +- Seems 124.17.34.59 are really downloading all our PDFs, compared to the next top active IPs during this time! + +``` +# grep -E "07/Nov/2017:1[234]:" /var/log/nginx/access.log | grep 124.17.34.59 | grep -c pdf +5948 +# grep -E "07/Nov/2017:1[234]:" /var/log/nginx/access.log | grep 104.196.152.243 | grep -c pdf +0 +``` diff --git a/public/2017-11/index.html b/public/2017-11/index.html index 01d98b360..900f97e96 100644 --- a/public/2017-11/index.html +++ b/public/2017-11/index.html @@ -38,7 +38,7 @@ COPY 54701 - + @@ -86,9 +86,9 @@ COPY 54701 "@type": "BlogPosting", "headline": "November, 2017", "url": "https://alanorth.github.io/cgspace-notes/2017-11/", - "wordCount": "1445", + "wordCount": "1905", "datePublished": "2017-11-02T09:37:54+02:00", - "dateModified": "2017-11-05T15:53:35+02:00", + "dateModified": "2017-11-07T14:50:01+02:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -433,7 +433,7 @@ $ grep 104.196.152.243 dspace.log.2017-11-01 | grep -o -E 'session_id=[A-Z0-9]{3
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '07/Nov/2017:0[2-8]' | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
@@ -451,8 +451,107 @@ $ grep 104.196.152.243 dspace.log.2017-11-01 | grep -o -E 'session_id=[A-Z0-9]{3
- Of those, most are Google, Bing, Yahoo, etc, except 63.143.42.244 and 63.143.42.242 which are Uptime Robot
+- Here are the top IPs making requests to REST from 2–8 AM:
+# cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 | grep -E '07/Nov/2017:0[2-8]' | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
+ 8 207.241.229.237
+ 10 66.249.66.90
+ 16 104.196.152.243
+ 25 41.60.238.61
+ 26 157.55.39.161
+ 27 207.46.13.103
+ 27 207.46.13.80
+ 31 207.46.13.36
+ 1498 50.116.102.77
+
+
+
+- The OAI requests during that same time period are nothing to worry about:
+
+
+# cat /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E '07/Nov/2017:0[2-8]' | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
+ 1 66.249.66.92
+ 4 66.249.66.90
+ 6 68.180.229.254
+
+
+
+- The top IPs from dspace.log during the 2–8 AM period:
+
+
+$ grep -E '2017-11-07 0[2-8]' dspace.log.2017-11-07 | grep -o -E 'ip_addr=[0-9.]+' | sort -n | uniq -c | sort -h | tail
+ 143 ip_addr=213.55.99.121
+ 181 ip_addr=66.249.66.91
+ 223 ip_addr=157.55.39.161
+ 248 ip_addr=207.46.13.80
+ 251 ip_addr=207.46.13.103
+ 291 ip_addr=207.46.13.36
+ 297 ip_addr=197.210.168.174
+ 312 ip_addr=65.49.68.199
+ 462 ip_addr=104.196.152.243
+ 488 ip_addr=66.249.66.90
+
+
+
+- These aren’t actually very interesting, as the top few are Google, CIAT, Bingbot, and a few other unknown scrapers
+- The number of requests isn’t even that high to be honest
+- As I was looking at these logs I noticed another heavy user (124.17.34.59) that was not active during this time period, but made many requests today alone:
+
+
+# zgrep -c 124.17.34.59 /var/log/nginx/access.log*
+/var/log/nginx/access.log:22581
+/var/log/nginx/access.log.1:0
+/var/log/nginx/access.log.2.gz:14
+/var/log/nginx/access.log.3.gz:0
+/var/log/nginx/access.log.4.gz:0
+/var/log/nginx/access.log.5.gz:3
+/var/log/nginx/access.log.6.gz:0
+/var/log/nginx/access.log.7.gz:0
+/var/log/nginx/access.log.8.gz:0
+/var/log/nginx/access.log.9.gz:1
+
+
+
+- The whois data shows the IP is from China, but the user agent doesn’t really give any clues:
+
+
+# grep 124.17.34.59 /var/log/nginx/access.log | awk -F'" ' '{print $3}' | sort | uniq -c | sort -h
+ 210 "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
+ 22610 "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.2; Win64; x64; Trident/7.0; LCTE)"
+
+
+
+- A Google search for “LCTE bot” doesn’t return anything interesting, but this Stack Overflow discussion references the lack of information
+- So basically after a few hours of looking at the log files I am not closer to understanding what is going on!
+- I do know that we want to block Baidu, though, as it does not respect
robots.txt
+- And as we speak Linode alerted that the outbound traffic rate is very high for the past two hours (about 12–14 hours)
+- At least for now it seems to be that new Chinese IP (124.17.34.59):
+
+
+# grep -E "07/Nov/2017:1[234]:" /var/log/nginx/access.log | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
+ 198 207.46.13.103
+ 203 207.46.13.80
+ 205 207.46.13.36
+ 218 157.55.39.161
+ 249 45.5.184.221
+ 258 45.5.187.130
+ 386 66.249.66.90
+ 410 197.210.168.174
+ 1896 104.196.152.243
+ 11005 124.17.34.59
+
+
+
+- Seems 124.17.34.59 are really downloading all our PDFs, compared to the next top active IPs during this time!
+
+
+# grep -E "07/Nov/2017:1[234]:" /var/log/nginx/access.log | grep 124.17.34.59 | grep -c pdf
+5948
+# grep -E "07/Nov/2017:1[234]:" /var/log/nginx/access.log | grep 104.196.152.243 | grep -c pdf
+0
+
+
diff --git a/public/sitemap.xml b/public/sitemap.xml
index 07d86cd23..ce470f6f2 100644
--- a/public/sitemap.xml
+++ b/public/sitemap.xml
@@ -4,7 +4,7 @@
https://alanorth.github.io/cgspace-notes/2017-11/
- 2017-11-05T15:53:35+02:00
+ 2017-11-07T14:50:01+02:00
@@ -134,7 +134,7 @@
https://alanorth.github.io/cgspace-notes/
- 2017-11-05T15:53:35+02:00
+ 2017-11-07T14:50:01+02:00
0
@@ -145,7 +145,7 @@
https://alanorth.github.io/cgspace-notes/tags/notes/
- 2017-11-05T15:53:35+02:00
+ 2017-11-07T14:50:01+02:00
0
@@ -157,13 +157,13 @@
https://alanorth.github.io/cgspace-notes/post/
- 2017-11-05T15:53:35+02:00
+ 2017-11-07T14:50:01+02:00
0
https://alanorth.github.io/cgspace-notes/tags/
- 2017-11-05T15:53:35+02:00
+ 2017-11-07T14:50:01+02:00
0