mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-26 08:28:18 +01:00
Update notes for 2017-11-07
This commit is contained in:
parent
1169510b5e
commit
950b0d3a24
@ -253,7 +253,7 @@ $ grep -c 207.46.13.36 /var/log/nginx/access.log.1
|
|||||||
- I think I will end up blocking Baidu as well...
|
- I think I will end up blocking Baidu as well...
|
||||||
- Next is for me to look and see what was happening specifically at 3AM and 7AM when the server crashed
|
- Next is for me to look and see what was happening specifically at 3AM and 7AM when the server crashed
|
||||||
- I should look in nginx access.log, rest.log, oai.log, and DSpace's dspace.log.2017-11-07
|
- I should look in nginx access.log, rest.log, oai.log, and DSpace's dspace.log.2017-11-07
|
||||||
- Here are the top IPs during 2–10 AM:
|
- Here are the top IPs making requests to XMLUI from 2–8 AM:
|
||||||
|
|
||||||
```
|
```
|
||||||
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '07/Nov/2017:0[2-8]' | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '07/Nov/2017:0[2-8]' | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
||||||
@ -270,3 +270,97 @@ $ grep -c 207.46.13.36 /var/log/nginx/access.log.1
|
|||||||
```
|
```
|
||||||
|
|
||||||
- Of those, most are Google, Bing, Yahoo, etc, except 63.143.42.244 and 63.143.42.242 which are Uptime Robot
|
- Of those, most are Google, Bing, Yahoo, etc, except 63.143.42.244 and 63.143.42.242 which are Uptime Robot
|
||||||
|
- Here are the top IPs making requests to REST from 2–8 AM:
|
||||||
|
|
||||||
|
```
|
||||||
|
# cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 | grep -E '07/Nov/2017:0[2-8]' | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
||||||
|
8 207.241.229.237
|
||||||
|
10 66.249.66.90
|
||||||
|
16 104.196.152.243
|
||||||
|
25 41.60.238.61
|
||||||
|
26 157.55.39.161
|
||||||
|
27 207.46.13.103
|
||||||
|
27 207.46.13.80
|
||||||
|
31 207.46.13.36
|
||||||
|
1498 50.116.102.77
|
||||||
|
```
|
||||||
|
|
||||||
|
- The OAI requests during that same time period are nothing to worry about:
|
||||||
|
|
||||||
|
```
|
||||||
|
# cat /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E '07/Nov/2017:0[2-8]' | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
||||||
|
1 66.249.66.92
|
||||||
|
4 66.249.66.90
|
||||||
|
6 68.180.229.254
|
||||||
|
```
|
||||||
|
|
||||||
|
- The top IPs from dspace.log during the 2–8 AM period:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ grep -E '2017-11-07 0[2-8]' dspace.log.2017-11-07 | grep -o -E 'ip_addr=[0-9.]+' | sort -n | uniq -c | sort -h | tail
|
||||||
|
143 ip_addr=213.55.99.121
|
||||||
|
181 ip_addr=66.249.66.91
|
||||||
|
223 ip_addr=157.55.39.161
|
||||||
|
248 ip_addr=207.46.13.80
|
||||||
|
251 ip_addr=207.46.13.103
|
||||||
|
291 ip_addr=207.46.13.36
|
||||||
|
297 ip_addr=197.210.168.174
|
||||||
|
312 ip_addr=65.49.68.199
|
||||||
|
462 ip_addr=104.196.152.243
|
||||||
|
488 ip_addr=66.249.66.90
|
||||||
|
```
|
||||||
|
|
||||||
|
- These aren't actually very interesting, as the top few are Google, CIAT, Bingbot, and a few other unknown scrapers
|
||||||
|
- The number of requests isn't even that high to be honest
|
||||||
|
- As I was looking at these logs I noticed another heavy user (124.17.34.59) that was not active during this time period, but made many requests today alone:
|
||||||
|
|
||||||
|
```
|
||||||
|
# zgrep -c 124.17.34.59 /var/log/nginx/access.log*
|
||||||
|
/var/log/nginx/access.log:22581
|
||||||
|
/var/log/nginx/access.log.1:0
|
||||||
|
/var/log/nginx/access.log.2.gz:14
|
||||||
|
/var/log/nginx/access.log.3.gz:0
|
||||||
|
/var/log/nginx/access.log.4.gz:0
|
||||||
|
/var/log/nginx/access.log.5.gz:3
|
||||||
|
/var/log/nginx/access.log.6.gz:0
|
||||||
|
/var/log/nginx/access.log.7.gz:0
|
||||||
|
/var/log/nginx/access.log.8.gz:0
|
||||||
|
/var/log/nginx/access.log.9.gz:1
|
||||||
|
```
|
||||||
|
|
||||||
|
- The whois data shows the IP is from China, but the user agent doesn't really give any clues:
|
||||||
|
|
||||||
|
```
|
||||||
|
# grep 124.17.34.59 /var/log/nginx/access.log | awk -F'" ' '{print $3}' | sort | uniq -c | sort -h
|
||||||
|
210 "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
|
||||||
|
22610 "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.2; Win64; x64; Trident/7.0; LCTE)"
|
||||||
|
```
|
||||||
|
|
||||||
|
- A Google search for "LCTE bot" doesn't return anything interesting, but this [Stack Overflow discussion](https://stackoverflow.com/questions/42500881/what-is-lcte-in-user-agent) references the lack of information
|
||||||
|
- So basically after a few hours of looking at the log files I am not closer to understanding what is going on!
|
||||||
|
- I do know that we want to block Baidu, though, as it does not respect `robots.txt`
|
||||||
|
- And as we speak Linode alerted that the outbound traffic rate is very high for the past two hours (about 12–14 hours)
|
||||||
|
- At least for now it seems to be that new Chinese IP (124.17.34.59):
|
||||||
|
|
||||||
|
```
|
||||||
|
# grep -E "07/Nov/2017:1[234]:" /var/log/nginx/access.log | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
||||||
|
198 207.46.13.103
|
||||||
|
203 207.46.13.80
|
||||||
|
205 207.46.13.36
|
||||||
|
218 157.55.39.161
|
||||||
|
249 45.5.184.221
|
||||||
|
258 45.5.187.130
|
||||||
|
386 66.249.66.90
|
||||||
|
410 197.210.168.174
|
||||||
|
1896 104.196.152.243
|
||||||
|
11005 124.17.34.59
|
||||||
|
```
|
||||||
|
|
||||||
|
- Seems 124.17.34.59 are really downloading all our PDFs, compared to the next top active IPs during this time!
|
||||||
|
|
||||||
|
```
|
||||||
|
# grep -E "07/Nov/2017:1[234]:" /var/log/nginx/access.log | grep 124.17.34.59 | grep -c pdf
|
||||||
|
5948
|
||||||
|
# grep -E "07/Nov/2017:1[234]:" /var/log/nginx/access.log | grep 104.196.152.243 | grep -c pdf
|
||||||
|
0
|
||||||
|
```
|
||||||
|
@ -38,7 +38,7 @@ COPY 54701
|
|||||||
|
|
||||||
<meta property="article:published_time" content="2017-11-02T09:37:54+02:00"/>
|
<meta property="article:published_time" content="2017-11-02T09:37:54+02:00"/>
|
||||||
|
|
||||||
<meta property="article:modified_time" content="2017-11-05T15:53:35+02:00"/>
|
<meta property="article:modified_time" content="2017-11-07T14:50:01+02:00"/>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@ -86,9 +86,9 @@ COPY 54701
|
|||||||
"@type": "BlogPosting",
|
"@type": "BlogPosting",
|
||||||
"headline": "November, 2017",
|
"headline": "November, 2017",
|
||||||
"url": "https://alanorth.github.io/cgspace-notes/2017-11/",
|
"url": "https://alanorth.github.io/cgspace-notes/2017-11/",
|
||||||
"wordCount": "1445",
|
"wordCount": "1905",
|
||||||
"datePublished": "2017-11-02T09:37:54+02:00",
|
"datePublished": "2017-11-02T09:37:54+02:00",
|
||||||
"dateModified": "2017-11-05T15:53:35+02:00",
|
"dateModified": "2017-11-07T14:50:01+02:00",
|
||||||
"author": {
|
"author": {
|
||||||
"@type": "Person",
|
"@type": "Person",
|
||||||
"name": "Alan Orth"
|
"name": "Alan Orth"
|
||||||
@ -433,7 +433,7 @@ $ grep 104.196.152.243 dspace.log.2017-11-01 | grep -o -E 'session_id=[A-Z0-9]{3
|
|||||||
<li>I think I will end up blocking Baidu as well…</li>
|
<li>I think I will end up blocking Baidu as well…</li>
|
||||||
<li>Next is for me to look and see what was happening specifically at 3AM and 7AM when the server crashed</li>
|
<li>Next is for me to look and see what was happening specifically at 3AM and 7AM when the server crashed</li>
|
||||||
<li>I should look in nginx access.log, rest.log, oai.log, and DSpace’s dspace.log.2017-11-07</li>
|
<li>I should look in nginx access.log, rest.log, oai.log, and DSpace’s dspace.log.2017-11-07</li>
|
||||||
<li>Here are the top IPs during 2–10 AM:</li>
|
<li>Here are the top IPs making requests to XMLUI from 2–8 AM:</li>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '07/Nov/2017:0[2-8]' | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '07/Nov/2017:0[2-8]' | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
||||||
@ -451,8 +451,107 @@ $ grep 104.196.152.243 dspace.log.2017-11-01 | grep -o -E 'session_id=[A-Z0-9]{3
|
|||||||
|
|
||||||
<ul>
|
<ul>
|
||||||
<li>Of those, most are Google, Bing, Yahoo, etc, except 63.143.42.244 and 63.143.42.242 which are Uptime Robot</li>
|
<li>Of those, most are Google, Bing, Yahoo, etc, except 63.143.42.244 and 63.143.42.242 which are Uptime Robot</li>
|
||||||
|
<li>Here are the top IPs making requests to REST from 2–8 AM:</li>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
|
<pre><code># cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 | grep -E '07/Nov/2017:0[2-8]' | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
||||||
|
8 207.241.229.237
|
||||||
|
10 66.249.66.90
|
||||||
|
16 104.196.152.243
|
||||||
|
25 41.60.238.61
|
||||||
|
26 157.55.39.161
|
||||||
|
27 207.46.13.103
|
||||||
|
27 207.46.13.80
|
||||||
|
31 207.46.13.36
|
||||||
|
1498 50.116.102.77
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>The OAI requests during that same time period are nothing to worry about:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code># cat /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E '07/Nov/2017:0[2-8]' | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
||||||
|
1 66.249.66.92
|
||||||
|
4 66.249.66.90
|
||||||
|
6 68.180.229.254
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>The top IPs from dspace.log during the 2–8 AM period:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>$ grep -E '2017-11-07 0[2-8]' dspace.log.2017-11-07 | grep -o -E 'ip_addr=[0-9.]+' | sort -n | uniq -c | sort -h | tail
|
||||||
|
143 ip_addr=213.55.99.121
|
||||||
|
181 ip_addr=66.249.66.91
|
||||||
|
223 ip_addr=157.55.39.161
|
||||||
|
248 ip_addr=207.46.13.80
|
||||||
|
251 ip_addr=207.46.13.103
|
||||||
|
291 ip_addr=207.46.13.36
|
||||||
|
297 ip_addr=197.210.168.174
|
||||||
|
312 ip_addr=65.49.68.199
|
||||||
|
462 ip_addr=104.196.152.243
|
||||||
|
488 ip_addr=66.249.66.90
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>These aren’t actually very interesting, as the top few are Google, CIAT, Bingbot, and a few other unknown scrapers</li>
|
||||||
|
<li>The number of requests isn’t even that high to be honest</li>
|
||||||
|
<li>As I was looking at these logs I noticed another heavy user (124.17.34.59) that was not active during this time period, but made many requests today alone:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code># zgrep -c 124.17.34.59 /var/log/nginx/access.log*
|
||||||
|
/var/log/nginx/access.log:22581
|
||||||
|
/var/log/nginx/access.log.1:0
|
||||||
|
/var/log/nginx/access.log.2.gz:14
|
||||||
|
/var/log/nginx/access.log.3.gz:0
|
||||||
|
/var/log/nginx/access.log.4.gz:0
|
||||||
|
/var/log/nginx/access.log.5.gz:3
|
||||||
|
/var/log/nginx/access.log.6.gz:0
|
||||||
|
/var/log/nginx/access.log.7.gz:0
|
||||||
|
/var/log/nginx/access.log.8.gz:0
|
||||||
|
/var/log/nginx/access.log.9.gz:1
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>The whois data shows the IP is from China, but the user agent doesn’t really give any clues:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code># grep 124.17.34.59 /var/log/nginx/access.log | awk -F'" ' '{print $3}' | sort | uniq -c | sort -h
|
||||||
|
210 "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
|
||||||
|
22610 "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.2; Win64; x64; Trident/7.0; LCTE)"
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>A Google search for “LCTE bot” doesn’t return anything interesting, but this <a href="https://stackoverflow.com/questions/42500881/what-is-lcte-in-user-agent">Stack Overflow discussion</a> references the lack of information</li>
|
||||||
|
<li>So basically after a few hours of looking at the log files I am not closer to understanding what is going on!</li>
|
||||||
|
<li>I do know that we want to block Baidu, though, as it does not respect <code>robots.txt</code></li>
|
||||||
|
<li>And as we speak Linode alerted that the outbound traffic rate is very high for the past two hours (about 12–14 hours)</li>
|
||||||
|
<li>At least for now it seems to be that new Chinese IP (124.17.34.59):</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code># grep -E "07/Nov/2017:1[234]:" /var/log/nginx/access.log | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
||||||
|
198 207.46.13.103
|
||||||
|
203 207.46.13.80
|
||||||
|
205 207.46.13.36
|
||||||
|
218 157.55.39.161
|
||||||
|
249 45.5.184.221
|
||||||
|
258 45.5.187.130
|
||||||
|
386 66.249.66.90
|
||||||
|
410 197.210.168.174
|
||||||
|
1896 104.196.152.243
|
||||||
|
11005 124.17.34.59
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>Seems 124.17.34.59 are really downloading all our PDFs, compared to the next top active IPs during this time!</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code># grep -E "07/Nov/2017:1[234]:" /var/log/nginx/access.log | grep 124.17.34.59 | grep -c pdf
|
||||||
|
5948
|
||||||
|
# grep -E "07/Nov/2017:1[234]:" /var/log/nginx/access.log | grep 104.196.152.243 | grep -c pdf
|
||||||
|
0
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@ -4,7 +4,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/2017-11/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/2017-11/</loc>
|
||||||
<lastmod>2017-11-05T15:53:35+02:00</lastmod>
|
<lastmod>2017-11-07T14:50:01+02:00</lastmod>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
@ -134,7 +134,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||||
<lastmod>2017-11-05T15:53:35+02:00</lastmod>
|
<lastmod>2017-11-07T14:50:01+02:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
@ -145,7 +145,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||||
<lastmod>2017-11-05T15:53:35+02:00</lastmod>
|
<lastmod>2017-11-07T14:50:01+02:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
@ -157,13 +157,13 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/post/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/post/</loc>
|
||||||
<lastmod>2017-11-05T15:53:35+02:00</lastmod>
|
<lastmod>2017-11-07T14:50:01+02:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||||
<lastmod>2017-11-05T15:53:35+02:00</lastmod>
|
<lastmod>2017-11-07T14:50:01+02:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user