mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-25 16:08:19 +01:00
Add notes for 2019-03-26
This commit is contained in:
parent
b8af480098
commit
41adbab750
@ -800,4 +800,59 @@ $ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|ds
|
||||
- I need to watch this carefully though because I've read some places that Tomcat's DBCP doesn't track statements and might create memory leaks if an application doesn't close statements before a connection gets returned back to the pool
|
||||
- According the Uptime Robot the server was up and down a few more times over the next hour so I restarted Tomcat again
|
||||
|
||||
## 2019-03-26
|
||||
|
||||
- UptimeRobot says CGSpace went down again and I see the load is again at 14.0!
|
||||
- Here are the top IPs in nginx logs in the last hour:
|
||||
|
||||
```
|
||||
# zcat --force /var/log/nginx/{oai,rest,statistics}.log /var/log/nginx/{oai,rest,statistics}.log.1 | grep -E "26/Mar/2019:(06|07)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
3 35.174.184.209
|
||||
3 66.249.66.81
|
||||
4 104.198.9.108
|
||||
4 154.77.98.122
|
||||
4 2.50.152.13
|
||||
10 196.188.12.245
|
||||
14 66.249.66.80
|
||||
414 45.5.184.72
|
||||
535 45.5.186.2
|
||||
2014 205.186.128.185
|
||||
# zcat --force /var/log/nginx/{access,error,library-access}.log /var/log/nginx/{access,error,library-access}.log.1 | grep -E "26/Mar/2019:(06|07)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
157 41.204.190.40
|
||||
160 18.194.46.84
|
||||
160 54.70.40.11
|
||||
168 31.6.77.23
|
||||
188 66.249.66.81
|
||||
284 3.91.79.74
|
||||
405 2a01:4f8:140:3192::2
|
||||
471 66.249.66.80
|
||||
712 35.174.184.209
|
||||
784 2a01:4f8:13b:1296::2
|
||||
```
|
||||
|
||||
- The two IPV6 addresses are something called BLEXBot, which seems to check the robots.txt file and the completely ignore it by making thousands of requests to dynamic pages like Browse and Discovery
|
||||
- Then `35.174.184.209` is MauiBot, which does the same thing
|
||||
- Also `3.91.79.74` does, which appears to be CCBot
|
||||
- I will add these three to the "bad bot" rate limiting that I originally used for Baidu
|
||||
- Going further, these are the IPs making requests to Discovery and Browse pages so far today:
|
||||
|
||||
```
|
||||
# zcat --force /var/log/nginx/{access,error,library-access}.log /var/log/nginx/{access,error,library-access}.log.1 | grep -E "(discover|browse)" | grep -E "26/Mar/2019:" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
120 34.207.146.166
|
||||
128 3.91.79.74
|
||||
132 108.179.57.67
|
||||
143 34.228.42.25
|
||||
185 216.244.66.198
|
||||
430 54.70.40.11
|
||||
1033 93.179.69.74
|
||||
1206 2a01:4f8:140:3192::2
|
||||
2678 2a01:4f8:13b:1296::2
|
||||
3790 35.174.184.209
|
||||
```
|
||||
|
||||
- `54.70.40.11` is SemanticScholarBot
|
||||
- `216.244.66.198` is DotBot
|
||||
- `93.179.69.74` is some IP in Ukraine, which I will add to the list of bot IPs in nginx
|
||||
- I can only hope that this helps the load go down because all this traffic is disrupting the service for normal users and well-behaved bots (and interrupting my dinner and breakfast)
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
@ -25,7 +25,7 @@ I think I will need to ask Udana to re-copy and paste the abstracts with more ca
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-03/" />
|
||||
<meta property="article:published_time" content="2019-03-01T12:16:30+01:00"/>
|
||||
<meta property="article:modified_time" content="2019-03-25T12:59:24+02:00"/>
|
||||
<meta property="article:modified_time" content="2019-03-25T23:47:00+02:00"/>
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="March, 2019"/>
|
||||
@ -55,9 +55,9 @@ I think I will need to ask Udana to re-copy and paste the abstracts with more ca
|
||||
"@type": "BlogPosting",
|
||||
"headline": "March, 2019",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2019-03/",
|
||||
"wordCount": "5067",
|
||||
"wordCount": "5371",
|
||||
"datePublished": "2019-03-01T12:16:30+01:00",
|
||||
"dateModified": "2019-03-25T12:59:24+02:00",
|
||||
"dateModified": "2019-03-25T23:47:00+02:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -1076,6 +1076,65 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-03-22 | sort -u | wc -l
|
||||
<li>According the Uptime Robot the server was up and down a few more times over the next hour so I restarted Tomcat again</li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2019-03-26">2019-03-26</h2>
|
||||
|
||||
<ul>
|
||||
<li>UptimeRobot says CGSpace went down again and I see the load is again at 14.0!</li>
|
||||
<li>Here are the top IPs in nginx logs in the last hour:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/{oai,rest,statistics}.log /var/log/nginx/{oai,rest,statistics}.log.1 | grep -E "26/Mar/2019:(06|07)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
3 35.174.184.209
|
||||
3 66.249.66.81
|
||||
4 104.198.9.108
|
||||
4 154.77.98.122
|
||||
4 2.50.152.13
|
||||
10 196.188.12.245
|
||||
14 66.249.66.80
|
||||
414 45.5.184.72
|
||||
535 45.5.186.2
|
||||
2014 205.186.128.185
|
||||
# zcat --force /var/log/nginx/{access,error,library-access}.log /var/log/nginx/{access,error,library-access}.log.1 | grep -E "26/Mar/2019:(06|07)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
157 41.204.190.40
|
||||
160 18.194.46.84
|
||||
160 54.70.40.11
|
||||
168 31.6.77.23
|
||||
188 66.249.66.81
|
||||
284 3.91.79.74
|
||||
405 2a01:4f8:140:3192::2
|
||||
471 66.249.66.80
|
||||
712 35.174.184.209
|
||||
784 2a01:4f8:13b:1296::2
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>The two IPV6 addresses are something called BLEXBot, which seems to check the robots.txt file and the completely ignore it by making thousands of requests to dynamic pages like Browse and Discovery</li>
|
||||
<li>Then <code>35.174.184.209</code> is MauiBot, which does the same thing</li>
|
||||
<li>Also <code>3.91.79.74</code> does, which appears to be CCBot</li>
|
||||
<li>I will add these three to the “bad bot” rate limiting that I originally used for Baidu</li>
|
||||
<li>Going further, these are the IPs making requests to Discovery and Browse pages so far today:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/{access,error,library-access}.log /var/log/nginx/{access,error,library-access}.log.1 | grep -E "(discover|browse)" | grep -E "26/Mar/2019:" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
120 34.207.146.166
|
||||
128 3.91.79.74
|
||||
132 108.179.57.67
|
||||
143 34.228.42.25
|
||||
185 216.244.66.198
|
||||
430 54.70.40.11
|
||||
1033 93.179.69.74
|
||||
1206 2a01:4f8:140:3192::2
|
||||
2678 2a01:4f8:13b:1296::2
|
||||
3790 35.174.184.209
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li><code>54.70.40.11</code> is SemanticScholarBot</li>
|
||||
<li><code>216.244.66.198</code> is DotBot</li>
|
||||
<li><code>93.179.69.74</code> is some IP in Ukraine, which I will add to the list of bot IPs in nginx</li>
|
||||
<li>I can only hope that this helps the load go down because all this traffic is disrupting the service for normal users and well-behaved bots (and interrupting my dinner and breakfast)</li>
|
||||
</ul>
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
||||
|
||||
|
@ -45,7 +45,7 @@ Disallow: /cgspace-notes/2015-12/
|
||||
Disallow: /cgspace-notes/2015-11/
|
||||
Disallow: /cgspace-notes/
|
||||
Disallow: /cgspace-notes/categories/
|
||||
Disallow: /cgspace-notes/tags/notes/
|
||||
Disallow: /cgspace-notes/categories/notes/
|
||||
Disallow: /cgspace-notes/tags/notes/
|
||||
Disallow: /cgspace-notes/posts/
|
||||
Disallow: /cgspace-notes/tags/
|
||||
|
@ -4,7 +4,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2019-03/</loc>
|
||||
<lastmod>2019-03-25T12:59:24+02:00</lastmod>
|
||||
<lastmod>2019-03-25T23:47:00+02:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
@ -214,7 +214,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||
<lastmod>2019-03-25T12:59:24+02:00</lastmod>
|
||||
<lastmod>2019-03-25T23:47:00+02:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
@ -223,27 +223,27 @@
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||
<lastmod>2019-03-25T12:59:24+02:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
|
||||
<lastmod>2018-03-09T22:10:33+02:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||
<lastmod>2019-03-25T23:47:00+02:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||
<lastmod>2019-03-25T12:59:24+02:00</lastmod>
|
||||
<lastmod>2019-03-25T23:47:00+02:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||
<lastmod>2019-03-25T12:59:24+02:00</lastmod>
|
||||
<lastmod>2019-03-25T23:47:00+02:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user