mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-25 08:00:18 +01:00
Update notes for 2018-11-04
This commit is contained in:
parent
ed623594e9
commit
6f561ce4b5
@ -191,5 +191,46 @@ facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
|
|||||||
```
|
```
|
||||||
|
|
||||||
- I will add it to the Tomcat Crawler Session Manager valve
|
- I will add it to the Tomcat Crawler Session Manager valve
|
||||||
|
- Later in the evening... ok, this Facebook bot is getting super annoying:
|
||||||
|
|
||||||
|
```
|
||||||
|
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "04/Nov/2018" | grep "2a03:2880:11ff:" | awk '{print $1}' | sort | uniq -c | sort -n
|
||||||
|
1871 2a03:2880:11ff:3::face:b00c
|
||||||
|
1885 2a03:2880:11ff:b::face:b00c
|
||||||
|
1941 2a03:2880:11ff:8::face:b00c
|
||||||
|
1942 2a03:2880:11ff:e::face:b00c
|
||||||
|
1987 2a03:2880:11ff:1::face:b00c
|
||||||
|
2023 2a03:2880:11ff:2::face:b00c
|
||||||
|
2027 2a03:2880:11ff:4::face:b00c
|
||||||
|
2032 2a03:2880:11ff:9::face:b00c
|
||||||
|
2034 2a03:2880:11ff:10::face:b00c
|
||||||
|
2050 2a03:2880:11ff:5::face:b00c
|
||||||
|
2061 2a03:2880:11ff:c::face:b00c
|
||||||
|
2076 2a03:2880:11ff:6::face:b00c
|
||||||
|
2093 2a03:2880:11ff:7::face:b00c
|
||||||
|
2107 2a03:2880:11ff::face:b00c
|
||||||
|
2118 2a03:2880:11ff:d::face:b00c
|
||||||
|
2164 2a03:2880:11ff:a::face:b00c
|
||||||
|
2178 2a03:2880:11ff:f::face:b00c
|
||||||
|
```
|
||||||
|
|
||||||
|
- And still making shit tons of Tomcat sessions:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11-04 | sort | uniq
|
||||||
|
28470
|
||||||
|
```
|
||||||
|
|
||||||
|
- And that's even using the Tomcat Crawler Session Manager valve!
|
||||||
|
- Maybe we need to limit more dynamic pages, like the "most popular" country, item, and author pages
|
||||||
|
- It seems these are popular too, and there is no fucking way Facebook needs that information, yet they are requesting thousands of them!
|
||||||
|
|
||||||
|
```
|
||||||
|
# grep 'face:b00c' /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -c 'most-popular/'
|
||||||
|
7033
|
||||||
|
```
|
||||||
|
|
||||||
|
- I added the "most-popular" pages to the list that return `X-Robots-Tag: none` to try to inform bots not to index or follow those pages
|
||||||
|
- Also, I implemented an nginx rate limit of twelve requests per minute on all dynamic pages... I figure a human user might legitimately request one every five seconds
|
||||||
|
|
||||||
<!-- vim: set sw=2 ts=2: -->
|
<!-- vim: set sw=2 ts=2: -->
|
||||||
|
@ -23,7 +23,7 @@ Today these are the top 10 IPs:
|
|||||||
" />
|
" />
|
||||||
<meta property="og:type" content="article" />
|
<meta property="og:type" content="article" />
|
||||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-11/" /><meta property="article:published_time" content="2018-11-01T16:41:30+02:00"/>
|
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-11/" /><meta property="article:published_time" content="2018-11-01T16:41:30+02:00"/>
|
||||||
<meta property="article:modified_time" content="2018-11-04T01:02:29+02:00"/>
|
<meta property="article:modified_time" content="2018-11-04T12:18:52+02:00"/>
|
||||||
|
|
||||||
<meta name="twitter:card" content="summary"/>
|
<meta name="twitter:card" content="summary"/>
|
||||||
<meta name="twitter:title" content="November, 2018"/>
|
<meta name="twitter:title" content="November, 2018"/>
|
||||||
@ -52,9 +52,9 @@ Today these are the top 10 IPs:
|
|||||||
"@type": "BlogPosting",
|
"@type": "BlogPosting",
|
||||||
"headline": "November, 2018",
|
"headline": "November, 2018",
|
||||||
"url": "https://alanorth.github.io/cgspace-notes/2018-11/",
|
"url": "https://alanorth.github.io/cgspace-notes/2018-11/",
|
||||||
"wordCount": "791",
|
"wordCount": "992",
|
||||||
"datePublished": "2018-11-01T16:41:30+02:00",
|
"datePublished": "2018-11-01T16:41:30+02:00",
|
||||||
"dateModified": "2018-11-04T01:02:29+02:00",
|
"dateModified": "2018-11-04T12:18:52+02:00",
|
||||||
"author": {
|
"author": {
|
||||||
"@type": "Person",
|
"@type": "Person",
|
||||||
"name": "Alan Orth"
|
"name": "Alan Orth"
|
||||||
@ -324,6 +324,50 @@ Today these are the top 10 IPs:
|
|||||||
|
|
||||||
<ul>
|
<ul>
|
||||||
<li>I will add it to the Tomcat Crawler Session Manager valve</li>
|
<li>I will add it to the Tomcat Crawler Session Manager valve</li>
|
||||||
|
<li>Later in the evening… ok, this Facebook bot is getting super annoying:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "04/Nov/2018" | grep "2a03:2880:11ff:" | awk '{print $1}' | sort | uniq -c | sort -n
|
||||||
|
1871 2a03:2880:11ff:3::face:b00c
|
||||||
|
1885 2a03:2880:11ff:b::face:b00c
|
||||||
|
1941 2a03:2880:11ff:8::face:b00c
|
||||||
|
1942 2a03:2880:11ff:e::face:b00c
|
||||||
|
1987 2a03:2880:11ff:1::face:b00c
|
||||||
|
2023 2a03:2880:11ff:2::face:b00c
|
||||||
|
2027 2a03:2880:11ff:4::face:b00c
|
||||||
|
2032 2a03:2880:11ff:9::face:b00c
|
||||||
|
2034 2a03:2880:11ff:10::face:b00c
|
||||||
|
2050 2a03:2880:11ff:5::face:b00c
|
||||||
|
2061 2a03:2880:11ff:c::face:b00c
|
||||||
|
2076 2a03:2880:11ff:6::face:b00c
|
||||||
|
2093 2a03:2880:11ff:7::face:b00c
|
||||||
|
2107 2a03:2880:11ff::face:b00c
|
||||||
|
2118 2a03:2880:11ff:d::face:b00c
|
||||||
|
2164 2a03:2880:11ff:a::face:b00c
|
||||||
|
2178 2a03:2880:11ff:f::face:b00c
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>And still making shit tons of Tomcat sessions:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11-04 | sort | uniq
|
||||||
|
28470
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>And that’s even using the Tomcat Crawler Session Manager valve!</li>
|
||||||
|
<li>Maybe we need to limit more dynamic pages, like the “most popular” country, item, and author pages</li>
|
||||||
|
<li>It seems these are popular too, and there is no fucking way Facebook needs that information, yet they are requesting thousands of them!</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code># grep 'face:b00c' /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -c 'most-popular/'
|
||||||
|
7033
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>I added the “most-popular” pages to the list that return <code>X-Robots-Tag: none</code> to try to inform bots not to index or follow those pages</li>
|
||||||
|
<li>Also, I implemented an nginx rate limit of twelve requests per minute on all dynamic pages… I figure a human user might legitimately request one every five seconds</li>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
<!-- vim: set sw=2 ts=2: -->
|
<!-- vim: set sw=2 ts=2: -->
|
||||||
|
@ -4,7 +4,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/2018-11/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/2018-11/</loc>
|
||||||
<lastmod>2018-11-04T01:02:29+02:00</lastmod>
|
<lastmod>2018-11-04T12:18:52+02:00</lastmod>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
@ -194,7 +194,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||||
<lastmod>2018-11-04T01:02:29+02:00</lastmod>
|
<lastmod>2018-11-04T12:18:52+02:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
@ -205,7 +205,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||||
<lastmod>2018-11-04T01:02:29+02:00</lastmod>
|
<lastmod>2018-11-04T12:18:52+02:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
@ -217,13 +217,13 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||||
<lastmod>2018-11-04T01:02:29+02:00</lastmod>
|
<lastmod>2018-11-04T12:18:52+02:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||||
<lastmod>2018-11-04T01:02:29+02:00</lastmod>
|
<lastmod>2018-11-04T12:18:52+02:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user