mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Update notes for 2018-11-04
This commit is contained in:
@ -191,5 +191,46 @@ facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
|
||||
```
|
||||
|
||||
- I will add it to the Tomcat Crawler Session Manager valve
|
||||
- Later in the evening... ok, this Facebook bot is getting super annoying:
|
||||
|
||||
```
|
||||
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "04/Nov/2018" | grep "2a03:2880:11ff:" | awk '{print $1}' | sort | uniq -c | sort -n
|
||||
1871 2a03:2880:11ff:3::face:b00c
|
||||
1885 2a03:2880:11ff:b::face:b00c
|
||||
1941 2a03:2880:11ff:8::face:b00c
|
||||
1942 2a03:2880:11ff:e::face:b00c
|
||||
1987 2a03:2880:11ff:1::face:b00c
|
||||
2023 2a03:2880:11ff:2::face:b00c
|
||||
2027 2a03:2880:11ff:4::face:b00c
|
||||
2032 2a03:2880:11ff:9::face:b00c
|
||||
2034 2a03:2880:11ff:10::face:b00c
|
||||
2050 2a03:2880:11ff:5::face:b00c
|
||||
2061 2a03:2880:11ff:c::face:b00c
|
||||
2076 2a03:2880:11ff:6::face:b00c
|
||||
2093 2a03:2880:11ff:7::face:b00c
|
||||
2107 2a03:2880:11ff::face:b00c
|
||||
2118 2a03:2880:11ff:d::face:b00c
|
||||
2164 2a03:2880:11ff:a::face:b00c
|
||||
2178 2a03:2880:11ff:f::face:b00c
|
||||
```
|
||||
|
||||
- And still making shit tons of Tomcat sessions:
|
||||
|
||||
```
|
||||
$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11-04 | sort | uniq
|
||||
28470
|
||||
```
|
||||
|
||||
- And that's even using the Tomcat Crawler Session Manager valve!
|
||||
- Maybe we need to limit more dynamic pages, like the "most popular" country, item, and author pages
|
||||
- It seems these are popular too, and there is no fucking way Facebook needs that information, yet they are requesting thousands of them!
|
||||
|
||||
```
|
||||
# grep 'face:b00c' /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -c 'most-popular/'
|
||||
7033
|
||||
```
|
||||
|
||||
- I added the "most-popular" pages to the list that return `X-Robots-Tag: none` to try to inform bots not to index or follow those pages
|
||||
- Also, I implemented an nginx rate limit of twelve requests per minute on all dynamic pages... I figure a human user might legitimately request one every five seconds
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user