Update notes for 2018-11-04

This commit is contained in:
2018-11-04 12:18:52 +02:00
parent 6bf2bdae09
commit ed623594e9
3 changed files with 147 additions and 10 deletions

View File

@ -125,6 +125,71 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.89.18' /home/cgspace.cgiar.o
- If they want to download all our metadata and PDFs they should use an API rather than scraping the XMLUI
- I will add them to the list of bot IPs in nginx for now and think about enforcing rate limits in XMLUI later
- Also, this is the third (?) time a mysterious IP in Hetzner has done this... who is this?
- Also, this is the third (?) time a mysterious IP on Hetzner has done this... who is this?
## 2018-11-04
- Forward Peter's information about CGSpace financials to Modi from ICRISAT
- Linode emailed about the CPU load and outgoing bandwidth on CGSpace (linode18) again
- Here are the top ten IPs active so far this morning:
```
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "04/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1083 2a03:2880:11ff:2::face:b00c
1105 2a03:2880:11ff:d::face:b00c
1111 2a03:2880:11ff:f::face:b00c
1134 84.38.130.177
1893 50.116.102.77
2040 66.249.64.63
4210 66.249.64.61
4534 70.32.83.92
13036 78.46.89.18
20407 66.249.64.59
```
- `78.46.89.18` is back... and still making tons of Tomcat sessions:
```
$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.89.18' dspace.log.2018-11-04 | sort | uniq
8765
```
- Also, now we have a ton of Facebook crawlers:
```
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "04/Nov/2018" | grep "2a03:2880:11ff:" | awk '{print $1}' | sort | uniq -c | sort -n
905 2a03:2880:11ff:b::face:b00c
955 2a03:2880:11ff:5::face:b00c
965 2a03:2880:11ff:e::face:b00c
984 2a03:2880:11ff:8::face:b00c
993 2a03:2880:11ff:3::face:b00c
994 2a03:2880:11ff:7::face:b00c
1006 2a03:2880:11ff:10::face:b00c
1011 2a03:2880:11ff:4::face:b00c
1023 2a03:2880:11ff:6::face:b00c
1026 2a03:2880:11ff:9::face:b00c
1039 2a03:2880:11ff:1::face:b00c
1043 2a03:2880:11ff:c::face:b00c
1070 2a03:2880:11ff::face:b00c
1075 2a03:2880:11ff:a::face:b00c
1093 2a03:2880:11ff:2::face:b00c
1107 2a03:2880:11ff:d::face:b00c
1116 2a03:2880:11ff:f::face:b00c
```
- They are really making shit tons of Tomcat sessions:
```
$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11-04 | sort | uniq
14368
```
- Their user agent is:
```
facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
```
- I will add it to the Tomcat Crawler Session Manager valve
<!-- vim: set sw=2 ts=2: -->