Update notes for 2017-11-12

This commit is contained in:
2017-11-12 10:41:44 +02:00
parent acc7c84022
commit f2ef00d1e9
3 changed files with 92 additions and 8 deletions

View File

@ -514,3 +514,44 @@ $ grep 104.196.152.243 dspace.log.2017-11-07 | grep -o -E 'session_id=[A-Z0-9]{3
## 2017-11-12
- Update the [Ansible infrastructure templates](https://github.com/ilri/rmg-ansible-public) to be a little more modular and flexible
- Looking at the top client IPs on CGSpace so far this morning, even though it's only been eight hours:
```
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep "12/Nov/2017" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
243 5.83.120.111
335 40.77.167.103
424 66.249.66.91
529 207.46.13.36
554 40.77.167.129
604 207.46.13.53
754 104.196.152.243
883 66.249.66.90
1150 95.108.181.88
1381 5.9.6.51
```
- 5.9.6.51 seems to be a Russian bot:
```
# grep 5.9.6.51 /var/log/nginx/access.log | tail -n 1
5.9.6.51 - - [12/Nov/2017:08:13:13 +0000] "GET /handle/10568/16515/recent-submissions HTTP/1.1" 200 5097 "-" "Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)"
```
- What's amazing is that it seems to reuse its Java session across all requests:
```
$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=5.9.6.51' /home/cgspace.cgiar.org/log/dspace.log.2017-11-12
1558
$ grep 5.9.6.51 /home/cgspace.cgiar.org/log/dspace.log.2017-11-12 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
1
```
- Bravo to MegaIndex.ru!
- The same cannot be said for 95.108.181.88, which appears to be YandexBot, even though Tomcat's Crawler Session Manager valve regex should match 'YandexBot':
```
# grep 95.108.181.88 /var/log/nginx/access.log | tail -n 1
95.108.181.88 - - [12/Nov/2017:08:33:17 +0000] "GET /bitstream/handle/10568/57004/GenebankColombia_23Feb2015.pdf HTTP/1.1" 200 972019 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=95.108.181.88' /home/cgspace.cgiar.org/log/dspace.log.2017-11-12
991
```