Update notes for 2018-09-10

This commit is contained in:
2018-09-11 00:37:38 +03:00
parent c3a3af4e9f
commit 6dd5e7850b
5 changed files with 140 additions and 18 deletions

View File

@ -138,5 +138,64 @@ UPDATE 15
- Start working on adding metadata for access and usage rights that we started earlier in 2018 (and also in 2017)
- The current `cg.identifier.status` field will become "Access rights" and `dc.rights` will become "Usage rights"
- I have some work in progress on the [`5_x-rights` branch](https://github.com/alanorth/DSpace/tree/5_x-rights)
- Linode said that CGSpace (linode18) had a high CPU load earlier today
- When I looked, I see it's the same Russian IP that I noticed last month:
```
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "10/Sep/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1459 157.55.39.202
1579 95.108.181.88
1615 157.55.39.147
1714 66.249.64.91
1924 50.116.102.77
3696 157.55.39.106
3763 157.55.39.148
4470 70.32.83.92
4724 35.237.175.180
14132 5.9.6.51
```
- And this bot is still creating more Tomcat sessions than Nginx requests (WTF?):
```
# grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=5.9.6.51' dspace.log.2018-09-10
14133
```
- The user agent is still the same:
```
Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)
```
- I added `.*crawl.*` to the Tomcat Session Crawler Manager Valve, so I'm not sure why the bot is creating so many sessions...
- I just tested that user agent on CGSpace and it *does not* create a new session:
```
$ http --print Hh https://cgspace.cgiar.org 'User-Agent:Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)'
GET / HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: cgspace.cgiar.org
User-Agent: Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)
HTTP/1.1 200 OK
Connection: keep-alive
Content-Encoding: gzip
Content-Language: en-US
Content-Type: text/html;charset=utf-8
Date: Mon, 10 Sep 2018 20:43:04 GMT
Server: nginx
Strict-Transport-Security: max-age=15768000
Transfer-Encoding: chunked
Vary: Accept-Encoding
X-Cocoon-Version: 2.2.0
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
```
- I will have to keep an eye on it and perhaps add it to the list of "bad bots" that get rate limited
<!-- vim: set sw=2 ts=2: -->