Update notes for 2017-11-08

This commit is contained in:
2017-11-08 14:17:04 +02:00
parent 4147b78b76
commit f775c480d8
3 changed files with 68 additions and 14 deletions

View File

@ -407,11 +407,11 @@ $ grep -Io -E 'session_id=[A-Z0-9]{32}:ip_addr=104.196.152.243' dspace.log.2017-
- Linode sent several alerts last night about CPU usage and outbound traffic rate at 6:13PM
- Linode sent another alert about CPU usage in the morning at 6:12AM
- Jesus, the new Chinese IP (124.17.34.59) has downloaded 22,000 PDFs in the last 24 hours:
- Jesus, the new Chinese IP (124.17.34.59) has downloaded 24,000 PDFs in the last 24 hours:
```
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "0[78]/Nov/2017:" | grep 124.17.34.59 | grep -c pdf
22510
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "0[78]/Nov/2017:" | grep 124.17.34.59 | grep -v pdf.jpg | grep -c pdf
24981
```
- This is about 20,000 Tomcat sessions:
@ -424,3 +424,29 @@ $ cat dspace.log.2017-11-07 dspace.log.2017-11-08 | grep -Io -E 'session_id=[A-Z
- I'm getting really sick of this
- Sisay re-uploaded the CIAT records that I had already corrected earlier this week, erasing all my corrections
- I had to re-correct all the publishers, places, names, dates, etc and apply the changes on DSpace Test
- Run system updates on DSpace Test and reboot the server
- Magdalena had written to say that two of their Phase II project tags were missing on CGSpace, so I added them ([#346](https://github.com/ilri/DSpace/pull/346))
- I figured out a way to use nginx's map function to assign a "bot" user agent to misbehaving clients who don't define a user agent
- Most bots are automatically lumped into one generic session by [Tomcat's Crawler Session Manager Valve](https://tomcat.apache.org/tomcat-7.0-doc/config/valve.html#Crawler_Session_Manager_Valve) but this only works if their user agent matches a pre-defined regular expression like `.*[bB]ot.*`
- Some clients send thousands of requests without a user agent which ends up creating thousands of Tomcat sessions, wasting precious memory, CPU, and database resources in the process
- Basically, we modify the nginx config to add a mapping with a modified user agent `$ua`:
```
map $remote_addr $ua {
# 2017-11-08 Random Chinese host grabbing 20,000 PDFs
124.17.34.59 'ChineseBot';
default $http_user_agent;
}
```
- If the client's address matches then the user agent is set, otherwise the default `$http_user_agent` variable is used
- Then, in the server's `/` block we pass this header to Tomcat:
```
proxy_pass http://tomcat_http;
proxy_set_header User-Agent $ua;
```
- Note to self: the `$ua` variable won't show up in nginx access logs because the default `combined` log format doesn't show it, so don't run around pulling your hair out wondering with the modified user agents aren't showing in the logs!
- If a client matching one of these IPs connects without a session, it will be assigned one by the Crawler Session Manager Valve
- You can verify by cross referencing nginx's `access.log` and DSpace's `dspace.log.2017-11-08`, for example