Update notes for 2019-03-26

This commit is contained in:
2019-03-26 19:41:33 +02:00
parent 63ed2d91bb
commit 9f7556a803
4 changed files with 111 additions and 14 deletions

View File

@ -872,4 +872,50 @@ $ ./fix-metadata-values.py -i /tmp/2019-03-26-AGROVOC-89-corrections.csv -db dsp
$ ./delete-metadata-values.py -i /tmp/2019-03-26-AGROVOC-79-deletions.csv -db dspace -u dspace -p 'fuuu' -m 57 -f dc.subject -d -n
```
- UptimeRobot says CGSpace is down again, but it seems to just be slow, as the load is over 10.0
- Looking at the nginx logs I don't see anything terribly abusive, but SemrushBot has made ~3,000 requests to Discovery and Browse pages today:
```
# grep SemrushBot /var/log/nginx/access.log | grep -E "26/Mar/2019" | grep -E '(discover|browse)' | wc -l
2931
```
- So I'm adding it to the badbot rate limiting in nginx, and actually, I kinda feel like just blocking all user agents with "bot" in the name for a few days to see if things calm down... maybe not just yet
- Otherwise, these are the top users in the web and API logs the last hour (1819):
```
# zcat --force /var/log/nginx/{access,error,library-access}.log /var/log/nginx/{access,error,library-access}.log.1 | grep -E "26/Mar/2019:(18|19)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
54 41.216.228.158
65 199.47.87.140
75 157.55.39.238
77 157.55.39.237
89 157.55.39.236
100 18.196.196.108
128 18.195.78.144
277 2a01:4f8:13b:1296::2
291 66.249.66.80
328 35.174.184.209
# zcat --force /var/log/nginx/{oai,rest,statistics}.log /var/log/nginx/{oai,rest,statistics}.log.1 | grep -E "26/Mar/2019:(18|19)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
2 2409:4066:211:2caf:3c31:3fae:2212:19cc
2 35.10.204.140
2 45.251.231.45
2 95.108.181.88
2 95.137.190.2
3 104.198.9.108
3 107.167.109.88
6 66.249.66.80
13 41.89.230.156
1860 45.5.184.2
```
- For the XMLUI I see `18.195.78.144` and `18.196.196.108` requesting only CTA items and with no user agent
- They are responsible for almost 1,000 XMLUI sessions today:
```
$ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=(18.195.78.144|18.196.196.108)' dspace.log.2019-03-26 | sort | uniq | wc -l
937
```
- I will add their IPs to the list of bot IPs in nginx so I can tag them as bots to let Tomcat's Crawler Session Manager Valve to force them to re-use their session
<!-- vim: set sw=2 ts=2: -->