mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Update notes for 2019-03-26
This commit is contained in:
@ -872,4 +872,50 @@ $ ./fix-metadata-values.py -i /tmp/2019-03-26-AGROVOC-89-corrections.csv -db dsp
|
||||
$ ./delete-metadata-values.py -i /tmp/2019-03-26-AGROVOC-79-deletions.csv -db dspace -u dspace -p 'fuuu' -m 57 -f dc.subject -d -n
|
||||
```
|
||||
|
||||
- UptimeRobot says CGSpace is down again, but it seems to just be slow, as the load is over 10.0
|
||||
- Looking at the nginx logs I don't see anything terribly abusive, but SemrushBot has made ~3,000 requests to Discovery and Browse pages today:
|
||||
|
||||
```
|
||||
# grep SemrushBot /var/log/nginx/access.log | grep -E "26/Mar/2019" | grep -E '(discover|browse)' | wc -l
|
||||
2931
|
||||
```
|
||||
|
||||
- So I'm adding it to the badbot rate limiting in nginx, and actually, I kinda feel like just blocking all user agents with "bot" in the name for a few days to see if things calm down... maybe not just yet
|
||||
- Otherwise, these are the top users in the web and API logs the last hour (18–19):
|
||||
|
||||
```
|
||||
# zcat --force /var/log/nginx/{access,error,library-access}.log /var/log/nginx/{access,error,library-access}.log.1 | grep -E "26/Mar/2019:(18|19)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
54 41.216.228.158
|
||||
65 199.47.87.140
|
||||
75 157.55.39.238
|
||||
77 157.55.39.237
|
||||
89 157.55.39.236
|
||||
100 18.196.196.108
|
||||
128 18.195.78.144
|
||||
277 2a01:4f8:13b:1296::2
|
||||
291 66.249.66.80
|
||||
328 35.174.184.209
|
||||
# zcat --force /var/log/nginx/{oai,rest,statistics}.log /var/log/nginx/{oai,rest,statistics}.log.1 | grep -E "26/Mar/2019:(18|19)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
2 2409:4066:211:2caf:3c31:3fae:2212:19cc
|
||||
2 35.10.204.140
|
||||
2 45.251.231.45
|
||||
2 95.108.181.88
|
||||
2 95.137.190.2
|
||||
3 104.198.9.108
|
||||
3 107.167.109.88
|
||||
6 66.249.66.80
|
||||
13 41.89.230.156
|
||||
1860 45.5.184.2
|
||||
```
|
||||
|
||||
- For the XMLUI I see `18.195.78.144` and `18.196.196.108` requesting only CTA items and with no user agent
|
||||
- They are responsible for almost 1,000 XMLUI sessions today:
|
||||
|
||||
```
|
||||
$ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=(18.195.78.144|18.196.196.108)' dspace.log.2019-03-26 | sort | uniq | wc -l
|
||||
937
|
||||
```
|
||||
|
||||
- I will add their IPs to the list of bot IPs in nginx so I can tag them as bots to let Tomcat's Crawler Session Manager Valve to force them to re-use their session
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user