Update notes for 2018-10-20

This commit is contained in:
2018-10-21 08:06:40 +03:00
parent 3a58db7091
commit e74be8ab0a
3 changed files with 94 additions and 8 deletions

View File

@ -446,5 +446,47 @@ ERROR: Error CREATEing SolrCore 'statistics': Unable to create core [statistics]
- Apparently a bunch of variable types were removed in [Solr 5](https://issues.apache.org/jira/browse/SOLR-5936)
- So for now it's actually a huge pain in the ass to run the tests for my dspace-statistics-api
- Linode sent a message that the CPU usage was high on CGSpace (linode18) last night
- According to the nginx logs around that time it was 5.9.6.51 (MegaIndex) again:
```
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "20/Oct/2018:(14|15|16)" | awk '{print $1}' | sort
| uniq -c | sort -n | tail -n 10
249 207.46.13.179
250 157.55.39.173
301 54.166.207.223
303 157.55.39.213
310 66.249.64.95
362 34.218.226.147
381 66.249.64.93
415 35.237.175.180
1205 66.249.64.91
1227 5.9.6.51
```
- This bot is only using the XMLUI and it does *not* seem to be re-using its sessions:
```
# grep -c 5.9.6.51 /var/log/nginx/*.log
/var/log/nginx/access.log:9323
/var/log/nginx/error.log:0
/var/log/nginx/library-access.log:0
/var/log/nginx/oai.log:0
/var/log/nginx/rest.log:0
/var/log/nginx/statistics.log:0
# grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=5.9.6.51' dspace.log.2018-10-20 | sort | uniq
8915
```
- Last month I added "crawl" to the Tomcat Crawler Session Manager Valve's regular expression matching, and it seems to be working for MegaIndex's user agent:
```
$ http --print Hh 'https://dspacetest.cgiar.org/handle/10568/1' User-Agent:'"Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)"'
```
- So I'm not sure why this bot uses so many sessionsis it because it requests very slowly?
## 2018-10-21
<!-- vim: set sw=2 ts=2: -->