mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2019-05-06
This commit is contained in:
@ -118,5 +118,105 @@ Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
|
||||
- Also, I tried to increase the `writeLockTimeout` in `solrconfig.xml` from the default of 1000ms to 10000ms
|
||||
- Eventually the Atmire stats started working, despite errors about "Error opening new searcher" in the Solr Admin UI
|
||||
- I wrote to the dspace-tech mailing list again on the thread from March, 2019
|
||||
- There were a few alerts from UptimeRobot about CGSpace going up and down this morning, along with an alert from Linode about 596% load
|
||||
- Looking at the Munin stats I see an exponential rise in DSpace XMLUI sessions, firewall activity, and PostgreSQL connections this morning:
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
- The number of unique sessions today is *ridiculously* high compared to the last few days considering it's only 12:30PM right now:
|
||||
|
||||
```
|
||||
$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-06 | sort | uniq | wc -l
|
||||
101108
|
||||
$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-05 | sort | uniq | wc -l
|
||||
14618
|
||||
$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-04 | sort | uniq | wc -l
|
||||
14946
|
||||
$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-03 | sort | uniq | wc -l
|
||||
6410
|
||||
$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-02 | sort | uniq | wc -l
|
||||
7758
|
||||
$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-01 | sort | uniq | wc -l
|
||||
20528
|
||||
```
|
||||
|
||||
- The number of unique IP addresses from 2 to 6 AM this morning is already several times higher than the average for that time of the morning this past week:
|
||||
|
||||
```
|
||||
# zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
|
||||
7127
|
||||
# zcat --force /var/log/nginx/access.log.1 /var/log/nginx/access.log.2.gz | grep -E '05/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
|
||||
1231
|
||||
# zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz | grep -E '04/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
|
||||
1255
|
||||
# zcat --force /var/log/nginx/access.log.3.gz /var/log/nginx/access.log.4.gz | grep -E '03/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
|
||||
1736
|
||||
# zcat --force /var/log/nginx/access.log.4.gz /var/log/nginx/access.log.5.gz | grep -E '02/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
|
||||
1573
|
||||
# zcat --force /var/log/nginx/access.log.5.gz /var/log/nginx/access.log.6.gz | grep -E '01/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
|
||||
1410
|
||||
```
|
||||
|
||||
- Just this morning between the hours of 2 and 6 the number of unique sessions was *very* high compared to previous mornings:
|
||||
|
||||
```
|
||||
$ cat dspace.log.2019-05-06 | grep -E '2019-05-06 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
83650
|
||||
$ cat dspace.log.2019-05-05 | grep -E '2019-05-05 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
2547
|
||||
$ cat dspace.log.2019-05-04 | grep -E '2019-05-04 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
2574
|
||||
$ cat dspace.log.2019-05-03 | grep -E '2019-05-03 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
2911
|
||||
$ cat dspace.log.2019-05-02 | grep -E '2019-05-02 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
2704
|
||||
$ cat dspace.log.2019-05-01 | grep -E '2019-05-01 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
3699
|
||||
```
|
||||
|
||||
- Most of the requests were GETs:
|
||||
|
||||
```
|
||||
# cat /var/log/nginx/{access,library-access}.log /var/log/nginx/{access,library-access}.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -o -E "(GET|HEAD|POST|PUT)" | sort | uniq -c | sort -n
|
||||
1 PUT
|
||||
98 POST
|
||||
2845 HEAD
|
||||
98121 GET
|
||||
```
|
||||
|
||||
- I'm not exactly sure what happened this morning, but it looks like some legitimate user traffic—perhaps someone launched a new publication and it got a bunch of hits?
|
||||
- Looking again, I see 84,000 requests to `/handle` this morning (not including logs for library.cgiar.org because those get HTTP 301 redirect to CGSpace and appear here in `access.log`):
|
||||
|
||||
```
|
||||
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -c -o -E " /handle/[0-9]+/[0-9]+"
|
||||
84350
|
||||
```
|
||||
|
||||
- But it would be difficult to find a pattern for those requests because they cover 78,000 *unique* Handles (ie direct browsing of items, collections, or communities) and only 2,492 discover/browse (total, not unique):
|
||||
|
||||
```
|
||||
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -o -E " /handle/[0-9]+/[0-9]+ HTTP" | sort | uniq | wc -l
|
||||
78104
|
||||
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -o -E " /handle/[0-9]+/[0-9]+/(discover|browse)" | wc -l
|
||||
2492
|
||||
```
|
||||
|
||||
- In other news, I see some IP is making several requests per second to the exact same REST API endpoints, for example:
|
||||
|
||||
```
|
||||
# grep /rest/handle/10568/3703?expand=all rest.log | awk '{print $1}' | sort | uniq -c
|
||||
3 2a01:7e00::f03c:91ff:fe0a:d645
|
||||
113 63.32.242.35
|
||||
```
|
||||
|
||||
- According to [viewdns.info](https://viewdns.info/reverseip/?host=63.32.242.35&t=1) that server belongs to Macaroni Brothers'
|
||||
- The user agent of their non-REST API requests from the same IP is Drupal
|
||||
- This is one very good reason to limit REST API requests, and perhaps to enable caching via nginx
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user