diff --git a/content/posts/2019-05.md b/content/posts/2019-05.md index 2df3cb04e..b8485f50d 100644 --- a/content/posts/2019-05.md +++ b/content/posts/2019-05.md @@ -118,5 +118,105 @@ Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed - Also, I tried to increase the `writeLockTimeout` in `solrconfig.xml` from the default of 1000ms to 10000ms - Eventually the Atmire stats started working, despite errors about "Error opening new searcher" in the Solr Admin UI - I wrote to the dspace-tech mailing list again on the thread from March, 2019 +- There were a few alerts from UptimeRobot about CGSpace going up and down this morning, along with an alert from Linode about 596% load + - Looking at the Munin stats I see an exponential rise in DSpace XMLUI sessions, firewall activity, and PostgreSQL connections this morning: + +![CGSpace XMLUI sessions day](/cgspace-notes/2019/05/2019-05-06-jmx_dspace_sessions-day.png) + +![linode18 firewall connections day](/cgspace-notes/2019/05/2019-05-06-fw_conntrack-day.png) + +![linode18 postgres connections day](/cgspace-notes/2019/05/2019-05-06-postgres_connections_db-day.png) + +![linode18 CPU day](/cgspace-notes/2019/05/2019-05-06-cpu-day.png) + +- The number of unique sessions today is *ridiculously* high compared to the last few days considering it's only 12:30PM right now: + +``` +$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-06 | sort | uniq | wc -l +101108 +$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-05 | sort | uniq | wc -l +14618 +$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-04 | sort | uniq | wc -l +14946 +$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-03 | sort | uniq | wc -l +6410 +$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-02 | sort | uniq | wc -l +7758 +$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-01 | sort | uniq | wc -l +20528 +``` + +- The number of unique IP addresses from 2 to 6 AM this morning is already several times higher than the average for that time of the morning this past week: + +``` +# zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l +7127 +# zcat --force /var/log/nginx/access.log.1 /var/log/nginx/access.log.2.gz | grep -E '05/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l +1231 +# zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz | grep -E '04/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l +1255 +# zcat --force /var/log/nginx/access.log.3.gz /var/log/nginx/access.log.4.gz | grep -E '03/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l +1736 +# zcat --force /var/log/nginx/access.log.4.gz /var/log/nginx/access.log.5.gz | grep -E '02/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l +1573 +# zcat --force /var/log/nginx/access.log.5.gz /var/log/nginx/access.log.6.gz | grep -E '01/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l +1410 +``` + +- Just this morning between the hours of 2 and 6 the number of unique sessions was *very* high compared to previous mornings: + +``` +$ cat dspace.log.2019-05-06 | grep -E '2019-05-06 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l +83650 +$ cat dspace.log.2019-05-05 | grep -E '2019-05-05 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l +2547 +$ cat dspace.log.2019-05-04 | grep -E '2019-05-04 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l +2574 +$ cat dspace.log.2019-05-03 | grep -E '2019-05-03 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l +2911 +$ cat dspace.log.2019-05-02 | grep -E '2019-05-02 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l +2704 +$ cat dspace.log.2019-05-01 | grep -E '2019-05-01 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l +3699 +``` + +- Most of the requests were GETs: + +``` +# cat /var/log/nginx/{access,library-access}.log /var/log/nginx/{access,library-access}.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -o -E "(GET|HEAD|POST|PUT)" | sort | uniq -c | sort -n + 1 PUT + 98 POST + 2845 HEAD + 98121 GET +``` + +- I'm not exactly sure what happened this morning, but it looks like some legitimate user traffic—perhaps someone launched a new publication and it got a bunch of hits? +- Looking again, I see 84,000 requests to `/handle` this morning (not including logs for library.cgiar.org because those get HTTP 301 redirect to CGSpace and appear here in `access.log`): + +``` +# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -c -o -E " /handle/[0-9]+/[0-9]+" +84350 +``` + +- But it would be difficult to find a pattern for those requests because they cover 78,000 *unique* Handles (ie direct browsing of items, collections, or communities) and only 2,492 discover/browse (total, not unique): + +``` +# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -o -E " /handle/[0-9]+/[0-9]+ HTTP" | sort | uniq | wc -l +78104 +# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -o -E " /handle/[0-9]+/[0-9]+/(discover|browse)" | wc -l +2492 +``` + +- In other news, I see some IP is making several requests per second to the exact same REST API endpoints, for example: + +``` +# grep /rest/handle/10568/3703?expand=all rest.log | awk '{print $1}' | sort | uniq -c + 3 2a01:7e00::f03c:91ff:fe0a:d645 + 113 63.32.242.35 +``` + +- According to [viewdns.info](https://viewdns.info/reverseip/?host=63.32.242.35&t=1) that server belongs to Macaroni Brothers' + - The user agent of their non-REST API requests from the same IP is Drupal + - This is one very good reason to limit REST API requests, and perhaps to enable caching via nginx diff --git a/docs/2019-05/index.html b/docs/2019-05/index.html index 7de9e0e06..d7061ea90 100644 --- a/docs/2019-05/index.html +++ b/docs/2019-05/index.html @@ -28,7 +28,7 @@ But after this I tried to delete the item from the XMLUI and it is still present - + @@ -61,9 +61,9 @@ But after this I tried to delete the item from the XMLUI and it is still present "@type": "BlogPosting", "headline": "May, 2019", "url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-05\/", - "wordCount": "796", + "wordCount": "1511", "datePublished": "2019-05-01T07:37:43\x2b03:00", - "dateModified": "2019-05-05T23:53:42\x2b03:00", + "dateModified": "2019-05-06T11:50:57\x2b03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -288,6 +288,109 @@ Please see the DSpace documentation for assistance.
There were a few alerts from UptimeRobot about CGSpace going up and down this morning, along with an alert from Linode about 596% load
+ +The number of unique sessions today is ridiculously high compared to the last few days considering it’s only 12:30PM right now:
+ +$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-06 | sort | uniq | wc -l
+101108
+$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-05 | sort | uniq | wc -l
+14618
+$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-04 | sort | uniq | wc -l
+14946
+$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-03 | sort | uniq | wc -l
+6410
+$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-02 | sort | uniq | wc -l
+7758
+$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-01 | sort | uniq | wc -l
+20528
+
The number of unique IP addresses from 2 to 6 AM this morning is already several times higher than the average for that time of the morning this past week:
+ +# zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
+7127
+# zcat --force /var/log/nginx/access.log.1 /var/log/nginx/access.log.2.gz | grep -E '05/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
+1231
+# zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz | grep -E '04/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
+1255
+# zcat --force /var/log/nginx/access.log.3.gz /var/log/nginx/access.log.4.gz | grep -E '03/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
+1736
+# zcat --force /var/log/nginx/access.log.4.gz /var/log/nginx/access.log.5.gz | grep -E '02/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
+1573
+# zcat --force /var/log/nginx/access.log.5.gz /var/log/nginx/access.log.6.gz | grep -E '01/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
+1410
+
Just this morning between the hours of 2 and 6 the number of unique sessions was very high compared to previous mornings:
+ +$ cat dspace.log.2019-05-06 | grep -E '2019-05-06 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
+83650
+$ cat dspace.log.2019-05-05 | grep -E '2019-05-05 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
+2547
+$ cat dspace.log.2019-05-04 | grep -E '2019-05-04 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
+2574
+$ cat dspace.log.2019-05-03 | grep -E '2019-05-03 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
+2911
+$ cat dspace.log.2019-05-02 | grep -E '2019-05-02 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
+2704
+$ cat dspace.log.2019-05-01 | grep -E '2019-05-01 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
+3699
+
Most of the requests were GETs:
+ +# cat /var/log/nginx/{access,library-access}.log /var/log/nginx/{access,library-access}.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -o -E "(GET|HEAD|POST|PUT)" | sort | uniq -c | sort -n
+ 1 PUT
+ 98 POST
+2845 HEAD
+98121 GET
+
I’m not exactly sure what happened this morning, but it looks like some legitimate user traffic—perhaps someone launched a new publication and it got a bunch of hits?
Looking again, I see 84,000 requests to /handle
this morning (not including logs for library.cgiar.org because those get HTTP 301 redirect to CGSpace and appear here in access.log
):
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -c -o -E " /handle/[0-9]+/[0-9]+"
+84350
+
But it would be difficult to find a pattern for those requests because they cover 78,000 unique Handles (ie direct browsing of items, collections, or communities) and only 2,492 discover/browse (total, not unique):
+ +# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -o -E " /handle/[0-9]+/[0-9]+ HTTP" | sort | uniq | wc -l
+78104
+# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -o -E " /handle/[0-9]+/[0-9]+/(discover|browse)" | wc -l
+2492
+
In other news, I see some IP is making several requests per second to the exact same REST API endpoints, for example:
+ +# grep /rest/handle/10568/3703?expand=all rest.log | awk '{print $1}' | sort | uniq -c
+ 3 2a01:7e00::f03c:91ff:fe0a:d645
+113 63.32.242.35
+
According to viewdns.info that server belongs to Macaroni Brothers’
+ +