diff --git a/content/posts/2021-06.md b/content/posts/2021-06.md index f2ba33f27..6cbe0886f 100644 --- a/content/posts/2021-06.md +++ b/content/posts/2021-06.md @@ -369,4 +369,72 @@ $ redis-cli KEYS "bull:plugins:*" \ - I thought of using `redis-cli --pipe` but then you have to construct the commands in the redis protocol format with the number of args and length of each command - There is clearly something wrong with the new DSpace health check plugin, as it creates WAY too many jobs every time we run the plugins +## 2021-06-27 + +- Looking into the spike in PostgreSQL connections last week + - I see the same things that I always see (large number of connections waiting for lock, large number of threads, high CPU usage, etc), but I also see almost 10,000 DSpace sessions on 2021-06-25 + +![DSpace sessions](/cgspace-notes/2021/06/dspace-sessions-week.png) + +- Looking at the DSpace log I see there was definitely a higher number of sessions that day, perhaps twice the normal: + +```console +$ for file in dspace.log.2021-06-[12]*; do echo "$file"; grep -oE 'session_id=[A-Z0-9]{32}' "$file" | sort | uniq | wc -l; done +dspace.log.2021-06-10 +19072 +dspace.log.2021-06-11 +19224 +dspace.log.2021-06-12 +19215 +dspace.log.2021-06-13 +16721 +dspace.log.2021-06-14 +17880 +dspace.log.2021-06-15 +12103 +dspace.log.2021-06-16 +4651 +dspace.log.2021-06-17 +22785 +dspace.log.2021-06-18 +21406 +dspace.log.2021-06-19 +25967 +dspace.log.2021-06-20 +20850 +dspace.log.2021-06-21 +6388 +dspace.log.2021-06-22 +5945 +dspace.log.2021-06-23 +46371 +dspace.log.2021-06-24 +9024 +dspace.log.2021-06-25 +12521 +dspace.log.2021-06-26 +16163 +dspace.log.2021-06-27 +5886 +``` + +- I see 15,000 unique IPs in the XMLUI logs alone on that day: + +```console +# zcat /var/log/nginx/access.log.5.gz /var/log/nginx/access.log.4.gz | grep '23/Jun/2021' | awk '{print $1}' | sort | uniq | wc -l +15835 +``` + +- Annoyingly I found 37,000 more hits from Bing using `dns:*msnbot* AND dns:*.msn.com.` as a Solr filter + - WTF, they are using a normal user agent: `Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko` + - I will purge the IPs and add this user agent to the nginx config so that we can rate limit it +- I signed up for Bing Webmaster Tools and verified cgspace.cgiar.org with the BingSiteAuth.xml file + - Also I adjusted the nginx config to explicitly allow access to `robots.txt` even when bots are rate limited + - Also I found that Bing was auto discovering all our RSS and Atom feeds as "sitemaps" so I deleted 750 of them and submitted the real sitemap + - I need to see if I can adjust the nginx config further to map the `bot` user agent to DNS like msnbot... +- Review Abdullah's filter on click pull request + - I rebased his code on the latest master branch and tested adding filter on click to the map and list components, and it works fine + - There seems to be a bug that breaks scrolling on the page though... + - Abdullah fixed the bug in the filter on click branch + diff --git a/docs/2021-06/index.html b/docs/2021-06/index.html index de2d35fdf..eb5dd369d 100644 --- a/docs/2021-06/index.html +++ b/docs/2021-06/index.html @@ -20,7 +20,7 @@ I simply started it and AReS was running again: - + @@ -46,9 +46,9 @@ I simply started it and AReS was running again: "@type": "BlogPosting", "headline": "June, 2021", "url": "https://alanorth.github.io/cgspace-notes/2021-06/", - "wordCount": "2651", + "wordCount": "2993", "datePublished": "2021-06-01T10:51:07+03:00", - "dateModified": "2021-06-25T09:34:29+03:00", + "dateModified": "2021-06-25T21:32:18+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -532,6 +532,82 @@ hash
  • There is clearly something wrong with the new DSpace health check plugin, as it creates WAY too many jobs every time we run the plugins
  • +

    2021-06-27

    + +

    DSpace sessions

    + +
    $ for file in dspace.log.2021-06-[12]*; do echo "$file"; grep -oE 'session_id=[A-Z0-9]{32}' "$file" | sort | uniq | wc -l; done
    +dspace.log.2021-06-10
    +19072
    +dspace.log.2021-06-11
    +19224
    +dspace.log.2021-06-12
    +19215
    +dspace.log.2021-06-13
    +16721
    +dspace.log.2021-06-14
    +17880
    +dspace.log.2021-06-15
    +12103
    +dspace.log.2021-06-16
    +4651
    +dspace.log.2021-06-17
    +22785
    +dspace.log.2021-06-18
    +21406
    +dspace.log.2021-06-19
    +25967
    +dspace.log.2021-06-20
    +20850
    +dspace.log.2021-06-21
    +6388
    +dspace.log.2021-06-22
    +5945
    +dspace.log.2021-06-23
    +46371
    +dspace.log.2021-06-24
    +9024
    +dspace.log.2021-06-25
    +12521
    +dspace.log.2021-06-26
    +16163
    +dspace.log.2021-06-27
    +5886
    +
    +
    # zcat /var/log/nginx/access.log.5.gz /var/log/nginx/access.log.4.gz | grep '23/Jun/2021' | awk '{print $1}' | sort | uniq | wc -l
    +15835
    +
    diff --git a/docs/categories/index.html b/docs/categories/index.html index cb1942966..ee2a0ca74 100644 --- a/docs/categories/index.html +++ b/docs/categories/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html index fb7fe4d7a..190348ac8 100644 --- a/docs/categories/notes/index.html +++ b/docs/categories/notes/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html index 696ef8c2d..a3ed5f38b 100644 --- a/docs/categories/notes/page/2/index.html +++ b/docs/categories/notes/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html index 9c5e03155..5faca9f46 100644 --- a/docs/categories/notes/page/3/index.html +++ b/docs/categories/notes/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html index 314b2cc78..6518472be 100644 --- a/docs/categories/notes/page/4/index.html +++ b/docs/categories/notes/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html index 5ad0bf4a0..163eecac0 100644 --- a/docs/categories/notes/page/5/index.html +++ b/docs/categories/notes/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/index.html b/docs/index.html index d46b63d80..c06df0420 100644 --- a/docs/index.html +++ b/docs/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/2/index.html b/docs/page/2/index.html index 6e814e654..7088cf806 100644 --- a/docs/page/2/index.html +++ b/docs/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/3/index.html b/docs/page/3/index.html index a885ed817..7695acdad 100644 --- a/docs/page/3/index.html +++ b/docs/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/4/index.html b/docs/page/4/index.html index 43336ba91..6547fbd52 100644 --- a/docs/page/4/index.html +++ b/docs/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/5/index.html b/docs/page/5/index.html index 0663fc271..278184ce6 100644 --- a/docs/page/5/index.html +++ b/docs/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/6/index.html b/docs/page/6/index.html index 1cbfa98e2..ac5b4d8bc 100644 --- a/docs/page/6/index.html +++ b/docs/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/7/index.html b/docs/page/7/index.html index 14251d6d5..283b2c5f6 100644 --- a/docs/page/7/index.html +++ b/docs/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/8/index.html b/docs/page/8/index.html index 1fa0c8374..88c580ebf 100644 --- a/docs/page/8/index.html +++ b/docs/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/index.html b/docs/posts/index.html index 2bc37f157..ddc3bf8ff 100644 --- a/docs/posts/index.html +++ b/docs/posts/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html index 4e02eb292..322e7c690 100644 --- a/docs/posts/page/2/index.html +++ b/docs/posts/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html index 129693125..c3c324349 100644 --- a/docs/posts/page/3/index.html +++ b/docs/posts/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html index 8489a09d7..304e08bf7 100644 --- a/docs/posts/page/4/index.html +++ b/docs/posts/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html index e9a7fa48c..5932a457d 100644 --- a/docs/posts/page/5/index.html +++ b/docs/posts/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html index d479918ea..baa80fb76 100644 --- a/docs/posts/page/6/index.html +++ b/docs/posts/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html index c66429598..09a5b4570 100644 --- a/docs/posts/page/7/index.html +++ b/docs/posts/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/8/index.html b/docs/posts/page/8/index.html index 84ddc22cf..a18f769ff 100644 --- a/docs/posts/page/8/index.html +++ b/docs/posts/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index c58100387..86a82e8ab 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -3,19 +3,19 @@ xmlns:xhtml="http://www.w3.org/1999/xhtml"> https://alanorth.github.io/cgspace-notes/categories/ - 2021-06-25T09:34:29+03:00 + 2021-06-25T21:32:18+03:00 https://alanorth.github.io/cgspace-notes/ - 2021-06-25T09:34:29+03:00 + 2021-06-25T21:32:18+03:00 https://alanorth.github.io/cgspace-notes/2021-06/ - 2021-06-25T09:34:29+03:00 + 2021-06-25T21:32:18+03:00 https://alanorth.github.io/cgspace-notes/categories/notes/ - 2021-06-25T09:34:29+03:00 + 2021-06-25T21:32:18+03:00 https://alanorth.github.io/cgspace-notes/posts/ - 2021-06-25T09:34:29+03:00 + 2021-06-25T21:32:18+03:00 https://alanorth.github.io/cgspace-notes/2021-05/ 2021-05-30T22:09:06+03:00