diff --git a/content/posts/2021-06.md b/content/posts/2021-06.md
index f2ba33f27..6cbe0886f 100644
--- a/content/posts/2021-06.md
+++ b/content/posts/2021-06.md
@@ -369,4 +369,72 @@ $ redis-cli KEYS "bull:plugins:*" \
- I thought of using `redis-cli --pipe` but then you have to construct the commands in the redis protocol format with the number of args and length of each command
- There is clearly something wrong with the new DSpace health check plugin, as it creates WAY too many jobs every time we run the plugins
+## 2021-06-27
+
+- Looking into the spike in PostgreSQL connections last week
+ - I see the same things that I always see (large number of connections waiting for lock, large number of threads, high CPU usage, etc), but I also see almost 10,000 DSpace sessions on 2021-06-25
+
+![DSpace sessions](/cgspace-notes/2021/06/dspace-sessions-week.png)
+
+- Looking at the DSpace log I see there was definitely a higher number of sessions that day, perhaps twice the normal:
+
+```console
+$ for file in dspace.log.2021-06-[12]*; do echo "$file"; grep -oE 'session_id=[A-Z0-9]{32}' "$file" | sort | uniq | wc -l; done
+dspace.log.2021-06-10
+19072
+dspace.log.2021-06-11
+19224
+dspace.log.2021-06-12
+19215
+dspace.log.2021-06-13
+16721
+dspace.log.2021-06-14
+17880
+dspace.log.2021-06-15
+12103
+dspace.log.2021-06-16
+4651
+dspace.log.2021-06-17
+22785
+dspace.log.2021-06-18
+21406
+dspace.log.2021-06-19
+25967
+dspace.log.2021-06-20
+20850
+dspace.log.2021-06-21
+6388
+dspace.log.2021-06-22
+5945
+dspace.log.2021-06-23
+46371
+dspace.log.2021-06-24
+9024
+dspace.log.2021-06-25
+12521
+dspace.log.2021-06-26
+16163
+dspace.log.2021-06-27
+5886
+```
+
+- I see 15,000 unique IPs in the XMLUI logs alone on that day:
+
+```console
+# zcat /var/log/nginx/access.log.5.gz /var/log/nginx/access.log.4.gz | grep '23/Jun/2021' | awk '{print $1}' | sort | uniq | wc -l
+15835
+```
+
+- Annoyingly I found 37,000 more hits from Bing using `dns:*msnbot* AND dns:*.msn.com.` as a Solr filter
+ - WTF, they are using a normal user agent: `Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko`
+ - I will purge the IPs and add this user agent to the nginx config so that we can rate limit it
+- I signed up for Bing Webmaster Tools and verified cgspace.cgiar.org with the BingSiteAuth.xml file
+ - Also I adjusted the nginx config to explicitly allow access to `robots.txt` even when bots are rate limited
+ - Also I found that Bing was auto discovering all our RSS and Atom feeds as "sitemaps" so I deleted 750 of them and submitted the real sitemap
+ - I need to see if I can adjust the nginx config further to map the `bot` user agent to DNS like msnbot...
+- Review Abdullah's filter on click pull request
+ - I rebased his code on the latest master branch and tested adding filter on click to the map and list components, and it works fine
+ - There seems to be a bug that breaks scrolling on the page though...
+ - Abdullah fixed the bug in the filter on click branch
+
diff --git a/docs/2021-06/index.html b/docs/2021-06/index.html
index de2d35fdf..eb5dd369d 100644
--- a/docs/2021-06/index.html
+++ b/docs/2021-06/index.html
@@ -20,7 +20,7 @@ I simply started it and AReS was running again:
-
+
@@ -46,9 +46,9 @@ I simply started it and AReS was running again:
"@type": "BlogPosting",
"headline": "June, 2021",
"url": "https://alanorth.github.io/cgspace-notes/2021-06/",
- "wordCount": "2651",
+ "wordCount": "2993",
"datePublished": "2021-06-01T10:51:07+03:00",
- "dateModified": "2021-06-25T09:34:29+03:00",
+ "dateModified": "2021-06-25T21:32:18+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@@ -532,6 +532,82 @@ hash
There is clearly something wrong with the new DSpace health check plugin, as it creates WAY too many jobs every time we run the plugins
+2021-06-27
+
+- Looking into the spike in PostgreSQL connections last week
+
+- I see the same things that I always see (large number of connections waiting for lock, large number of threads, high CPU usage, etc), but I also see almost 10,000 DSpace sessions on 2021-06-25
+
+
+
+
+
+- Looking at the DSpace log I see there was definitely a higher number of sessions that day, perhaps twice the normal:
+
+$ for file in dspace.log.2021-06-[12]*; do echo "$file"; grep -oE 'session_id=[A-Z0-9]{32}' "$file" | sort | uniq | wc -l; done
+dspace.log.2021-06-10
+19072
+dspace.log.2021-06-11
+19224
+dspace.log.2021-06-12
+19215
+dspace.log.2021-06-13
+16721
+dspace.log.2021-06-14
+17880
+dspace.log.2021-06-15
+12103
+dspace.log.2021-06-16
+4651
+dspace.log.2021-06-17
+22785
+dspace.log.2021-06-18
+21406
+dspace.log.2021-06-19
+25967
+dspace.log.2021-06-20
+20850
+dspace.log.2021-06-21
+6388
+dspace.log.2021-06-22
+5945
+dspace.log.2021-06-23
+46371
+dspace.log.2021-06-24
+9024
+dspace.log.2021-06-25
+12521
+dspace.log.2021-06-26
+16163
+dspace.log.2021-06-27
+5886
+
+- I see 15,000 unique IPs in the XMLUI logs alone on that day:
+
+# zcat /var/log/nginx/access.log.5.gz /var/log/nginx/access.log.4.gz | grep '23/Jun/2021' | awk '{print $1}' | sort | uniq | wc -l
+15835
+
+- Annoyingly I found 37,000 more hits from Bing using
dns:*msnbot* AND dns:*.msn.com.
as a Solr filter
+
+- WTF, they are using a normal user agent:
Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko
+- I will purge the IPs and add this user agent to the nginx config so that we can rate limit it
+
+
+- I signed up for Bing Webmaster Tools and verified cgspace.cgiar.org with the BingSiteAuth.xml file
+
+- Also I adjusted the nginx config to explicitly allow access to
robots.txt
even when bots are rate limited
+- Also I found that Bing was auto discovering all our RSS and Atom feeds as “sitemaps” so I deleted 750 of them and submitted the real sitemap
+- I need to see if I can adjust the nginx config further to map the
bot
user agent to DNS like msnbot…
+
+
+- Review Abdullah’s filter on click pull request
+
+- I rebased his code on the latest master branch and tested adding filter on click to the map and list components, and it works fine
+- There seems to be a bug that breaks scrolling on the page though…
+- Abdullah fixed the bug in the filter on click branch
+
+
+
diff --git a/docs/categories/index.html b/docs/categories/index.html
index cb1942966..ee2a0ca74 100644
--- a/docs/categories/index.html
+++ b/docs/categories/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html
index fb7fe4d7a..190348ac8 100644
--- a/docs/categories/notes/index.html
+++ b/docs/categories/notes/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html
index 696ef8c2d..a3ed5f38b 100644
--- a/docs/categories/notes/page/2/index.html
+++ b/docs/categories/notes/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html
index 9c5e03155..5faca9f46 100644
--- a/docs/categories/notes/page/3/index.html
+++ b/docs/categories/notes/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html
index 314b2cc78..6518472be 100644
--- a/docs/categories/notes/page/4/index.html
+++ b/docs/categories/notes/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html
index 5ad0bf4a0..163eecac0 100644
--- a/docs/categories/notes/page/5/index.html
+++ b/docs/categories/notes/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/index.html b/docs/index.html
index d46b63d80..c06df0420 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/2/index.html b/docs/page/2/index.html
index 6e814e654..7088cf806 100644
--- a/docs/page/2/index.html
+++ b/docs/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/3/index.html b/docs/page/3/index.html
index a885ed817..7695acdad 100644
--- a/docs/page/3/index.html
+++ b/docs/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/4/index.html b/docs/page/4/index.html
index 43336ba91..6547fbd52 100644
--- a/docs/page/4/index.html
+++ b/docs/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/5/index.html b/docs/page/5/index.html
index 0663fc271..278184ce6 100644
--- a/docs/page/5/index.html
+++ b/docs/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/6/index.html b/docs/page/6/index.html
index 1cbfa98e2..ac5b4d8bc 100644
--- a/docs/page/6/index.html
+++ b/docs/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/7/index.html b/docs/page/7/index.html
index 14251d6d5..283b2c5f6 100644
--- a/docs/page/7/index.html
+++ b/docs/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/8/index.html b/docs/page/8/index.html
index 1fa0c8374..88c580ebf 100644
--- a/docs/page/8/index.html
+++ b/docs/page/8/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/index.html b/docs/posts/index.html
index 2bc37f157..ddc3bf8ff 100644
--- a/docs/posts/index.html
+++ b/docs/posts/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html
index 4e02eb292..322e7c690 100644
--- a/docs/posts/page/2/index.html
+++ b/docs/posts/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html
index 129693125..c3c324349 100644
--- a/docs/posts/page/3/index.html
+++ b/docs/posts/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html
index 8489a09d7..304e08bf7 100644
--- a/docs/posts/page/4/index.html
+++ b/docs/posts/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html
index e9a7fa48c..5932a457d 100644
--- a/docs/posts/page/5/index.html
+++ b/docs/posts/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html
index d479918ea..baa80fb76 100644
--- a/docs/posts/page/6/index.html
+++ b/docs/posts/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html
index c66429598..09a5b4570 100644
--- a/docs/posts/page/7/index.html
+++ b/docs/posts/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/8/index.html b/docs/posts/page/8/index.html
index 84ddc22cf..a18f769ff 100644
--- a/docs/posts/page/8/index.html
+++ b/docs/posts/page/8/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index c58100387..86a82e8ab 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -3,19 +3,19 @@
xmlns:xhtml="http://www.w3.org/1999/xhtml">
https://alanorth.github.io/cgspace-notes/categories/
- 2021-06-25T09:34:29+03:00
+ 2021-06-25T21:32:18+03:00
https://alanorth.github.io/cgspace-notes/
- 2021-06-25T09:34:29+03:00
+ 2021-06-25T21:32:18+03:00
https://alanorth.github.io/cgspace-notes/2021-06/
- 2021-06-25T09:34:29+03:00
+ 2021-06-25T21:32:18+03:00
https://alanorth.github.io/cgspace-notes/categories/notes/
- 2021-06-25T09:34:29+03:00
+ 2021-06-25T21:32:18+03:00
https://alanorth.github.io/cgspace-notes/posts/
- 2021-06-25T09:34:29+03:00
+ 2021-06-25T21:32:18+03:00
https://alanorth.github.io/cgspace-notes/2021-05/
2021-05-30T22:09:06+03:00