diff --git a/content/posts/2020-08.md b/content/posts/2020-08.md
index 6da0b0541..bb04d3d6e 100644
--- a/content/posts/2020-08.md
+++ b/content/posts/2020-08.md
@@ -120,5 +120,65 @@ $ ./fix-metadata-values.py -i 2020-08-04-PB-new-countries.csv -db dspace -u dspa
- Seems that something happened yesterday afternoon at around 5PM...
- For now I will just run all updates on the server and reboot it, as I have no idea what causes this issue
- I had to restart Tomcat 7 three times after the server came back up before all Solr statistics cores came up properly
+- I checked the nginx logs around 5PM yesterday to see who was accessing the server:
+
+```
+# cat /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E '04/Aug/2020:(17|18)' | goaccess --log-format=COMBINED -
+```
+
+- I see the Macaroni Bros are using their new user agent for harvesting: `RTB website BOT`
+ - But that pattern doesn't match in the nginx bot list or Tomcat's crawler session manager valve because we're only checking for `[Bb]ot`!
+ - So they have created thousands of Tomcat sessions:
+
+```
+$ cat dspace.log.2020-08-04 | grep -E "(63.32.242.35|64.62.202.71)" | grep -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
+5693
+```
+
+- DSpace itself uses a case-sensitive regex for user agents so there are no hits from those IPs in Solr, but I need to tweak the other regexes so they don't misuse the resources
+ - Perhaps `[Bb][Oo][Tt]`...
+- I see another IP 104.198.96.245, which is also using the "RTB website BOT" but there are 70,000 hits in Solr from earlier this year before they started using the user agent
+ - I purged all the hits from Solr, including a few thousand from 64.62.202.71
+- A few more IPs causing lots of Tomcat sessions yesterday:
+
+```
+$ cat dspace.log.2020-08-04 | grep "38.128.66.10" | grep -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
+1585
+$ cat dspace.log.2020-08-04 | grep "64.62.202.71" | grep -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
+5691
+```
+
+- 38.128.66.10 isn't creating any Solr statistics due to our DSpace agents pattern, but they are creating lots of sessions so perhaps I need to force them to use one session in Tomcat:
+
+```
+Mozilla/5.0 (Windows NT 5.1) brokenlinkcheck.com/1.2
+```
+
+- 64.62.202.71 is using a user agent I've never seen before:
+
+```
+Mozilla/5.0 (compatible; +centuryb.o.t9[at]gmail.com)
+```
+
+- So now our "bot" regex can't even match that...
+ - Unless we change it to `[Bb]\.?[Oo]\.?[Tt]\.?`... which seems to match all variations of "bot" I can think of right now, according to [regexr.com](https://regexr.com/59lpt):
+
+```
+RTB website BOT
+Altmetribot
+Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
+Mozilla/5.0 (compatible; +centuryb.o.t9[at]gmail.com)
+Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)
+```
+
+- And another IP belonging to Turnitin (the alternate user agent of Turnitinbot):
+
+```
+$ cat dspace.log.2020-08-04 | grep "199.47.87.145" | grep -E 'sessi
+on_id=[A-Z0-9]{32}' | sort | uniq | wc -l
+2777
+```
+
+- I will add `Turnitin` to the Tomcat Crawler Session Manager Valve regex as well...
diff --git a/docs/categories/index.html b/docs/categories/index.html
index 4f8d39e3b..600183605 100644
--- a/docs/categories/index.html
+++ b/docs/categories/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html
index b30881833..ae303d571 100644
--- a/docs/categories/notes/index.html
+++ b/docs/categories/notes/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html
index 8bd86d7a1..84bdb03d4 100644
--- a/docs/categories/notes/page/2/index.html
+++ b/docs/categories/notes/page/2/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html
index e2aa7f405..45204d66d 100644
--- a/docs/categories/notes/page/3/index.html
+++ b/docs/categories/notes/page/3/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html
index e7649cd42..9ff85015a 100644
--- a/docs/categories/notes/page/4/index.html
+++ b/docs/categories/notes/page/4/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/index.html b/docs/index.html
index 7225b99cd..b004790b8 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/page/2/index.html b/docs/page/2/index.html
index 8b583663b..a82515a14 100644
--- a/docs/page/2/index.html
+++ b/docs/page/2/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/page/3/index.html b/docs/page/3/index.html
index 4f1e34760..46996d5fd 100644
--- a/docs/page/3/index.html
+++ b/docs/page/3/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/page/4/index.html b/docs/page/4/index.html
index 08796aec3..6911371d2 100644
--- a/docs/page/4/index.html
+++ b/docs/page/4/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/page/5/index.html b/docs/page/5/index.html
index 08110ca7d..2927c2bcc 100644
--- a/docs/page/5/index.html
+++ b/docs/page/5/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/page/6/index.html b/docs/page/6/index.html
index 5b159049b..9608e7177 100644
--- a/docs/page/6/index.html
+++ b/docs/page/6/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/posts/index.html b/docs/posts/index.html
index 83b0a73a5..818d1e419 100644
--- a/docs/posts/index.html
+++ b/docs/posts/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html
index b519c5c0c..ab05ed905 100644
--- a/docs/posts/page/2/index.html
+++ b/docs/posts/page/2/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html
index 01915cb62..39532ac5f 100644
--- a/docs/posts/page/3/index.html
+++ b/docs/posts/page/3/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html
index 1e77f1a94..e7da2351c 100644
--- a/docs/posts/page/4/index.html
+++ b/docs/posts/page/4/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html
index 5d62d246c..a1db89c20 100644
--- a/docs/posts/page/5/index.html
+++ b/docs/posts/page/5/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html
index 4b2785e4e..2a06cb3b5 100644
--- a/docs/posts/page/6/index.html
+++ b/docs/posts/page/6/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index 1119d7954..39aff2545 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -4,27 +4,27 @@
https://alanorth.github.io/cgspace-notes/2020-07/
- 2020-08-03T16:27:51+03:00
+ 2020-08-05T15:00:06+03:00
https://alanorth.github.io/cgspace-notes/categories/
- 2020-08-03T16:27:51+03:00
+ 2020-08-05T15:00:06+03:00
https://alanorth.github.io/cgspace-notes/
- 2020-08-03T16:27:51+03:00
+ 2020-08-05T15:00:06+03:00
https://alanorth.github.io/cgspace-notes/categories/notes/
- 2020-08-03T16:27:51+03:00
+ 2020-08-05T15:00:06+03:00
https://alanorth.github.io/cgspace-notes/posts/
- 2020-08-03T16:27:51+03:00
+ 2020-08-05T15:00:06+03:00