diff --git a/content/posts/2022-07.md b/content/posts/2022-07.md
index 14e7568eb..f6803327f 100644
--- a/content/posts/2022-07.md
+++ b/content/posts/2022-07.md
@@ -32,4 +32,53 @@ Time: 399.751 ms
- Start a harvest on AReS
+## 2022-07-04
+
+- Linode told me that CGSpace had high load yesterday
+ - I also got some up and down notices from UptimeRobot
+ - Looking now, I see there was a very high CPU and database pool load, but a mostly normal DSpace session count
+
+![CPU load day](/cgspace-notes/2022/07/cpu-day.png)
+![JDBC pool day](/cgspace-notes/2022/07/jmx_tomcat_dbpools-day.png)
+
+- Seems we have some old database transactions since 2022-06-27:
+
+![PostgreSQL locks week](/cgspace-notes/2022/07/postgres_locks_ALL-week.png)
+![PostgreSQL query length week](/cgspace-notes/2022/07/postgres_querylength_ALL-week.png)
+
+- Looking at the top connections to nginx yesterday:
+
+```console
+# awk '{print $1}' /var/log/nginx/{access,library-access,oai,rest}.log.1 | sort | uniq -c | sort -h | tail
+ 1132 64.124.8.34
+ 1146 2a01:4f8:1c17:5550::1
+ 1380 137.184.159.211
+ 1533 64.124.8.59
+ 4013 80.248.237.167
+ 4776 54.195.118.125
+ 10482 45.5.186.2
+ 11177 172.104.229.92
+ 15855 2a01:7e00::f03c:91ff:fe9a:3a37
+ 22179 64.39.98.251
+```
+
+- And the total number of unique IPs:
+
+```console
+# awk '{print $1}' /var/log/nginx/{access,library-access,oai,rest}.log.1 | sort -u | wc -l
+6952
+```
+
+- This seems low, so it must have been from the request patterns by certain visitors
+ - 64.39.98.251 is Qualys, and I'm debating blocking [all their IPs](https://pci.qualys.com/static/help/merchant/getting_started/check_scanner_ip_addresses.htm) using a geo block in nginx (need to test)
+ - The top few are known ILRI and other CGIAR scrapers, but 80.248.237.167 is on InternetVikings in Sweden, using a normal user agentand scraping Discover
+ - 64.124.8.59 is making requests with a normal user agent and belongs to Castle Global or Zayo
+- I ran all system updates and rebooted the server (could have just restarted PostgreSQL but I thought I might as well do everything)
+- I implemented a geo mapping for the user agent mapping AND the nginx `limit_req_zone` by extracting the networks into an external file and including it in two different geo mapping blocks
+ - This is clever and relies on the fact that we can use defaults in both cases
+ - First, we map the user agent of requests from these networks to "bot" so that Tomcat and Solr handle them accordingly
+ - Second, we use this as a key in a `limit_req_zone`, which relies on a default mapping of '' (and nginx doesn't evaluate empty cache keys)
+- I noticed that CIP uploaded a number of Georgian presentations with `dcterms.language` set to English and Other so I changed them to "ka"
+ - Perhaps we need to update our list of languages to include all instead of the most common ones
+
diff --git a/docs/2022-06/index.html b/docs/2022-06/index.html
index ebdc6e2d2..de34877ec 100644
--- a/docs/2022-06/index.html
+++ b/docs/2022-06/index.html
@@ -26,7 +26,7 @@ There seem to be many more of these:
-
+
@@ -60,7 +60,7 @@ There seem to be many more of these:
"url": "https://alanorth.github.io/cgspace-notes/2022-06/",
"wordCount": "1786",
"datePublished": "2022-06-06T09:01:36+03:00",
- "dateModified": "2022-06-30T16:48:03+03:00",
+ "dateModified": "2022-07-04T09:25:14+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
diff --git a/docs/2022-07/index.html b/docs/2022-07/index.html
index 80e7262e7..b19f9153d 100644
--- a/docs/2022-07/index.html
+++ b/docs/2022-07/index.html
@@ -19,7 +19,7 @@ Also, the trgm functions I’ve used before are case insensitive, but Levens
-
+
@@ -44,9 +44,9 @@ Also, the trgm functions I’ve used before are case insensitive, but Levens
"@type": "BlogPosting",
"headline": "July, 2022",
"url": "https://alanorth.github.io/cgspace-notes/2022-07/",
- "wordCount": "164",
+ "wordCount": "507",
"datePublished": "2022-07-02T14:07:36+03:00",
- "dateModified": "2022-07-02T14:07:36+03:00",
+ "dateModified": "2022-07-04T09:25:14+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@@ -147,6 +147,63 @@ Also, the trgm functions I’ve used before are case insensitive, but Levens
+2022-07-04
+
+- Linode told me that CGSpace had high load yesterday
+
+- I also got some up and down notices from UptimeRobot
+- Looking now, I see there was a very high CPU and database pool load, but a mostly normal DSpace session count
+
+
+
+
+
+
+- Seems we have some old database transactions since 2022-06-27:
+
+
+
+
+- Looking at the top connections to nginx yesterday:
+
+# awk '{print $1}' /var/log/nginx/{access,library-access,oai,rest}.log.1 | sort | uniq -c | sort -h | tail
+ 1132 64.124.8.34
+ 1146 2a01:4f8:1c17:5550::1
+ 1380 137.184.159.211
+ 1533 64.124.8.59
+ 4013 80.248.237.167
+ 4776 54.195.118.125
+ 10482 45.5.186.2
+ 11177 172.104.229.92
+ 15855 2a01:7e00::f03c:91ff:fe9a:3a37
+ 22179 64.39.98.251
+
+- And the total number of unique IPs:
+
+# awk '{print $1}' /var/log/nginx/{access,library-access,oai,rest}.log.1 | sort -u | wc -l
+6952
+
+- This seems low, so it must have been from the request patterns by certain visitors
+
+- 64.39.98.251 is Qualys, and I’m debating blocking all their IPs using a geo block in nginx (need to test)
+- The top few are known ILRI and other CGIAR scrapers, but 80.248.237.167 is on InternetVikings in Sweden, using a normal user agentand scraping Discover
+- 64.124.8.59 is making requests with a normal user agent and belongs to Castle Global or Zayo
+
+
+- I ran all system updates and rebooted the server (could have just restarted PostgreSQL but I thought I might as well do everything)
+- I implemented a geo mapping for the user agent mapping AND the nginx
limit_req_zone
by extracting the networks into an external file and including it in two different geo mapping blocks
+
+- This is clever and relies on the fact that we can use defaults in both cases
+- First, we map the user agent of requests from these networks to “bot” so that Tomcat and Solr handle them accordingly
+- Second, we use this as a key in a
limit_req_zone
, which relies on a default mapping of ’’ (and nginx doesn’t evaluate empty cache keys)
+
+
+- I noticed that CIP uploaded a number of Georgian presentations with
dcterms.language
set to English and Other so I changed them to “ka”
+
+- Perhaps we need to update our list of languages to include all instead of the most common ones
+
+
+
diff --git a/docs/2022/07/cpu-day.png b/docs/2022/07/cpu-day.png
new file mode 100644
index 000000000..09c6ea9f8
Binary files /dev/null and b/docs/2022/07/cpu-day.png differ
diff --git a/docs/2022/07/jmx_tomcat_dbpools-day.png b/docs/2022/07/jmx_tomcat_dbpools-day.png
new file mode 100644
index 000000000..e1d251489
Binary files /dev/null and b/docs/2022/07/jmx_tomcat_dbpools-day.png differ
diff --git a/docs/2022/07/postgres_locks_ALL-week.png b/docs/2022/07/postgres_locks_ALL-week.png
new file mode 100644
index 000000000..9079e8f19
Binary files /dev/null and b/docs/2022/07/postgres_locks_ALL-week.png differ
diff --git a/docs/2022/07/postgres_querylength_ALL-week.png b/docs/2022/07/postgres_querylength_ALL-week.png
new file mode 100644
index 000000000..5cb704313
Binary files /dev/null and b/docs/2022/07/postgres_querylength_ALL-week.png differ
diff --git a/docs/categories/index.html b/docs/categories/index.html
index bec37df3f..964e5efbc 100644
--- a/docs/categories/index.html
+++ b/docs/categories/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html
index b5cb216a6..e3e5c7ef5 100644
--- a/docs/categories/notes/index.html
+++ b/docs/categories/notes/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html
index 03ec0a28e..70bff6411 100644
--- a/docs/categories/notes/page/2/index.html
+++ b/docs/categories/notes/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html
index f9604243b..bb084bde7 100644
--- a/docs/categories/notes/page/3/index.html
+++ b/docs/categories/notes/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html
index eaf0201da..b8e801f24 100644
--- a/docs/categories/notes/page/4/index.html
+++ b/docs/categories/notes/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html
index ec41ab99a..5122a29bd 100644
--- a/docs/categories/notes/page/5/index.html
+++ b/docs/categories/notes/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/6/index.html b/docs/categories/notes/page/6/index.html
index 371701750..2ab773135 100644
--- a/docs/categories/notes/page/6/index.html
+++ b/docs/categories/notes/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/7/index.html b/docs/categories/notes/page/7/index.html
index 625bc4856..f639854c9 100644
--- a/docs/categories/notes/page/7/index.html
+++ b/docs/categories/notes/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/index.html b/docs/index.html
index 1ada86612..7c6ce7763 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/2/index.html b/docs/page/2/index.html
index 1379df2db..46d9584ce 100644
--- a/docs/page/2/index.html
+++ b/docs/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/3/index.html b/docs/page/3/index.html
index 2a3d5d815..6feecf006 100644
--- a/docs/page/3/index.html
+++ b/docs/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/4/index.html b/docs/page/4/index.html
index 7df44be06..756b85335 100644
--- a/docs/page/4/index.html
+++ b/docs/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/5/index.html b/docs/page/5/index.html
index aec3dc29e..db3b1b94e 100644
--- a/docs/page/5/index.html
+++ b/docs/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/6/index.html b/docs/page/6/index.html
index 6b7062bae..a290bc42e 100644
--- a/docs/page/6/index.html
+++ b/docs/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/7/index.html b/docs/page/7/index.html
index 414151a35..d219c86d6 100644
--- a/docs/page/7/index.html
+++ b/docs/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/8/index.html b/docs/page/8/index.html
index 11add4848..28a425edc 100644
--- a/docs/page/8/index.html
+++ b/docs/page/8/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/9/index.html b/docs/page/9/index.html
index 5fcda1e0b..669afee0f 100644
--- a/docs/page/9/index.html
+++ b/docs/page/9/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/index.html b/docs/posts/index.html
index 0ef2bd156..ab51cddd6 100644
--- a/docs/posts/index.html
+++ b/docs/posts/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html
index 181b2e39f..81bc3d3c8 100644
--- a/docs/posts/page/2/index.html
+++ b/docs/posts/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html
index 0d684fe07..98155ba0d 100644
--- a/docs/posts/page/3/index.html
+++ b/docs/posts/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html
index c7d92dcd6..5d3a4f3b8 100644
--- a/docs/posts/page/4/index.html
+++ b/docs/posts/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html
index 6068f0792..5156ed3a1 100644
--- a/docs/posts/page/5/index.html
+++ b/docs/posts/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html
index 3762b19c9..f6be904f0 100644
--- a/docs/posts/page/6/index.html
+++ b/docs/posts/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html
index 7d05bf0be..27d177e48 100644
--- a/docs/posts/page/7/index.html
+++ b/docs/posts/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/8/index.html b/docs/posts/page/8/index.html
index b5f85c9ad..554e4d9f3 100644
--- a/docs/posts/page/8/index.html
+++ b/docs/posts/page/8/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/9/index.html b/docs/posts/page/9/index.html
index e73e57d5a..951e8d624 100644
--- a/docs/posts/page/9/index.html
+++ b/docs/posts/page/9/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index d5a2ee0d8..7dd053305 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -3,22 +3,22 @@
xmlns:xhtml="http://www.w3.org/1999/xhtml">
https://alanorth.github.io/cgspace-notes/categories/
- 2022-07-02T14:07:36+03:00
+ 2022-07-04T09:25:14+03:00
https://alanorth.github.io/cgspace-notes/
- 2022-07-02T14:07:36+03:00
+ 2022-07-04T09:25:14+03:00
https://alanorth.github.io/cgspace-notes/2022-07/
- 2022-07-02T14:07:36+03:00
+ 2022-07-04T09:25:14+03:00
https://alanorth.github.io/cgspace-notes/categories/notes/
- 2022-07-02T14:07:36+03:00
+ 2022-07-04T09:25:14+03:00
https://alanorth.github.io/cgspace-notes/posts/
- 2022-07-02T14:07:36+03:00
+ 2022-07-04T09:25:14+03:00
https://alanorth.github.io/cgspace-notes/2022-06/
- 2022-06-30T16:48:03+03:00
+ 2022-07-04T09:25:14+03:00
https://alanorth.github.io/cgspace-notes/2022-05/
2022-05-30T16:00:02+03:00
diff --git a/static/2022/07/cpu-day.png b/static/2022/07/cpu-day.png
new file mode 100644
index 000000000..09c6ea9f8
Binary files /dev/null and b/static/2022/07/cpu-day.png differ
diff --git a/static/2022/07/jmx_tomcat_dbpools-day.png b/static/2022/07/jmx_tomcat_dbpools-day.png
new file mode 100644
index 000000000..e1d251489
Binary files /dev/null and b/static/2022/07/jmx_tomcat_dbpools-day.png differ
diff --git a/static/2022/07/postgres_locks_ALL-week.png b/static/2022/07/postgres_locks_ALL-week.png
new file mode 100644
index 000000000..9079e8f19
Binary files /dev/null and b/static/2022/07/postgres_locks_ALL-week.png differ
diff --git a/static/2022/07/postgres_querylength_ALL-week.png b/static/2022/07/postgres_querylength_ALL-week.png
new file mode 100644
index 000000000..5cb704313
Binary files /dev/null and b/static/2022/07/postgres_querylength_ALL-week.png differ