diff --git a/content/posts/2022-04.md b/content/posts/2022-04.md
index 1ecf11186..67a0d6e1c 100644
--- a/content/posts/2022-04.md
+++ b/content/posts/2022-04.md
@@ -59,4 +59,115 @@ $ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid =
- Start harvest on AReS
+## 2022-04-18
+
+- I woke up to several notices from UptimeRobot that CGSpace had gone down and up in the night (of course I'm on holiday out of the country for Easter)
+ - I see there are many locks in use from the XMLUI:
+
+```console
+$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | grep -o -E '(dspaceWeb|dspaceApi)' | sort | uniq -c
+ 8932 dspaceWeb
+```
+
+- Looking at the top IPs making requests it seems they are Yandex, bingbot, and Googlebot:
+
+```console
+# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | awk '{print $1}' | sort | uniq -c | sort -h
+ 752 69.162.124.231
+ 759 66.249.64.213
+ 864 66.249.66.222
+ 905 2a01:4f8:221:f::2
+ 1013 84.33.2.97
+ 1201 157.55.39.159
+ 1204 157.55.39.144
+ 1209 157.55.39.102
+ 1217 157.55.39.161
+ 1252 207.46.13.177
+ 1274 157.55.39.162
+ 2553 66.249.66.221
+ 2941 95.108.213.28
+```
+
+- One IP is using a stange user agent though:
+
+```console
+84.33.2.97 - - [18/Apr/2022:00:20:38 +0200] "GET /bitstream/handle/10568/109581/Banana_Blomme%20_2020.pdf.jpg HTTP/1.1" 404 10890 "-" "SomeRandomText"
+```
+
+- Overall, it seems we had 17,000 unique IPs connecting in the last nine hours (currently 9:14AM and log file rolled over at 00:00):
+
+```console
+# cat /var/log/nginx/access.log | awk '{print $1}' | sort | uniq | wc -l
+17314
+```
+
+- That's a lot of unique IPs, and I see some patterns of IPs in China making ten to twenty requests each
+ - The ISPs I've seen so far are ChinaNet and China Unicom
+- I extracted all the IPs from today and resolved them:
+
+```console
+# cat /var/log/nginx/access.log | awk '{print $1}' | sort | uniq > /tmp/2022-04-18-ips.txt
+$ ./ilri/resolve-addresses-geoip2.py -i /tmp/2022-04-18-ips.txt -o /tmp/2022-04-18-ips.csv
+```
+
+- The top ASNs by IP are:
+
+```console
+$ csvcut -c 2 /tmp/2022-04-18-ips.csv | sed 1d | sort | uniq -c | sort -n | tail -n 10
+ 102 GOOGLE
+ 139 Maxihost LTDA
+ 165 AMAZON-02
+ 393 "China Mobile Communications Group Co., Ltd."
+ 473 AMAZON-AES
+ 616 China Mobile communications corporation
+ 642 M247 Ltd
+ 2336 HostRoyale Technologies Pvt Ltd
+ 4556 Chinanet
+ 5527 CHINA UNICOM China169 Backbone
+$ csvcut -c 4 /tmp/2022-04-18-ips.csv | sed 1d | sort | uniq -c | sort -n | tail -n 10
+ 139 262287
+ 165 16509
+ 180 204287
+ 393 9808
+ 473 14618
+ 615 56041
+ 642 9009
+ 2156 203020
+ 4556 4134
+ 5527 4837
+```
+
+- I spot checked a few IPs from each of these and they are definitely just making bullshit requests to Discovery and HTML sitemap etc
+- I will download the IP blocks for each ASN except Google and Amazon and ban them
+
+```console
+$ wget https://asn.ipinfo.app/api/text/nginx/AS4837 https://asn.ipinfo.app/api/text/nginx/AS4134 https://asn.ipinfo.app/api/text/nginx/AS203020 https://asn.ipinfo.app/api/text/nginx/AS9009 https://asn.ipinfo.app/api/text/nginx/AS56041 https://asn.ipinfo.app/api/text/nginx/AS9808
+$ cat AS* | sed -e '/^$/d' -e '/^#/d' -e '/^{/d' -e 's/deny //' -e 's/;//' | sort | uniq | wc -l
+20296
+```
+
+- I extracted the IPv4 and IPv6 networks:
+
+```console
+$ cat AS* | sed -e '/^$/d' -e '/^#/d' -e '/^{/d' -e 's/deny //' -e 's/;//' | grep ":" | sort > /tmp/ipv6-networks.txt
+$ cat AS* | sed -e '/^$/d' -e '/^#/d' -e '/^{/d' -e 's/deny //' -e 's/;//' | grep -v ":" | sort > /tmp/ipv4-networks.txt
+```
+
+- I suspect we need to aggregate these networks since they are so many and nftables doesn't like it when they overlap:
+
+```console
+$ wc -l /tmp/ipv4-networks.txt
+15464 /tmp/ipv4-networks.txt
+$ aggregate6 /tmp/ipv4-networks.txt | wc -l
+2781
+$ wc -l /tmp/ipv6-networks.txt
+4833 /tmp/ipv6-networks.txt
+$ aggregate6 /tmp/ipv6-networks.txt | wc -l
+338
+```
+
+- I deployed these lists on CGSpace, ran all updates, and rebooted the server
+ - This list is SURELY too broad because we will block legitimate users in China... but right now how can I discern?
+ - Also, I need to purge the hits from these 14,000 IPs in Solr when I get time
+
diff --git a/docs/categories/index.html b/docs/categories/index.html
index efcd9efba..cd28ecdec 100644
--- a/docs/categories/index.html
+++ b/docs/categories/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html
index df8443b2b..70aad965c 100644
--- a/docs/categories/notes/index.html
+++ b/docs/categories/notes/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html
index e1b4ee146..96c24db5a 100644
--- a/docs/categories/notes/page/2/index.html
+++ b/docs/categories/notes/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html
index 98a50eaca..ed5f7cd9c 100644
--- a/docs/categories/notes/page/3/index.html
+++ b/docs/categories/notes/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html
index fe8445825..3df912579 100644
--- a/docs/categories/notes/page/4/index.html
+++ b/docs/categories/notes/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html
index fe293b67e..c2bfddd18 100644
--- a/docs/categories/notes/page/5/index.html
+++ b/docs/categories/notes/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/6/index.html b/docs/categories/notes/page/6/index.html
index 5b57afe9a..12eb35e61 100644
--- a/docs/categories/notes/page/6/index.html
+++ b/docs/categories/notes/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/index.html b/docs/index.html
index df8f4712f..0557d1c84 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/2/index.html b/docs/page/2/index.html
index e7c07e40b..269f98cd7 100644
--- a/docs/page/2/index.html
+++ b/docs/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/3/index.html b/docs/page/3/index.html
index 88ee58052..501f03f50 100644
--- a/docs/page/3/index.html
+++ b/docs/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/4/index.html b/docs/page/4/index.html
index 06d2efacc..5480b1326 100644
--- a/docs/page/4/index.html
+++ b/docs/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/5/index.html b/docs/page/5/index.html
index e4ae38bc4..5e80c6830 100644
--- a/docs/page/5/index.html
+++ b/docs/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/6/index.html b/docs/page/6/index.html
index e243e5984..d90e3c5d2 100644
--- a/docs/page/6/index.html
+++ b/docs/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/7/index.html b/docs/page/7/index.html
index afac6a279..3851243b7 100644
--- a/docs/page/7/index.html
+++ b/docs/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/8/index.html b/docs/page/8/index.html
index c7bd82c41..528ce76db 100644
--- a/docs/page/8/index.html
+++ b/docs/page/8/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/9/index.html b/docs/page/9/index.html
index 372c2ece4..dd35a37f7 100644
--- a/docs/page/9/index.html
+++ b/docs/page/9/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/index.html b/docs/posts/index.html
index b3ea26d9d..b973521be 100644
--- a/docs/posts/index.html
+++ b/docs/posts/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html
index 4347feb4e..cde889089 100644
--- a/docs/posts/page/2/index.html
+++ b/docs/posts/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html
index 34e979e65..981ae405b 100644
--- a/docs/posts/page/3/index.html
+++ b/docs/posts/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html
index d0b152f7b..0daac6eb4 100644
--- a/docs/posts/page/4/index.html
+++ b/docs/posts/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html
index 35dfd7d43..b00a39e51 100644
--- a/docs/posts/page/5/index.html
+++ b/docs/posts/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html
index 723080c6e..5c8c57e95 100644
--- a/docs/posts/page/6/index.html
+++ b/docs/posts/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html
index 3c67f66e2..d47f3132a 100644
--- a/docs/posts/page/7/index.html
+++ b/docs/posts/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/8/index.html b/docs/posts/page/8/index.html
index 8c646fa95..d295db04b 100644
--- a/docs/posts/page/8/index.html
+++ b/docs/posts/page/8/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/9/index.html b/docs/posts/page/9/index.html
index 7aa0ff354..0fd50414d 100644
--- a/docs/posts/page/9/index.html
+++ b/docs/posts/page/9/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index 7baed02e6..6b9a5c4ff 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -3,22 +3,22 @@
xmlns:xhtml="http://www.w3.org/1999/xhtml">
https://alanorth.github.io/cgspace-notes/categories/
- 2022-04-13T16:52:34+03:00
+ 2022-04-16T22:41:45+03:00
https://alanorth.github.io/cgspace-notes/
- 2022-04-13T16:52:34+03:00
+ 2022-04-16T22:41:45+03:00
https://alanorth.github.io/cgspace-notes/2022-03/
2022-04-04T19:15:58+03:00
https://alanorth.github.io/cgspace-notes/categories/notes/
- 2022-04-13T16:52:34+03:00
+ 2022-04-16T22:41:45+03:00
https://alanorth.github.io/cgspace-notes/posts/
- 2022-04-13T16:52:34+03:00
+ 2022-04-16T22:41:45+03:00
https://alanorth.github.io/cgspace-notes/2022-03/
- 2022-04-13T16:52:34+03:00
+ 2022-04-16T22:41:45+03:00
https://alanorth.github.io/cgspace-notes/2022-02/
2022-03-01T17:17:27+03:00