diff --git a/content/posts/2022-04.md b/content/posts/2022-04.md index 1ecf11186..67a0d6e1c 100644 --- a/content/posts/2022-04.md +++ b/content/posts/2022-04.md @@ -59,4 +59,115 @@ $ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = - Start harvest on AReS +## 2022-04-18 + +- I woke up to several notices from UptimeRobot that CGSpace had gone down and up in the night (of course I'm on holiday out of the country for Easter) + - I see there are many locks in use from the XMLUI: + +```console +$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | grep -o -E '(dspaceWeb|dspaceApi)' | sort | uniq -c + 8932 dspaceWeb +``` + +- Looking at the top IPs making requests it seems they are Yandex, bingbot, and Googlebot: + +```console +# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | awk '{print $1}' | sort | uniq -c | sort -h + 752 69.162.124.231 + 759 66.249.64.213 + 864 66.249.66.222 + 905 2a01:4f8:221:f::2 + 1013 84.33.2.97 + 1201 157.55.39.159 + 1204 157.55.39.144 + 1209 157.55.39.102 + 1217 157.55.39.161 + 1252 207.46.13.177 + 1274 157.55.39.162 + 2553 66.249.66.221 + 2941 95.108.213.28 +``` + +- One IP is using a stange user agent though: + +```console +84.33.2.97 - - [18/Apr/2022:00:20:38 +0200] "GET /bitstream/handle/10568/109581/Banana_Blomme%20_2020.pdf.jpg HTTP/1.1" 404 10890 "-" "SomeRandomText" +``` + +- Overall, it seems we had 17,000 unique IPs connecting in the last nine hours (currently 9:14AM and log file rolled over at 00:00): + +```console +# cat /var/log/nginx/access.log | awk '{print $1}' | sort | uniq | wc -l +17314 +``` + +- That's a lot of unique IPs, and I see some patterns of IPs in China making ten to twenty requests each + - The ISPs I've seen so far are ChinaNet and China Unicom +- I extracted all the IPs from today and resolved them: + +```console +# cat /var/log/nginx/access.log | awk '{print $1}' | sort | uniq > /tmp/2022-04-18-ips.txt +$ ./ilri/resolve-addresses-geoip2.py -i /tmp/2022-04-18-ips.txt -o /tmp/2022-04-18-ips.csv +``` + +- The top ASNs by IP are: + +```console +$ csvcut -c 2 /tmp/2022-04-18-ips.csv | sed 1d | sort | uniq -c | sort -n | tail -n 10 + 102 GOOGLE + 139 Maxihost LTDA + 165 AMAZON-02 + 393 "China Mobile Communications Group Co., Ltd." + 473 AMAZON-AES + 616 China Mobile communications corporation + 642 M247 Ltd + 2336 HostRoyale Technologies Pvt Ltd + 4556 Chinanet + 5527 CHINA UNICOM China169 Backbone +$ csvcut -c 4 /tmp/2022-04-18-ips.csv | sed 1d | sort | uniq -c | sort -n | tail -n 10 + 139 262287 + 165 16509 + 180 204287 + 393 9808 + 473 14618 + 615 56041 + 642 9009 + 2156 203020 + 4556 4134 + 5527 4837 +``` + +- I spot checked a few IPs from each of these and they are definitely just making bullshit requests to Discovery and HTML sitemap etc +- I will download the IP blocks for each ASN except Google and Amazon and ban them + +```console +$ wget https://asn.ipinfo.app/api/text/nginx/AS4837 https://asn.ipinfo.app/api/text/nginx/AS4134 https://asn.ipinfo.app/api/text/nginx/AS203020 https://asn.ipinfo.app/api/text/nginx/AS9009 https://asn.ipinfo.app/api/text/nginx/AS56041 https://asn.ipinfo.app/api/text/nginx/AS9808 +$ cat AS* | sed -e '/^$/d' -e '/^#/d' -e '/^{/d' -e 's/deny //' -e 's/;//' | sort | uniq | wc -l +20296 +``` + +- I extracted the IPv4 and IPv6 networks: + +```console +$ cat AS* | sed -e '/^$/d' -e '/^#/d' -e '/^{/d' -e 's/deny //' -e 's/;//' | grep ":" | sort > /tmp/ipv6-networks.txt +$ cat AS* | sed -e '/^$/d' -e '/^#/d' -e '/^{/d' -e 's/deny //' -e 's/;//' | grep -v ":" | sort > /tmp/ipv4-networks.txt +``` + +- I suspect we need to aggregate these networks since they are so many and nftables doesn't like it when they overlap: + +```console +$ wc -l /tmp/ipv4-networks.txt +15464 /tmp/ipv4-networks.txt +$ aggregate6 /tmp/ipv4-networks.txt | wc -l +2781 +$ wc -l /tmp/ipv6-networks.txt +4833 /tmp/ipv6-networks.txt +$ aggregate6 /tmp/ipv6-networks.txt | wc -l +338 +``` + +- I deployed these lists on CGSpace, ran all updates, and rebooted the server + - This list is SURELY too broad because we will block legitimate users in China... but right now how can I discern? + - Also, I need to purge the hits from these 14,000 IPs in Solr when I get time + diff --git a/docs/categories/index.html b/docs/categories/index.html index efcd9efba..cd28ecdec 100644 --- a/docs/categories/index.html +++ b/docs/categories/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html index df8443b2b..70aad965c 100644 --- a/docs/categories/notes/index.html +++ b/docs/categories/notes/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html index e1b4ee146..96c24db5a 100644 --- a/docs/categories/notes/page/2/index.html +++ b/docs/categories/notes/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html index 98a50eaca..ed5f7cd9c 100644 --- a/docs/categories/notes/page/3/index.html +++ b/docs/categories/notes/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html index fe8445825..3df912579 100644 --- a/docs/categories/notes/page/4/index.html +++ b/docs/categories/notes/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html index fe293b67e..c2bfddd18 100644 --- a/docs/categories/notes/page/5/index.html +++ b/docs/categories/notes/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/6/index.html b/docs/categories/notes/page/6/index.html index 5b57afe9a..12eb35e61 100644 --- a/docs/categories/notes/page/6/index.html +++ b/docs/categories/notes/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/index.html b/docs/index.html index df8f4712f..0557d1c84 100644 --- a/docs/index.html +++ b/docs/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/2/index.html b/docs/page/2/index.html index e7c07e40b..269f98cd7 100644 --- a/docs/page/2/index.html +++ b/docs/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/3/index.html b/docs/page/3/index.html index 88ee58052..501f03f50 100644 --- a/docs/page/3/index.html +++ b/docs/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/4/index.html b/docs/page/4/index.html index 06d2efacc..5480b1326 100644 --- a/docs/page/4/index.html +++ b/docs/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/5/index.html b/docs/page/5/index.html index e4ae38bc4..5e80c6830 100644 --- a/docs/page/5/index.html +++ b/docs/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/6/index.html b/docs/page/6/index.html index e243e5984..d90e3c5d2 100644 --- a/docs/page/6/index.html +++ b/docs/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/7/index.html b/docs/page/7/index.html index afac6a279..3851243b7 100644 --- a/docs/page/7/index.html +++ b/docs/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/8/index.html b/docs/page/8/index.html index c7bd82c41..528ce76db 100644 --- a/docs/page/8/index.html +++ b/docs/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/9/index.html b/docs/page/9/index.html index 372c2ece4..dd35a37f7 100644 --- a/docs/page/9/index.html +++ b/docs/page/9/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/index.html b/docs/posts/index.html index b3ea26d9d..b973521be 100644 --- a/docs/posts/index.html +++ b/docs/posts/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html index 4347feb4e..cde889089 100644 --- a/docs/posts/page/2/index.html +++ b/docs/posts/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html index 34e979e65..981ae405b 100644 --- a/docs/posts/page/3/index.html +++ b/docs/posts/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html index d0b152f7b..0daac6eb4 100644 --- a/docs/posts/page/4/index.html +++ b/docs/posts/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html index 35dfd7d43..b00a39e51 100644 --- a/docs/posts/page/5/index.html +++ b/docs/posts/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html index 723080c6e..5c8c57e95 100644 --- a/docs/posts/page/6/index.html +++ b/docs/posts/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html index 3c67f66e2..d47f3132a 100644 --- a/docs/posts/page/7/index.html +++ b/docs/posts/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/8/index.html b/docs/posts/page/8/index.html index 8c646fa95..d295db04b 100644 --- a/docs/posts/page/8/index.html +++ b/docs/posts/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/9/index.html b/docs/posts/page/9/index.html index 7aa0ff354..0fd50414d 100644 --- a/docs/posts/page/9/index.html +++ b/docs/posts/page/9/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 7baed02e6..6b9a5c4ff 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -3,22 +3,22 @@ xmlns:xhtml="http://www.w3.org/1999/xhtml"> https://alanorth.github.io/cgspace-notes/categories/ - 2022-04-13T16:52:34+03:00 + 2022-04-16T22:41:45+03:00 https://alanorth.github.io/cgspace-notes/ - 2022-04-13T16:52:34+03:00 + 2022-04-16T22:41:45+03:00 https://alanorth.github.io/cgspace-notes/2022-03/ 2022-04-04T19:15:58+03:00 https://alanorth.github.io/cgspace-notes/categories/notes/ - 2022-04-13T16:52:34+03:00 + 2022-04-16T22:41:45+03:00 https://alanorth.github.io/cgspace-notes/posts/ - 2022-04-13T16:52:34+03:00 + 2022-04-16T22:41:45+03:00 https://alanorth.github.io/cgspace-notes/2022-03/ - 2022-04-13T16:52:34+03:00 + 2022-04-16T22:41:45+03:00 https://alanorth.github.io/cgspace-notes/2022-02/ 2022-03-01T17:17:27+03:00