From 4d35572e9299d78385cc5951e4b3a296799d8a65 Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Wed, 29 Dec 2021 16:29:37 +0200 Subject: [PATCH] Add notes for 2021-12-29 --- content/posts/2021-12.md | 104 +++++++++++++++++++++ docs/2021-12/index.html | 116 +++++++++++++++++++++++- docs/categories/index.html | 2 +- docs/categories/notes/index.html | 2 +- docs/categories/notes/page/2/index.html | 2 +- docs/categories/notes/page/3/index.html | 2 +- docs/categories/notes/page/4/index.html | 2 +- docs/categories/notes/page/5/index.html | 2 +- docs/categories/notes/page/6/index.html | 2 +- docs/index.html | 2 +- docs/page/2/index.html | 2 +- docs/page/3/index.html | 2 +- docs/page/4/index.html | 2 +- docs/page/5/index.html | 2 +- docs/page/6/index.html | 2 +- docs/page/7/index.html | 2 +- docs/page/8/index.html | 2 +- docs/posts/index.html | 2 +- docs/posts/page/2/index.html | 2 +- docs/posts/page/3/index.html | 2 +- docs/posts/page/4/index.html | 2 +- docs/posts/page/5/index.html | 2 +- docs/posts/page/6/index.html | 2 +- docs/posts/page/7/index.html | 2 +- docs/posts/page/8/index.html | 2 +- docs/sitemap.xml | 10 +- 26 files changed, 244 insertions(+), 32 deletions(-) diff --git a/content/posts/2021-12.md b/content/posts/2021-12.md index 51fea66f1..2ac6878df 100644 --- a/content/posts/2021-12.md +++ b/content/posts/2021-12.md @@ -289,4 +289,108 @@ $ dspace user -a -m rafael-approve@cgiar.org -g Rafael -s Rodriguez -p 'fuuuuuu' - Start a fresh harvest on AReS +## 2021-12-29 + +- Looking at the top IPs and user agents on CGSpace's Solr statistics I see a strange user agent: + +```console +Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.{random.randint(0, 9999)} Safari/537.{random.randint(0, 99)} +``` + +- I found two IPs using user agents with the "randint" bug: + - 47.252.80.214 (AliCloud in the US) + - 61.143.40.50 (ChinaNet in China) +- I wonder what other requests have been made from those hosts where the randint spoofer was working... ugh. +- I found some IPs from the Russian SELECTEL network making thousands of requests with SQL injection attempts... + - 45.134.26.171 + - 45.146.166.173 +- 3.225.28.105 is on Amazon and making thousands of requests for the same URL: + +```console +/rest/collections/1118/items?expand=all&limit=1 +``` + +- Most of the time it has a real-looking user agent, but sometimes it uses `Apache-HttpClient/4.3.4 (java 1.5)` +- Another 82.65.26.228 is doing SQL injection attempts from France +- 216.213.28.138 is some scrape-as-a-service bot from Sprious +- I used my `resolve-addresses-geoip2.py` script to get the ASNs for all the IPs in Solr stats this month, then extracted the ASNs that were responsible for more than one IP: + +```console +$ ./ilri/resolve-addresses-geoip2.py -i /tmp/ips.txt -o /tmp/2021-12-29-ips.csv +$ csvcut -c asn /tmp/2021-12-29-ips.csv | sed 1d | sort | uniq -c | sort -h | awk '$1 > 1' + 2 10620 + 2 265696 + 2 6147 + 2 9299 + 3 3269 + 5 16509 + 5 49505 + 9 24757 + 9 24940 + 9 64267 +``` + +- AS 64267 is Sprious, and it has used these IPs this month: + - 216.213.28.136 + - 207.182.27.191 + - 216.41.235.187 + - 216.41.232.169 + - 216.41.235.186 + - 52.124.19.190 + - 216.213.28.138 + - 216.41.234.163 +- To be honest I want to ban all their networks but I'm afraid it's too many IPs... hmmm +- AS 24940 is Hetzner, but I don't feel like going through all the IPs to see... they always pretend to be normal users and make semi-sane requests so it might be a proxy or something +- AS 24757 is Ethiopian Telecom +- I'm going to purge all these for sure, as they are a scraping-as-a-service company and don't use proper user agents or request robots.txt +- AS 49505 is the Russian Selectel, and it has used these IPs this month: + - 45.146.166.173 + - 45.134.26.171 + - 45.146.164.123 + - 45.155.205.231 + - 195.54.167.122 +- I will purge them all too because they are up to no good, as I already saw earlier today (SQL injections) +- AS 16509 is Amazon, and it has used these IPs this month: + - 18.135.23.223 (made requests using the `Mozilla/5.0 (compatible; U; Koha checkurl)` user agent, so I will purge it and add it to our DSpace user agent override and [submit to COUNTER-Robots](https://github.com/atmire/COUNTER-Robots/pull/51)) + - 54.76.137.83 (made hundreds of requests to "/" with a normal user agent) + - 34.253.119.85 (made hundreds of requests to "/" with a normal user agent) + - 34.216.201.131 (made hundreds of requests to "/" with a normal user agent) + - 54.203.193.46 (made hundreds of requests to "/" with a normal user agent) +- I ran the script to purge spider agents with the latest updates: + +```console +$ ./ilri/check-spider-hits.sh -f dspace/config/spiders/agents/ilri -p +Purging 2530 hits from HeadlessChrome in statistics +Purging 10676 hits from randint in statistics +Purging 3579 hits from Koha in statistics + +Total number of bot hits purged: 16785 +``` + +- Then the IPs: + +```console +$ ./ilri/check-spider-ip-hits.sh -f /tmp/ips-to-purge.txt -p +Purging 1190 hits from 216.213.28.136 in statistics +Purging 1128 hits from 207.182.27.191 in statistics +Purging 1095 hits from 216.41.235.187 in statistics +Purging 1087 hits from 216.41.232.169 in statistics +Purging 1011 hits from 216.41.235.186 in statistics +Purging 945 hits from 52.124.19.190 in statistics +Purging 933 hits from 216.213.28.138 in statistics +Purging 930 hits from 216.41.234.163 in statistics +Purging 4410 hits from 45.146.166.173 in statistics +Purging 2688 hits from 45.134.26.171 in statistics +Purging 1130 hits from 45.146.164.123 in statistics +Purging 536 hits from 45.155.205.231 in statistics +Purging 10676 hits from 195.54.167.122 in statistics +Purging 1350 hits from 54.76.137.83 in statistics +Purging 1240 hits from 34.253.119.85 in statistics +Purging 2879 hits from 34.216.201.131 in statistics +Purging 2909 hits from 54.203.193.46 in statistics +Purging 1822 hits from 2605\:b100\:316\:7f74\:8d67\:5860\:a9f3\:d87c in statistics + +Total number of bot hits purged: 37959 +``` + diff --git a/docs/2021-12/index.html b/docs/2021-12/index.html index acf23bf21..28096f857 100644 --- a/docs/2021-12/index.html +++ b/docs/2021-12/index.html @@ -22,7 +22,7 @@ Total number of bot hits purged: 3679 - + @@ -50,9 +50,9 @@ Total number of bot hits purged: 3679 "@type": "BlogPosting", "headline": "December, 2021", "url": "https://alanorth.github.io/cgspace-notes/2021-12/", - "wordCount": "2055", + "wordCount": "2686", "datePublished": "2021-12-01T16:07:07+02:00", - "dateModified": "2021-12-19T22:03:42+02:00", + "dateModified": "2021-12-28T13:24:23+02:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -448,7 +448,115 @@ BUG_REPORT_URL="https://bugs.debian.org/" - +

2021-12-29

+ +
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.{random.randint(0, 9999)} Safari/537.{random.randint(0, 99)}
+
+
/rest/collections/1118/items?expand=all&limit=1
+
+
$ ./ilri/resolve-addresses-geoip2.py -i /tmp/ips.txt -o /tmp/2021-12-29-ips.csv
+$ csvcut -c asn /tmp/2021-12-29-ips.csv | sed 1d | sort | uniq -c | sort -h | awk '$1 > 1'
+      2 10620
+      2 265696
+      2 6147
+      2 9299
+      3 3269
+      5 16509
+      5 49505
+      9 24757
+      9 24940
+      9 64267
+
+
$ ./ilri/check-spider-hits.sh -f dspace/config/spiders/agents/ilri -p
+Purging 2530 hits from HeadlessChrome in statistics
+Purging 10676 hits from randint in statistics
+Purging 3579 hits from Koha in statistics
+
+Total number of bot hits purged: 16785
+
+
$ ./ilri/check-spider-ip-hits.sh -f /tmp/ips-to-purge.txt -p
+Purging 1190 hits from 216.213.28.136 in statistics
+Purging 1128 hits from 207.182.27.191 in statistics
+Purging 1095 hits from 216.41.235.187 in statistics
+Purging 1087 hits from 216.41.232.169 in statistics
+Purging 1011 hits from 216.41.235.186 in statistics
+Purging 945 hits from 52.124.19.190 in statistics
+Purging 933 hits from 216.213.28.138 in statistics
+Purging 930 hits from 216.41.234.163 in statistics
+Purging 4410 hits from 45.146.166.173 in statistics
+Purging 2688 hits from 45.134.26.171 in statistics
+Purging 1130 hits from 45.146.164.123 in statistics
+Purging 536 hits from 45.155.205.231 in statistics
+Purging 10676 hits from 195.54.167.122 in statistics
+Purging 1350 hits from 54.76.137.83 in statistics
+Purging 1240 hits from 34.253.119.85 in statistics
+Purging 2879 hits from 34.216.201.131 in statistics
+Purging 2909 hits from 54.203.193.46 in statistics
+Purging 1822 hits from 2605\:b100\:316\:7f74\:8d67\:5860\:a9f3\:d87c in statistics
+
+Total number of bot hits purged: 37959
+
diff --git a/docs/categories/index.html b/docs/categories/index.html index fe980c438..5376be84b 100644 --- a/docs/categories/index.html +++ b/docs/categories/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html index c8b6bea68..d146d388a 100644 --- a/docs/categories/notes/index.html +++ b/docs/categories/notes/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html index 8db30fd62..e70301df3 100644 --- a/docs/categories/notes/page/2/index.html +++ b/docs/categories/notes/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html index 7a8a70af9..f4b283136 100644 --- a/docs/categories/notes/page/3/index.html +++ b/docs/categories/notes/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html index a96ef5978..fdce1a95f 100644 --- a/docs/categories/notes/page/4/index.html +++ b/docs/categories/notes/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html index 6ce2f51c5..2eb01b6d2 100644 --- a/docs/categories/notes/page/5/index.html +++ b/docs/categories/notes/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/6/index.html b/docs/categories/notes/page/6/index.html index 6c837cdbe..93f8c4cea 100644 --- a/docs/categories/notes/page/6/index.html +++ b/docs/categories/notes/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/index.html b/docs/index.html index 8b05d603d..c77723e00 100644 --- a/docs/index.html +++ b/docs/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/2/index.html b/docs/page/2/index.html index f25d2ee2b..721eb3396 100644 --- a/docs/page/2/index.html +++ b/docs/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/3/index.html b/docs/page/3/index.html index 6c88498db..7c052928d 100644 --- a/docs/page/3/index.html +++ b/docs/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/4/index.html b/docs/page/4/index.html index d61d0df28..2af87084d 100644 --- a/docs/page/4/index.html +++ b/docs/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/5/index.html b/docs/page/5/index.html index 14e1a99d8..b3a349fd5 100644 --- a/docs/page/5/index.html +++ b/docs/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/6/index.html b/docs/page/6/index.html index 4d6d19ac9..446a9be36 100644 --- a/docs/page/6/index.html +++ b/docs/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/7/index.html b/docs/page/7/index.html index 7a996fcf3..30796e5ed 100644 --- a/docs/page/7/index.html +++ b/docs/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/8/index.html b/docs/page/8/index.html index 7d8eeacf7..43b8f9d8c 100644 --- a/docs/page/8/index.html +++ b/docs/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/index.html b/docs/posts/index.html index 5011a3b1b..33eb24128 100644 --- a/docs/posts/index.html +++ b/docs/posts/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html index 7c65a7ec2..237f25c7f 100644 --- a/docs/posts/page/2/index.html +++ b/docs/posts/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html index b1572d1f7..f00297b4a 100644 --- a/docs/posts/page/3/index.html +++ b/docs/posts/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html index 43d71cf3c..621900d03 100644 --- a/docs/posts/page/4/index.html +++ b/docs/posts/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html index 85ef6a473..623b7c3f7 100644 --- a/docs/posts/page/5/index.html +++ b/docs/posts/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html index 27c22f00f..4a0b6f561 100644 --- a/docs/posts/page/6/index.html +++ b/docs/posts/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html index 995f85b66..278b16ec4 100644 --- a/docs/posts/page/7/index.html +++ b/docs/posts/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/8/index.html b/docs/posts/page/8/index.html index ed601691f..d38d98da3 100644 --- a/docs/posts/page/8/index.html +++ b/docs/posts/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index e5265fe28..ca27ace5d 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -3,19 +3,19 @@ xmlns:xhtml="http://www.w3.org/1999/xhtml"> https://alanorth.github.io/cgspace-notes/categories/ - 2021-12-19T22:03:42+02:00 + 2021-12-28T13:24:23+02:00 https://alanorth.github.io/cgspace-notes/ - 2021-12-19T22:03:42+02:00 + 2021-12-28T13:24:23+02:00 https://alanorth.github.io/cgspace-notes/2021-12/ - 2021-12-19T22:03:42+02:00 + 2021-12-28T13:24:23+02:00 https://alanorth.github.io/cgspace-notes/categories/notes/ - 2021-12-19T22:03:42+02:00 + 2021-12-28T13:24:23+02:00 https://alanorth.github.io/cgspace-notes/posts/ - 2021-12-19T22:03:42+02:00 + 2021-12-28T13:24:23+02:00 https://alanorth.github.io/cgspace-notes/2021-11/ 2021-11-30T16:44:30+02:00