mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2021-12-29
This commit is contained in:
@ -289,4 +289,108 @@ $ dspace user -a -m rafael-approve@cgiar.org -g Rafael -s Rodriguez -p 'fuuuuuu'
|
||||
|
||||
- Start a fresh harvest on AReS
|
||||
|
||||
## 2021-12-29
|
||||
|
||||
- Looking at the top IPs and user agents on CGSpace's Solr statistics I see a strange user agent:
|
||||
|
||||
```console
|
||||
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.{random.randint(0, 9999)} Safari/537.{random.randint(0, 99)}
|
||||
```
|
||||
|
||||
- I found two IPs using user agents with the "randint" bug:
|
||||
- 47.252.80.214 (AliCloud in the US)
|
||||
- 61.143.40.50 (ChinaNet in China)
|
||||
- I wonder what other requests have been made from those hosts where the randint spoofer was working... ugh.
|
||||
- I found some IPs from the Russian SELECTEL network making thousands of requests with SQL injection attempts...
|
||||
- 45.134.26.171
|
||||
- 45.146.166.173
|
||||
- 3.225.28.105 is on Amazon and making thousands of requests for the same URL:
|
||||
|
||||
```console
|
||||
/rest/collections/1118/items?expand=all&limit=1
|
||||
```
|
||||
|
||||
- Most of the time it has a real-looking user agent, but sometimes it uses `Apache-HttpClient/4.3.4 (java 1.5)`
|
||||
- Another 82.65.26.228 is doing SQL injection attempts from France
|
||||
- 216.213.28.138 is some scrape-as-a-service bot from Sprious
|
||||
- I used my `resolve-addresses-geoip2.py` script to get the ASNs for all the IPs in Solr stats this month, then extracted the ASNs that were responsible for more than one IP:
|
||||
|
||||
```console
|
||||
$ ./ilri/resolve-addresses-geoip2.py -i /tmp/ips.txt -o /tmp/2021-12-29-ips.csv
|
||||
$ csvcut -c asn /tmp/2021-12-29-ips.csv | sed 1d | sort | uniq -c | sort -h | awk '$1 > 1'
|
||||
2 10620
|
||||
2 265696
|
||||
2 6147
|
||||
2 9299
|
||||
3 3269
|
||||
5 16509
|
||||
5 49505
|
||||
9 24757
|
||||
9 24940
|
||||
9 64267
|
||||
```
|
||||
|
||||
- AS 64267 is Sprious, and it has used these IPs this month:
|
||||
- 216.213.28.136
|
||||
- 207.182.27.191
|
||||
- 216.41.235.187
|
||||
- 216.41.232.169
|
||||
- 216.41.235.186
|
||||
- 52.124.19.190
|
||||
- 216.213.28.138
|
||||
- 216.41.234.163
|
||||
- To be honest I want to ban all their networks but I'm afraid it's too many IPs... hmmm
|
||||
- AS 24940 is Hetzner, but I don't feel like going through all the IPs to see... they always pretend to be normal users and make semi-sane requests so it might be a proxy or something
|
||||
- AS 24757 is Ethiopian Telecom
|
||||
- I'm going to purge all these for sure, as they are a scraping-as-a-service company and don't use proper user agents or request robots.txt
|
||||
- AS 49505 is the Russian Selectel, and it has used these IPs this month:
|
||||
- 45.146.166.173
|
||||
- 45.134.26.171
|
||||
- 45.146.164.123
|
||||
- 45.155.205.231
|
||||
- 195.54.167.122
|
||||
- I will purge them all too because they are up to no good, as I already saw earlier today (SQL injections)
|
||||
- AS 16509 is Amazon, and it has used these IPs this month:
|
||||
- 18.135.23.223 (made requests using the `Mozilla/5.0 (compatible; U; Koha checkurl)` user agent, so I will purge it and add it to our DSpace user agent override and [submit to COUNTER-Robots](https://github.com/atmire/COUNTER-Robots/pull/51))
|
||||
- 54.76.137.83 (made hundreds of requests to "/" with a normal user agent)
|
||||
- 34.253.119.85 (made hundreds of requests to "/" with a normal user agent)
|
||||
- 34.216.201.131 (made hundreds of requests to "/" with a normal user agent)
|
||||
- 54.203.193.46 (made hundreds of requests to "/" with a normal user agent)
|
||||
- I ran the script to purge spider agents with the latest updates:
|
||||
|
||||
```console
|
||||
$ ./ilri/check-spider-hits.sh -f dspace/config/spiders/agents/ilri -p
|
||||
Purging 2530 hits from HeadlessChrome in statistics
|
||||
Purging 10676 hits from randint in statistics
|
||||
Purging 3579 hits from Koha in statistics
|
||||
|
||||
Total number of bot hits purged: 16785
|
||||
```
|
||||
|
||||
- Then the IPs:
|
||||
|
||||
```console
|
||||
$ ./ilri/check-spider-ip-hits.sh -f /tmp/ips-to-purge.txt -p
|
||||
Purging 1190 hits from 216.213.28.136 in statistics
|
||||
Purging 1128 hits from 207.182.27.191 in statistics
|
||||
Purging 1095 hits from 216.41.235.187 in statistics
|
||||
Purging 1087 hits from 216.41.232.169 in statistics
|
||||
Purging 1011 hits from 216.41.235.186 in statistics
|
||||
Purging 945 hits from 52.124.19.190 in statistics
|
||||
Purging 933 hits from 216.213.28.138 in statistics
|
||||
Purging 930 hits from 216.41.234.163 in statistics
|
||||
Purging 4410 hits from 45.146.166.173 in statistics
|
||||
Purging 2688 hits from 45.134.26.171 in statistics
|
||||
Purging 1130 hits from 45.146.164.123 in statistics
|
||||
Purging 536 hits from 45.155.205.231 in statistics
|
||||
Purging 10676 hits from 195.54.167.122 in statistics
|
||||
Purging 1350 hits from 54.76.137.83 in statistics
|
||||
Purging 1240 hits from 34.253.119.85 in statistics
|
||||
Purging 2879 hits from 34.216.201.131 in statistics
|
||||
Purging 2909 hits from 54.203.193.46 in statistics
|
||||
Purging 1822 hits from 2605\:b100\:316\:7f74\:8d67\:5860\:a9f3\:d87c in statistics
|
||||
|
||||
Total number of bot hits purged: 37959
|
||||
```
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user