diff --git a/content/posts/2021-07.md b/content/posts/2021-07.md index ba9d4bbcc..4f2c2e33c 100644 --- a/content/posts/2021-07.md +++ b/content/posts/2021-07.md @@ -1,6 +1,6 @@ --- title: "July, 2021" -date: 2021-06-01T08:53:07+03:00 +date: 2021-07-01T08:53:07+03:00 author: "Alan Orth" categories: ["Notes"] --- @@ -446,5 +446,24 @@ $ wc -l /tmp/all-ips-to-block.txt - Then I added them to the normal ipset we are already using with firewalld - I will check again in a few hours and ban more +- I decided to extract the networks from the GeoIP database with `resolve-addresses-geoip2.py` so I can block them more efficiently than using the 5,000 IPs in an ipset: + +```console +$ csvgrep -c asn -r '^(206485|35624|36352|46844|49453|62282)$' /tmp/all-ips-out.csv | csvcut -c network | sed 1d | sort | uniq > /tmp/all-networks-to-block.txt +$ grep deny roles/dspace/templates/nginx/abusive-networks.conf.j2 | sort | uniq | wc -l +2354 +``` + +- Combined with the previous networks this brings about 200 more for a total of 2,354 networks + - I think I need to re-work the ipset stuff in my common Ansible role so that I can add such abusive networks as an iptables ipset / nftables set, and have a cron job to update them daily (from [Spamhaus's DROP and EDROP lists](https://www.spamhaus.org/drop/), for example +- Then I got a list of all the 5,095 IPs from above and used `check-spider-ip-hits.sh` to purge them from Solr: + +```console +$ ilri/check-spider-ip-hits.sh -f /tmp/all-ips-to-block.txt -p +... +Total number of bot hits purged: 197116 +``` + +- I started a harvest on AReS and it finished in a few hours now that the load on CGSpace is back to a normal level