--- title: "November, 2024" date: 2024-11-11T09:47:00+03:00 author: "Alan Orth" categories: ["Notes"] --- ## 2024-11-11 - Some IP in India is making tons of requests this morning with a normal user agent: ```console # awk '{print $1}' /var/log/nginx/api-access.log | sort | uniq -c | sort -h | tail -n 40 ... 513743 49.207.196.249 ``` - They are using this user agent: ``` Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.3 ``` ## 2024-11-16 - I switched CGSpace to Node.js v20 since I've been using it in dev and test for months ## 2024-11-18 - I see a bot (188.34.177.10) on Hetzner has made 35,000 requests this morning and is pretending to be Googlebot, GoogleOther, etc - Google publishes their range of IPs also: https://developers.google.com/search/docs/crawling-indexing/verifying-googlebot - Our nginx config doesn't rate limit the API but perhaps that needs to change... - In DSpace 4/5/6 the API was separate from the user interface so we didn't need to enforce rate limits there because we encouraged using that over scraping the UI - In DSpace 7 the API is used by the frontend and perhaps should have the same IP- and UA-based rate limiting ## 2024-11-19 - I notice 10,000 requests by a new bot yesterday: ``` 20.38.174.208 - - [18/Nov/2024:07:02:50 +0100] "GET /server/oai/request?verb=ListRecords&resumptionToken=oai_dc%2F2024-10-18T13%3A00%3A49Z%2F%2F%2F400 HTTP/1.1" 503 190 "-" "Laminas_Http_Client" ``` - Seems to be some kind of PHP framework library - Yesterday one IP in Argentina made nearly 1,000,000 requests using a normal user agent: 181.4.143.40 - 188.34.177.10 ended up making 700,000 requests using various Googlebot, GoogleOther, and even normal Chrome user agents