mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-22 06:35:03 +01:00
51 lines
1.8 KiB
Markdown
51 lines
1.8 KiB
Markdown
|
---
|
||
|
title: "November, 2024"
|
||
|
date: 2024-11-11T09:47:00+03:00
|
||
|
author: "Alan Orth"
|
||
|
categories: ["Notes"]
|
||
|
---
|
||
|
|
||
|
## 2024-11-11
|
||
|
|
||
|
- Some IP in India is making tons of requests this morning with a normal user agent:
|
||
|
|
||
|
```console
|
||
|
# awk '{print $1}' /var/log/nginx/api-access.log | sort | uniq -c | sort -h | tail -n 40
|
||
|
...
|
||
|
513743 49.207.196.249
|
||
|
```
|
||
|
|
||
|
<!--more-->
|
||
|
|
||
|
- They are using this user agent:
|
||
|
|
||
|
```
|
||
|
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.3
|
||
|
```
|
||
|
|
||
|
## 2024-11-16
|
||
|
|
||
|
- I switched CGSpace to Node.js v20 since I've been using it in dev and test for months
|
||
|
|
||
|
## 2024-11-18
|
||
|
|
||
|
- I see a bot (188.34.177.10) on Hetzner has made 35,000 requests this morning and is pretending to be Googlebot, GoogleOther, etc
|
||
|
- Google publishes their range of IPs also: https://developers.google.com/search/docs/crawling-indexing/verifying-googlebot
|
||
|
- Our nginx config doesn't rate limit the API but perhaps that needs to change...
|
||
|
- In DSpace 4/5/6 the API was separate from the user interface so we didn't need to enforce rate limits there because we encouraged using that over scraping the UI
|
||
|
- In DSpace 7 the API is used by the frontend and perhaps should have the same IP- and UA-based rate limiting
|
||
|
|
||
|
## 2024-11-19
|
||
|
|
||
|
- I notice 10,000 requests by a new bot yesterday:
|
||
|
|
||
|
```
|
||
|
20.38.174.208 - - [18/Nov/2024:07:02:50 +0100] "GET /server/oai/request?verb=ListRecords&resumptionToken=oai_dc%2F2024-10-18T13%3A00%3A49Z%2F%2F%2F400 HTTP/1.1" 503 190 "-" "Laminas_Http_Client"
|
||
|
```
|
||
|
|
||
|
- Seems to be some kind of PHP framework library
|
||
|
- Yesterday one IP in Argentina made nearly 1,000,000 requests using a normal user agent: 181.4.143.40
|
||
|
- 188.34.177.10 ended up making 700,000 requests using various Googlebot, GoogleOther, and even normal Chrome user agents
|
||
|
|
||
|
<!-- vim: set sw=2 ts=2: -->
|