mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes
This commit is contained in:
@ -48,4 +48,35 @@ $ csvcut -c 'id,dc.title[en_US],dcterms.abstract[en_US],cg.identifier.doi[en_US]
|
||||
- The default scrape interval is 60 seconds, so if we scrape it more than that the metrics will be stale
|
||||
- From what I've seen this returns in less than one second so it should be safe to reduce the scrape interval
|
||||
|
||||
## 2024-10-19
|
||||
|
||||
- Heavy load on CGSpace today
|
||||
- There is a noted increase just before 4PM local time
|
||||
- I extracted a list of IPs:
|
||||
|
||||
```console
|
||||
# grep -E '19/Oct/2024:1[567]' /var/log/nginx/api-access.log | awk '{print $1}' | sort -u > /tmp/ips.txt
|
||||
```
|
||||
|
||||
- I looked them up and found some data center IPs that were using normal user agents with hundreds of IPs, for example:
|
||||
- 154.47.29.168 # 212238 (CDNEXT - Datacamp Limited, GB)
|
||||
- 91.210.64.12 # 29802 (HVC-AS, US) - HIVELOCITY, Inc.
|
||||
- 103.221.57.120 # 132817 (DZCRD-AS-AP DZCRD Networks Ltd, BD)
|
||||
- 109.107.150.136 # 201341 (CENTURION-INTERNET-SERVICES - trafficforce, UAB, LT) - Code200
|
||||
- 185.210.207.1 # 209709 (CODE200-ISP1 - UAB code200, LT)
|
||||
- 185.162.119.101 # 207223 (GLOBALCON - Global Connections Network LLC, US)
|
||||
- 173.244.35.101 # 64286 (LOGICWEB, US) - Tesonet
|
||||
- 139.28.160.141 # 396319 (US-INTERNET-396319, US) - OxyLabs
|
||||
- 104.143.89.112 # 62874 (WEB2OBJECTS, US) - Web2Objects LLC
|
||||
- I added some network blocks to the nginx conf
|
||||
- Interestingly, I see so many IPs using the same user agent today:
|
||||
|
||||
```console
|
||||
# grep "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.3" /var/log/nginx/api-access.log | awk '{print $1}' | sort -u | wc -l
|
||||
767
|
||||
```
|
||||
|
||||
- For reference, the current Chrome version is 129 or so...
|
||||
- This is definitely worth looking into because it seems like one massive botnet
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
50
content/posts/2024-11.md
Normal file
50
content/posts/2024-11.md
Normal file
@ -0,0 +1,50 @@
|
||||
---
|
||||
title: "November, 2024"
|
||||
date: 2024-11-11T09:47:00+03:00
|
||||
author: "Alan Orth"
|
||||
categories: ["Notes"]
|
||||
---
|
||||
|
||||
## 2024-11-11
|
||||
|
||||
- Some IP in India is making tons of requests this morning with a normal user agent:
|
||||
|
||||
```console
|
||||
# awk '{print $1}' /var/log/nginx/api-access.log | sort | uniq -c | sort -h | tail -n 40
|
||||
...
|
||||
513743 49.207.196.249
|
||||
```
|
||||
|
||||
<!--more-->
|
||||
|
||||
- They are using this user agent:
|
||||
|
||||
```
|
||||
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.3
|
||||
```
|
||||
|
||||
## 2024-11-16
|
||||
|
||||
- I switched CGSpace to Node.js v20 since I've been using it in dev and test for months
|
||||
|
||||
## 2024-11-18
|
||||
|
||||
- I see a bot (188.34.177.10) on Hetzner has made 35,000 requests this morning and is pretending to be Googlebot, GoogleOther, etc
|
||||
- Google publishes their range of IPs also: https://developers.google.com/search/docs/crawling-indexing/verifying-googlebot
|
||||
- Our nginx config doesn't rate limit the API but perhaps that needs to change...
|
||||
- In DSpace 4/5/6 the API was separate from the user interface so we didn't need to enforce rate limits there because we encouraged using that over scraping the UI
|
||||
- In DSpace 7 the API is used by the frontend and perhaps should have the same IP- and UA-based rate limiting
|
||||
|
||||
## 2024-11-19
|
||||
|
||||
- I notice 10,000 requests by a new bot yesterday:
|
||||
|
||||
```
|
||||
20.38.174.208 - - [18/Nov/2024:07:02:50 +0100] "GET /server/oai/request?verb=ListRecords&resumptionToken=oai_dc%2F2024-10-18T13%3A00%3A49Z%2F%2F%2F400 HTTP/1.1" 503 190 "-" "Laminas_Http_Client"
|
||||
```
|
||||
|
||||
- Seems to be some kind of PHP framework library
|
||||
- Yesterday one IP in Argentina made nearly 1,000,000 requests using a normal user agent: 181.4.143.40
|
||||
- 188.34.177.10 ended up making 700,000 requests using various Googlebot, GoogleOther, and even normal Chrome user agents
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
Reference in New Issue
Block a user