mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-16 11:57:03 +01:00
Add notes for 2022-03-22
This commit is contained in:
parent
dcd2a9b7e5
commit
9fc0935448
@ -146,5 +146,47 @@ $ csvjoin -c id /tmp/2022-03-22-tac-duplicates.csv /tmp/tac-filenames.csv > /tmp
|
|||||||
```
|
```
|
||||||
|
|
||||||
- I sent the resulting 76 items to Gaia to check
|
- I sent the resulting 76 items to Gaia to check
|
||||||
|
- UptimeRobot said that CGSpace was down
|
||||||
|
- I looked and found many locks belonging to the REST API application:
|
||||||
|
|
||||||
|
```console
|
||||||
|
$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | grep -o -E '(dspaceWeb|dspaceApi)' | sort | uniq -c | sort -n
|
||||||
|
301 dspaceWeb
|
||||||
|
2390 dspaceApi
|
||||||
|
```
|
||||||
|
|
||||||
|
- Looking at nginx's logs, I found the top addresses making requests today:
|
||||||
|
|
||||||
|
```console
|
||||||
|
# awk '{print $1}' /var/log/nginx/rest.log | sort | uniq -c | sort -h
|
||||||
|
1977 45.5.184.2
|
||||||
|
3167 70.32.90.172
|
||||||
|
4754 54.195.118.125
|
||||||
|
5411 205.186.128.185
|
||||||
|
6826 137.184.159.211
|
||||||
|
```
|
||||||
|
|
||||||
|
- 137.184.159.211 is on DigitalOcean using this user agent: `GuzzleHttp/6.3.3 curl/7.81.0 PHP/7.4.28`
|
||||||
|
- I blocked this IP in nginx and the load went down immediately
|
||||||
|
- 205.186.128.185 is on Media Temple, but it's OK because it's the CCAFS publications importer bot
|
||||||
|
- 54.195.118.125 is on Amazon, but is also a CCAFS publications importer bot apparently (perhaps a test server)
|
||||||
|
- 70.32.90.172 is on Media Temple and has no user agent
|
||||||
|
- What is surprising to me is that we already have an nginx rule to return HTTP 403 for requests without a user agent
|
||||||
|
- I verified it works as expected with an empty user agent:
|
||||||
|
|
||||||
|
```console
|
||||||
|
$ curl -H User-Agent:'' 'https://dspacetest.cgiar.org/rest/handle/10568/34799?expand=all'
|
||||||
|
Due to abuse we no longer permit requests without a user agent. Please specify a descriptive user agent, for example containing the word 'bot', if you are accessing the site programmatically. For more information see here: https://dspacetest.cgiar.org/page/about.
|
||||||
|
```
|
||||||
|
|
||||||
|
- I note that the nginx log shows '-' for a request with an empty user agent, which would be indistinguishable from a request with a '-', for example these were successful:
|
||||||
|
|
||||||
|
```console
|
||||||
|
70.32.90.172 - - [22/Mar/2022:11:59:10 +0100] "GET /rest/handle/10568/34374?expand=all HTTP/1.0" 200 10671 "-" "-"
|
||||||
|
70.32.90.172 - - [22/Mar/2022:11:59:14 +0100] "GET /rest/handle/10568/34795?expand=all HTTP/1.0" 200 11394 "-" "-"
|
||||||
|
```
|
||||||
|
|
||||||
|
- I can only assume that these requests used a literal '-' so I will have to add an nginx rule to block those too
|
||||||
|
- Otherwise, I see from my notes that 70.32.90.172 is the wle.cgiar.org REST API harvester... I should ask Macaroni Bros about that
|
||||||
|
|
||||||
<!-- vim: set sw=2 ts=2: -->
|
<!-- vim: set sw=2 ts=2: -->
|
||||||
|
Loading…
Reference in New Issue
Block a user