1
0
mirror of https://github.com/alanorth/cgspace-notes.git synced 2024-12-27 15:34:30 +01:00

Add notes for 2022-03-22

This commit is contained in:
Alan Orth 2022-03-22 22:03:45 +03:00
parent dcd2a9b7e5
commit 9fc0935448
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9

View File

@ -146,5 +146,47 @@ $ csvjoin -c id /tmp/2022-03-22-tac-duplicates.csv /tmp/tac-filenames.csv > /tmp
```
- I sent the resulting 76 items to Gaia to check
- UptimeRobot said that CGSpace was down
- I looked and found many locks belonging to the REST API application:
```console
$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | grep -o -E '(dspaceWeb|dspaceApi)' | sort | uniq -c | sort -n
301 dspaceWeb
2390 dspaceApi
```
- Looking at nginx's logs, I found the top addresses making requests today:
```console
# awk '{print $1}' /var/log/nginx/rest.log | sort | uniq -c | sort -h
1977 45.5.184.2
3167 70.32.90.172
4754 54.195.118.125
5411 205.186.128.185
6826 137.184.159.211
```
- 137.184.159.211 is on DigitalOcean using this user agent: `GuzzleHttp/6.3.3 curl/7.81.0 PHP/7.4.28`
- I blocked this IP in nginx and the load went down immediately
- 205.186.128.185 is on Media Temple, but it's OK because it's the CCAFS publications importer bot
- 54.195.118.125 is on Amazon, but is also a CCAFS publications importer bot apparently (perhaps a test server)
- 70.32.90.172 is on Media Temple and has no user agent
- What is surprising to me is that we already have an nginx rule to return HTTP 403 for requests without a user agent
- I verified it works as expected with an empty user agent:
```console
$ curl -H User-Agent:'' 'https://dspacetest.cgiar.org/rest/handle/10568/34799?expand=all'
Due to abuse we no longer permit requests without a user agent. Please specify a descriptive user agent, for example containing the word 'bot', if you are accessing the site programmatically. For more information see here: https://dspacetest.cgiar.org/page/about.
```
- I note that the nginx log shows '-' for a request with an empty user agent, which would be indistinguishable from a request with a '-', for example these were successful:
```console
70.32.90.172 - - [22/Mar/2022:11:59:10 +0100] "GET /rest/handle/10568/34374?expand=all HTTP/1.0" 200 10671 "-" "-"
70.32.90.172 - - [22/Mar/2022:11:59:14 +0100] "GET /rest/handle/10568/34795?expand=all HTTP/1.0" 200 11394 "-" "-"
```
- I can only assume that these requests used a literal '-' so I will have to add an nginx rule to block those too
- Otherwise, I see from my notes that 70.32.90.172 is the wle.cgiar.org REST API harvester... I should ask Macaroni Bros about that
<!-- vim: set sw=2 ts=2: -->