mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2022-07-04
This commit is contained in:
@ -32,4 +32,53 @@ Time: 399.751 ms
|
||||
|
||||
- Start a harvest on AReS
|
||||
|
||||
## 2022-07-04
|
||||
|
||||
- Linode told me that CGSpace had high load yesterday
|
||||
- I also got some up and down notices from UptimeRobot
|
||||
- Looking now, I see there was a very high CPU and database pool load, but a mostly normal DSpace session count
|
||||
|
||||

|
||||

|
||||
|
||||
- Seems we have some old database transactions since 2022-06-27:
|
||||
|
||||

|
||||

|
||||
|
||||
- Looking at the top connections to nginx yesterday:
|
||||
|
||||
```console
|
||||
# awk '{print $1}' /var/log/nginx/{access,library-access,oai,rest}.log.1 | sort | uniq -c | sort -h | tail
|
||||
1132 64.124.8.34
|
||||
1146 2a01:4f8:1c17:5550::1
|
||||
1380 137.184.159.211
|
||||
1533 64.124.8.59
|
||||
4013 80.248.237.167
|
||||
4776 54.195.118.125
|
||||
10482 45.5.186.2
|
||||
11177 172.104.229.92
|
||||
15855 2a01:7e00::f03c:91ff:fe9a:3a37
|
||||
22179 64.39.98.251
|
||||
```
|
||||
|
||||
- And the total number of unique IPs:
|
||||
|
||||
```console
|
||||
# awk '{print $1}' /var/log/nginx/{access,library-access,oai,rest}.log.1 | sort -u | wc -l
|
||||
6952
|
||||
```
|
||||
|
||||
- This seems low, so it must have been from the request patterns by certain visitors
|
||||
- 64.39.98.251 is Qualys, and I'm debating blocking [all their IPs](https://pci.qualys.com/static/help/merchant/getting_started/check_scanner_ip_addresses.htm) using a geo block in nginx (need to test)
|
||||
- The top few are known ILRI and other CGIAR scrapers, but 80.248.237.167 is on InternetVikings in Sweden, using a normal user agentand scraping Discover
|
||||
- 64.124.8.59 is making requests with a normal user agent and belongs to Castle Global or Zayo
|
||||
- I ran all system updates and rebooted the server (could have just restarted PostgreSQL but I thought I might as well do everything)
|
||||
- I implemented a geo mapping for the user agent mapping AND the nginx `limit_req_zone` by extracting the networks into an external file and including it in two different geo mapping blocks
|
||||
- This is clever and relies on the fact that we can use defaults in both cases
|
||||
- First, we map the user agent of requests from these networks to "bot" so that Tomcat and Solr handle them accordingly
|
||||
- Second, we use this as a key in a `limit_req_zone`, which relies on a default mapping of '' (and nginx doesn't evaluate empty cache keys)
|
||||
- I noticed that CIP uploaded a number of Georgian presentations with `dcterms.language` set to English and Other so I changed them to "ka"
|
||||
- Perhaps we need to update our list of languages to include all instead of the most common ones
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user