mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2022-07-18
This commit is contained in:
@ -318,5 +318,22 @@ geo $ua {
|
||||
|
||||
- But I can't get it to work, neither for the default value or for matching my IP...
|
||||
- I will have to ask on the nginx mailing list
|
||||
- The total number of requests and unique hosts was not even very high (below here around midnight so is almost all day):
|
||||
|
||||
```console
|
||||
# awk '{print $1}' /var/log/nginx/{access,library-access,oai,rest}.log | sort -u | wc -l
|
||||
2776
|
||||
# awk '{print $1}' /var/log/nginx/{access,library-access,oai,rest}.log | wc -l
|
||||
40325
|
||||
```
|
||||
|
||||
## 2022-07-18
|
||||
|
||||
- Reading more about nginx's geo/map and doing some tests on DSpace Test, it appears that the [geo module cannot do dynamic values](https://stackoverflow.com/questions/47011497/nginx-geo-module-wont-use-variables)
|
||||
- So this issue with the literal `$http_user_agent` is due to the geo block I put in place earlier this month
|
||||
- I reworked the logic so that the geo block sets "bot" or and empty string when a network matches or not, and then re-use that value in a mapping that passes through the host's user agent in case geo has set it to an empty string
|
||||
- This allows me to accomplish the original goal while still only using one bot-networks.conf file for the `limit_req_zone` and the user agent mapping that we pass to Tomcat
|
||||
- Unfortunately this means I will have hundreds of thousands of requests in Solr with a literal `$http_user_agent`
|
||||
- I might try to purge some by enumerating all the networks in my block file and running them through `check-spider-ip-hits.sh`
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user