Add notes for 2022-07-18

This commit is contained in:
2022-07-18 12:32:23 +03:00
parent 6fb5aa2be0
commit 92b115ef62
29 changed files with 68 additions and 34 deletions

View File

@ -318,5 +318,22 @@ geo $ua {
- But I can't get it to work, neither for the default value or for matching my IP...
- I will have to ask on the nginx mailing list
- The total number of requests and unique hosts was not even very high (below here around midnight so is almost all day):
```console
# awk '{print $1}' /var/log/nginx/{access,library-access,oai,rest}.log | sort -u | wc -l
2776
# awk '{print $1}' /var/log/nginx/{access,library-access,oai,rest}.log | wc -l
40325
```
## 2022-07-18
- Reading more about nginx's geo/map and doing some tests on DSpace Test, it appears that the [geo module cannot do dynamic values](https://stackoverflow.com/questions/47011497/nginx-geo-module-wont-use-variables)
- So this issue with the literal `$http_user_agent` is due to the geo block I put in place earlier this month
- I reworked the logic so that the geo block sets "bot" or and empty string when a network matches or not, and then re-use that value in a mapping that passes through the host's user agent in case geo has set it to an empty string
- This allows me to accomplish the original goal while still only using one bot-networks.conf file for the `limit_req_zone` and the user agent mapping that we pass to Tomcat
- Unfortunately this means I will have hundreds of thousands of requests in Solr with a literal `$http_user_agent`
- I might try to purge some by enumerating all the networks in my block file and running them through `check-spider-ip-hits.sh`
<!-- vim: set sw=2 ts=2: -->