diff --git a/content/posts/2022-07.md b/content/posts/2022-07.md index 08f9acefa..17db51db4 100644 --- a/content/posts/2022-07.md +++ b/content/posts/2022-07.md @@ -318,5 +318,22 @@ geo $ua { - But I can't get it to work, neither for the default value or for matching my IP... - I will have to ask on the nginx mailing list +- The total number of requests and unique hosts was not even very high (below here around midnight so is almost all day): + +```console +# awk '{print $1}' /var/log/nginx/{access,library-access,oai,rest}.log | sort -u | wc -l +2776 +# awk '{print $1}' /var/log/nginx/{access,library-access,oai,rest}.log | wc -l +40325 +``` + +## 2022-07-18 + +- Reading more about nginx's geo/map and doing some tests on DSpace Test, it appears that the [geo module cannot do dynamic values](https://stackoverflow.com/questions/47011497/nginx-geo-module-wont-use-variables) + - So this issue with the literal `$http_user_agent` is due to the geo block I put in place earlier this month + - I reworked the logic so that the geo block sets "bot" or and empty string when a network matches or not, and then re-use that value in a mapping that passes through the host's user agent in case geo has set it to an empty string + - This allows me to accomplish the original goal while still only using one bot-networks.conf file for the `limit_req_zone` and the user agent mapping that we pass to Tomcat + - Unfortunately this means I will have hundreds of thousands of requests in Solr with a literal `$http_user_agent` + - I might try to purge some by enumerating all the networks in my block file and running them through `check-spider-ip-hits.sh` diff --git a/docs/2022-07/index.html b/docs/2022-07/index.html index e82e146c0..c9ec84202 100644 --- a/docs/2022-07/index.html +++ b/docs/2022-07/index.html @@ -19,7 +19,7 @@ Also, the trgm functions I’ve used before are case insensitive, but Levens - + @@ -44,9 +44,9 @@ Also, the trgm functions I’ve used before are case insensitive, but Levens "@type": "BlogPosting", "headline": "July, 2022", "url": "https://alanorth.github.io/cgspace-notes/2022-07/", - "wordCount": "1959", + "wordCount": "2156", "datePublished": "2022-07-02T14:07:36+03:00", - "dateModified": "2022-07-14T16:46:24+03:00", + "dateModified": "2022-07-17T22:45:16+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -484,6 +484,23 @@ Also, the trgm functions I’ve used before are case insensitive, but Levens
  • I will have to ask on the nginx mailing list
  • +
  • The total number of requests and unique hosts was not even very high (below here around midnight so is almost all day):
  • + +
    # awk '{print $1}' /var/log/nginx/{access,library-access,oai,rest}.log | sort -u | wc -l
    +2776
    +# awk '{print $1}' /var/log/nginx/{access,library-access,oai,rest}.log | wc -l
    +40325
    +

    2022-07-18

    + diff --git a/docs/categories/index.html b/docs/categories/index.html index ee2d35a07..6002bcddb 100644 --- a/docs/categories/index.html +++ b/docs/categories/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html index ea7168db8..369b23af0 100644 --- a/docs/categories/notes/index.html +++ b/docs/categories/notes/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html index 4ecb1ea90..71f31f177 100644 --- a/docs/categories/notes/page/2/index.html +++ b/docs/categories/notes/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html index ca3363911..de25155c0 100644 --- a/docs/categories/notes/page/3/index.html +++ b/docs/categories/notes/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html index 35e781797..aa245dc46 100644 --- a/docs/categories/notes/page/4/index.html +++ b/docs/categories/notes/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html index e75c761bb..380a03c86 100644 --- a/docs/categories/notes/page/5/index.html +++ b/docs/categories/notes/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/6/index.html b/docs/categories/notes/page/6/index.html index a97ddf2e3..b3eba0fd7 100644 --- a/docs/categories/notes/page/6/index.html +++ b/docs/categories/notes/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/7/index.html b/docs/categories/notes/page/7/index.html index fb9457a5e..081af3025 100644 --- a/docs/categories/notes/page/7/index.html +++ b/docs/categories/notes/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/index.html b/docs/index.html index 709375011..b14ac3726 100644 --- a/docs/index.html +++ b/docs/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/2/index.html b/docs/page/2/index.html index 2993b444d..7bb02089f 100644 --- a/docs/page/2/index.html +++ b/docs/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/3/index.html b/docs/page/3/index.html index 2b6399522..8c25e8ef7 100644 --- a/docs/page/3/index.html +++ b/docs/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/4/index.html b/docs/page/4/index.html index fdd30b5cf..70d129ee2 100644 --- a/docs/page/4/index.html +++ b/docs/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/5/index.html b/docs/page/5/index.html index bc1ea6f4b..a6f5c7bff 100644 --- a/docs/page/5/index.html +++ b/docs/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/6/index.html b/docs/page/6/index.html index 72112afc9..b0719b969 100644 --- a/docs/page/6/index.html +++ b/docs/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/7/index.html b/docs/page/7/index.html index 57491675a..fc65a45ea 100644 --- a/docs/page/7/index.html +++ b/docs/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/8/index.html b/docs/page/8/index.html index a0111ad52..410cea6b1 100644 --- a/docs/page/8/index.html +++ b/docs/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/9/index.html b/docs/page/9/index.html index 38a1720bc..c4b902cc5 100644 --- a/docs/page/9/index.html +++ b/docs/page/9/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/index.html b/docs/posts/index.html index d342f1783..3ce7aa806 100644 --- a/docs/posts/index.html +++ b/docs/posts/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html index 2fe05f29d..58011fee2 100644 --- a/docs/posts/page/2/index.html +++ b/docs/posts/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html index 2c1b21707..b09a58363 100644 --- a/docs/posts/page/3/index.html +++ b/docs/posts/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html index 32ffea037..3887f7c06 100644 --- a/docs/posts/page/4/index.html +++ b/docs/posts/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html index 5af087bd1..be099cd2d 100644 --- a/docs/posts/page/5/index.html +++ b/docs/posts/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html index 548022590..38a07eb9a 100644 --- a/docs/posts/page/6/index.html +++ b/docs/posts/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html index bd85d2a8f..ecbbf5746 100644 --- a/docs/posts/page/7/index.html +++ b/docs/posts/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/8/index.html b/docs/posts/page/8/index.html index 7fbb94ada..700fe5ba5 100644 --- a/docs/posts/page/8/index.html +++ b/docs/posts/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/9/index.html b/docs/posts/page/9/index.html index 24009dc59..826087e8a 100644 --- a/docs/posts/page/9/index.html +++ b/docs/posts/page/9/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 30f141627..a903298c9 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -3,19 +3,19 @@ xmlns:xhtml="http://www.w3.org/1999/xhtml"> https://alanorth.github.io/cgspace-notes/categories/ - 2022-07-14T16:46:24+03:00 + 2022-07-17T22:45:16+03:00 https://alanorth.github.io/cgspace-notes/ - 2022-07-14T16:46:24+03:00 + 2022-07-17T22:45:16+03:00 https://alanorth.github.io/cgspace-notes/2022-07/ - 2022-07-14T16:46:24+03:00 + 2022-07-17T22:45:16+03:00 https://alanorth.github.io/cgspace-notes/categories/notes/ - 2022-07-14T16:46:24+03:00 + 2022-07-17T22:45:16+03:00 https://alanorth.github.io/cgspace-notes/posts/ - 2022-07-14T16:46:24+03:00 + 2022-07-17T22:45:16+03:00 https://alanorth.github.io/cgspace-notes/2022-06/ 2022-07-04T09:25:14+03:00