mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-22 14:45:03 +01:00
Add notes for 2022-07-18
This commit is contained in:
parent
6fb5aa2be0
commit
92b115ef62
@ -318,5 +318,22 @@ geo $ua {
|
||||
|
||||
- But I can't get it to work, neither for the default value or for matching my IP...
|
||||
- I will have to ask on the nginx mailing list
|
||||
- The total number of requests and unique hosts was not even very high (below here around midnight so is almost all day):
|
||||
|
||||
```console
|
||||
# awk '{print $1}' /var/log/nginx/{access,library-access,oai,rest}.log | sort -u | wc -l
|
||||
2776
|
||||
# awk '{print $1}' /var/log/nginx/{access,library-access,oai,rest}.log | wc -l
|
||||
40325
|
||||
```
|
||||
|
||||
## 2022-07-18
|
||||
|
||||
- Reading more about nginx's geo/map and doing some tests on DSpace Test, it appears that the [geo module cannot do dynamic values](https://stackoverflow.com/questions/47011497/nginx-geo-module-wont-use-variables)
|
||||
- So this issue with the literal `$http_user_agent` is due to the geo block I put in place earlier this month
|
||||
- I reworked the logic so that the geo block sets "bot" or and empty string when a network matches or not, and then re-use that value in a mapping that passes through the host's user agent in case geo has set it to an empty string
|
||||
- This allows me to accomplish the original goal while still only using one bot-networks.conf file for the `limit_req_zone` and the user agent mapping that we pass to Tomcat
|
||||
- Unfortunately this means I will have hundreds of thousands of requests in Solr with a literal `$http_user_agent`
|
||||
- I might try to purge some by enumerating all the networks in my block file and running them through `check-spider-ip-hits.sh`
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
@ -19,7 +19,7 @@ Also, the trgm functions I’ve used before are case insensitive, but Levens
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2022-07/" />
|
||||
<meta property="article:published_time" content="2022-07-02T14:07:36+03:00" />
|
||||
<meta property="article:modified_time" content="2022-07-14T16:46:24+03:00" />
|
||||
<meta property="article:modified_time" content="2022-07-17T22:45:16+03:00" />
|
||||
|
||||
|
||||
|
||||
@ -44,9 +44,9 @@ Also, the trgm functions I’ve used before are case insensitive, but Levens
|
||||
"@type": "BlogPosting",
|
||||
"headline": "July, 2022",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2022-07/",
|
||||
"wordCount": "1959",
|
||||
"wordCount": "2156",
|
||||
"datePublished": "2022-07-02T14:07:36+03:00",
|
||||
"dateModified": "2022-07-14T16:46:24+03:00",
|
||||
"dateModified": "2022-07-17T22:45:16+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -484,6 +484,23 @@ Also, the trgm functions I’ve used before are case insensitive, but Levens
|
||||
<li>I will have to ask on the nginx mailing list</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>The total number of requests and unique hosts was not even very high (below here around midnight so is almost all day):</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># awk <span style="color:#e6db74">'{print $1}'</span> /var/log/nginx/<span style="color:#f92672">{</span>access,library-access,oai,rest<span style="color:#f92672">}</span>.log | sort -u | wc -l
|
||||
</span></span><span style="display:flex;"><span>2776
|
||||
</span></span><span style="display:flex;"><span># awk <span style="color:#e6db74">'{print $1}'</span> /var/log/nginx/<span style="color:#f92672">{</span>access,library-access,oai,rest<span style="color:#f92672">}</span>.log | wc -l
|
||||
</span></span><span style="display:flex;"><span>40325
|
||||
</span></span></code></pre></div><h2 id="2022-07-18">2022-07-18</h2>
|
||||
<ul>
|
||||
<li>Reading more about nginx’s geo/map and doing some tests on DSpace Test, it appears that the <a href="https://stackoverflow.com/questions/47011497/nginx-geo-module-wont-use-variables">geo module cannot do dynamic values</a>
|
||||
<ul>
|
||||
<li>So this issue with the literal <code>$http_user_agent</code> is due to the geo block I put in place earlier this month</li>
|
||||
<li>I reworked the logic so that the geo block sets “bot” or and empty string when a network matches or not, and then re-use that value in a mapping that passes through the host’s user agent in case geo has set it to an empty string</li>
|
||||
<li>This allows me to accomplish the original goal while still only using one bot-networks.conf file for the <code>limit_req_zone</code> and the user agent mapping that we pass to Tomcat</li>
|
||||
<li>Unfortunately this means I will have hundreds of thousands of requests in Solr with a literal <code>$http_user_agent</code></li>
|
||||
<li>I might try to purge some by enumerating all the networks in my block file and running them through <code>check-spider-ip-hits.sh</code></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<!-- raw HTML omitted -->
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
|
||||
<meta property="og:updated_time" content="2022-07-14T16:46:24+03:00" />
|
||||
<meta property="og:updated_time" content="2022-07-17T22:45:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2022-07-14T16:46:24+03:00" />
|
||||
<meta property="og:updated_time" content="2022-07-17T22:45:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2022-07-14T16:46:24+03:00" />
|
||||
<meta property="og:updated_time" content="2022-07-17T22:45:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2022-07-14T16:46:24+03:00" />
|
||||
<meta property="og:updated_time" content="2022-07-17T22:45:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2022-07-14T16:46:24+03:00" />
|
||||
<meta property="og:updated_time" content="2022-07-17T22:45:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2022-07-14T16:46:24+03:00" />
|
||||
<meta property="og:updated_time" content="2022-07-17T22:45:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2022-07-14T16:46:24+03:00" />
|
||||
<meta property="og:updated_time" content="2022-07-17T22:45:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2022-07-14T16:46:24+03:00" />
|
||||
<meta property="og:updated_time" content="2022-07-17T22:45:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2022-07-14T16:46:24+03:00" />
|
||||
<meta property="og:updated_time" content="2022-07-17T22:45:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2022-07-14T16:46:24+03:00" />
|
||||
<meta property="og:updated_time" content="2022-07-17T22:45:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2022-07-14T16:46:24+03:00" />
|
||||
<meta property="og:updated_time" content="2022-07-17T22:45:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2022-07-14T16:46:24+03:00" />
|
||||
<meta property="og:updated_time" content="2022-07-17T22:45:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2022-07-14T16:46:24+03:00" />
|
||||
<meta property="og:updated_time" content="2022-07-17T22:45:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2022-07-14T16:46:24+03:00" />
|
||||
<meta property="og:updated_time" content="2022-07-17T22:45:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2022-07-14T16:46:24+03:00" />
|
||||
<meta property="og:updated_time" content="2022-07-17T22:45:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2022-07-14T16:46:24+03:00" />
|
||||
<meta property="og:updated_time" content="2022-07-17T22:45:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2022-07-14T16:46:24+03:00" />
|
||||
<meta property="og:updated_time" content="2022-07-17T22:45:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2022-07-14T16:46:24+03:00" />
|
||||
<meta property="og:updated_time" content="2022-07-17T22:45:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2022-07-14T16:46:24+03:00" />
|
||||
<meta property="og:updated_time" content="2022-07-17T22:45:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2022-07-14T16:46:24+03:00" />
|
||||
<meta property="og:updated_time" content="2022-07-17T22:45:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2022-07-14T16:46:24+03:00" />
|
||||
<meta property="og:updated_time" content="2022-07-17T22:45:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2022-07-14T16:46:24+03:00" />
|
||||
<meta property="og:updated_time" content="2022-07-17T22:45:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2022-07-14T16:46:24+03:00" />
|
||||
<meta property="og:updated_time" content="2022-07-17T22:45:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2022-07-14T16:46:24+03:00" />
|
||||
<meta property="og:updated_time" content="2022-07-17T22:45:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2022-07-14T16:46:24+03:00" />
|
||||
<meta property="og:updated_time" content="2022-07-17T22:45:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2022-07-14T16:46:24+03:00" />
|
||||
<meta property="og:updated_time" content="2022-07-17T22:45:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -3,19 +3,19 @@
|
||||
xmlns:xhtml="http://www.w3.org/1999/xhtml">
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
|
||||
<lastmod>2022-07-14T16:46:24+03:00</lastmod>
|
||||
<lastmod>2022-07-17T22:45:16+03:00</lastmod>
|
||||
</url><url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||
<lastmod>2022-07-14T16:46:24+03:00</lastmod>
|
||||
<lastmod>2022-07-17T22:45:16+03:00</lastmod>
|
||||
</url><url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2022-07/</loc>
|
||||
<lastmod>2022-07-14T16:46:24+03:00</lastmod>
|
||||
<lastmod>2022-07-17T22:45:16+03:00</lastmod>
|
||||
</url><url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
|
||||
<lastmod>2022-07-14T16:46:24+03:00</lastmod>
|
||||
<lastmod>2022-07-17T22:45:16+03:00</lastmod>
|
||||
</url><url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||
<lastmod>2022-07-14T16:46:24+03:00</lastmod>
|
||||
<lastmod>2022-07-17T22:45:16+03:00</lastmod>
|
||||
</url><url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2022-06/</loc>
|
||||
<lastmod>2022-07-04T09:25:14+03:00</lastmod>
|
||||
|
Loading…
Reference in New Issue
Block a user