Update notes for 2017-11-09

This commit is contained in:
Alan Orth 2017-11-09 21:44:20 +02:00
parent 0340864181
commit 5c4bc14ee2
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
4 changed files with 20 additions and 20 deletions

View File

@ -482,8 +482,8 @@ proxy_set_header User-Agent $ua;
- Awesome, it seems my bot mapping stuff in nginx actually reduced the number of Tomcat sessions used by the CIAT scraper today, total requests and unique sessions: - Awesome, it seems my bot mapping stuff in nginx actually reduced the number of Tomcat sessions used by the CIAT scraper today, total requests and unique sessions:
``` ```
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep '09/Nov/2017' | grep -c 104.196.152.243 # cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep '09/Nov/2017' | grep -c 104.196.152.243
5769 7648
$ grep 104.196.152.243 dspace.log.2017-11-09 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l $ grep 104.196.152.243 dspace.log.2017-11-09 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
223 223
``` ```
@ -501,6 +501,6 @@ $ grep 104.196.152.243 dspace.log.2017-11-07 | grep -o -E 'session_id=[A-Z0-9]{3
3506 3506
``` ```
- The number of total requests vary by a few thousand, but the number of sessions is over *ten times less*! - The number of sessions is over *ten times less*!
- This gets me thinking, I wonder if I can use something like nginx's rate limiter to automatically change the user agent of clients who make too many requests - This gets me thinking, I wonder if I can use something like nginx's rate limiter to automatically change the user agent of clients who make too many requests
- Perhaps using a combination of geo and map, like illustrated here: https://www.nginx.com/blog/rate-limiting-nginx/ - Perhaps using a combination of geo and map, like illustrated here: https://www.nginx.com/blog/rate-limiting-nginx/

View File

@ -38,7 +38,7 @@ COPY 54701
<meta property="article:published_time" content="2017-11-02T09:37:54&#43;02:00"/> <meta property="article:published_time" content="2017-11-02T09:37:54&#43;02:00"/>
<meta property="article:modified_time" content="2017-11-09T17:52:14&#43;02:00"/> <meta property="article:modified_time" content="2017-11-09T18:05:32&#43;02:00"/>
@ -86,9 +86,9 @@ COPY 54701
"@type": "BlogPosting", "@type": "BlogPosting",
"headline": "November, 2017", "headline": "November, 2017",
"url": "https://alanorth.github.io/cgspace-notes/2017-11/", "url": "https://alanorth.github.io/cgspace-notes/2017-11/",
"wordCount": "2921", "wordCount": "2910",
"datePublished": "2017-11-02T09:37:54&#43;02:00", "datePublished": "2017-11-02T09:37:54&#43;02:00",
"dateModified": "2017-11-09T17:52:14&#43;02:00", "dateModified": "2017-11-09T18:05:32&#43;02:00",
"author": { "author": {
"@type": "Person", "@type": "Person",
"name": "Alan Orth" "name": "Alan Orth"
@ -684,8 +684,8 @@ proxy_set_header User-Agent $ua;
<li>Awesome, it seems my bot mapping stuff in nginx actually reduced the number of Tomcat sessions used by the CIAT scraper today, total requests and unique sessions:</li> <li>Awesome, it seems my bot mapping stuff in nginx actually reduced the number of Tomcat sessions used by the CIAT scraper today, total requests and unique sessions:</li>
</ul> </ul>
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep '09/Nov/2017' | grep -c 104.196.152.243 <pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep '09/Nov/2017' | grep -c 104.196.152.243
5769 7648
$ grep 104.196.152.243 dspace.log.2017-11-09 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l $ grep 104.196.152.243 dspace.log.2017-11-09 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
223 223
</code></pre> </code></pre>
@ -705,7 +705,7 @@ $ grep 104.196.152.243 dspace.log.2017-11-07 | grep -o -E 'session_id=[A-Z0-9]{3
</code></pre> </code></pre>
<ul> <ul>
<li>The number of total requests vary by a few thousand, but the number of sessions is over <em>ten times less</em>!</li> <li>The number of sessions is over <em>ten times less</em>!</li>
<li>This gets me thinking, I wonder if I can use something like nginx&rsquo;s rate limiter to automatically change the user agent of clients who make too many requests</li> <li>This gets me thinking, I wonder if I can use something like nginx&rsquo;s rate limiter to automatically change the user agent of clients who make too many requests</li>
<li>Perhaps using a combination of geo and map, like illustrated here: <a href="https://www.nginx.com/blog/rate-limiting-nginx/">https://www.nginx.com/blog/rate-limiting-nginx/</a></li> <li>Perhaps using a combination of geo and map, like illustrated here: <a href="https://www.nginx.com/blog/rate-limiting-nginx/">https://www.nginx.com/blog/rate-limiting-nginx/</a></li>
</ul> </ul>

View File

@ -29,7 +29,7 @@ Disallow: /cgspace-notes/2015-12/
Disallow: /cgspace-notes/2015-11/ Disallow: /cgspace-notes/2015-11/
Disallow: /cgspace-notes/ Disallow: /cgspace-notes/
Disallow: /cgspace-notes/categories/ Disallow: /cgspace-notes/categories/
Disallow: /cgspace-notes/categories/notes/
Disallow: /cgspace-notes/tags/notes/ Disallow: /cgspace-notes/tags/notes/
Disallow: /cgspace-notes/categories/notes/
Disallow: /cgspace-notes/post/ Disallow: /cgspace-notes/post/
Disallow: /cgspace-notes/tags/ Disallow: /cgspace-notes/tags/

View File

@ -4,7 +4,7 @@
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/2017-11/</loc> <loc>https://alanorth.github.io/cgspace-notes/2017-11/</loc>
<lastmod>2017-11-09T17:52:14+02:00</lastmod> <lastmod>2017-11-09T18:05:32+02:00</lastmod>
</url> </url>
<url> <url>
@ -134,7 +134,7 @@
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/</loc> <loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2017-11-09T17:52:14+02:00</lastmod> <lastmod>2017-11-09T18:05:32+02:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>
@ -143,27 +143,27 @@
<priority>0</priority> <priority>0</priority>
</url> </url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
<lastmod>2017-11-09T18:05:32+02:00</lastmod>
<priority>0</priority>
</url>
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc> <loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
<lastmod>2017-09-28T12:00:49+03:00</lastmod> <lastmod>2017-09-28T12:00:49+03:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
<lastmod>2017-11-09T17:52:14+02:00</lastmod>
<priority>0</priority>
</url>
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/post/</loc> <loc>https://alanorth.github.io/cgspace-notes/post/</loc>
<lastmod>2017-11-09T17:52:14+02:00</lastmod> <lastmod>2017-11-09T18:05:32+02:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc> <loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
<lastmod>2017-11-09T17:52:14+02:00</lastmod> <lastmod>2017-11-09T18:05:32+02:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>