Update notes for 2017-11-12

This commit is contained in:
Alan Orth 2017-11-12 18:48:52 +02:00
parent f2ef00d1e9
commit 41bdd24079
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
3 changed files with 94 additions and 8 deletions

View File

@ -555,3 +555,44 @@ $ grep 5.9.6.51 /home/cgspace.cgiar.org/log/dspace.log.2017-11-12 | grep -o -E '
$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=95.108.181.88' /home/cgspace.cgiar.org/log/dspace.log.2017-11-12
991
```
- Move some items and collections on CGSpace for Peter Ballantyne, running [`move_collections.sh`](https://gist.github.com/alanorth/e60b530ed4989df0c731afbb0c640515) with the following configuration:
```
10947/6 10947/1 10568/83389
10947/34 10947/1 10568/83389
10947/2512 10947/1 10568/83389
```
- I explored nginx rate limits as a way to aggressively throttle Baidu bot which doesn't seem to respect disallowed URLs in robots.txt
- There's an interesting [blog post from Nginx's team about rate limiting](https://www.nginx.com/blog/rate-limiting-nginx/) as well as a [clever use of mapping with rate limits](https://gist.github.com/arosenhagen/8aaf5d7f94171778c0e9)
- The solution [I came up with](https://github.com/ilri/rmg-ansible-public/commit/f0646991772660c505bea9c5ac586490e7c86156) uses tricks from both of those
- I deployed the limit on CGSpace and DSpace Test and it seems to work well:
```
$ http --print h https://cgspace.cgiar.org/handle/10568/1 User-Agent:'Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)'
HTTP/1.1 200 OK
Connection: keep-alive
Content-Encoding: gzip
Content-Language: en-US
Content-Type: text/html;charset=utf-8
Date: Sun, 12 Nov 2017 16:30:19 GMT
Server: nginx
Strict-Transport-Security: max-age=15768000
Transfer-Encoding: chunked
Vary: Accept-Encoding
X-Cocoon-Version: 2.2.0
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
$ http --print h https://cgspace.cgiar.org/handle/10568/1 User-Agent:'Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)'
HTTP/1.1 503 Service Temporarily Unavailable
Connection: keep-alive
Content-Length: 206
Content-Type: text/html
Date: Sun, 12 Nov 2017 16:30:21 GMT
Server: nginx
```
- The first request works, second is denied with an HTTP 503!
- I need to remember to check the Munin graphs for PostgreSQL and JVM next week to see how this affects them

View File

@ -38,7 +38,7 @@ COPY 54701
<meta property="article:published_time" content="2017-11-02T09:37:54&#43;02:00"/>
<meta property="article:modified_time" content="2017-11-12T10:19:47&#43;02:00"/>
<meta property="article:modified_time" content="2017-11-12T10:41:44&#43;02:00"/>
@ -86,9 +86,9 @@ COPY 54701
"@type": "BlogPosting",
"headline": "November, 2017",
"url": "https://alanorth.github.io/cgspace-notes/2017-11/",
"wordCount": "3150",
"wordCount": "3351",
"datePublished": "2017-11-02T09:37:54&#43;02:00",
"dateModified": "2017-11-12T10:19:47&#43;02:00",
"dateModified": "2017-11-12T10:41:44&#43;02:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -767,6 +767,51 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=95.108.181.88' /home/cgspace.cgiar
991
</code></pre>
<ul>
<li>Move some items and collections on CGSpace for Peter Ballantyne, running <a href="https://gist.github.com/alanorth/e60b530ed4989df0c731afbb0c640515"><code>move_collections.sh</code></a> with the following configuration:</li>
</ul>
<pre><code>10947/6 10947/1 10568/83389
10947/34 10947/1 10568/83389
10947/2512 10947/1 10568/83389
</code></pre>
<ul>
<li>I explored nginx rate limits as a way to aggressively throttle Baidu bot which doesn&rsquo;t seem to respect disallowed URLs in robots.txt</li>
<li>There&rsquo;s an interesting <a href="https://www.nginx.com/blog/rate-limiting-nginx/">blog post from Nginx&rsquo;s team about rate limiting</a> as well as a <a href="https://gist.github.com/arosenhagen/8aaf5d7f94171778c0e9">clever use of mapping with rate limits</a></li>
<li>The solution <a href="https://github.com/ilri/rmg-ansible-public/commit/f0646991772660c505bea9c5ac586490e7c86156">I came up with</a> uses tricks from both of those</li>
<li>I deployed the limit on CGSpace and DSpace Test and it seems to work well:</li>
</ul>
<pre><code>$ http --print h https://cgspace.cgiar.org/handle/10568/1 User-Agent:'Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)'
HTTP/1.1 200 OK
Connection: keep-alive
Content-Encoding: gzip
Content-Language: en-US
Content-Type: text/html;charset=utf-8
Date: Sun, 12 Nov 2017 16:30:19 GMT
Server: nginx
Strict-Transport-Security: max-age=15768000
Transfer-Encoding: chunked
Vary: Accept-Encoding
X-Cocoon-Version: 2.2.0
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
$ http --print h https://cgspace.cgiar.org/handle/10568/1 User-Agent:'Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)'
HTTP/1.1 503 Service Temporarily Unavailable
Connection: keep-alive
Content-Length: 206
Content-Type: text/html
Date: Sun, 12 Nov 2017 16:30:21 GMT
Server: nginx
</code></pre>
<ul>
<li>The first request works, second is denied with an HTTP 503!</li>
<li>I need to remember to check the Munin graphs for PostgreSQL and JVM next week to see how this affects them</li>
</ul>

View File

@ -4,7 +4,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/2017-11/</loc>
<lastmod>2017-11-12T10:19:47+02:00</lastmod>
<lastmod>2017-11-12T10:41:44+02:00</lastmod>
</url>
<url>
@ -134,7 +134,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2017-11-12T10:19:47+02:00</lastmod>
<lastmod>2017-11-12T10:41:44+02:00</lastmod>
<priority>0</priority>
</url>
@ -145,7 +145,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
<lastmod>2017-11-12T10:19:47+02:00</lastmod>
<lastmod>2017-11-12T10:41:44+02:00</lastmod>
<priority>0</priority>
</url>
@ -157,13 +157,13 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/post/</loc>
<lastmod>2017-11-12T10:19:47+02:00</lastmod>
<lastmod>2017-11-12T10:41:44+02:00</lastmod>
<priority>0</priority>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
<lastmod>2017-11-12T10:19:47+02:00</lastmod>
<lastmod>2017-11-12T10:41:44+02:00</lastmod>
<priority>0</priority>
</url>