diff --git a/content/post/2017-11.md b/content/post/2017-11.md index c96fdb221..7375b667a 100644 --- a/content/post/2017-11.md +++ b/content/post/2017-11.md @@ -555,3 +555,44 @@ $ grep 5.9.6.51 /home/cgspace.cgiar.org/log/dspace.log.2017-11-12 | grep -o -E ' $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=95.108.181.88' /home/cgspace.cgiar.org/log/dspace.log.2017-11-12 991 ``` + +- Move some items and collections on CGSpace for Peter Ballantyne, running [`move_collections.sh`](https://gist.github.com/alanorth/e60b530ed4989df0c731afbb0c640515) with the following configuration: + +``` +10947/6 10947/1 10568/83389 +10947/34 10947/1 10568/83389 +10947/2512 10947/1 10568/83389 +``` + +- I explored nginx rate limits as a way to aggressively throttle Baidu bot which doesn't seem to respect disallowed URLs in robots.txt +- There's an interesting [blog post from Nginx's team about rate limiting](https://www.nginx.com/blog/rate-limiting-nginx/) as well as a [clever use of mapping with rate limits](https://gist.github.com/arosenhagen/8aaf5d7f94171778c0e9) +- The solution [I came up with](https://github.com/ilri/rmg-ansible-public/commit/f0646991772660c505bea9c5ac586490e7c86156) uses tricks from both of those +- I deployed the limit on CGSpace and DSpace Test and it seems to work well: + +``` +$ http --print h https://cgspace.cgiar.org/handle/10568/1 User-Agent:'Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)' +HTTP/1.1 200 OK +Connection: keep-alive +Content-Encoding: gzip +Content-Language: en-US +Content-Type: text/html;charset=utf-8 +Date: Sun, 12 Nov 2017 16:30:19 GMT +Server: nginx +Strict-Transport-Security: max-age=15768000 +Transfer-Encoding: chunked +Vary: Accept-Encoding +X-Cocoon-Version: 2.2.0 +X-Content-Type-Options: nosniff +X-Frame-Options: SAMEORIGIN +X-XSS-Protection: 1; mode=block +$ http --print h https://cgspace.cgiar.org/handle/10568/1 User-Agent:'Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)' +HTTP/1.1 503 Service Temporarily Unavailable +Connection: keep-alive +Content-Length: 206 +Content-Type: text/html +Date: Sun, 12 Nov 2017 16:30:21 GMT +Server: nginx +``` + +- The first request works, second is denied with an HTTP 503! +- I need to remember to check the Munin graphs for PostgreSQL and JVM next week to see how this affects them diff --git a/public/2017-11/index.html b/public/2017-11/index.html index 5a18b3b86..64e72cefd 100644 --- a/public/2017-11/index.html +++ b/public/2017-11/index.html @@ -38,7 +38,7 @@ COPY 54701 - + @@ -86,9 +86,9 @@ COPY 54701 "@type": "BlogPosting", "headline": "November, 2017", "url": "https://alanorth.github.io/cgspace-notes/2017-11/", - "wordCount": "3150", + "wordCount": "3351", "datePublished": "2017-11-02T09:37:54+02:00", - "dateModified": "2017-11-12T10:19:47+02:00", + "dateModified": "2017-11-12T10:41:44+02:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -767,6 +767,51 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=95.108.181.88' /home/cgspace.cgiar 991 + + +
10947/6    10947/1 10568/83389
+10947/34   10947/1 10568/83389
+10947/2512 10947/1 10568/83389
+
+ + + +
$ http --print h https://cgspace.cgiar.org/handle/10568/1 User-Agent:'Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)'
+HTTP/1.1 200 OK
+Connection: keep-alive
+Content-Encoding: gzip
+Content-Language: en-US
+Content-Type: text/html;charset=utf-8
+Date: Sun, 12 Nov 2017 16:30:19 GMT
+Server: nginx
+Strict-Transport-Security: max-age=15768000
+Transfer-Encoding: chunked
+Vary: Accept-Encoding
+X-Cocoon-Version: 2.2.0
+X-Content-Type-Options: nosniff
+X-Frame-Options: SAMEORIGIN
+X-XSS-Protection: 1; mode=block
+$ http --print h https://cgspace.cgiar.org/handle/10568/1 User-Agent:'Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)'
+HTTP/1.1 503 Service Temporarily Unavailable
+Connection: keep-alive
+Content-Length: 206
+Content-Type: text/html
+Date: Sun, 12 Nov 2017 16:30:21 GMT
+Server: nginx
+
+ + + diff --git a/public/sitemap.xml b/public/sitemap.xml index 2ccafa5b4..e5545389b 100644 --- a/public/sitemap.xml +++ b/public/sitemap.xml @@ -4,7 +4,7 @@ https://alanorth.github.io/cgspace-notes/2017-11/ - 2017-11-12T10:19:47+02:00 + 2017-11-12T10:41:44+02:00 @@ -134,7 +134,7 @@ https://alanorth.github.io/cgspace-notes/ - 2017-11-12T10:19:47+02:00 + 2017-11-12T10:41:44+02:00 0 @@ -145,7 +145,7 @@ https://alanorth.github.io/cgspace-notes/tags/notes/ - 2017-11-12T10:19:47+02:00 + 2017-11-12T10:41:44+02:00 0 @@ -157,13 +157,13 @@ https://alanorth.github.io/cgspace-notes/post/ - 2017-11-12T10:19:47+02:00 + 2017-11-12T10:41:44+02:00 0 https://alanorth.github.io/cgspace-notes/tags/ - 2017-11-12T10:19:47+02:00 + 2017-11-12T10:41:44+02:00 0