From 6dd5e7850b5be850598816cd66ff3548b8a23e3a Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Tue, 11 Sep 2018 00:37:38 +0300 Subject: [PATCH] Update notes for 2018-09-10 --- content/posts/2018-09.md | 59 ++++++++++++++++++++++++++++++++++ docs/2018-08/index.html | 6 ++-- docs/2018-09/index.html | 69 ++++++++++++++++++++++++++++++++++++++-- docs/robots.txt | 2 +- docs/sitemap.xml | 22 ++++++------- 5 files changed, 140 insertions(+), 18 deletions(-) diff --git a/content/posts/2018-09.md b/content/posts/2018-09.md index 6b35d1a1c..235a8fb7d 100644 --- a/content/posts/2018-09.md +++ b/content/posts/2018-09.md @@ -138,5 +138,64 @@ UPDATE 15 - Start working on adding metadata for access and usage rights that we started earlier in 2018 (and also in 2017) - The current `cg.identifier.status` field will become "Access rights" and `dc.rights` will become "Usage rights" - I have some work in progress on the [`5_x-rights` branch](https://github.com/alanorth/DSpace/tree/5_x-rights) +- Linode said that CGSpace (linode18) had a high CPU load earlier today +- When I looked, I see it's the same Russian IP that I noticed last month: + +``` +# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "10/Sep/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 + 1459 157.55.39.202 + 1579 95.108.181.88 + 1615 157.55.39.147 + 1714 66.249.64.91 + 1924 50.116.102.77 + 3696 157.55.39.106 + 3763 157.55.39.148 + 4470 70.32.83.92 + 4724 35.237.175.180 + 14132 5.9.6.51 +``` + +- And this bot is still creating more Tomcat sessions than Nginx requests (WTF?): + +``` +# grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=5.9.6.51' dspace.log.2018-09-10 +14133 +``` + +- The user agent is still the same: + +``` +Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler) +``` + +- I added `.*crawl.*` to the Tomcat Session Crawler Manager Valve, so I'm not sure why the bot is creating so many sessions... +- I just tested that user agent on CGSpace and it *does not* create a new session: + +``` +$ http --print Hh https://cgspace.cgiar.org 'User-Agent:Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)' +GET / HTTP/1.1 +Accept: */* +Accept-Encoding: gzip, deflate +Connection: keep-alive +Host: cgspace.cgiar.org +User-Agent: Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler) + +HTTP/1.1 200 OK +Connection: keep-alive +Content-Encoding: gzip +Content-Language: en-US +Content-Type: text/html;charset=utf-8 +Date: Mon, 10 Sep 2018 20:43:04 GMT +Server: nginx +Strict-Transport-Security: max-age=15768000 +Transfer-Encoding: chunked +Vary: Accept-Encoding +X-Cocoon-Version: 2.2.0 +X-Content-Type-Options: nosniff +X-Frame-Options: SAMEORIGIN +X-XSS-Protection: 1; mode=block +``` + +- I will have to keep an eye on it and perhaps add it to the list of "bad bots" that get rate limited diff --git a/docs/2018-08/index.html b/docs/2018-08/index.html index 61289e157..deb82eabd 100644 --- a/docs/2018-08/index.html +++ b/docs/2018-08/index.html @@ -29,7 +29,7 @@ I ran all system updates on DSpace Test and rebooted it " /> - + # cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep '19/Aug/2018' | grep -c 5.9.6.51 1553 -# grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=5.9.6.51' dspace.log.2018-08-19 +# grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=5.9.6.51' dspace.log.2018-08-19 1724 diff --git a/docs/2018-09/index.html b/docs/2018-09/index.html index 2a1370129..ca87f22ef 100644 --- a/docs/2018-09/index.html +++ b/docs/2018-09/index.html @@ -18,7 +18,7 @@ I’m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I " /> - + Start working on adding metadata for access and usage rights that we started earlier in 2018 (and also in 2017)
  • The current cg.identifier.status field will become “Access rights” and dc.rights will become “Usage rights”
  • I have some work in progress on the 5_x-rights branch
  • +
  • Linode said that CGSpace (linode18) had a high CPU load earlier today
  • +
  • When I looked, I see it’s the same Russian IP that I noticed last month:
  • + + +
    # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "10/Sep/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
    +   1459 157.55.39.202
    +   1579 95.108.181.88
    +   1615 157.55.39.147
    +   1714 66.249.64.91
    +   1924 50.116.102.77
    +   3696 157.55.39.106
    +   3763 157.55.39.148
    +   4470 70.32.83.92
    +   4724 35.237.175.180
    +  14132 5.9.6.51
    +
    + + + +
    # grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=5.9.6.51' dspace.log.2018-09-10 
    +14133
    +
    + + + +
    Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)
    +
    + + + +
    $ http --print Hh https://cgspace.cgiar.org 'User-Agent:Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)'
    +GET / HTTP/1.1
    +Accept: */*
    +Accept-Encoding: gzip, deflate
    +Connection: keep-alive
    +Host: cgspace.cgiar.org
    +User-Agent: Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)
    +
    +HTTP/1.1 200 OK
    +Connection: keep-alive
    +Content-Encoding: gzip
    +Content-Language: en-US
    +Content-Type: text/html;charset=utf-8
    +Date: Mon, 10 Sep 2018 20:43:04 GMT
    +Server: nginx
    +Strict-Transport-Security: max-age=15768000
    +Transfer-Encoding: chunked
    +Vary: Accept-Encoding
    +X-Cocoon-Version: 2.2.0
    +X-Content-Type-Options: nosniff
    +X-Frame-Options: SAMEORIGIN
    +X-XSS-Protection: 1; mode=block
    +
    + + diff --git a/docs/robots.txt b/docs/robots.txt index 5620a7e3e..3582866fa 100644 --- a/docs/robots.txt +++ b/docs/robots.txt @@ -39,7 +39,7 @@ Disallow: /cgspace-notes/2015-12/ Disallow: /cgspace-notes/2015-11/ Disallow: /cgspace-notes/ Disallow: /cgspace-notes/categories/ -Disallow: /cgspace-notes/tags/notes/ Disallow: /cgspace-notes/categories/notes/ +Disallow: /cgspace-notes/tags/notes/ Disallow: /cgspace-notes/posts/ Disallow: /cgspace-notes/tags/ diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 1df4ecba6..7ec0184c3 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,12 +4,12 @@ https://alanorth.github.io/cgspace-notes/2018-09/ - 2018-09-10T11:59:08+03:00 + 2018-09-10T18:19:00+03:00 https://alanorth.github.io/cgspace-notes/2018-08/ - 2018-09-02T11:01:40+03:00 + 2018-09-10T23:35:46+03:00 @@ -184,7 +184,7 @@ https://alanorth.github.io/cgspace-notes/ - 2018-09-10T11:59:08+03:00 + 2018-09-10T18:19:00+03:00 0 @@ -193,27 +193,27 @@ 0 - - https://alanorth.github.io/cgspace-notes/tags/notes/ - 2018-09-10T11:59:08+03:00 - 0 - - https://alanorth.github.io/cgspace-notes/categories/notes/ 2018-03-09T22:10:33+02:00 0 + + https://alanorth.github.io/cgspace-notes/tags/notes/ + 2018-09-10T18:19:00+03:00 + 0 + + https://alanorth.github.io/cgspace-notes/posts/ - 2018-09-10T11:59:08+03:00 + 2018-09-10T18:19:00+03:00 0 https://alanorth.github.io/cgspace-notes/tags/ - 2018-09-10T11:59:08+03:00 + 2018-09-10T18:19:00+03:00 0