diff --git a/content/posts/2019-02.md b/content/posts/2019-02.md index 344618ff2..d4e068074 100644 --- a/content/posts/2019-02.md +++ b/content/posts/2019-02.md @@ -758,4 +758,119 @@ UPDATE 1 - I merged the Atmire Metadata Quality Module (MQM) changes to the `5_x-prod` branch and deployed it on CGSpace ([#407](https://github.com/ilri/DSpace/pull/407)) - Then I ran all system updates on CGSpace server and rebooted it +## 2019-02-18 + +- Jesus fucking Christ, Linode sent an alert that CGSpace (linode18) was using 421% CPU for a few hours this afternoon (server time): +- There seems to have been a lot of activity in XMLUI: + +``` +# zcat --force /var/log/nginx/{access,error,library-access}.log /var/log/nginx/{access,error,library-access}.log.1 | grep -E "18/Feb/2019:1(2|3|4|5|6)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 + 1236 18.212.208.240 + 1276 54.164.83.99 + 1277 3.83.14.11 + 1282 3.80.196.188 + 1296 3.84.172.18 + 1299 100.24.48.177 + 1299 34.230.15.139 + 1327 52.54.252.47 + 1477 5.9.6.51 + 1861 94.71.244.172 +# zcat --force /var/log/nginx/{oai,rest,statistics}.log /var/log/nginx/{oai,rest,statistics}.log.1 | grep -E "18/Feb/2019:1(2|3|4|5|6)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 + 8 42.112.238.64 + 9 121.52.152.3 + 9 157.55.39.50 + 10 110.54.151.102 + 10 194.246.119.6 + 10 66.249.66.221 + 15 190.56.193.94 + 28 66.249.66.219 + 43 34.209.213.122 + 178 50.116.102.77 +# zcat --force /var/log/nginx/{access,error,library-access}.log /var/log/nginx/{access,error,library-access}.log.1 | grep -E "18/Feb/2019:1(2|3|4|5|6)" | awk '{print $1}' | sort | uniq | wc -l +2727 +# zcat --force /var/log/nginx/{oai,rest,statistics}.log /var/log/nginx/{oai,rest,statistics}.log.1 | grep -E "18/Feb/2019:1(2|3|4|5|6)" | awk '{print $1}' | sort | uniq | wc -l +186 +``` + +- 94.71.244.172 is in Greece and uses the user agent "Indy Library" +- At least they are re-using their Tomcat session: + +``` +$ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=94.71.244.172' dspace.log.2019-02-18 | sort | uniq | wc -l +``` + +- The following IPs were all hitting the server hard simultaneously and are located on Amazon and use the user agent "Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0": + - 52.54.252.47 + - 34.230.15.139 + - 100.24.48.177 + - 3.84.172.18 + - 3.80.196.188 + - 3.83.14.11 + - 54.164.83.99 + - 18.212.208.240 + +- Actually, even up to the top 30 IPs are almost all on Amazon and use the same user agent! +- For reference most of these IPs hitting the XMLUI this afternoon are on Amazon: + +``` +# zcat --force /var/log/nginx/{access,error,library-access}.log /var/log/nginx/{access,error,library-access}.log.1 | grep -E "18/Feb/2019:1(2|3|4|5|6)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 30 + 1173 52.91.249.23 + 1176 107.22.118.106 + 1178 3.88.173.152 + 1179 3.81.136.184 + 1183 34.201.220.164 + 1183 3.89.134.93 + 1184 54.162.66.53 + 1187 3.84.62.209 + 1188 3.87.4.140 + 1189 54.158.27.198 + 1190 54.209.39.13 + 1192 54.82.238.223 + 1208 3.82.232.144 + 1209 3.80.128.247 + 1214 54.167.64.164 + 1219 3.91.17.126 + 1220 34.201.108.226 + 1221 3.84.223.134 + 1222 18.206.155.14 + 1231 54.210.125.13 + 1236 18.212.208.240 + 1276 54.164.83.99 + 1277 3.83.14.11 + 1282 3.80.196.188 + 1296 3.84.172.18 + 1299 100.24.48.177 + 1299 34.230.15.139 + 1327 52.54.252.47 + 1477 5.9.6.51 + 1861 94.71.244.172 +``` + +- In the case of 52.54.252.47 they are only making about 10 requests per minute during this time (albeit from dozens of concurrent IPs): + +``` +# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep 52.54.252.47 | grep -o -E '18/Feb/2019:1[0-9]:[0-9][0-9]' | uniq -c | sort -n | tail -n 10 + 10 18/Feb/2019:17:20 + 10 18/Feb/2019:17:22 + 10 18/Feb/2019:17:31 + 11 18/Feb/2019:13:21 + 11 18/Feb/2019:15:18 + 11 18/Feb/2019:16:43 + 11 18/Feb/2019:16:57 + 11 18/Feb/2019:16:58 + 11 18/Feb/2019:18:34 + 12 18/Feb/2019:14:37 +``` + +- As this user agent is not recognized as a bot by DSpace this will definitely fuck up the usage statistics +- There were 92,000 requests from these IPs alone today! + +``` +# zcat --force /var/log/nginx/{access,error,library-access}.log /var/log/nginx/{access,error,library-access}.log.1 | grep -c 'Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0' +92756 +``` + +- I will add this user agent to the ["badbots" rate limiting in our nginx configuration](https://github.com/ilri/rmg-ansible-public/blob/master/roles/dspace/templates/nginx/default.conf.j2) +- I realized that I had effectively only been applying the "badbots" rate limiting to requests at the root, so I added it to the other blocks that match Discovery, Browse, etc as well + diff --git a/docs/2019-02/index.html b/docs/2019-02/index.html index 3ea308904..2f4da4266 100644 --- a/docs/2019-02/index.html +++ b/docs/2019-02/index.html @@ -42,7 +42,7 @@ sys 0m1.979s - + @@ -89,9 +89,9 @@ sys 0m1.979s "@type": "BlogPosting", "headline": "February, 2019", "url": "https://alanorth.github.io/cgspace-notes/2019-02/", - "wordCount": "4292", + "wordCount": "4802", "datePublished": "2019-02-01T21:37:30+02:00", - "dateModified": "2019-02-15T17:37:14+02:00", + "dateModified": "2019-02-17T13:16:36+02:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -1020,6 +1020,132 @@ UPDATE 1
  • Then I ran all system updates on CGSpace server and rebooted it

  • +

    2019-02-18

    + + + +
    # zcat --force /var/log/nginx/{access,error,library-access}.log /var/log/nginx/{access,error,library-access}.log.1 | grep -E "18/Feb/2019:1(2|3|4|5|6)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
    +   1236 18.212.208.240
    +   1276 54.164.83.99
    +   1277 3.83.14.11
    +   1282 3.80.196.188
    +   1296 3.84.172.18
    +   1299 100.24.48.177
    +   1299 34.230.15.139
    +   1327 52.54.252.47
    +   1477 5.9.6.51
    +   1861 94.71.244.172
    +# zcat --force /var/log/nginx/{oai,rest,statistics}.log /var/log/nginx/{oai,rest,statistics}.log.1 | grep -E "18/Feb/2019:1(2|3|4|5|6)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
    +      8 42.112.238.64
    +      9 121.52.152.3
    +      9 157.55.39.50
    +     10 110.54.151.102
    +     10 194.246.119.6
    +     10 66.249.66.221
    +     15 190.56.193.94
    +     28 66.249.66.219
    +     43 34.209.213.122
    +    178 50.116.102.77
    +# zcat --force /var/log/nginx/{access,error,library-access}.log /var/log/nginx/{access,error,library-access}.log.1 | grep -E "18/Feb/2019:1(2|3|4|5|6)" | awk '{print $1}' | sort | uniq | wc -l
    +2727
    +# zcat --force /var/log/nginx/{oai,rest,statistics}.log /var/log/nginx/{oai,rest,statistics}.log.1 | grep -E "18/Feb/2019:1(2|3|4|5|6)" | awk '{print $1}' | sort | uniq | wc -l
    +186
    +
    + + + +
    $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=94.71.244.172' dspace.log.2019-02-18 | sort | uniq | wc -l
    +
    + + + +
    # zcat --force /var/log/nginx/{access,error,library-access}.log /var/log/nginx/{access,error,library-access}.log.1 | grep -E "18/Feb/2019:1(2|3|4|5|6)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 30
    +   1173 52.91.249.23
    +   1176 107.22.118.106
    +   1178 3.88.173.152
    +   1179 3.81.136.184
    +   1183 34.201.220.164
    +   1183 3.89.134.93
    +   1184 54.162.66.53
    +   1187 3.84.62.209
    +   1188 3.87.4.140
    +   1189 54.158.27.198
    +   1190 54.209.39.13
    +   1192 54.82.238.223
    +   1208 3.82.232.144
    +   1209 3.80.128.247
    +   1214 54.167.64.164
    +   1219 3.91.17.126
    +   1220 34.201.108.226
    +   1221 3.84.223.134
    +   1222 18.206.155.14
    +   1231 54.210.125.13
    +   1236 18.212.208.240
    +   1276 54.164.83.99
    +   1277 3.83.14.11
    +   1282 3.80.196.188
    +   1296 3.84.172.18
    +   1299 100.24.48.177
    +   1299 34.230.15.139
    +   1327 52.54.252.47
    +   1477 5.9.6.51
    +   1861 94.71.244.172
    +
    + + + +
    # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep 52.54.252.47 | grep -o -E '18/Feb/2019:1[0-9]:[0-9][0-9]' | uniq -c | sort -n | tail -n 10
    +     10 18/Feb/2019:17:20
    +     10 18/Feb/2019:17:22
    +     10 18/Feb/2019:17:31
    +     11 18/Feb/2019:13:21
    +     11 18/Feb/2019:15:18
    +     11 18/Feb/2019:16:43
    +     11 18/Feb/2019:16:57
    +     11 18/Feb/2019:16:58
    +     11 18/Feb/2019:18:34
    +     12 18/Feb/2019:14:37
    +
    + + + +
    # zcat --force /var/log/nginx/{access,error,library-access}.log /var/log/nginx/{access,error,library-access}.log.1 | grep -c 'Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0'
    +92756
    +
    + + + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index d29e2b119..166172f4f 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,7 +4,7 @@ https://alanorth.github.io/cgspace-notes/2019-02/ - 2019-02-15T17:37:14+02:00 + 2019-02-17T13:16:36+02:00 @@ -209,7 +209,7 @@ https://alanorth.github.io/cgspace-notes/ - 2019-02-15T17:37:14+02:00 + 2019-02-17T13:16:36+02:00 0 @@ -220,7 +220,7 @@ https://alanorth.github.io/cgspace-notes/tags/notes/ - 2019-02-15T17:37:14+02:00 + 2019-02-17T13:16:36+02:00 0 @@ -232,13 +232,13 @@ https://alanorth.github.io/cgspace-notes/posts/ - 2019-02-15T17:37:14+02:00 + 2019-02-17T13:16:36+02:00 0 https://alanorth.github.io/cgspace-notes/tags/ - 2019-02-15T17:37:14+02:00 + 2019-02-17T13:16:36+02:00 0