mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-26 00:18:21 +01:00
Add notes for 2019-02-18
This commit is contained in:
parent
3eb69e79fe
commit
9edefa24cf
@ -758,4 +758,119 @@ UPDATE 1
|
|||||||
- I merged the Atmire Metadata Quality Module (MQM) changes to the `5_x-prod` branch and deployed it on CGSpace ([#407](https://github.com/ilri/DSpace/pull/407))
|
- I merged the Atmire Metadata Quality Module (MQM) changes to the `5_x-prod` branch and deployed it on CGSpace ([#407](https://github.com/ilri/DSpace/pull/407))
|
||||||
- Then I ran all system updates on CGSpace server and rebooted it
|
- Then I ran all system updates on CGSpace server and rebooted it
|
||||||
|
|
||||||
|
## 2019-02-18
|
||||||
|
|
||||||
|
- Jesus fucking Christ, Linode sent an alert that CGSpace (linode18) was using 421% CPU for a few hours this afternoon (server time):
|
||||||
|
- There seems to have been a lot of activity in XMLUI:
|
||||||
|
|
||||||
|
```
|
||||||
|
# zcat --force /var/log/nginx/{access,error,library-access}.log /var/log/nginx/{access,error,library-access}.log.1 | grep -E "18/Feb/2019:1(2|3|4|5|6)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||||
|
1236 18.212.208.240
|
||||||
|
1276 54.164.83.99
|
||||||
|
1277 3.83.14.11
|
||||||
|
1282 3.80.196.188
|
||||||
|
1296 3.84.172.18
|
||||||
|
1299 100.24.48.177
|
||||||
|
1299 34.230.15.139
|
||||||
|
1327 52.54.252.47
|
||||||
|
1477 5.9.6.51
|
||||||
|
1861 94.71.244.172
|
||||||
|
# zcat --force /var/log/nginx/{oai,rest,statistics}.log /var/log/nginx/{oai,rest,statistics}.log.1 | grep -E "18/Feb/2019:1(2|3|4|5|6)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||||
|
8 42.112.238.64
|
||||||
|
9 121.52.152.3
|
||||||
|
9 157.55.39.50
|
||||||
|
10 110.54.151.102
|
||||||
|
10 194.246.119.6
|
||||||
|
10 66.249.66.221
|
||||||
|
15 190.56.193.94
|
||||||
|
28 66.249.66.219
|
||||||
|
43 34.209.213.122
|
||||||
|
178 50.116.102.77
|
||||||
|
# zcat --force /var/log/nginx/{access,error,library-access}.log /var/log/nginx/{access,error,library-access}.log.1 | grep -E "18/Feb/2019:1(2|3|4|5|6)" | awk '{print $1}' | sort | uniq | wc -l
|
||||||
|
2727
|
||||||
|
# zcat --force /var/log/nginx/{oai,rest,statistics}.log /var/log/nginx/{oai,rest,statistics}.log.1 | grep -E "18/Feb/2019:1(2|3|4|5|6)" | awk '{print $1}' | sort | uniq | wc -l
|
||||||
|
186
|
||||||
|
```
|
||||||
|
|
||||||
|
- 94.71.244.172 is in Greece and uses the user agent "Indy Library"
|
||||||
|
- At least they are re-using their Tomcat session:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=94.71.244.172' dspace.log.2019-02-18 | sort | uniq | wc -l
|
||||||
|
```
|
||||||
|
|
||||||
|
- The following IPs were all hitting the server hard simultaneously and are located on Amazon and use the user agent "Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0":
|
||||||
|
- 52.54.252.47
|
||||||
|
- 34.230.15.139
|
||||||
|
- 100.24.48.177
|
||||||
|
- 3.84.172.18
|
||||||
|
- 3.80.196.188
|
||||||
|
- 3.83.14.11
|
||||||
|
- 54.164.83.99
|
||||||
|
- 18.212.208.240
|
||||||
|
|
||||||
|
- Actually, even up to the top 30 IPs are almost all on Amazon and use the same user agent!
|
||||||
|
- For reference most of these IPs hitting the XMLUI this afternoon are on Amazon:
|
||||||
|
|
||||||
|
```
|
||||||
|
# zcat --force /var/log/nginx/{access,error,library-access}.log /var/log/nginx/{access,error,library-access}.log.1 | grep -E "18/Feb/2019:1(2|3|4|5|6)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 30
|
||||||
|
1173 52.91.249.23
|
||||||
|
1176 107.22.118.106
|
||||||
|
1178 3.88.173.152
|
||||||
|
1179 3.81.136.184
|
||||||
|
1183 34.201.220.164
|
||||||
|
1183 3.89.134.93
|
||||||
|
1184 54.162.66.53
|
||||||
|
1187 3.84.62.209
|
||||||
|
1188 3.87.4.140
|
||||||
|
1189 54.158.27.198
|
||||||
|
1190 54.209.39.13
|
||||||
|
1192 54.82.238.223
|
||||||
|
1208 3.82.232.144
|
||||||
|
1209 3.80.128.247
|
||||||
|
1214 54.167.64.164
|
||||||
|
1219 3.91.17.126
|
||||||
|
1220 34.201.108.226
|
||||||
|
1221 3.84.223.134
|
||||||
|
1222 18.206.155.14
|
||||||
|
1231 54.210.125.13
|
||||||
|
1236 18.212.208.240
|
||||||
|
1276 54.164.83.99
|
||||||
|
1277 3.83.14.11
|
||||||
|
1282 3.80.196.188
|
||||||
|
1296 3.84.172.18
|
||||||
|
1299 100.24.48.177
|
||||||
|
1299 34.230.15.139
|
||||||
|
1327 52.54.252.47
|
||||||
|
1477 5.9.6.51
|
||||||
|
1861 94.71.244.172
|
||||||
|
```
|
||||||
|
|
||||||
|
- In the case of 52.54.252.47 they are only making about 10 requests per minute during this time (albeit from dozens of concurrent IPs):
|
||||||
|
|
||||||
|
```
|
||||||
|
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep 52.54.252.47 | grep -o -E '18/Feb/2019:1[0-9]:[0-9][0-9]' | uniq -c | sort -n | tail -n 10
|
||||||
|
10 18/Feb/2019:17:20
|
||||||
|
10 18/Feb/2019:17:22
|
||||||
|
10 18/Feb/2019:17:31
|
||||||
|
11 18/Feb/2019:13:21
|
||||||
|
11 18/Feb/2019:15:18
|
||||||
|
11 18/Feb/2019:16:43
|
||||||
|
11 18/Feb/2019:16:57
|
||||||
|
11 18/Feb/2019:16:58
|
||||||
|
11 18/Feb/2019:18:34
|
||||||
|
12 18/Feb/2019:14:37
|
||||||
|
```
|
||||||
|
|
||||||
|
- As this user agent is not recognized as a bot by DSpace this will definitely fuck up the usage statistics
|
||||||
|
- There were 92,000 requests from these IPs alone today!
|
||||||
|
|
||||||
|
```
|
||||||
|
# zcat --force /var/log/nginx/{access,error,library-access}.log /var/log/nginx/{access,error,library-access}.log.1 | grep -c 'Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0'
|
||||||
|
92756
|
||||||
|
```
|
||||||
|
|
||||||
|
- I will add this user agent to the ["badbots" rate limiting in our nginx configuration](https://github.com/ilri/rmg-ansible-public/blob/master/roles/dspace/templates/nginx/default.conf.j2)
|
||||||
|
- I realized that I had effectively only been applying the "badbots" rate limiting to requests at the root, so I added it to the other blocks that match Discovery, Browse, etc as well
|
||||||
|
|
||||||
<!-- vim: set sw=2 ts=2: -->
|
<!-- vim: set sw=2 ts=2: -->
|
||||||
|
@ -42,7 +42,7 @@ sys 0m1.979s
|
|||||||
<meta property="og:type" content="article" />
|
<meta property="og:type" content="article" />
|
||||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-02/" />
|
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-02/" />
|
||||||
<meta property="article:published_time" content="2019-02-01T21:37:30+02:00"/>
|
<meta property="article:published_time" content="2019-02-01T21:37:30+02:00"/>
|
||||||
<meta property="article:modified_time" content="2019-02-15T17:37:14+02:00"/>
|
<meta property="article:modified_time" content="2019-02-17T13:16:36+02:00"/>
|
||||||
|
|
||||||
<meta name="twitter:card" content="summary"/>
|
<meta name="twitter:card" content="summary"/>
|
||||||
<meta name="twitter:title" content="February, 2019"/>
|
<meta name="twitter:title" content="February, 2019"/>
|
||||||
@ -89,9 +89,9 @@ sys 0m1.979s
|
|||||||
"@type": "BlogPosting",
|
"@type": "BlogPosting",
|
||||||
"headline": "February, 2019",
|
"headline": "February, 2019",
|
||||||
"url": "https://alanorth.github.io/cgspace-notes/2019-02/",
|
"url": "https://alanorth.github.io/cgspace-notes/2019-02/",
|
||||||
"wordCount": "4292",
|
"wordCount": "4802",
|
||||||
"datePublished": "2019-02-01T21:37:30+02:00",
|
"datePublished": "2019-02-01T21:37:30+02:00",
|
||||||
"dateModified": "2019-02-15T17:37:14+02:00",
|
"dateModified": "2019-02-17T13:16:36+02:00",
|
||||||
"author": {
|
"author": {
|
||||||
"@type": "Person",
|
"@type": "Person",
|
||||||
"name": "Alan Orth"
|
"name": "Alan Orth"
|
||||||
@ -1020,6 +1020,132 @@ UPDATE 1
|
|||||||
<li><p>Then I ran all system updates on CGSpace server and rebooted it</p></li>
|
<li><p>Then I ran all system updates on CGSpace server and rebooted it</p></li>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
|
<h2 id="2019-02-18">2019-02-18</h2>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>Jesus fucking Christ, Linode sent an alert that CGSpace (linode18) was using 421% CPU for a few hours this afternoon (server time):</li>
|
||||||
|
<li>There seems to have been a lot of activity in XMLUI:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code># zcat --force /var/log/nginx/{access,error,library-access}.log /var/log/nginx/{access,error,library-access}.log.1 | grep -E "18/Feb/2019:1(2|3|4|5|6)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||||
|
1236 18.212.208.240
|
||||||
|
1276 54.164.83.99
|
||||||
|
1277 3.83.14.11
|
||||||
|
1282 3.80.196.188
|
||||||
|
1296 3.84.172.18
|
||||||
|
1299 100.24.48.177
|
||||||
|
1299 34.230.15.139
|
||||||
|
1327 52.54.252.47
|
||||||
|
1477 5.9.6.51
|
||||||
|
1861 94.71.244.172
|
||||||
|
# zcat --force /var/log/nginx/{oai,rest,statistics}.log /var/log/nginx/{oai,rest,statistics}.log.1 | grep -E "18/Feb/2019:1(2|3|4|5|6)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||||
|
8 42.112.238.64
|
||||||
|
9 121.52.152.3
|
||||||
|
9 157.55.39.50
|
||||||
|
10 110.54.151.102
|
||||||
|
10 194.246.119.6
|
||||||
|
10 66.249.66.221
|
||||||
|
15 190.56.193.94
|
||||||
|
28 66.249.66.219
|
||||||
|
43 34.209.213.122
|
||||||
|
178 50.116.102.77
|
||||||
|
# zcat --force /var/log/nginx/{access,error,library-access}.log /var/log/nginx/{access,error,library-access}.log.1 | grep -E "18/Feb/2019:1(2|3|4|5|6)" | awk '{print $1}' | sort | uniq | wc -l
|
||||||
|
2727
|
||||||
|
# zcat --force /var/log/nginx/{oai,rest,statistics}.log /var/log/nginx/{oai,rest,statistics}.log.1 | grep -E "18/Feb/2019:1(2|3|4|5|6)" | awk '{print $1}' | sort | uniq | wc -l
|
||||||
|
186
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>94.71.244.172 is in Greece and uses the user agent “Indy Library”</li>
|
||||||
|
<li>At least they are re-using their Tomcat session:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>$ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=94.71.244.172' dspace.log.2019-02-18 | sort | uniq | wc -l
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li><p>The following IPs were all hitting the server hard simultaneously and are located on Amazon and use the user agent “Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0”:</p>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>52.54.252.47</li>
|
||||||
|
<li>34.230.15.139</li>
|
||||||
|
<li>100.24.48.177</li>
|
||||||
|
<li>3.84.172.18</li>
|
||||||
|
<li>3.80.196.188</li>
|
||||||
|
<li>3.83.14.11</li>
|
||||||
|
<li>54.164.83.99</li>
|
||||||
|
<li>18.212.208.240</li>
|
||||||
|
</ul></li>
|
||||||
|
|
||||||
|
<li><p>Actually, even up to the top 30 IPs are almost all on Amazon and use the same user agent!</p></li>
|
||||||
|
|
||||||
|
<li><p>For reference most of these IPs hitting the XMLUI this afternoon are on Amazon:</p></li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code># zcat --force /var/log/nginx/{access,error,library-access}.log /var/log/nginx/{access,error,library-access}.log.1 | grep -E "18/Feb/2019:1(2|3|4|5|6)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 30
|
||||||
|
1173 52.91.249.23
|
||||||
|
1176 107.22.118.106
|
||||||
|
1178 3.88.173.152
|
||||||
|
1179 3.81.136.184
|
||||||
|
1183 34.201.220.164
|
||||||
|
1183 3.89.134.93
|
||||||
|
1184 54.162.66.53
|
||||||
|
1187 3.84.62.209
|
||||||
|
1188 3.87.4.140
|
||||||
|
1189 54.158.27.198
|
||||||
|
1190 54.209.39.13
|
||||||
|
1192 54.82.238.223
|
||||||
|
1208 3.82.232.144
|
||||||
|
1209 3.80.128.247
|
||||||
|
1214 54.167.64.164
|
||||||
|
1219 3.91.17.126
|
||||||
|
1220 34.201.108.226
|
||||||
|
1221 3.84.223.134
|
||||||
|
1222 18.206.155.14
|
||||||
|
1231 54.210.125.13
|
||||||
|
1236 18.212.208.240
|
||||||
|
1276 54.164.83.99
|
||||||
|
1277 3.83.14.11
|
||||||
|
1282 3.80.196.188
|
||||||
|
1296 3.84.172.18
|
||||||
|
1299 100.24.48.177
|
||||||
|
1299 34.230.15.139
|
||||||
|
1327 52.54.252.47
|
||||||
|
1477 5.9.6.51
|
||||||
|
1861 94.71.244.172
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>In the case of 52.54.252.47 they are only making about 10 requests per minute during this time (albeit from dozens of concurrent IPs):</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep 52.54.252.47 | grep -o -E '18/Feb/2019:1[0-9]:[0-9][0-9]' | uniq -c | sort -n | tail -n 10
|
||||||
|
10 18/Feb/2019:17:20
|
||||||
|
10 18/Feb/2019:17:22
|
||||||
|
10 18/Feb/2019:17:31
|
||||||
|
11 18/Feb/2019:13:21
|
||||||
|
11 18/Feb/2019:15:18
|
||||||
|
11 18/Feb/2019:16:43
|
||||||
|
11 18/Feb/2019:16:57
|
||||||
|
11 18/Feb/2019:16:58
|
||||||
|
11 18/Feb/2019:18:34
|
||||||
|
12 18/Feb/2019:14:37
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>As this user agent is not recognized as a bot by DSpace this will definitely fuck up the usage statistics</li>
|
||||||
|
<li>There were 92,000 requests from these IPs alone today!</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code># zcat --force /var/log/nginx/{access,error,library-access}.log /var/log/nginx/{access,error,library-access}.log.1 | grep -c 'Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0'
|
||||||
|
92756
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>I will add this user agent to the <a href="https://github.com/ilri/rmg-ansible-public/blob/master/roles/dspace/templates/nginx/default.conf.j2">“badbots” rate limiting in our nginx configuration</a></li>
|
||||||
|
<li>I realized that I had effectively only been applying the “badbots” rate limiting to requests at the root, so I added it to the other blocks that match Discovery, Browse, etc as well</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
<!-- vim: set sw=2 ts=2: -->
|
<!-- vim: set sw=2 ts=2: -->
|
||||||
|
|
||||||
|
|
||||||
|
@ -4,7 +4,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/2019-02/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/2019-02/</loc>
|
||||||
<lastmod>2019-02-15T17:37:14+02:00</lastmod>
|
<lastmod>2019-02-17T13:16:36+02:00</lastmod>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
@ -209,7 +209,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||||
<lastmod>2019-02-15T17:37:14+02:00</lastmod>
|
<lastmod>2019-02-17T13:16:36+02:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
@ -220,7 +220,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||||
<lastmod>2019-02-15T17:37:14+02:00</lastmod>
|
<lastmod>2019-02-17T13:16:36+02:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
@ -232,13 +232,13 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||||
<lastmod>2019-02-15T17:37:14+02:00</lastmod>
|
<lastmod>2019-02-17T13:16:36+02:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||||
<lastmod>2019-02-15T17:37:14+02:00</lastmod>
|
<lastmod>2019-02-17T13:16:36+02:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user