mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-16 03:47:04 +01:00
Update notes
This commit is contained in:
parent
727c761af4
commit
a421ac8227
@ -370,6 +370,84 @@ $ http 'http://localhost:3000/solr/statistics/select?q=-owningColl%3A*&wt=json&i
|
|||||||
"response":{"numFound":34879872,"start":0,"docs":[
|
"response":{"numFound":34879872,"start":0,"docs":[
|
||||||
```
|
```
|
||||||
|
|
||||||
|
- I tested the `dspace stats-util -s` process on my local machine and it failed the same way
|
||||||
|
- It doesn't seem to be helpful, but the dspace log shows this:
|
||||||
|
|
||||||
|
```
|
||||||
|
2018-01-10 10:51:19,301 INFO org.dspace.statistics.SolrLogger @ Created core with name: statistics-2016
|
||||||
|
2018-01-10 10:51:19,301 INFO org.dspace.statistics.SolrLogger @ Moving: 3821 records into core statistics-2016
|
||||||
|
```
|
||||||
|
|
||||||
|
- Terry Brady has written some notes on the DSpace Wiki about Solr sharing issues: https://wiki.duraspace.org/display/%7Eterrywbrady/Statistics+Import+Export+Issues
|
||||||
|
- Uptime Robot said that CGSpace went down at around 9:43 AM
|
||||||
|
- I looked at PostgreSQL's `pg_stat_activity` table and saw 161 active connections, but no pool errors in the DSpace logs:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ grep -c "Timeout: Pool empty." dspace.log.2018-01-10
|
||||||
|
0
|
||||||
|
```
|
||||||
|
|
||||||
|
- The XMLUI logs show quite a bit of activity today:
|
||||||
|
|
||||||
|
```
|
||||||
|
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep "10/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
||||||
|
951 207.46.13.159
|
||||||
|
954 157.55.39.123
|
||||||
|
1217 95.108.181.88
|
||||||
|
1503 104.196.152.243
|
||||||
|
6455 70.36.107.50
|
||||||
|
11412 70.36.107.190
|
||||||
|
16730 70.36.107.49
|
||||||
|
17386 2607:fa98:40:9:26b6:fdff:feff:1c96
|
||||||
|
21566 2607:fa98:40:9:26b6:fdff:feff:195d
|
||||||
|
45384 2607:fa98:40:9:26b6:fdff:feff:1888
|
||||||
|
```
|
||||||
|
|
||||||
|
- The user agent for the top six or so IPs are all the same:
|
||||||
|
|
||||||
|
```
|
||||||
|
"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36"
|
||||||
|
```
|
||||||
|
|
||||||
|
- `whois` says they come from [Perfect IP](http://www.perfectip.net/)
|
||||||
|
- I've never seen those top IPs before, but they have created 50,000 Tomcat sessions today:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ grep -E '(2607:fa98:40:9:26b6:fdff:feff:1888|2607:fa98:40:9:26b6:fdff:feff:195d|2607:fa98:40:9:26b6:fdff:feff:1c96|70.36.107.49|70.36.107.190|70.36.107.50)' /home/cgspace.cgiar.org/log/dspace.log.2018-01-10 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
|
||||||
|
49096
|
||||||
|
```
|
||||||
|
|
||||||
|
- Rather than blocking their IPs, I think I might just add their user agent to the "badbots" zone with Baidu, because they seem to be the only ones using that user agent:
|
||||||
|
|
||||||
|
```
|
||||||
|
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari
|
||||||
|
/537.36" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
||||||
|
6796 70.36.107.50
|
||||||
|
11870 70.36.107.190
|
||||||
|
17323 70.36.107.49
|
||||||
|
19204 2607:fa98:40:9:26b6:fdff:feff:1c96
|
||||||
|
23401 2607:fa98:40:9:26b6:fdff:feff:195d
|
||||||
|
47875 2607:fa98:40:9:26b6:fdff:feff:1888
|
||||||
|
```
|
||||||
|
|
||||||
|
- I added the user agent to nginx's badbots limit req zone but upon testing the config I got an error:
|
||||||
|
|
||||||
|
```
|
||||||
|
# nginx -t
|
||||||
|
nginx: [emerg] could not build map_hash, you should increase map_hash_bucket_size: 64
|
||||||
|
nginx: configuration file /etc/nginx/nginx.conf test failed
|
||||||
|
```
|
||||||
|
|
||||||
|
- According to nginx docs the [bucket size should be a multiple of the CPU's cache alignment](https://nginx.org/en/docs/hash.html), which is 64 for us:
|
||||||
|
|
||||||
|
```
|
||||||
|
# cat /proc/cpuinfo | grep cache_alignment | head -n1
|
||||||
|
cache_alignment : 64
|
||||||
|
```
|
||||||
|
|
||||||
|
- On our servers that is 64, so I increased this parameter to 128 and deployed the changes to nginx
|
||||||
|
- Almost immediately the PostgreSQL connections dropped back down to 40 or so, and UptimeRobot said the site was back up
|
||||||
|
- So that's interesting that we're not out of PostgreSQL connections (current pool maxActive is 300!) but the system is "down" to UptimeRobot and very slow to use
|
||||||
- Linode continues to test mitigations for Meltdown and Spectre: https://blog.linode.com/2018/01/03/cpu-vulnerabilities-meltdown-spectre/
|
- Linode continues to test mitigations for Meltdown and Spectre: https://blog.linode.com/2018/01/03/cpu-vulnerabilities-meltdown-spectre/
|
||||||
- I rebooted DSpace Test to see if the kernel will be updated (currently Linux 4.14.12-x86_64-linode92)... nope.
|
- I rebooted DSpace Test to see if the kernel will be updated (currently Linux 4.14.12-x86_64-linode92)... nope.
|
||||||
- It looks like Linode will reboot the KVM hosts later this week, though
|
- It looks like Linode will reboot the KVM hosts later this week, though
|
||||||
|
@ -92,7 +92,7 @@ Danny wrote to ask for help renewing the wildcard ilri.org certificate and I adv
|
|||||||
|
|
||||||
<meta property="article:published_time" content="2018-01-02T08:35:54-08:00"/>
|
<meta property="article:published_time" content="2018-01-02T08:35:54-08:00"/>
|
||||||
|
|
||||||
<meta property="article:modified_time" content="2018-01-09T10:02:52+01:00"/>
|
<meta property="article:modified_time" content="2018-01-10T09:39:16+02:00"/>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@ -194,9 +194,9 @@ Danny wrote to ask for help renewing the wildcard ilri.org certificate and I adv
|
|||||||
"@type": "BlogPosting",
|
"@type": "BlogPosting",
|
||||||
"headline": "January, 2018",
|
"headline": "January, 2018",
|
||||||
"url": "https://alanorth.github.io/cgspace-notes/2018-01/",
|
"url": "https://alanorth.github.io/cgspace-notes/2018-01/",
|
||||||
"wordCount": "1602",
|
"wordCount": "2043",
|
||||||
"datePublished": "2018-01-02T08:35:54-08:00",
|
"datePublished": "2018-01-02T08:35:54-08:00",
|
||||||
"dateModified": "2018-01-09T10:02:52+01:00",
|
"dateModified": "2018-01-10T09:39:16+02:00",
|
||||||
"author": {
|
"author": {
|
||||||
"@type": "Person",
|
"@type": "Person",
|
||||||
"name": "Alan Orth"
|
"name": "Alan Orth"
|
||||||
@ -657,6 +657,92 @@ $ http 'http://localhost:3000/solr/statistics/select?q=-owningColl%3A*&wt=js
|
|||||||
</code></pre>
|
</code></pre>
|
||||||
|
|
||||||
<ul>
|
<ul>
|
||||||
|
<li>I tested the <code>dspace stats-util -s</code> process on my local machine and it failed the same way</li>
|
||||||
|
<li>It doesn’t seem to be helpful, but the dspace log shows this:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>2018-01-10 10:51:19,301 INFO org.dspace.statistics.SolrLogger @ Created core with name: statistics-2016
|
||||||
|
2018-01-10 10:51:19,301 INFO org.dspace.statistics.SolrLogger @ Moving: 3821 records into core statistics-2016
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>Terry Brady has written some notes on the DSpace Wiki about Solr sharing issues: <a href="https://wiki.duraspace.org/display/%7Eterrywbrady/Statistics+Import+Export+Issues">https://wiki.duraspace.org/display/%7Eterrywbrady/Statistics+Import+Export+Issues</a></li>
|
||||||
|
<li>Uptime Robot said that CGSpace went down at around 9:43 AM</li>
|
||||||
|
<li>I looked at PostgreSQL’s <code>pg_stat_activity</code> table and saw 161 active connections, but no pool errors in the DSpace logs:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>$ grep -c "Timeout: Pool empty." dspace.log.2018-01-10
|
||||||
|
0
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>The XMLUI logs show quite a bit of activity today:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep "10/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
||||||
|
951 207.46.13.159
|
||||||
|
954 157.55.39.123
|
||||||
|
1217 95.108.181.88
|
||||||
|
1503 104.196.152.243
|
||||||
|
6455 70.36.107.50
|
||||||
|
11412 70.36.107.190
|
||||||
|
16730 70.36.107.49
|
||||||
|
17386 2607:fa98:40:9:26b6:fdff:feff:1c96
|
||||||
|
21566 2607:fa98:40:9:26b6:fdff:feff:195d
|
||||||
|
45384 2607:fa98:40:9:26b6:fdff:feff:1888
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>The user agent for the top six or so IPs are all the same:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36"
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li><code>whois</code> says they come from <a href="http://www.perfectip.net/">Perfect IP</a></li>
|
||||||
|
<li>I’ve never seen those top IPs before, but they have created 50,000 Tomcat sessions today:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>$ grep -E '(2607:fa98:40:9:26b6:fdff:feff:1888|2607:fa98:40:9:26b6:fdff:feff:195d|2607:fa98:40:9:26b6:fdff:feff:1c96|70.36.107.49|70.36.107.190|70.36.107.50)' /home/cgspace.cgiar.org/log/dspace.log.2018-01-10 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
|
||||||
|
49096
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>Rather than blocking their IPs, I think I might just add their user agent to the “badbots” zone with Baidu, because they seem to be the only ones using that user agent:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari
|
||||||
|
/537.36" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
||||||
|
6796 70.36.107.50
|
||||||
|
11870 70.36.107.190
|
||||||
|
17323 70.36.107.49
|
||||||
|
19204 2607:fa98:40:9:26b6:fdff:feff:1c96
|
||||||
|
23401 2607:fa98:40:9:26b6:fdff:feff:195d
|
||||||
|
47875 2607:fa98:40:9:26b6:fdff:feff:1888
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>I added the user agent to nginx’s badbots limit req zone but upon testing the config I got an error:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code># nginx -t
|
||||||
|
nginx: [emerg] could not build map_hash, you should increase map_hash_bucket_size: 64
|
||||||
|
nginx: configuration file /etc/nginx/nginx.conf test failed
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>According to nginx docs the <a href="https://nginx.org/en/docs/hash.html">bucket size should be a multiple of the CPU’s cache alignment</a>, which is 64 for us:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code># cat /proc/cpuinfo | grep cache_alignment | head -n1
|
||||||
|
cache_alignment : 64
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>On our servers that is 64, so I increased this parameter to 128 and deployed the changes to nginx</li>
|
||||||
|
<li>Almost immediately the PostgreSQL connections dropped back down to 40 or so, and UptimeRobot said the site was back up</li>
|
||||||
|
<li>So that’s interesting that we’re not out of PostgreSQL connections (current pool maxActive is 300!) but the system is “down” to UptimeRobot and very slow to use</li>
|
||||||
<li>Linode continues to test mitigations for Meltdown and Spectre: <a href="https://blog.linode.com/2018/01/03/cpu-vulnerabilities-meltdown-spectre/">https://blog.linode.com/2018/01/03/cpu-vulnerabilities-meltdown-spectre/</a></li>
|
<li>Linode continues to test mitigations for Meltdown and Spectre: <a href="https://blog.linode.com/2018/01/03/cpu-vulnerabilities-meltdown-spectre/">https://blog.linode.com/2018/01/03/cpu-vulnerabilities-meltdown-spectre/</a></li>
|
||||||
<li>I rebooted DSpace Test to see if the kernel will be updated (currently Linux 4.14.12-x86_64-linode92)… nope.</li>
|
<li>I rebooted DSpace Test to see if the kernel will be updated (currently Linux 4.14.12-x86_64-linode92)… nope.</li>
|
||||||
<li>It looks like Linode will reboot the KVM hosts later this week, though</li>
|
<li>It looks like Linode will reboot the KVM hosts later this week, though</li>
|
||||||
|
@ -4,7 +4,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/2018-01/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/2018-01/</loc>
|
||||||
<lastmod>2018-01-09T10:02:52+01:00</lastmod>
|
<lastmod>2018-01-10T09:39:16+02:00</lastmod>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
@ -144,7 +144,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||||
<lastmod>2018-01-09T10:02:52+01:00</lastmod>
|
<lastmod>2018-01-10T09:39:16+02:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
@ -155,7 +155,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||||
<lastmod>2018-01-09T10:02:52+01:00</lastmod>
|
<lastmod>2018-01-10T09:39:16+02:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
@ -167,13 +167,13 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/post/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/post/</loc>
|
||||||
<lastmod>2018-01-09T10:02:52+01:00</lastmod>
|
<lastmod>2018-01-10T09:39:16+02:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||||
<lastmod>2018-01-09T10:02:52+01:00</lastmod>
|
<lastmod>2018-01-10T09:39:16+02:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user