cgspace-notes/content/2015-11.md

+++
date = "2015-11-23T17:00:57+03:00"
author = "Alan Orth"
title = "November, 2015"
Tags = ["notes"]

+++
## 2015-11-22

- CGSpace went down
- Looks like DSpace exhausted its PostgreSQL connection pool
- Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:

```
$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
78
```

- For now I have increased the limit from 60 to 90, run updates, and rebooted the server

## 2015-11-24

- CGSpace went down again
- Getting emails from uptimeRobot and uptimeButler that it's down, and Google Webmaster Tools is sending emails that there is an increase in crawl errors
- Looks like there are still a bunch of idle PostgreSQL connections:

```
$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
96
```

- For some reason the number of idle connections is very high since we upgraded to DSpace 5

## 2015-11-25

- Troubleshoot the DSpace 5 OAI breakage caused by nginx routing config
- The OAI application requests stylesheets and javascript files with the path `/oai/static/css`, which gets matched here:

```
# static assets we can load from the file system directly with nginx
location ~ /(themes|static|aspects/ReportingSuite) {
    try_files $uri @tomcat;
...
```

- The document root is relative to the xmlui app, so this gets a 404—I'm not sure why it doesn't pass to `@tomcat`
- Anyways, I can't find any URIs with path `/static`, and the more important point is to handle all the static theme assets, so we can just remove `static` from the regex for now (who cares if we can't use nginx to send Etags for OAI CSS!)
- Also, I noticed we aren't setting CSP headers on the static assets, because in nginx headers are inherited in child blocks, but if you use `add_header` in a child block it doesn't inherit the others
- We simply need to add `include extra-security.conf;` to the above location block (but research and test first)
- We should add WOFF assets to the list of things to set expires for:

```
location ~* \.(?:ico|css|js|gif|jpe?g|png|woff)$ {
```

- We should also add `aspects/Statistics` to the location block for static assets (minus `static` from above):

```
location ~ /(themes|aspects/ReportingSuite|aspects/Statistics) {
```

- Need to check `/about` on CGSpace, as it's blank on my local test server and we might need to add something there
- CGSpace has been up and down all day due to PostgreSQL idle connections (current DSpace pool is 90):

```
$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
93
```

- I looked closer at the idle connections and saw that many have been idle for hours (current time on server is `2015-11-25T20:20:42+0000`):

```
$ psql -c 'SELECT * from pg_stat_activity;' | less -S
datid | datname  |  pid  | usesysid | usename  | application_name | client_addr | client_hostname | client_port |         backend_start         |          xact_start           |
-------+----------+-------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+---
20951 | cgspace  | 10966 |    18205 | cgspace  |                  | 127.0.0.1   |                 |       37731 | 2015-11-25 13:13:02.837624+00 |                               | 20
20951 | cgspace  | 10967 |    18205 | cgspace  |                  | 127.0.0.1   |                 |       37737 | 2015-11-25 13:13:03.069421+00 |                               | 20
...
```

- There is a relevant Jira issue about this: https://jira.duraspace.org/browse/DS-1458
- It seems there is some sense changing DSpace's default `db.maxidle` from unlimited (-1) to something like 8 (Tomcat default) or 10 (Confluence default)
- Change `db.maxidle` from -1 to 10, reduce `db.maxconnections` from 90 to 50, and restart postgres and tomcat7
- Also redeploy DSpace Test with a clean sync of CGSpace and mirror these database settings there as well
- Also deploy the nginx fixes for the `try_files` location block as well as the expires block

## 2015-11-26

- CGSpace behaving much better since changing `db.maxidle` yesterday, but still two up/down notices from monitoring this morning (better than 50!)
- CCAFS colleagues mentioned that the REST API is very slow, 24 seconds for one item
- Not as bad for me, but still unsustainable if you have to get many:

```
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
8.415
```
Add post for November, 2015 Signed-off-by: Alan Orth <alan.orth@gmail.com> 2015-11-24 23:05:21 +01:00			`+++`
			`date = "2015-11-23T17:00:57+03:00"`
			`author = "Alan Orth"`
			`title = "November, 2015"`
content/post/2015-11.md: Add "notes" tag Signed-off-by: Alan Orth <alan.orth@gmail.com> 2015-11-25 03:16:14 +01:00			`Tags = ["notes"]`
Add post for November, 2015 Signed-off-by: Alan Orth <alan.orth@gmail.com> 2015-11-24 23:05:21 +01:00
			`+++`
			`## 2015-11-22`

			`- CGSpace went down`
			`- Looks like DSpace exhausted its PostgreSQL connection pool`
			`- Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:`

			```
			`$ psql -c 'SELECT * from pg_stat_activity;' \| grep idle \| grep -c cgspace`
			`78`
			```

			`- For now I have increased the limit from 60 to 90, run updates, and rebooted the server`

			`## 2015-11-24`

			`- CGSpace went down again`
			`- Getting emails from uptimeRobot and uptimeButler that it's down, and Google Webmaster Tools is sending emails that there is an increase in crawl errors`
			`- Looks like there are still a bunch of idle PostgreSQL connections:`

			```
			`$ psql -c 'SELECT * from pg_stat_activity;' \| grep idle \| grep -c cgspace`
			`96`
			```

			`- For some reason the number of idle connections is very high since we upgraded to DSpace 5`
content/2015-11.md: Add notes for 2015-11-25 Signed-off-by: Alan Orth <alan.orth@gmail.com> 2015-11-25 10:19:20 +01:00
			`## 2015-11-25`

			`- Troubleshoot the DSpace 5 OAI breakage caused by nginx routing config`
			- The OAI application requests stylesheets and javascript files with the path `/oai/static/css`, which gets matched here:

			```
			`# static assets we can load from the file system directly with nginx`
			`location ~ /(themes\|static\|aspects/ReportingSuite) {`
			`try_files $uri @tomcat;`
			`...`
			```

			- The document root is relative to the xmlui app, so this gets a 404—I'm not sure why it doesn't pass to `@tomcat`
			- Anyways, I can't find any URIs with path `/static`, and the more important point is to handle all the static theme assets, so we can just remove `static` from the regex for now (who cares if we can't use nginx to send Etags for OAI CSS!)
			- Also, I noticed we aren't setting CSP headers on the static assets, because in nginx headers are inherited in child blocks, but if you use `add_header` in a child block it doesn't inherit the others
			- We simply need to add `include extra-security.conf;` to the above location block (but research and test first)
			`- We should add WOFF assets to the list of things to set expires for:`

			```
			`location ~* \.(?:ico\|css\|js\|gif\|jpe?g\|png\|woff)$ {`
			```

			- We should also add `aspects/Statistics` to the location block for static assets (minus `static` from above):

			```
			`location ~ /(themes\|aspects/ReportingSuite\|aspects/Statistics) {`
			```

			- Need to check `/about` on CGSpace, as it's blank on my local test server and we might need to add something there
content/2015-11.md: Add notes about CGSpace crashing again Due to PostgreSQL idle connections again Signed-off-by: Alan Orth <alan.orth@gmail.com> 2015-11-25 19:55:20 +01:00			`- CGSpace has been up and down all day due to PostgreSQL idle connections (current DSpace pool is 90):`

			```
			`$ psql -c 'SELECT * from pg_stat_activity;' \| grep idle \| grep -c cgspace`
			`93`
			```
content/2015-11.md: Add more notes Signed-off-by: Alan Orth <alan.orth@gmail.com> 2015-11-25 21:54:25 +01:00
			- I looked closer at the idle connections and saw that many have been idle for hours (current time on server is `2015-11-25T20:20:42+0000`):

			```
			`$ psql -c 'SELECT * from pg_stat_activity;' \| less -S`
			`datid \| datname \| pid \| usesysid \| usename \| application_name \| client_addr \| client_hostname \| client_port \| backend_start \| xact_start \|`
			`-------+----------+-------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+---`
			`20951 \| cgspace \| 10966 \| 18205 \| cgspace \| \| 127.0.0.1 \| \| 37731 \| 2015-11-25 13:13:02.837624+00 \| \| 20`
			`20951 \| cgspace \| 10967 \| 18205 \| cgspace \| \| 127.0.0.1 \| \| 37737 \| 2015-11-25 13:13:03.069421+00 \| \| 20`
			`...`
			```

			`- There is a relevant Jira issue about this: https://jira.duraspace.org/browse/DS-1458`
			- It seems there is some sense changing DSpace's default `db.maxidle` from unlimited (-1) to something like 8 (Tomcat default) or 10 (Confluence default)
			- Change `db.maxidle` from -1 to 10, reduce `db.maxconnections` from 90 to 50, and restart postgres and tomcat7
			`- Also redeploy DSpace Test with a clean sync of CGSpace and mirror these database settings there as well`
			- Also deploy the nginx fixes for the `try_files` location block as well as the expires block
content/2015-11.md: Add notes Signed-off-by: Alan Orth <alan.orth@gmail.com> 2015-11-26 12:42:18 +01:00
			`## 2015-11-26`

			- CGSpace behaving much better since changing `db.maxidle` yesterday, but still two up/down notices from monitoring this morning (better than 50!)
			`- CCAFS colleagues mentioned that the REST API is very slow, 24 seconds for one item`
			`- Not as bad for me, but still unsustainable if you have to get many:`

			```
			`$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all`
			`8.415`
			```