mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-14 19:07:03 +01:00
151 lines
6.2 KiB
Markdown
151 lines
6.2 KiB
Markdown
+++
|
||
date = "2015-12-02T13:18:00+03:00"
|
||
author = "Alan Orth"
|
||
title = "December, 2015"
|
||
tags = ["notes"]
|
||
image = "../images/bg.jpg"
|
||
|
||
+++
|
||
## 2015-12-02
|
||
|
||
- Replace `lzop` with `xz` in log compression cron jobs on DSpace Test—it uses less space:
|
||
|
||
```
|
||
# cd /home/dspacetest.cgiar.org/log
|
||
# ls -lh dspace.log.2015-11-18*
|
||
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
|
||
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
|
||
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
|
||
```
|
||
|
||
- I had used lrzip once, but it needs more memory and is harder to use as it requires the lrztar wrapper
|
||
- Need to remember to go check if everything is ok in a few days and then change CGSpace
|
||
- CGSpace went down again (due to PostgreSQL idle connections of course)
|
||
- Current database settings for DSpace are `db.maxconnections = 30` and `db.maxidle = 8`, yet idle connections are exceeding this:
|
||
|
||
```
|
||
$ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle
|
||
39
|
||
```
|
||
|
||
- I restarted PostgreSQL and Tomcat and it's back
|
||
- On a related note of why CGSpace is so slow, I decided to finally try the `pgtune` script to tune the postgres settings:
|
||
|
||
```
|
||
# apt-get install pgtune
|
||
# pgtune -i /etc/postgresql/9.3/main/postgresql.conf -o postgresql.conf-pgtune
|
||
# mv /etc/postgresql/9.3/main/postgresql.conf /etc/postgresql/9.3/main/postgresql.conf.orig
|
||
# mv postgresql.conf-pgtune /etc/postgresql/9.3/main/postgresql.conf
|
||
```
|
||
|
||
- It introduced the following new settings:
|
||
|
||
```
|
||
default_statistics_target = 50
|
||
maintenance_work_mem = 480MB
|
||
constraint_exclusion = on
|
||
checkpoint_completion_target = 0.9
|
||
effective_cache_size = 5632MB
|
||
work_mem = 48MB
|
||
wal_buffers = 8MB
|
||
checkpoint_segments = 16
|
||
shared_buffers = 1920MB
|
||
max_connections = 80
|
||
```
|
||
|
||
- Now I need to go read PostgreSQL docs about these options, and watch memory settings in munin etc
|
||
- For what it's worth, now the REST API should be faster (because of these PostgreSQL tweaks):
|
||
|
||
```
|
||
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
|
||
1.474
|
||
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
|
||
2.141
|
||
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
|
||
1.685
|
||
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
|
||
1.995
|
||
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
|
||
1.786
|
||
```
|
||
|
||
- Last week it was an average of 8 seconds... now this is 1/4 of that
|
||
- CCAFS noticed that one of their items displays only the Atmire statlets: https://cgspace.cgiar.org/handle/10568/42445
|
||
|
||
![CCAFS item](../images/2015/12/ccafs-item-no-metadata.png)
|
||
|
||
- The authorizations for the item are all public READ, and I don't see any errors in dspace.log when browsing that item
|
||
- I filed a ticket on Atmire's issue tracker
|
||
- I also filed a ticket on Atmire's issue tracker for the PostgreSQL stuff
|
||
|
||
## 2015-12-03
|
||
|
||
- CGSpace very slow, and monitoring emailing me to say its down, even though I can load the page (very slowly)
|
||
- Idle postgres connections look like this (with no change in DSpace db settings lately):
|
||
|
||
```
|
||
$ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle
|
||
29
|
||
```
|
||
|
||
- I restarted Tomcat and postgres...
|
||
- Atmire commented that we should raise the JVM heap size by ~500M, so it is now `-Xms3584m -Xmx3584m`
|
||
- We weren't out of heap yet, but it's probably fair enough that the DSpace 5 upgrade (and new Atmire modules) requires more memory so it's ok
|
||
- A possible side effect is that I see that the REST API is twice as fast for the request above now:
|
||
|
||
```
|
||
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
|
||
1.368
|
||
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
|
||
0.968
|
||
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
|
||
1.006
|
||
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
|
||
0.849
|
||
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
|
||
0.806
|
||
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
|
||
0.854
|
||
```
|
||
|
||
## 2015-12-05
|
||
|
||
- CGSpace has been up and down all day and REST API is completely unresponsive
|
||
- PostgreSQL idle connections are currently:
|
||
|
||
```
|
||
postgres@linode01:~$ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle
|
||
28
|
||
```
|
||
|
||
- I have reverted all the pgtune tweaks from the other day, as they didn't fix the stability issues, so I'd rather not have them introducing more variables into the equation
|
||
- The PostgreSQL stats from Munin all point to something database-related with the DSpace 5 upgrade around mid–late November
|
||
|
||
![PostgreSQL bgwriter (year)](../images/2015/12/postgres_bgwriter-year.png)
|
||
![PostgreSQL cache (year)](../images/2015/12/postgres_cache_cgspace-year.png)
|
||
![PostgreSQL locks (year)](../images/2015/12/postgres_locks_cgspace-year.png)
|
||
![PostgreSQL scans (year)](../images/2015/12/postgres_scans_cgspace-year.png)
|
||
|
||
## 2015-12-07
|
||
|
||
- Atmire sent [some fixes](https://github.com/ilri/DSpace/pull/161) to DSpace's REST API code that was leaving contexts open (causing the slow performance and database issues)
|
||
- After deploying the fix to CGSpace the REST API is consistently faster:
|
||
|
||
```
|
||
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
|
||
0.675
|
||
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
|
||
0.599
|
||
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
|
||
0.588
|
||
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
|
||
0.566
|
||
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
|
||
0.497
|
||
```
|
||
|
||
## 2015-12-08
|
||
|
||
- Switch CGSpace log compression cron jobs from using lzop to xz—the compression isn't as good, but it's much faster and causes less IO/CPU load
|
||
- Since we figured out (and fixed) the cause of the performance issue, I reverted Google Bot's crawl rate to the "Let Google optimize" setting
|