Add notes for 2017-10-26

This commit is contained in:
2017-10-26 17:23:36 +03:00
parent ee2a713c2a
commit 85b84e6458
4 changed files with 72 additions and 14 deletions

View File

@ -159,3 +159,30 @@ http://library.cgiar.org/browse?value=Intellectual%20Assets%20Reports&type=subje
- Re-deploy CGSpace from latest `5_x-prod` (adds ISI Journal to search filters and adds Discovery index for CGIAR Library `systemsubject`)
- Deploy nginx redirect fixes to catch CGIAR Library browse links (redirect to their community and translate subject→systemsubject)
- Run migration of CGSpace server (linode18) for Linode security alert, which took 42 minutes of downtime
## 2017-10-26
- In the last 24 hours we've gotten a few alerts from Linode that there was high CPU and outgoing traffic on CGSpace
- Uptime Robot even noticed CGSpace go "down" for a few minutes
- In other news, I was trying to look at a question about stats raised by Magdalena and then CGSpace went down due to SQL connection pool
- Looking at the PostgreSQL activity I see there are 93 connections, but after a minute or two they went down and CGSpace came back up
- Annnd I reloaded the Atmire Usage Stats module and the connections shot back up and CGSpace went down again
- Still not sure where the load is coming from right now, but it's clear why there were so many alerts yesterday on the 25th!
```
# grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2017-10-25 | sort -n | uniq | wc -l
18022
```
- Compared to other days there were two or three times the number of requests yesterday!
```
# grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2017-10-23 | sort -n | uniq | wc -l
3141
# grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2017-10-26 | sort -n | uniq | wc -l
7851
```
- I still have no idea what was causing the load to go up today
- I finally investigated Magdalena's issue with the item download stats and now I can't reproduce it: I get the same number of downloads reported in the stats widget on the item page, the "Most Popular Items" page, and in Usage Stats
- I think it might have been an issue with the statistics not being fresh