Add notes for 2017-10-29

This commit is contained in:
2017-10-29 10:02:34 +02:00
parent 8ee7949429
commit 5acf458937
3 changed files with 64 additions and 8 deletions

View File

@ -198,3 +198,29 @@ http://library.cgiar.org/browse?value=Intellectual%20Assets%20Reports&type=subje
## 2017-10-28
- Linode alerted about high CPU usage again on CGSpace around 2AM this morning
## 2017-10-29
- Linode alerted about high CPU usage again on CGSpace around 2AM and 4AM
- I'm still not sure why this started causing alerts so repeatadely the past week
- I don't see any tell tale signs in the REST or OAI logs, so trying to do rudimentary analysis in DSpace logs:
```
# grep '2017-10-29 02:' dspace.log.2017-10-29 | grep -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
2049
```
- So there were 2049 unique sessions during the hour of 2AM
- Looking at my notes, the number of unique sessions was about the same during the same hour on other days when there were no alerts
- I think I'll need to enable access logging in nginx to figure out what's going on
- After enabling logging on requests to XMLUI on `/` I see some new bot I've never seen before:
```
137.108.70.6 - - [29/Oct/2017:07:39:49 +0000] "GET /discover?filtertype_0=type&filter_relational_operator_0=equals&filter_0=Internal+Document&filtertype=author&filter_relational_operator=equals&filter=CGIAR+Secretariat HTTP/1.1" 200 7776 "-" "Mozilla/5.0 (compatible; CORE/0.6; +http://core.ac.uk; http://core.ac.uk/intro/contact)"
```
- CORE seems to be some bot that is "Aggregating the worlds open access research papers"
- The contact address listed in their bot's user agent is incorrect, correct page is simply: https://core.ac.uk/contact
- I will check the logs in a few days to see if they are harvesting us regularly, then add their bot's user agent to the Tomcat Crawler Session Valve
- After browsing the CORE site it seems that the CGIAR Library is somehow a member of CORE, so they have probably only been harvesting CGSpace since we did the migration, as library.cgiar.org directs to us now
- For now I will just contact them to have them update their contact info in the bot's user agent, but eventually I think I'll tell them to swap out the CGIAR Library entry for CGSpace