mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2017-10-29
This commit is contained in:
@ -198,3 +198,29 @@ http://library.cgiar.org/browse?value=Intellectual%20Assets%20Reports&type=subje
|
||||
## 2017-10-28
|
||||
|
||||
- Linode alerted about high CPU usage again on CGSpace around 2AM this morning
|
||||
|
||||
## 2017-10-29
|
||||
|
||||
- Linode alerted about high CPU usage again on CGSpace around 2AM and 4AM
|
||||
- I'm still not sure why this started causing alerts so repeatadely the past week
|
||||
- I don't see any tell tale signs in the REST or OAI logs, so trying to do rudimentary analysis in DSpace logs:
|
||||
|
||||
```
|
||||
# grep '2017-10-29 02:' dspace.log.2017-10-29 | grep -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
|
||||
2049
|
||||
```
|
||||
|
||||
- So there were 2049 unique sessions during the hour of 2AM
|
||||
- Looking at my notes, the number of unique sessions was about the same during the same hour on other days when there were no alerts
|
||||
- I think I'll need to enable access logging in nginx to figure out what's going on
|
||||
- After enabling logging on requests to XMLUI on `/` I see some new bot I've never seen before:
|
||||
|
||||
```
|
||||
137.108.70.6 - - [29/Oct/2017:07:39:49 +0000] "GET /discover?filtertype_0=type&filter_relational_operator_0=equals&filter_0=Internal+Document&filtertype=author&filter_relational_operator=equals&filter=CGIAR+Secretariat HTTP/1.1" 200 7776 "-" "Mozilla/5.0 (compatible; CORE/0.6; +http://core.ac.uk; http://core.ac.uk/intro/contact)"
|
||||
```
|
||||
|
||||
- CORE seems to be some bot that is "Aggregating the world’s open access research papers"
|
||||
- The contact address listed in their bot's user agent is incorrect, correct page is simply: https://core.ac.uk/contact
|
||||
- I will check the logs in a few days to see if they are harvesting us regularly, then add their bot's user agent to the Tomcat Crawler Session Valve
|
||||
- After browsing the CORE site it seems that the CGIAR Library is somehow a member of CORE, so they have probably only been harvesting CGSpace since we did the migration, as library.cgiar.org directs to us now
|
||||
- For now I will just contact them to have them update their contact info in the bot's user agent, but eventually I think I'll tell them to swap out the CGIAR Library entry for CGSpace
|
||||
|
Reference in New Issue
Block a user