mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2020-07-01
This commit is contained in:
186
content/posts/2020-07.md
Normal file
186
content/posts/2020-07.md
Normal file
@ -0,0 +1,186 @@
|
||||
---
|
||||
title: "July, 2020"
|
||||
date: 2020-07-01T10:53:54+03:00
|
||||
author: "Alan Orth"
|
||||
categories: ["Notes"]
|
||||
---
|
||||
|
||||
## 2020-07-01
|
||||
|
||||
- A few users noticed that CGSpace wasn't loading items today, item pages seem blank
|
||||
- I looked at the PostgreSQL locks but they don't seem unusual
|
||||
- I guess this is the same "blank item page" issue that we had a few times in 2019 that we never solved
|
||||
- I restarted Tomcat and PostgreSQL and the issue was gone
|
||||
- Since I was restarting Tomcat anyways I decided to redeploy the latest changes from the `5_x-prod` branch and I added a note about COVID-19 items to the CGSpace frontpage at Peter's request
|
||||
|
||||
<!--more-->
|
||||
|
||||
- Also, Linode is alerting that we had high outbound traffic rate early this morning around midnight AND high CPU load later in the morning
|
||||
- First looking at the traffic in the morning:
|
||||
|
||||
```
|
||||
# cat /var/log/nginx/*.log.1 /var/log/nginx/*.log | grep -E "01/Jul/2020:(00|01|02|03|04)" | goaccess --log-format=COMBINED -
|
||||
...
|
||||
9659 33.56% 1 0.08% 340.94 MiB 64.39.99.13
|
||||
3317 11.53% 1 0.08% 871.71 MiB 199.47.87.140
|
||||
2986 10.38% 1 0.08% 17.39 MiB 199.47.87.144
|
||||
2286 7.94% 1 0.08% 13.04 MiB 199.47.87.142
|
||||
```
|
||||
|
||||
- 64.39.99.13 belongs to Qualys, but I see they are using a normal desktop user agent:
|
||||
|
||||
```
|
||||
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Safari/605.1.15
|
||||
```
|
||||
|
||||
- I will purge hits from that IP from Solr
|
||||
- The 199.47.87.x IPs belong to Turnitin, and apparently they are NOT marked as bots and we have 40,000 hits from them in 2020 statistics alone:
|
||||
|
||||
```
|
||||
$ curl -s "http://localhost:8081/solr/statistics/select" -d "q=userAgent:/Turnitin.*/&rows=0" | grep -oE 'numFound="[0-9]+"'
|
||||
numFound="41694"
|
||||
```
|
||||
|
||||
- They used to be "TurnitinBot"... hhmmmm, seems they use both: https://turnitin.com/robot/crawlerinfo.html
|
||||
- I will add Turnitin to the DSpace bot user agent list, but I see they are reqesting `robots.txt` and only requesting item pages, so that's impressive! I don't need to add them to the "bad bot" rate limit list in nginx
|
||||
- While looking at the logs I noticed eighty-one IPs in the range 185.152.250.x making little requests this user agent:
|
||||
|
||||
```
|
||||
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:76.0) Gecko/20100101 Firefox/76.0
|
||||
```
|
||||
|
||||
- The IPs all belong to HostRoyale:
|
||||
|
||||
```
|
||||
# cat /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep '01/Jul/2020' | awk '{print $1}' | grep 185.152.250. | sort | uniq | wc -l
|
||||
81
|
||||
# cat /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep '01/Jul/2020' | awk '{print $1}' | grep 185.152.250. | sort | uniq | sort -h
|
||||
185.152.250.1
|
||||
185.152.250.101
|
||||
185.152.250.103
|
||||
185.152.250.105
|
||||
185.152.250.107
|
||||
185.152.250.111
|
||||
185.152.250.115
|
||||
185.152.250.119
|
||||
185.152.250.121
|
||||
185.152.250.123
|
||||
185.152.250.125
|
||||
185.152.250.129
|
||||
185.152.250.13
|
||||
185.152.250.131
|
||||
185.152.250.133
|
||||
185.152.250.135
|
||||
185.152.250.137
|
||||
185.152.250.141
|
||||
185.152.250.145
|
||||
185.152.250.149
|
||||
185.152.250.153
|
||||
185.152.250.155
|
||||
185.152.250.157
|
||||
185.152.250.159
|
||||
185.152.250.161
|
||||
185.152.250.163
|
||||
185.152.250.165
|
||||
185.152.250.167
|
||||
185.152.250.17
|
||||
185.152.250.171
|
||||
185.152.250.183
|
||||
185.152.250.189
|
||||
185.152.250.191
|
||||
185.152.250.197
|
||||
185.152.250.201
|
||||
185.152.250.205
|
||||
185.152.250.209
|
||||
185.152.250.21
|
||||
185.152.250.213
|
||||
185.152.250.217
|
||||
185.152.250.219
|
||||
185.152.250.221
|
||||
185.152.250.223
|
||||
185.152.250.225
|
||||
185.152.250.227
|
||||
185.152.250.229
|
||||
185.152.250.231
|
||||
185.152.250.233
|
||||
185.152.250.235
|
||||
185.152.250.239
|
||||
185.152.250.243
|
||||
185.152.250.247
|
||||
185.152.250.249
|
||||
185.152.250.25
|
||||
185.152.250.251
|
||||
185.152.250.253
|
||||
185.152.250.255
|
||||
185.152.250.27
|
||||
185.152.250.29
|
||||
185.152.250.3
|
||||
185.152.250.31
|
||||
185.152.250.39
|
||||
185.152.250.41
|
||||
185.152.250.47
|
||||
185.152.250.5
|
||||
185.152.250.59
|
||||
185.152.250.63
|
||||
185.152.250.65
|
||||
185.152.250.67
|
||||
185.152.250.7
|
||||
185.152.250.71
|
||||
185.152.250.73
|
||||
185.152.250.77
|
||||
185.152.250.81
|
||||
185.152.250.85
|
||||
185.152.250.89
|
||||
185.152.250.9
|
||||
185.152.250.93
|
||||
185.152.250.95
|
||||
185.152.250.97
|
||||
185.152.250.99
|
||||
```
|
||||
|
||||
- It's only a few hundred requests each, but I am very suspicious so I will record it here and purge their IPs from Solr
|
||||
- Then I see 185.187.30.14 and 185.187.30.13 making requests also, with several different "normal" user agents
|
||||
- They are both apparently in France, belonging to Scalair FR hosting
|
||||
- I will purge their requests from Solr too
|
||||
- Now I see some other new bots I hadn't noticed before:
|
||||
- `Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0) LinkCheck by Siteimprove.com`
|
||||
- `Consilio (WebHare Platform 4.28.2-dev); LinkChecker)`, which appears to be a [university CMS](https://www.utwente.nl/en/websites/webhare/)
|
||||
- I will add `LinkCheck`, `Consilio`, and `WebHare` to the list of DSpace bot agents and purge them from Solr stats
|
||||
- COUNTER-Robots list already has `link.?check` but for some reason DSpace didn't match that and I see hits for some of these...
|
||||
- Maybe I should add `[Ll]ink.?[Cc]heck.?` to a custom list for now?
|
||||
- For now I added `Turnitin` to the [new bots pull request on COUNTER-Robots](https://github.com/atmire/COUNTER-Robots/pull/34)
|
||||
- I purged 20,000 hits from IPs and 45,000 hits from user agents
|
||||
- I will revert the default "example" agents file back to the upstream master branch of COUNTER-Robots, and then add all my custom ones that are pending in pull requests they haven't merged yet:
|
||||
|
||||
```
|
||||
$ diff --unchanged-line-format= --old-line-format= --new-line-format='%L' dspace/config/spiders/agents/example ~/src/git/COUNTER-Robots/COUNTER_Robots_list.txt
|
||||
Citoid
|
||||
ecointernet
|
||||
GigablastOpenSource
|
||||
Jersey\/\d
|
||||
MarcEdit
|
||||
OgScrper
|
||||
okhttp
|
||||
^Pattern\/\d
|
||||
ReactorNetty\/\d
|
||||
sqlmap
|
||||
Typhoeus
|
||||
7siters
|
||||
```
|
||||
|
||||
- Just a note that I *still* can't deploy the `6_x-dev-atmire-modules` branch as it fails at ant update:
|
||||
|
||||
```
|
||||
[java] java.lang.RuntimeException: Failed to startup the DSpace Service Manager: failure starting up spring service manager: Error crea
|
||||
ting bean with name 'DefaultStorageUpdateConfig': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire method: public void com.atmire.statistics.util.StorageReportsUpdater.setStorageReportServi
|
||||
ces(java.util.List); nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'cuaEPersonStorageReportService': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationExceptio
|
||||
n: Could not autowire field: private com.atmire.dspace.cua.dao.storage.CUAEPersonStorageReportDAO com.atmire.dspace.cua.CUAStorageReportServ
|
||||
iceImpl$CUAEPersonStorageReportServiceImpl.CUAEPersonStorageReportDAO; nested exception is org.springframework.beans.factory.NoUniqueBeanDef
|
||||
initionException: No qualifying bean of type [com.atmire.dspace.cua.dao.storage.CUAEPersonStorageReportDAO] is defined: expected single matc
|
||||
hing bean but found 2: com.atmire.dspace.cua.dao.impl.CUAStorageReportDAOImpl$CUAEPersonStorageReportDAOImpl#0,com.atmire.dspace.cua.dao.imp
|
||||
l.CUAStorageReportDAOImpl$CUAEPersonStorageReportDAOImpl#1
|
||||
```
|
||||
|
||||
- I had told Atmire about this several weeks ago... but I reminded them again in the ticket
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
Reference in New Issue
Block a user