+
+ 2020-07-01
+
+- A few users noticed that CGSpace wasn’t loading items today, item pages seem blank
+
+- I looked at the PostgreSQL locks but they don’t seem unusual
+- I guess this is the same “blank item page” issue that we had a few times in 2019 that we never solved
+- I restarted Tomcat and PostgreSQL and the issue was gone
+
+
+- Since I was restarting Tomcat anyways I decided to redeploy the latest changes from the
5_x-prod
branch and I added a note about COVID-19 items to the CGSpace frontpage at Peter’s request
+
+
+- Also, Linode is alerting that we had high outbound traffic rate early this morning around midnight AND high CPU load later in the morning
+- First looking at the traffic in the morning:
+
+# cat /var/log/nginx/*.log.1 /var/log/nginx/*.log | grep -E "01/Jul/2020:(00|01|02|03|04)" | goaccess --log-format=COMBINED -
+...
+9659 33.56% 1 0.08% 340.94 MiB 64.39.99.13
+3317 11.53% 1 0.08% 871.71 MiB 199.47.87.140
+2986 10.38% 1 0.08% 17.39 MiB 199.47.87.144
+2286 7.94% 1 0.08% 13.04 MiB 199.47.87.142
+
+- 64.39.99.13 belongs to Qualys, but I see they are using a normal desktop user agent:
+
+Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Safari/605.1.15
+
+- I will purge hits from that IP from Solr
+- The 199.47.87.x IPs belong to Turnitin, and apparently they are NOT marked as bots and we have 40,000 hits from them in 2020 statistics alone:
+
+$ curl -s "http://localhost:8081/solr/statistics/select" -d "q=userAgent:/Turnitin.*/&rows=0" | grep -oE 'numFound="[0-9]+"'
+numFound="41694"
+
+- They used to be “TurnitinBot”… hhmmmm, seems they use both: https://turnitin.com/robot/crawlerinfo.html
+- I will add Turnitin to the DSpace bot user agent list, but I see they are reqesting
robots.txt
and only requesting item pages, so that’s impressive! I don’t need to add them to the “bad bot” rate limit list in nginx
+- While looking at the logs I noticed eighty-one IPs in the range 185.152.250.x making little requests this user agent:
+
+Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:76.0) Gecko/20100101 Firefox/76.0
+
+- The IPs all belong to HostRoyale:
+
+# cat /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep '01/Jul/2020' | awk '{print $1}' | grep 185.152.250. | sort | uniq | wc -l
+81
+# cat /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep '01/Jul/2020' | awk '{print $1}' | grep 185.152.250. | sort | uniq | sort -h
+185.152.250.1
+185.152.250.101
+185.152.250.103
+185.152.250.105
+185.152.250.107
+185.152.250.111
+185.152.250.115
+185.152.250.119
+185.152.250.121
+185.152.250.123
+185.152.250.125
+185.152.250.129
+185.152.250.13
+185.152.250.131
+185.152.250.133
+185.152.250.135
+185.152.250.137
+185.152.250.141
+185.152.250.145
+185.152.250.149
+185.152.250.153
+185.152.250.155
+185.152.250.157
+185.152.250.159
+185.152.250.161
+185.152.250.163
+185.152.250.165
+185.152.250.167
+185.152.250.17
+185.152.250.171
+185.152.250.183
+185.152.250.189
+185.152.250.191
+185.152.250.197
+185.152.250.201
+185.152.250.205
+185.152.250.209
+185.152.250.21
+185.152.250.213
+185.152.250.217
+185.152.250.219
+185.152.250.221
+185.152.250.223
+185.152.250.225
+185.152.250.227
+185.152.250.229
+185.152.250.231
+185.152.250.233
+185.152.250.235
+185.152.250.239
+185.152.250.243
+185.152.250.247
+185.152.250.249
+185.152.250.25
+185.152.250.251
+185.152.250.253
+185.152.250.255
+185.152.250.27
+185.152.250.29
+185.152.250.3
+185.152.250.31
+185.152.250.39
+185.152.250.41
+185.152.250.47
+185.152.250.5
+185.152.250.59
+185.152.250.63
+185.152.250.65
+185.152.250.67
+185.152.250.7
+185.152.250.71
+185.152.250.73
+185.152.250.77
+185.152.250.81
+185.152.250.85
+185.152.250.89
+185.152.250.9
+185.152.250.93
+185.152.250.95
+185.152.250.97
+185.152.250.99
+
+- It’s only a few hundred requests each, but I am very suspicious so I will record it here and purge their IPs from Solr
+- Then I see 185.187.30.14 and 185.187.30.13 making requests also, with several different “normal” user agents
+
+- They are both apparently in France, belonging to Scalair FR hosting
+- I will purge their requests from Solr too
+
+
+- Now I see some other new bots I hadn’t noticed before:
+
+Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0) LinkCheck by Siteimprove.com
+Consilio (WebHare Platform 4.28.2-dev); LinkChecker)
, which appears to be a university CMS
+- I will add
LinkCheck
, Consilio
, and WebHare
to the list of DSpace bot agents and purge them from Solr stats
+- COUNTER-Robots list already has
link.?check
but for some reason DSpace didn’t match that and I see hits for some of these…
+- Maybe I should add
[Ll]ink.?[Cc]heck.?
to a custom list for now?
+- For now I added
Turnitin
to the new bots pull request on COUNTER-Robots
+
+
+- I purged 20,000 hits from IPs and 45,000 hits from user agents
+- I will revert the default “example” agents file back to the upstream master branch of COUNTER-Robots, and then add all my custom ones that are pending in pull requests they haven’t merged yet:
+
+$ diff --unchanged-line-format= --old-line-format= --new-line-format='%L' dspace/config/spiders/agents/example ~/src/git/COUNTER-Robots/COUNTER_Robots_list.txt
+Citoid
+ecointernet
+GigablastOpenSource
+Jersey\/\d
+MarcEdit
+OgScrper
+okhttp
+^Pattern\/\d
+ReactorNetty\/\d
+sqlmap
+Typhoeus
+7siters
+
+- Just a note that I still can’t deploy the
6_x-dev-atmire-modules
branch as it fails at ant update:
+
+ [java] java.lang.RuntimeException: Failed to startup the DSpace Service Manager: failure starting up spring service manager: Error crea
+ting bean with name 'DefaultStorageUpdateConfig': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire method: public void com.atmire.statistics.util.StorageReportsUpdater.setStorageReportServi
+ces(java.util.List); nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'cuaEPersonStorageReportService': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationExceptio
+n: Could not autowire field: private com.atmire.dspace.cua.dao.storage.CUAEPersonStorageReportDAO com.atmire.dspace.cua.CUAStorageReportServ
+iceImpl$CUAEPersonStorageReportServiceImpl.CUAEPersonStorageReportDAO; nested exception is org.springframework.beans.factory.NoUniqueBeanDef
+initionException: No qualifying bean of type [com.atmire.dspace.cua.dao.storage.CUAEPersonStorageReportDAO] is defined: expected single matc
+hing bean but found 2: com.atmire.dspace.cua.dao.impl.CUAStorageReportDAOImpl$CUAEPersonStorageReportDAOImpl#0,com.atmire.dspace.cua.dao.imp
+l.CUAStorageReportDAOImpl$CUAEPersonStorageReportDAOImpl#1
+
+- I had told Atmire about this several weeks ago… but I reminded them again in the ticket
+
+
+
+
+
+
+
+