Update notes for 2020-07-24

This commit is contained in:
2020-07-24 23:23:15 +03:00
parent 6b75032413
commit 9e6ff5d999
21 changed files with 223 additions and 30 deletions

View File

@ -670,5 +670,93 @@ $ curl -s "http://localhost:8081/solr/statistics-2018/update?softCommit=true" -H
- I closed all issues in the [OpenRXV](https://github.com/ilri/OpenRXV/issues) and [AReS](https://github.com/ilri/AReS/issues) GitHub repositories with screenshots so that Moayad can use them for his invoice
- The statistics-2018 core always crashes with the same error even after I deleted the "id:10" records...
- I started the statistics-2017 core and it finished in 3:44:15
- I started the statistics-2016 core and it finished in 2:27:08
- I started the statistics-2015 core and it finished in 1:07:38
## 2020-07-24
- Looking at the statistics-2019 Solr stats and see some interesting user agents and IPs
- For example, I see 568,000 requests from 66.109.27.x in 2019-10, all with the same exact user agent:
```
Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1
```
- Also, in the same month with the same *exact* user agent, I see 300,000 from 192.157.89.x
- The 66.109.27.x IPs belong to galaxyvisions.com
- The 192.157.89.x IPs belong to cologuard.com
- All these hosts were reported in late 2019 on abuseipdb.com
- Then I see another one 163.172.71.23 that made 215,000 requests in 2019-09 and 2019-08
- It belongs to poneytelecom.eu and is also in abuseipdb.com for PHP injection and directory traversal
- It uses this user agent:
```
Mozilla/5.0 ((Windows; U; Windows NT 6.1; fr; rv:1.9.2) Gecko/20100115 Firefox/3.6)
```
- In statistics-2018 I see more weird IPs
- 54.214.112.202 made 839,000 requests with no user agent...
- It is on Amazon Web Services (AWS) and made 100% `statistics_type:view` so I guess it was harvesting via the REST API
- A few IPs owned by perfectip.net made 400,000 requests in 2018-01
- They are 2607:fa98:40:9:26b6:fdff:feff:195d and 2607:fa98:40:9:26b6:fdff:feff:1888 and 2607:fa98:40:9:26b6:fdff:feff:1c96
- All the requests used this user agent:
```
Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36
```
- Then there is 213.139.53.62 in 2018, which is on Orange Telecom Jordan, so it's definitely CodeObia / ICARDA and I will purge them
- Jesus, and then there are 100,000 from the ILRI harvestor on Linode on 2a01:7e00::f03c:91ff:fe0a:d645
- Jesus fuck there is 46.101.86.248 making 15,000 requests per month in 2018 with no user agent...
- I will purge the hits from all the following IPs:
```
192.157.89.4
192.157.89.5
192.157.89.6
192.157.89.7
66.109.27.142
66.109.27.139
66.109.27.138
66.109.27.140
66.109.27.141
2607:fa98:40:9:26b6:fdff:feff:1888
2607:fa98:40:9:26b6:fdff:feff:195d
2607:fa98:40:9:26b6:fdff:feff:1c96
213.139.53.62
2a01:7e00::f03c:91ff:fe0a:d645
46.101.86.248
```
- In total these accounted for the following amount of requests in each year:
- 2020: 1436
- 2019: 933148
- 2018: 613936
- I noticed a few other user agents that should be purged too:
```
^Java\/\d{1,2}.\d
FlipboardProxy\/\d
API scraper
RebelMouse\/\d
Iframely\/\d
Python\/\d
Ruby
NING\/\d
ubermetrics-technologies\.com
Jetty\/\d
scalaj-http\/\d
mailto\:team@impactstory\.org
```
- I purged them from the stats too:
- 2020: 18153
- 2019: 29745
- 2018: 18083
- 2017: 19399
- 2016: 16283
- 2015: 16659
- 2014: 713
<!-- vim: set sw=2 ts=2: -->