From 9e6ff5d999c990172ab83d6e374b0e81a989817d Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Fri, 24 Jul 2020 23:23:15 +0300 Subject: [PATCH] Update notes for 2020-07-24 --- content/posts/2020-07.md | 88 ++++++++++++++++++ docs/2019-03/index.html | 6 +- docs/2020-07/index.html | 113 +++++++++++++++++++++++- docs/categories/index.html | 2 +- docs/categories/notes/index.html | 2 +- docs/categories/notes/page/2/index.html | 2 +- docs/categories/notes/page/3/index.html | 2 +- docs/categories/notes/page/4/index.html | 2 +- docs/index.html | 2 +- docs/page/2/index.html | 2 +- docs/page/3/index.html | 2 +- docs/page/4/index.html | 2 +- docs/page/5/index.html | 2 +- docs/page/6/index.html | 2 +- docs/posts/index.html | 2 +- docs/posts/page/2/index.html | 2 +- docs/posts/page/3/index.html | 2 +- docs/posts/page/4/index.html | 2 +- docs/posts/page/5/index.html | 2 +- docs/posts/page/6/index.html | 2 +- docs/sitemap.xml | 12 +-- 21 files changed, 223 insertions(+), 30 deletions(-) diff --git a/content/posts/2020-07.md b/content/posts/2020-07.md index 1e7e34142..57e88fa5e 100644 --- a/content/posts/2020-07.md +++ b/content/posts/2020-07.md @@ -670,5 +670,93 @@ $ curl -s "http://localhost:8081/solr/statistics-2018/update?softCommit=true" -H - I closed all issues in the [OpenRXV](https://github.com/ilri/OpenRXV/issues) and [AReS](https://github.com/ilri/AReS/issues) GitHub repositories with screenshots so that Moayad can use them for his invoice - The statistics-2018 core always crashes with the same error even after I deleted the "id:10" records... + - I started the statistics-2017 core and it finished in 3:44:15 + - I started the statistics-2016 core and it finished in 2:27:08 + - I started the statistics-2015 core and it finished in 1:07:38 + +## 2020-07-24 + +- Looking at the statistics-2019 Solr stats and see some interesting user agents and IPs + - For example, I see 568,000 requests from 66.109.27.x in 2019-10, all with the same exact user agent: + +``` +Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1 +``` + +- Also, in the same month with the same *exact* user agent, I see 300,000 from 192.157.89.x + - The 66.109.27.x IPs belong to galaxyvisions.com + - The 192.157.89.x IPs belong to cologuard.com + - All these hosts were reported in late 2019 on abuseipdb.com +- Then I see another one 163.172.71.23 that made 215,000 requests in 2019-09 and 2019-08 + - It belongs to poneytelecom.eu and is also in abuseipdb.com for PHP injection and directory traversal + - It uses this user agent: + +``` +Mozilla/5.0 ((Windows; U; Windows NT 6.1; fr; rv:1.9.2) Gecko/20100115 Firefox/3.6) +``` + +- In statistics-2018 I see more weird IPs + - 54.214.112.202 made 839,000 requests with no user agent... + - It is on Amazon Web Services (AWS) and made 100% `statistics_type:view` so I guess it was harvesting via the REST API + - A few IPs owned by perfectip.net made 400,000 requests in 2018-01 + - They are 2607:fa98:40:9:26b6:fdff:feff:195d and 2607:fa98:40:9:26b6:fdff:feff:1888 and 2607:fa98:40:9:26b6:fdff:feff:1c96 + - All the requests used this user agent: + +``` +Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36 +``` + +- Then there is 213.139.53.62 in 2018, which is on Orange Telecom Jordan, so it's definitely CodeObia / ICARDA and I will purge them +- Jesus, and then there are 100,000 from the ILRI harvestor on Linode on 2a01:7e00::f03c:91ff:fe0a:d645 +- Jesus fuck there is 46.101.86.248 making 15,000 requests per month in 2018 with no user agent... +- I will purge the hits from all the following IPs: + +``` +192.157.89.4 +192.157.89.5 +192.157.89.6 +192.157.89.7 +66.109.27.142 +66.109.27.139 +66.109.27.138 +66.109.27.140 +66.109.27.141 +2607:fa98:40:9:26b6:fdff:feff:1888 +2607:fa98:40:9:26b6:fdff:feff:195d +2607:fa98:40:9:26b6:fdff:feff:1c96 +213.139.53.62 +2a01:7e00::f03c:91ff:fe0a:d645 +46.101.86.248 +``` + +- In total these accounted for the following amount of requests in each year: + - 2020: 1436 + - 2019: 933148 + - 2018: 613936 +- I noticed a few other user agents that should be purged too: + +``` +^Java\/\d{1,2}.\d +FlipboardProxy\/\d +API scraper +RebelMouse\/\d +Iframely\/\d +Python\/\d +Ruby +NING\/\d +ubermetrics-technologies\.com +Jetty\/\d +scalaj-http\/\d +mailto\:team@impactstory\.org +``` + +- I purged them from the stats too: + - 2020: 18153 + - 2019: 29745 + - 2018: 18083 + - 2017: 19399 + - 2016: 16283 + - 2015: 16659 + - 2014: 713 diff --git a/docs/2019-03/index.html b/docs/2019-03/index.html index 43410728c..80ae51f6a 100644 --- a/docs/2019-03/index.html +++ b/docs/2019-03/index.html @@ -24,7 +24,7 @@ I think I will need to ask Udana to re-copy and paste the abstracts with more ca - + @@ -55,7 +55,7 @@ I think I will need to ask Udana to re-copy and paste the abstracts with more ca "url": "https://alanorth.github.io/cgspace-notes/2019-03/", "wordCount": "7105", "datePublished": "2019-03-01T12:16:30+01:00", - "dateModified": "2020-04-13T15:30:24+03:00", + "dateModified": "2020-07-24T21:57:55+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -951,7 +951,7 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-03-22 | sort -u | wc -l 712 35.174.184.209 784 2a01:4f8:13b:1296::2