From dbff911e21f1a22cc08cafb93e90ecf745e01a09 Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Mon, 1 Nov 2021 10:07:51 +0200 Subject: [PATCH] Add notes for 2021-10-31 --- content/posts/2021-10.md | 76 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 75 insertions(+), 1 deletion(-) diff --git a/content/posts/2021-10.md b/content/posts/2021-10.md index 470fb5f62..09b2b55e8 100644 --- a/content/posts/2021-10.md +++ b/content/posts/2021-10.md @@ -1,4 +1,4 @@ ---- +e-- title: "October, 2021" date: 2021-10-01T11:14:07+03:00 author: "Alan Orth" @@ -531,4 +531,78 @@ $ wc -l /tmp/ips-to-purge.txt - Help ICARDA colleagues with GLDC reports on AReS - There was an issue due to differences in CRP metadata between repositories +## 2021-10-28 + +- Meeting with Medha and a bunch of others about the FAIRscribe tool they have been developing + - Seems it is a submission tool like MEL + +## 2021-10-29 + +- Linode alerted me that CGSpace (linode18) has high outbound traffic for the last two hours + - This has happened a few other times this week so I decided to go look at the Solr stats for today + - I see 93.158.91.62 is making thousands of requests to Discover with a normal user agent: + +``` +Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36 +``` + +- Even more annoying, they are not re-using their session ID: + +```console +$ grep 93.158.91.62 log/dspace.log.2021-10-29 | grep -oE 'session_id=[A-Z0-9]{32}:ip_addr=' | sort | uniq | wc -l +4888 +``` + +- This IP has made 36,000 requests to CGSpace... +- The IP is owned by [Internet Vikings](internetvikings.com) in Sweden +- I purged their statistics and set up a temporary HTTP 403 telling them to use a real user agent +- I see another one in Sweden a few days ago (192.36.109.131), also using the same exact user agent as above, but belonging to [Resilans AB](http://webb.resilans.se/) + - I purged another 74,619 hits from this bot +- I added these two IPs to the nginx IP bot identifier +- Jesus I found a few Russian IPs attempting SQL injection and path traversal, ie: + +``` +45.9.20.71 - - [20/Oct/2021:02:31:15 +0200] "GET /bitstream/handle/10568/1820/Rhodesgrass.pdf?sequence=4&OoxD=6591%20AND%201%3D1%20UNION%20ALL%20SELECT%201%2CNULL%2C%27%3Cscript%3Ealert%28%22XSS%22%29%3C%2Fscript%3E%27%2Ctable_name%20FROM%20information_schema.tables%20WHERE%202%3E1--%2F%2A%2A%2F%3B%20EXEC%20xp_cmdshell%28%27cat%20..%2F..%2F..%2Fetc%2Fpasswd%27%29%23 HTTP/1.1" 200 143070 "https://cgspace.cgiar.org:443/bitstream/handle/10568/1820/Rhodesgrass.pdf" "Mozilla/5.0 (X11; U; Linux i686; es-AR; rv:1.8.1.11) Gecko/20071204 Ubuntu/7.10 (gutsy) Firefox/2.0.0.11" +``` + +- I reported them to AbuseIPDB.com and purged their hits: + +```console +$ ./ilri/check-spider-ip-hits.sh -f /tmp/ip.txt -p +Purging 6364 hits from 45.9.20.71 in statistics +Purging 8039 hits from 45.146.166.157 in statistics +Purging 3383 hits from 45.155.204.82 in statistics + +Total number of bot hits purged: 17786 +``` + +## 2021-10-31 + +- Update Docker containers for AReS on linode20 and run a fresh harvest +- Found some strange IP (94.71.3.44) making 51,000 requests today with the user agent "Microsoft Internet Explorer" + - It is in Greece, and it seems to be requesting each item's XMLUI full metadata view, so I suspect it's Gardian actually + - I found it making another 25,000 requests yesterday... + - I purged them from Solr +- Found 20,000 hits from Qualys (according to AbuseIPDB.com) using normal user agents... ugh, must be some ILRI ICT scan +- Found more request from a Swedish IP (93.158.90.34) using that weird Firefox user agent that I noticed a few weeks ago: + +``` +Mozilla/5.0 (Macintosh; Intel Mac OS X 11.1; rv:84.0) Gecko/20100101 Firefox/84.0 +``` + +- That's from ASN 12552 (IPO-EU, SE), which is operated by Internet Vikings, though AbuseIPDB.com says it's [Availo Networks AB](availo.se) +- There's another IP (3.225.28.105) that made a few thousand requests to the REST API from Amazon, though it's using a normal user agent + +```console +# zgrep 3.225.28.105 /var/log/nginx/rest.log.* | wc -l +3991 +~# zgrep 3.225.28.105 /var/log/nginx/rest.log.* | grep -oE 'GET /rest/(collections|handle|items)' | sort | uniq -c + 3154 GET /rest/collections + 427 GET /rest/handle + 410 GET /rest/items +``` + +- It requested the [CIAT Story Maps](https://cgspace.cgiar.org/handle/10568/75560) collection over 3,000 times last month... + - I will purge those hits +