+
+ 2021-05-01
+
+- I looked at the top user agents and IPs in the Solr statistics for last month and I see these user agents:
+
+- “RI/1.0”, 1337
+- “Microsoft Office Word 2014”, 941
+
+
+- I will add the RI/1.0 pattern to our DSpace agents overload and purge them from Solr (we had previously seen this agent with 9,000 hits or so in 2020-09), but I think I will leave the Microsoft Word one… as that’s an actual user…
+
+
+- I should probably add the
RI/1.0
pattern to COUNTER-Robots project
+- As well as these IPs:
+
+- 193.169.254.178, 21648
+- 181.62.166.177, 20323
+- 45.146.166.180, 19376
+
+
+- The first IP seems to be in Estonia and their requests to the REST API change user agents from curl to Mac OS X to Windows and more
+
+- Also, they seem to be trying to exploit something:
+
+
+
+193.169.254.178 - - [21/Apr/2021:01:59:01 +0200] "GET /rest/collections/1179/items?limit=812&expand=metadata\x22%20and%20\x2221\x22=\x2221 HTTP/1.1" 400 5 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
+193.169.254.178 - - [21/Apr/2021:02:00:36 +0200] "GET /rest/collections/1179/items?limit=812&expand=metadata-21%2B21*01 HTTP/1.1" 200 458201 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
+193.169.254.178 - - [21/Apr/2021:02:00:36 +0200] "GET /rest/collections/1179/items?limit=812&expand=metadata'||lower('')||' HTTP/1.1" 400 5 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
+193.169.254.178 - - [21/Apr/2021:02:02:10 +0200] "GET /rest/collections/1179/items?limit=812&expand=metadata'%2Brtrim('')%2B' HTTP/1.1" 200 458209 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
+
+- I will report the IP on abuseipdb.com and purge their hits from Solr
+- The second IP is in Colombia and is making thousands of requests for what looks like some test site:
+
+181.62.166.177 - - [20/Apr/2021:22:48:42 +0200] "GET /rest/collections/d1e11546-c62a-4aee-af91-fd482b3e7653/items?expand=metadata HTTP/2.0" 200 123613 "http://cassavalighthousetest.org/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36"
+181.62.166.177 - - [20/Apr/2021:22:55:39 +0200] "GET /rest/collections/d1e11546-c62a-4aee-af91-fd482b3e7653/items?expand=metadata HTTP/2.0" 200 123613 "http://cassavalighthousetest.org/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36"
+
+- But this site does not exist (yet?)
+
+- I will purge them from Solr
+
+
+- The third IP is in Russia apparently, and the user agent has the
pl-PL
locale with thousands of requests like this:
+
+45.146.166.180 - - [18/Apr/2021:16:28:44 +0200] "GET /bitstream/handle/10947/4153/.AAS%202014%20Annual%20Report.pdf?sequence=1%22%29%29%20AND%201691%3DUTL_INADDR.GET_HOST_ADDRESS%28CHR%28113%29%7C%7CCHR%28118%29%7C%7CCHR%28113%29%7C%7CCHR%28106%29%7C%7CCHR%28113%29%7C%7C%28SELECT%20%28CASE%20WHEN%20%281691%3D1691%29%20THEN%201%20ELSE%200%20END%29%20FROM%20DUAL%29%7C%7CCHR%28113%29%7C%7CCHR%2898%29%7C%7CCHR%28122%29%7C%7CCHR%28120%29%7C%7CCHR%28113%29%29%20AND%20%28%28%22RKbp%22%3D%22RKbp&isAllowed=y HTTP/1.1" 200 918998 "http://cgspace.cgiar.org:80/bitstream/handle/10947/4153/.AAS 2014 Annual Report.pdf" "Mozilla/5.0 (Windows; U; Windows NT 5.1; pl-PL) AppleWebKit/523.15 (KHTML, like Gecko) Version/3.0 Safari/523.15"
+
+- I will purge these all with my
check-spider-ip-hits.sh
script:
+
+$ ./ilri/check-spider-ip-hits.sh -f /tmp/ips.txt -p
+Purging 21648 hits from 193.169.254.178 in statistics
+Purging 20323 hits from 181.62.166.177 in statistics
+Purging 19376 hits from 45.146.166.180 in statistics
+
+Total number of bot hits purged: 61347
+
2021-05-02
+
+- Check the AReS Harvester indexes:
+
+$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
+yellow open openrxv-items-temp H-CGsyyLTaqAj6-nKXZ-7w 1 1 0 0 283b 283b
+yellow open openrxv-items-final ul3SKsa7Q9Cd_K7qokBY_w 1 1 103951 0 254mb 254mb
+$ curl -s 'http://localhost:9200/_alias/' | python -m json.tool
+...
+ "openrxv-items-temp": {
+ "aliases": {}
+ },
+ "openrxv-items-final": {
+ "aliases": {
+ "openrxv-items": {}
+ }
+ },
+
+- I think they look OK (
openrxv-items
is an alias of openrxv-items-final
), but I took a backup just in case:
+
+$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_mapping.json --type=mapping
+$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_data.json --type=data --limit=1000
+
+- Then I started an indexing in the AReS Explorer admin dashboard
+- The indexing finished, but it looks like the aliases are messed up again:
+
+$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
+yellow open openrxv-items-temp H-CGsyyLTaqAj6-nKXZ-7w 1 1 104165 105024 487.7mb 487.7mb
+yellow open openrxv-items-final d0tbMM_SRWimirxr_gm9YA 1 1 937 0 2.2mb 2.2mb
+
+
+
+
+
+
+