--- title: "May, 2021" date: 2021-05-02T09:50:54+03:00 author: "Alan Orth" categories: ["Notes"] --- ## 2021-05-01 - I looked at the top user agents and IPs in the Solr statistics for last month and I see these user agents: - "RI/1.0", 1337 - "Microsoft Office Word 2014", 941 - I will add the RI/1.0 pattern to our DSpace agents overload and purge them from Solr (we had previously seen this agent with 9,000 hits or so in 2020-09), but I think I will leave the Microsoft Word one... as that's an actual user... - I should probably add the `RI/1.0` pattern to COUNTER-Robots project - As well as these IPs: - 193.169.254.178, 21648 - 181.62.166.177, 20323 - 45.146.166.180, 19376 - The first IP seems to be in Estonia and their requests to the REST API change user agents from curl to Mac OS X to Windows and more - Also, they seem to be trying to exploit something: ```console 193.169.254.178 - - [21/Apr/2021:01:59:01 +0200] "GET /rest/collections/1179/items?limit=812&expand=metadata\x22%20and%20\x2221\x22=\x2221 HTTP/1.1" 400 5 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" 193.169.254.178 - - [21/Apr/2021:02:00:36 +0200] "GET /rest/collections/1179/items?limit=812&expand=metadata-21%2B21*01 HTTP/1.1" 200 458201 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" 193.169.254.178 - - [21/Apr/2021:02:00:36 +0200] "GET /rest/collections/1179/items?limit=812&expand=metadata'||lower('')||' HTTP/1.1" 400 5 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" 193.169.254.178 - - [21/Apr/2021:02:02:10 +0200] "GET /rest/collections/1179/items?limit=812&expand=metadata'%2Brtrim('')%2B' HTTP/1.1" 200 458209 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" ``` - I will report the IP on abuseipdb.com and purge their hits from Solr - The second IP is in Colombia and is making thousands of requests for what looks like some test site: ```console 181.62.166.177 - - [20/Apr/2021:22:48:42 +0200] "GET /rest/collections/d1e11546-c62a-4aee-af91-fd482b3e7653/items?expand=metadata HTTP/2.0" 200 123613 "http://cassavalighthousetest.org/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36" 181.62.166.177 - - [20/Apr/2021:22:55:39 +0200] "GET /rest/collections/d1e11546-c62a-4aee-af91-fd482b3e7653/items?expand=metadata HTTP/2.0" 200 123613 "http://cassavalighthousetest.org/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36" ``` - But this site does not exist (yet?) - I will purge them from Solr - The third IP is in Russia apparently, and the user agent has the `pl-PL` locale with thousands of requests like this: ```console 45.146.166.180 - - [18/Apr/2021:16:28:44 +0200] "GET /bitstream/handle/10947/4153/.AAS%202014%20Annual%20Report.pdf?sequence=1%22%29%29%20AND%201691%3DUTL_INADDR.GET_HOST_ADDRESS%28CHR%28113%29%7C%7CCHR%28118%29%7C%7CCHR%28113%29%7C%7CCHR%28106%29%7C%7CCHR%28113%29%7C%7C%28SELECT%20%28CASE%20WHEN%20%281691%3D1691%29%20THEN%201%20ELSE%200%20END%29%20FROM%20DUAL%29%7C%7CCHR%28113%29%7C%7CCHR%2898%29%7C%7CCHR%28122%29%7C%7CCHR%28120%29%7C%7CCHR%28113%29%29%20AND%20%28%28%22RKbp%22%3D%22RKbp&isAllowed=y HTTP/1.1" 200 918998 "http://cgspace.cgiar.org:80/bitstream/handle/10947/4153/.AAS 2014 Annual Report.pdf" "Mozilla/5.0 (Windows; U; Windows NT 5.1; pl-PL) AppleWebKit/523.15 (KHTML, like Gecko) Version/3.0 Safari/523.15" ``` - I will purge these all with my `check-spider-ip-hits.sh` script: ```console $ ./ilri/check-spider-ip-hits.sh -f /tmp/ips.txt -p Purging 21648 hits from 193.169.254.178 in statistics Purging 20323 hits from 181.62.166.177 in statistics Purging 19376 hits from 45.146.166.180 in statistics Total number of bot hits purged: 61347 ``` ## 2021-05-02 - Check the AReS Harvester indexes: ```console $ curl -s http://localhost:9200/_cat/indices | grep openrxv-items yellow open openrxv-items-temp H-CGsyyLTaqAj6-nKXZ-7w 1 1 0 0 283b 283b yellow open openrxv-items-final ul3SKsa7Q9Cd_K7qokBY_w 1 1 103951 0 254mb 254mb $ curl -s 'http://localhost:9200/_alias/' | python -m json.tool ... "openrxv-items-temp": { "aliases": {} }, "openrxv-items-final": { "aliases": { "openrxv-items": {} } }, ``` - I think they look OK (`openrxv-items` is an alias of `openrxv-items-final`), but I took a backup just in case: ```console $ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_mapping.json --type=mapping $ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_data.json --type=data --limit=1000 ``` - Then I started an indexing in the AReS Explorer admin dashboard - The indexing finished, but it looks like the aliases are messed up again: ```console $ curl -s http://localhost:9200/_cat/indices | grep openrxv-items yellow open openrxv-items-temp H-CGsyyLTaqAj6-nKXZ-7w 1 1 104165 105024 487.7mb 487.7mb yellow open openrxv-items-final d0tbMM_SRWimirxr_gm9YA 1 1 937 0 2.2mb 2.2mb ``` ## 2021-05-05 - Peter noticed that we no longer display `cg.link.reference` on the item view - It seems that this got dropped accidentally when we migrated to `dcterms.relation` in CG Core v2 - I fixed it in the `6_x-prod` branch and told him it will be live soon