- Export all affiliations on CGSpace and run them against the latest RoR data dump:
```console
localhost/dspace63= > \COPY (SELECT DISTINCT text_value as "cg.contributor.affiliation", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC) to /tmp/2021-10-01-affiliations.csv WITH CSV HEADER;
$ csvcut -c 1 /tmp/2021-10-01-affiliations.csv | sed 1d > /tmp/2021-10-01-affiliations.txt
- Start looking at the last month of Solr statistics on CGSpace
- I see a number of IPs with "normal" user agents who clearly behave like bots
- 198.15.130.18: 21,000 requests to /discover with a normal-looking user agent, from ASN 11282 (SERVERYOU, US)
- 93.158.90.107: 8,500 requests to handle and browse links with a Firefox 84.0 user agent, from ASN 12552 (IPO-EU, SE)
- 193.235.141.162: 4,800 requests to handle, browse, and discovery links with a Firefox 84.0 user agent, from ASN 51747 (INTERNETBOLAGET, SE)
- 3.225.28.105: 2,900 requests to REST API for the CIAT Story Maps collection with a normal user agent, from ASN 14618 (AMAZON-AES, US)
- 34.228.236.6: 2,800 requests to discovery for the CGIAR System community with user agent `Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)`, from ASN 14618 (AMAZON-AES, US)
- 18.212.137.2: 2,800 requests to discovery for the CGIAR System community with user agent `Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)`, from ASN 14618 (AMAZON-AES, US)
- 3.81.123.72: 2,800 requests to discovery and handles for the CGIAR System community with user agent `Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)`, from ASN 14618 (AMAZON-AES, US)
- 3.227.16.188: 2,800 requests to discovery and handles for the CGIAR System community with user agent `Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)`, from ASN 14618 (AMAZON-AES, US)
- Looking closer into the requests with this Mozilla/4.0 user agent, I see 500+ IPs using it: