- Looking at the Solr statistics for 2022-03 on CGSpace
- I see 54.229.218.204 on Amazon AWS made 49,000 requests, some of which with this user agent: `Apache-HttpClient/4.5.9 (Java/1.8.0_322)`, and many others with a normal browser agent, so that's fishy!
- The DSpace agent pattern `http.?agent` seems to have caught the first ones, but I'll purge the IP ones
- I see 40.77.167.80 is Bing or MSN Bot, but using a normal browser user agent, and if I search Solr for `dns:*msnbot* AND dns:*.msn.com.` I see over 100,000, which is a problem I noticed a few months ago too...
- I extracted the MSN Bot IPs from Solr using an IP facet, then used the `check-spider-ip-hits.sh` script to purge them
- I woke up to several notices from UptimeRobot that CGSpace had gone down and up in the night (of course I'm on holiday out of the country for Easter)
- I see there are many locks in use from the XMLUI:
```console
$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | grep -o -E '(dspaceWeb|dspaceApi)' | sort | uniq -c
8932 dspaceWeb
```
- Looking at the top IPs making requests it seems they are Yandex, bingbot, and Googlebot:
- A handful of spider user agents that I identified were merged into COUNTER-Robots so I updated the ILRI override in our DSpace and regenerated the `example` file that contains most patterns
- I updated CGSpace, then ran all system updates and rebooted the host
- I also ran `dspace cleanup -v` to prune the database
- Looking at the countries on AReS I decided to collect a list to remind Jacquie at WorldFish again about how many incorrect ones they have
- There are about sixty incorrect ones, some of which I can correct via the value mappings on AReS, but most I can't
- I set up value mappings for seventeen countries, then sent another sixty or so to Jacquie and Salem to hopefully delete
- I notice we have over 1,000 items with region `Africa South of Sahara`
- I am surprised to see these because we did a mass migration to `Sub-Saharan Africa` in 2020-10 when we aligned to UN M.49
- Oh! It seems I used a capital O in `Of`!
- This is curious, I see we missed `East Asia` and `Northern America`, because those are still in our list, but UN M.49 uses `Eastern Asia` and `Northern America`... I will have to raise that with Peter and Abenet later