mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Update notes for 2020-09-10
This commit is contained in:
@ -190,5 +190,20 @@ Would fix 3 occurences of: SOUTHWEST ASIA
|
||||
|
||||
- I think we need to wait for the web team, though, as they need to update their mappings
|
||||
- Not to mention that we'll need to give WLE and CCAFS time to update their harvesters as well... hmmm
|
||||
- Looking at the top user agents active on CGSpace in 2020-08 and I see:
|
||||
- `Delphi 2009`: 235353 (this is GARDIAN harvester I guess, as the IP is in Greece)
|
||||
- `Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)`: 57004 (IP is 18.196.100.94, and the requests seem to be for CTA's content)
|
||||
- `RTB website BOT`: 12282
|
||||
- `ILRI Livestock Website Publications importer BOT`: 9393
|
||||
- Shit, I meant to add Delphi to the DSpace spider agents list last month but I guess I didn't commit the change
|
||||
- HTTrack is in the agents list so I'm not sure why DSpace registers a hit from that request
|
||||
- Also, I am surprised to see the RTB and ILRI bots here because they have "BOT" in the name and that should also be dropped
|
||||
- I also see hits from `curl` and `Java/1.8.0_66` and `Apache-HttpClient` so WTF... those are supposed to be dropped by the default agents list
|
||||
- Some IP `2607:f298:5:101d:f816:3eff:fed9:a484` made 9,000 requests with the `RI/1.0` user agent this year...
|
||||
- That's on DreamHost...?
|
||||
- I purged 448658 hits from these agents and added `Delphi` to our local agents overload for Solr as well as Tomcat's Crawler Session Manager Valve so that it forces them to re-use a single session
|
||||
- I made a pull request on the COUNTER-Robots project for the Daum robot: https://github.com/atmire/COUNTER-Robots/pull/38
|
||||
- This bot made 8,000 requests to CGSpace this year
|
||||
- I purged about 20,000 total requests from this bot from our Solr stats for the last few years
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user