mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2021-11-26
This commit is contained in:
@ -203,4 +203,40 @@ Total number of bot hits purged: 10893
|
||||
- According to my notes we actually completed this in 2021-08, but for some reason we are no longer on the list and I can't validate again
|
||||
- There seems to be a problem with their website because every link I try to validate says it received an HTTP 500 response from CGSpace
|
||||
|
||||
## 2021-11-23
|
||||
|
||||
- Help RTB colleagues with thumbnail issues on their [2020 Annual Report](https://hdl.handle.net/10568/114576)
|
||||
- The PDF seems to be in landscape mode or something and the first page is half width, so the thumbnail renders with the left half being white
|
||||
- I generated a new one manually with libvips and it is better:
|
||||
|
||||
```console
|
||||
$ vipsthumbnail AR\ RTB\ 2020.pdf -s 600 -o '%s.jpg[Q=85,optimize_coding,strip]'
|
||||
```
|
||||
|
||||
- I sent an email to the OpenArchives.org contact to ask for help with the OAI validator
|
||||
- Someone responded to say that there have been a number of complaints about this on the oai-pmh mailing list recently...
|
||||
- I sent an email to Pythagoras from GARDIAN to ask if they can use a more specific user agent than "Microsoft Internet Explorer" for their scraper
|
||||
- He said he will change the user agent
|
||||
|
||||
## 2021-11-24
|
||||
|
||||
- I had an idea to check our Solr statistics for hits from all the IPs that I have listed in nginx as being bots
|
||||
- Other than a few that I ruled out that *may* be humans, these are all making requests within one month or with no user agent, which is highly suspicious:
|
||||
|
||||
```console
|
||||
$ ./ilri/check-spider-ip-hits.sh -f /tmp/ips.txt
|
||||
Found 8352 hits from 138.201.49.199 in statistics
|
||||
Found 9374 hits from 78.46.89.18 in statistics
|
||||
Found 2112 hits from 93.179.69.74 in statistics
|
||||
Found 1 hits from 31.6.77.23 in statistics
|
||||
Found 5 hits from 34.209.213.122 in statistics
|
||||
Found 86772 hits from 163.172.68.99 in statistics
|
||||
Found 77 hits from 163.172.70.248 in statistics
|
||||
Found 15842 hits from 163.172.71.24 in statistics
|
||||
Found 172954 hits from 104.154.216.0 in statistics
|
||||
Found 3 hits from 188.134.31.88 in statistics
|
||||
|
||||
Total number of hits from bots: 295492
|
||||
```
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user