mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Update notes
This commit is contained in:
@ -354,4 +354,53 @@ $ wc -l /tmp/bot-ips.txt
|
||||
1946968 /tmp/bot-ips.txt
|
||||
```
|
||||
|
||||
- I started running `check-spider-ip-hits.sh` with the 1946968 IPs and left it running in dry run mode
|
||||
|
||||
## 2022-07-19
|
||||
|
||||
- Patrizio and Fabio emailed me to ask if their IP was banned from CGSpace
|
||||
- It's one of the Hetzner ones so I said yes definitely, and asked more about how they are using the API
|
||||
- Add ORCID identifer for Ram Dhulipala, Lilian Wambua, and Dan Masiga to CGSpace and tag them and some other existing items:
|
||||
|
||||
```console
|
||||
dc.contributor.author,cg.creator.identifier
|
||||
"Dhulipala, Ram K","Ram Dhulipala: 0000-0002-9720-3247"
|
||||
"Dhulipala, Ram","Ram Dhulipala: 0000-0002-9720-3247"
|
||||
"Dhulipala, R.","Ram Dhulipala: 0000-0002-9720-3247"
|
||||
"Wambua, Lillian","Lillian Wambua: 0000-0003-3632-7411"
|
||||
"Wambua, Lilian","Lillian Wambua: 0000-0003-3632-7411"
|
||||
"Masiga, D.K.","Daniel Masiga: 0000-0001-7513-0887"
|
||||
"Masiga, Daniel K.","Daniel Masiga: 0000-0001-7513-0887"
|
||||
"Jores, Joerg","Joerg Jores: 0000-0003-3790-5746"
|
||||
"Schieck, Elise","Elise Schieck: 0000-0003-1756-6337"
|
||||
"Schieck, Elise G.","Elise Schieck: 0000-0003-1756-6337"
|
||||
$ ./ilri/add-orcid-identifiers-csv.py -i /tmp/2022-07-19-add-orcids.csv -db dspace -u dspace -p 'fuuu'
|
||||
```
|
||||
|
||||
- Review the AfricaRice records from earlier this month again
|
||||
- I found one more duplicate and one more suspicious item, so the total after removing those is now forty-two
|
||||
- I took all the ~560 IPs that had hits so far in `check-spider-ip-hits.sh` above (about 270,000 into the list of 1946968 above) and ran them directly on CGSpace
|
||||
- This purged 199,032 hits from Solr, very many of which were from Qualys, but also that Chinese bot on 124.17.34.0/24 that was grabbing PDFs a few years ago which I blocked in nginx, but never purged the hits from
|
||||
- Then I deleted all IPs up to the last one where I found hits in the large file of 1946968 IPs and re-started the script
|
||||
|
||||
## 2022-07-20
|
||||
|
||||
- Did a few more minor edits to the forty-two AfricaRice records (including generating thumbnails for the handful that are Creative Commons licensed) then did a test import on my local instance
|
||||
- Once it worked well I did an import to CGSpace:
|
||||
|
||||
```console
|
||||
$ dspace import -a -e fuuu@example.com -m 2022-07-20-africarice.map -s /tmp/SimpleArchiveFormat
|
||||
```
|
||||
|
||||
- Also make edits to ~62 affiliations on CGSpace because I noticed they were messed up
|
||||
- Extract another ~1,600 IPs that had hits since I started the second round of `check-spider-ip-hits.sh` yesterday and purge another 303,594 hits
|
||||
- This is about 999846 into the original list of 1946968 from yesterday
|
||||
- A metric fuck ton of the IPs in this batch were from Hetzner
|
||||
|
||||
## 2022-07-21
|
||||
|
||||
- Extract another ~2,100 IPs that had hits since I started the third round of `check-spider-ip-hits.sh` last night and purge another 763,843 hits
|
||||
- This is about 1441221 into the original list of 1946968 from two days ago
|
||||
- Again these are overwhelmingly Hetzner (not surprising since my bot-networks.conf file in nginx is mostly Hetzner)
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user