diff --git a/content/posts/2022-07.md b/content/posts/2022-07.md
index 8ab9be97e..de38c26d0 100644
--- a/content/posts/2022-07.md
+++ b/content/posts/2022-07.md
@@ -354,4 +354,53 @@ $ wc -l /tmp/bot-ips.txt
1946968 /tmp/bot-ips.txt
```
+- I started running `check-spider-ip-hits.sh` with the 1946968 IPs and left it running in dry run mode
+
+## 2022-07-19
+
+- Patrizio and Fabio emailed me to ask if their IP was banned from CGSpace
+ - It's one of the Hetzner ones so I said yes definitely, and asked more about how they are using the API
+- Add ORCID identifer for Ram Dhulipala, Lilian Wambua, and Dan Masiga to CGSpace and tag them and some other existing items:
+
+```console
+dc.contributor.author,cg.creator.identifier
+"Dhulipala, Ram K","Ram Dhulipala: 0000-0002-9720-3247"
+"Dhulipala, Ram","Ram Dhulipala: 0000-0002-9720-3247"
+"Dhulipala, R.","Ram Dhulipala: 0000-0002-9720-3247"
+"Wambua, Lillian","Lillian Wambua: 0000-0003-3632-7411"
+"Wambua, Lilian","Lillian Wambua: 0000-0003-3632-7411"
+"Masiga, D.K.","Daniel Masiga: 0000-0001-7513-0887"
+"Masiga, Daniel K.","Daniel Masiga: 0000-0001-7513-0887"
+"Jores, Joerg","Joerg Jores: 0000-0003-3790-5746"
+"Schieck, Elise","Elise Schieck: 0000-0003-1756-6337"
+"Schieck, Elise G.","Elise Schieck: 0000-0003-1756-6337"
+$ ./ilri/add-orcid-identifiers-csv.py -i /tmp/2022-07-19-add-orcids.csv -db dspace -u dspace -p 'fuuu'
+```
+
+- Review the AfricaRice records from earlier this month again
+ - I found one more duplicate and one more suspicious item, so the total after removing those is now forty-two
+- I took all the ~560 IPs that had hits so far in `check-spider-ip-hits.sh` above (about 270,000 into the list of 1946968 above) and ran them directly on CGSpace
+ - This purged 199,032 hits from Solr, very many of which were from Qualys, but also that Chinese bot on 124.17.34.0/24 that was grabbing PDFs a few years ago which I blocked in nginx, but never purged the hits from
+ - Then I deleted all IPs up to the last one where I found hits in the large file of 1946968 IPs and re-started the script
+
+## 2022-07-20
+
+- Did a few more minor edits to the forty-two AfricaRice records (including generating thumbnails for the handful that are Creative Commons licensed) then did a test import on my local instance
+ - Once it worked well I did an import to CGSpace:
+
+```console
+$ dspace import -a -e fuuu@example.com -m 2022-07-20-africarice.map -s /tmp/SimpleArchiveFormat
+```
+
+- Also make edits to ~62 affiliations on CGSpace because I noticed they were messed up
+- Extract another ~1,600 IPs that had hits since I started the second round of `check-spider-ip-hits.sh` yesterday and purge another 303,594 hits
+ - This is about 999846 into the original list of 1946968 from yesterday
+ - A metric fuck ton of the IPs in this batch were from Hetzner
+
+## 2022-07-21
+
+- Extract another ~2,100 IPs that had hits since I started the third round of `check-spider-ip-hits.sh` last night and purge another 763,843 hits
+ - This is about 1441221 into the original list of 1946968 from two days ago
+ - Again these are overwhelmingly Hetzner (not surprising since my bot-networks.conf file in nginx is mostly Hetzner)
+
diff --git a/docs/2022-07/index.html b/docs/2022-07/index.html
index 848c43d96..67250e7cb 100644
--- a/docs/2022-07/index.html
+++ b/docs/2022-07/index.html
@@ -19,7 +19,7 @@ Also, the trgm functions I’ve used before are case insensitive, but Levens
-
+
@@ -44,9 +44,9 @@ Also, the trgm functions I’ve used before are case insensitive, but Levens
"@type": "BlogPosting",
"headline": "July, 2022",
"url": "https://alanorth.github.io/cgspace-notes/2022-07/",
- "wordCount": "2266",
+ "wordCount": "2679",
"datePublished": "2022-07-02T14:07:36+03:00",
- "dateModified": "2022-07-18T12:32:23+03:00",
+ "dateModified": "2022-07-18T16:45:55+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@@ -521,7 +521,71 @@ Also, the trgm functions I’ve used before are case insensitive, but Levens
$ while read -r line; do prips "$line" | sed -e '1d; $d'; done < /tmp/bot-networks.conf > /tmp/bot-ips.txt
$ wc -l /tmp/bot-ips.txt
1946968 /tmp/bot-ips.txt
-
+
+- I started running
check-spider-ip-hits.sh
with the 1946968 IPs and left it running in dry run mode
+
+2022-07-19
+
+- Patrizio and Fabio emailed me to ask if their IP was banned from CGSpace
+
+- It’s one of the Hetzner ones so I said yes definitely, and asked more about how they are using the API
+
+
+- Add ORCID identifer for Ram Dhulipala, Lilian Wambua, and Dan Masiga to CGSpace and tag them and some other existing items:
+
+dc.contributor.author,cg.creator.identifier
+"Dhulipala, Ram K","Ram Dhulipala: 0000-0002-9720-3247"
+"Dhulipala, Ram","Ram Dhulipala: 0000-0002-9720-3247"
+"Dhulipala, R.","Ram Dhulipala: 0000-0002-9720-3247"
+"Wambua, Lillian","Lillian Wambua: 0000-0003-3632-7411"
+"Wambua, Lilian","Lillian Wambua: 0000-0003-3632-7411"
+"Masiga, D.K.","Daniel Masiga: 0000-0001-7513-0887"
+"Masiga, Daniel K.","Daniel Masiga: 0000-0001-7513-0887"
+"Jores, Joerg","Joerg Jores: 0000-0003-3790-5746"
+"Schieck, Elise","Elise Schieck: 0000-0003-1756-6337"
+"Schieck, Elise G.","Elise Schieck: 0000-0003-1756-6337"
+$ ./ilri/add-orcid-identifiers-csv.py -i /tmp/2022-07-19-add-orcids.csv -db dspace -u dspace -p 'fuuu'
+
+- Review the AfricaRice records from earlier this month again
+
+- I found one more duplicate and one more suspicious item, so the total after removing those is now forty-two
+
+
+- I took all the ~560 IPs that had hits so far in
check-spider-ip-hits.sh
above (about 270,000 into the list of 1946968 above) and ran them directly on CGSpace
+
+- This purged 199,032 hits from Solr, very many of which were from Qualys, but also that Chinese bot on 124.17.34.0/24 that was grabbing PDFs a few years ago which I blocked in nginx, but never purged the hits from
+- Then I deleted all IPs up to the last one where I found hits in the large file of 1946968 IPs and re-started the script
+
+
+
+2022-07-20
+
+- Did a few more minor edits to the forty-two AfricaRice records (including generating thumbnails for the handful that are Creative Commons licensed) then did a test import on my local instance
+
+- Once it worked well I did an import to CGSpace:
+
+
+
+$ dspace import -a -e fuuu@example.com -m 2022-07-20-africarice.map -s /tmp/SimpleArchiveFormat
+
+- Also make edits to ~62 affiliations on CGSpace because I noticed they were messed up
+- Extract another ~1,600 IPs that had hits since I started the second round of
check-spider-ip-hits.sh
yesterday and purge another 303,594 hits
+
+- This is about 999846 into the original list of 1946968 from yesterday
+- A metric fuck ton of the IPs in this batch were from Hetzner
+
+
+
+2022-07-21
+
+- Extract another ~2,100 IPs that had hits since I started the third round of
check-spider-ip-hits.sh
last night and purge another 763,843 hits
+
+- This is about 1441221 into the original list of 1946968 from two days ago
+- Again these are overwhelmingly Hetzner (not surprising since my bot-networks.conf file in nginx is mostly Hetzner)
+
+
+
+
diff --git a/docs/categories/index.html b/docs/categories/index.html
index ae4e4602a..f89bdbe08 100644
--- a/docs/categories/index.html
+++ b/docs/categories/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html
index b1a7e0175..52de8ac06 100644
--- a/docs/categories/notes/index.html
+++ b/docs/categories/notes/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html
index 0c6845648..a33ff9b7c 100644
--- a/docs/categories/notes/page/2/index.html
+++ b/docs/categories/notes/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html
index 0f3351aad..8cb8ff946 100644
--- a/docs/categories/notes/page/3/index.html
+++ b/docs/categories/notes/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html
index 1634ece89..d3ceb4031 100644
--- a/docs/categories/notes/page/4/index.html
+++ b/docs/categories/notes/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html
index 144711a18..2bb084d11 100644
--- a/docs/categories/notes/page/5/index.html
+++ b/docs/categories/notes/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/6/index.html b/docs/categories/notes/page/6/index.html
index 84ae6d816..e675f414b 100644
--- a/docs/categories/notes/page/6/index.html
+++ b/docs/categories/notes/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/7/index.html b/docs/categories/notes/page/7/index.html
index 796797dfd..04bd704a9 100644
--- a/docs/categories/notes/page/7/index.html
+++ b/docs/categories/notes/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/index.html b/docs/index.html
index e2429bec3..acb0ce20d 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/2/index.html b/docs/page/2/index.html
index dd3ba8e09..54c4a1160 100644
--- a/docs/page/2/index.html
+++ b/docs/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/3/index.html b/docs/page/3/index.html
index 1d2510117..4cb3657e7 100644
--- a/docs/page/3/index.html
+++ b/docs/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/4/index.html b/docs/page/4/index.html
index 426b28c44..a63285042 100644
--- a/docs/page/4/index.html
+++ b/docs/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/5/index.html b/docs/page/5/index.html
index 4d459354c..91077562e 100644
--- a/docs/page/5/index.html
+++ b/docs/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/6/index.html b/docs/page/6/index.html
index 133cd5689..eb8a0df31 100644
--- a/docs/page/6/index.html
+++ b/docs/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/7/index.html b/docs/page/7/index.html
index 9a843246e..3f52c682b 100644
--- a/docs/page/7/index.html
+++ b/docs/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/8/index.html b/docs/page/8/index.html
index 631aa884a..9270b4168 100644
--- a/docs/page/8/index.html
+++ b/docs/page/8/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/9/index.html b/docs/page/9/index.html
index 2cccf680d..ce9b4d9c9 100644
--- a/docs/page/9/index.html
+++ b/docs/page/9/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/index.html b/docs/posts/index.html
index 75887b5cb..ee61406fa 100644
--- a/docs/posts/index.html
+++ b/docs/posts/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html
index 66b20da96..aaf3381f1 100644
--- a/docs/posts/page/2/index.html
+++ b/docs/posts/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html
index 385d2ff11..aa255bbf5 100644
--- a/docs/posts/page/3/index.html
+++ b/docs/posts/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html
index 46f827716..7457b95f0 100644
--- a/docs/posts/page/4/index.html
+++ b/docs/posts/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html
index 4f114ddff..9c2fcbb02 100644
--- a/docs/posts/page/5/index.html
+++ b/docs/posts/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html
index 555e110f4..3bb0fed10 100644
--- a/docs/posts/page/6/index.html
+++ b/docs/posts/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html
index 9dd0bed47..687ac7da1 100644
--- a/docs/posts/page/7/index.html
+++ b/docs/posts/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/8/index.html b/docs/posts/page/8/index.html
index a3ddbd76d..6ee8432ae 100644
--- a/docs/posts/page/8/index.html
+++ b/docs/posts/page/8/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/9/index.html b/docs/posts/page/9/index.html
index 9c79413b9..d5e68b588 100644
--- a/docs/posts/page/9/index.html
+++ b/docs/posts/page/9/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index 32e40f5ce..28e0ceeb3 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -3,19 +3,19 @@
xmlns:xhtml="http://www.w3.org/1999/xhtml">
https://alanorth.github.io/cgspace-notes/categories/
- 2022-07-18T12:32:23+03:00
+ 2022-07-18T16:45:55+03:00
https://alanorth.github.io/cgspace-notes/
- 2022-07-18T12:32:23+03:00
+ 2022-07-18T16:45:55+03:00
https://alanorth.github.io/cgspace-notes/2022-07/
- 2022-07-18T12:32:23+03:00
+ 2022-07-18T16:45:55+03:00
https://alanorth.github.io/cgspace-notes/categories/notes/
- 2022-07-18T12:32:23+03:00
+ 2022-07-18T16:45:55+03:00
https://alanorth.github.io/cgspace-notes/posts/
- 2022-07-18T12:32:23+03:00
+ 2022-07-18T16:45:55+03:00
https://alanorth.github.io/cgspace-notes/2022-06/
2022-07-04T09:25:14+03:00