diff --git a/content/posts/2021-11.md b/content/posts/2021-11.md
index 749efe0f6..48827d8ad 100644
--- a/content/posts/2021-11.md
+++ b/content/posts/2021-11.md
@@ -203,4 +203,40 @@ Total number of bot hits purged: 10893
- According to my notes we actually completed this in 2021-08, but for some reason we are no longer on the list and I can't validate again
- There seems to be a problem with their website because every link I try to validate says it received an HTTP 500 response from CGSpace
+## 2021-11-23
+
+- Help RTB colleagues with thumbnail issues on their [2020 Annual Report](https://hdl.handle.net/10568/114576)
+ - The PDF seems to be in landscape mode or something and the first page is half width, so the thumbnail renders with the left half being white
+ - I generated a new one manually with libvips and it is better:
+
+```console
+$ vipsthumbnail AR\ RTB\ 2020.pdf -s 600 -o '%s.jpg[Q=85,optimize_coding,strip]'
+```
+
+- I sent an email to the OpenArchives.org contact to ask for help with the OAI validator
+ - Someone responded to say that there have been a number of complaints about this on the oai-pmh mailing list recently...
+- I sent an email to Pythagoras from GARDIAN to ask if they can use a more specific user agent than "Microsoft Internet Explorer" for their scraper
+ - He said he will change the user agent
+
+## 2021-11-24
+
+- I had an idea to check our Solr statistics for hits from all the IPs that I have listed in nginx as being bots
+ - Other than a few that I ruled out that *may* be humans, these are all making requests within one month or with no user agent, which is highly suspicious:
+
+```console
+$ ./ilri/check-spider-ip-hits.sh -f /tmp/ips.txt
+Found 8352 hits from 138.201.49.199 in statistics
+Found 9374 hits from 78.46.89.18 in statistics
+Found 2112 hits from 93.179.69.74 in statistics
+Found 1 hits from 31.6.77.23 in statistics
+Found 5 hits from 34.209.213.122 in statistics
+Found 86772 hits from 163.172.68.99 in statistics
+Found 77 hits from 163.172.70.248 in statistics
+Found 15842 hits from 163.172.71.24 in statistics
+Found 172954 hits from 104.154.216.0 in statistics
+Found 3 hits from 188.134.31.88 in statistics
+
+Total number of hits from bots: 295492
+```
+
diff --git a/docs/2021-11/index.html b/docs/2021-11/index.html
index d0e47a950..ae49ff714 100644
--- a/docs/2021-11/index.html
+++ b/docs/2021-11/index.html
@@ -18,7 +18,7 @@ $ zstd statistics-2019.json
-
+
@@ -42,9 +42,9 @@ $ zstd statistics-2019.json
"@type": "BlogPosting",
"headline": "November, 2021",
"url": "https://alanorth.github.io/cgspace-notes/2021-11/",
- "wordCount": "1339",
+ "wordCount": "1604",
"datePublished": "2021-11-02T22:27:07+02:00",
- "dateModified": "2021-11-21T13:45:30+02:00",
+ "dateModified": "2021-11-22T16:47:50+02:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@@ -335,7 +335,50 @@ Purging 10893 hits from 87.203.87.141 in statistics
-
+
2021-11-23
+
+- Help RTB colleagues with thumbnail issues on their 2020 Annual Report
+
+- The PDF seems to be in landscape mode or something and the first page is half width, so the thumbnail renders with the left half being white
+- I generated a new one manually with libvips and it is better:
+
+
+
+$ vipsthumbnail AR\ RTB\ 2020.pdf -s 600 -o '%s.jpg[Q=85,optimize_coding,strip]'
+
+- I sent an email to the OpenArchives.org contact to ask for help with the OAI validator
+
+- Someone responded to say that there have been a number of complaints about this on the oai-pmh mailing list recently…
+
+
+- I sent an email to Pythagoras from GARDIAN to ask if they can use a more specific user agent than “Microsoft Internet Explorer” for their scraper
+
+- He said he will change the user agent
+
+
+
+2021-11-24
+
+- I had an idea to check our Solr statistics for hits from all the IPs that I have listed in nginx as being bots
+
+- Other than a few that I ruled out that may be humans, these are all making requests within one month or with no user agent, which is highly suspicious:
+
+
+
+$ ./ilri/check-spider-ip-hits.sh -f /tmp/ips.txt
+Found 8352 hits from 138.201.49.199 in statistics
+Found 9374 hits from 78.46.89.18 in statistics
+Found 2112 hits from 93.179.69.74 in statistics
+Found 1 hits from 31.6.77.23 in statistics
+Found 5 hits from 34.209.213.122 in statistics
+Found 86772 hits from 163.172.68.99 in statistics
+Found 77 hits from 163.172.70.248 in statistics
+Found 15842 hits from 163.172.71.24 in statistics
+Found 172954 hits from 104.154.216.0 in statistics
+Found 3 hits from 188.134.31.88 in statistics
+
+Total number of hits from bots: 295492
+
diff --git a/docs/categories/index.html b/docs/categories/index.html
index 1e02fc1bf..a47fbe472 100644
--- a/docs/categories/index.html
+++ b/docs/categories/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html
index 39843a78a..8044dcc8f 100644
--- a/docs/categories/notes/index.html
+++ b/docs/categories/notes/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html
index e3062bab2..df553d1ba 100644
--- a/docs/categories/notes/page/2/index.html
+++ b/docs/categories/notes/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html
index 13f3b82b1..35814e8e2 100644
--- a/docs/categories/notes/page/3/index.html
+++ b/docs/categories/notes/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html
index 2c1b91de2..d91011a96 100644
--- a/docs/categories/notes/page/4/index.html
+++ b/docs/categories/notes/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html
index e5174c510..41301cf0a 100644
--- a/docs/categories/notes/page/5/index.html
+++ b/docs/categories/notes/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/6/index.html b/docs/categories/notes/page/6/index.html
index 58483dacf..8cdda1037 100644
--- a/docs/categories/notes/page/6/index.html
+++ b/docs/categories/notes/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/index.html b/docs/index.html
index 3d20e2fba..be2fd85ea 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/2/index.html b/docs/page/2/index.html
index 10bbe72e8..067be5248 100644
--- a/docs/page/2/index.html
+++ b/docs/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/3/index.html b/docs/page/3/index.html
index 3ff419671..49d689270 100644
--- a/docs/page/3/index.html
+++ b/docs/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/4/index.html b/docs/page/4/index.html
index d583f9e22..855c1fe25 100644
--- a/docs/page/4/index.html
+++ b/docs/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/5/index.html b/docs/page/5/index.html
index 890759dcb..ffd9e9acf 100644
--- a/docs/page/5/index.html
+++ b/docs/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/6/index.html b/docs/page/6/index.html
index 9a2723ea6..b4cd912f3 100644
--- a/docs/page/6/index.html
+++ b/docs/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/7/index.html b/docs/page/7/index.html
index 23e59559e..d31f3daec 100644
--- a/docs/page/7/index.html
+++ b/docs/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/8/index.html b/docs/page/8/index.html
index 20be87823..4c0506208 100644
--- a/docs/page/8/index.html
+++ b/docs/page/8/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/index.html b/docs/posts/index.html
index 7986b229a..76352aa1f 100644
--- a/docs/posts/index.html
+++ b/docs/posts/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html
index efa47b9e4..7886451c0 100644
--- a/docs/posts/page/2/index.html
+++ b/docs/posts/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html
index ef2f5e189..8572cc67c 100644
--- a/docs/posts/page/3/index.html
+++ b/docs/posts/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html
index 35e787223..04f1cd7ba 100644
--- a/docs/posts/page/4/index.html
+++ b/docs/posts/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html
index 861325d11..122f8c5fd 100644
--- a/docs/posts/page/5/index.html
+++ b/docs/posts/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html
index 7c88e030a..b172856fb 100644
--- a/docs/posts/page/6/index.html
+++ b/docs/posts/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html
index 68e8a6ce3..59b13a938 100644
--- a/docs/posts/page/7/index.html
+++ b/docs/posts/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/8/index.html b/docs/posts/page/8/index.html
index 855d2bc7d..e53cbfcdd 100644
--- a/docs/posts/page/8/index.html
+++ b/docs/posts/page/8/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index 637f75dda..0c6cb434f 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -3,19 +3,19 @@
xmlns:xhtml="http://www.w3.org/1999/xhtml">
https://alanorth.github.io/cgspace-notes/categories/
- 2021-11-21T13:45:30+02:00
+ 2021-11-22T16:47:50+02:00
https://alanorth.github.io/cgspace-notes/
- 2021-11-21T13:45:30+02:00
+ 2021-11-22T16:47:50+02:00
https://alanorth.github.io/cgspace-notes/categories/notes/
- 2021-11-21T13:45:30+02:00
+ 2021-11-22T16:47:50+02:00
https://alanorth.github.io/cgspace-notes/2021-11/
- 2021-11-21T13:45:30+02:00
+ 2021-11-22T16:47:50+02:00
https://alanorth.github.io/cgspace-notes/posts/
- 2021-11-21T13:45:30+02:00
+ 2021-11-22T16:47:50+02:00
https://alanorth.github.io/cgspace-notes/2021-10/
2021-11-01T10:48:13+02:00