diff --git a/content/posts/2022-04.md b/content/posts/2022-04.md
index fe0acd455..e2d763de4 100644
--- a/content/posts/2022-04.md
+++ b/content/posts/2022-04.md
@@ -21,5 +21,11 @@ sys 3m43.037s
- Start a full harvest on AReS
- Help Marianne with submit/approve access on a new collection on CGSpace
- Go back in Gaia's batch reports to find records that she indicated for replacing on CGSpace (ie, those with better new copies, new versions, etc)
+- Looking at the Solr statistics for 2022-03 on CGSpace
+ - I see 54.229.218.204 on Amazon AWS made 49,000 requests, some of which with this user agent: `Apache-HttpClient/4.5.9 (Java/1.8.0_322)`, and many others with a normal browser agent, so that's fishy!
+ - The DSpace agent pattern `http.?agent` seems to have caught the first ones, but I'll purge the IP ones
+ - I see 40.77.167.80 is Bing or MSN Bot, but using a normal browser user agent, and if I search Solr for `dns:*msnbot* AND dns:*.msn.com.` I see over 100,000, which is a problem I noticed a few months ago too...
+ - I extracted the MSN Bot IPs from Solr using an IP facet, then used the `check-spider-ip-hits.sh` script to purge them
+ -
diff --git a/docs/2022-03/index.html b/docs/2022-03/index.html
index 33e34f156..b85e01dd0 100644
--- a/docs/2022-03/index.html
+++ b/docs/2022-03/index.html
@@ -19,7 +19,7 @@ $ csvjoin -c id /tmp/2022-03-01-tac-batch4-701-980.csv /tmp/tac4-filenames.csv &
-
+
@@ -46,7 +46,7 @@ $ csvjoin -c id /tmp/2022-03-01-tac-batch4-701-980.csv /tmp/tac4-filenames.csv &
"url": "https://alanorth.github.io/cgspace-notes/2022-03/",
"wordCount": "1836",
"datePublished": "2022-03-01T16:46:54+03:00",
- "dateModified": "2022-03-31T16:09:14+03:00",
+ "dateModified": "2022-04-04T19:15:58+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
diff --git a/docs/categories/index.html b/docs/categories/index.html
index bcc69495f..80b3ed888 100644
--- a/docs/categories/index.html
+++ b/docs/categories/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html
index 370272ad7..b1f61916a 100644
--- a/docs/categories/notes/index.html
+++ b/docs/categories/notes/index.html
@@ -10,7 +10,7 @@
-
+
@@ -116,7 +116,7 @@
- 2022-04-01 I did G1GC tests on DSpace Test (linode26) to compliment the CMS tests I did yesterday The Discovery indexing took this long: real 334m33.625s user 227m51.331s sys 3m43.037s 2022-04-04 Start a full harvest on AReS Help Marianne with submit/approve access on a new collection on CGSpace Go back in Gaia’s batch reports to find records that she indicated for replacing on CGSpace (ie, those with better new copies, new versions, etc)
+ 2022-04-01 I did G1GC tests on DSpace Test (linode26) to compliment the CMS tests I did yesterday The Discovery indexing took this long: real 334m33.625s user 227m51.331s sys 3m43.037s 2022-04-04 Start a full harvest on AReS Help Marianne with submit/approve access on a new collection on CGSpace Go back in Gaia’s batch reports to find records that she indicated for replacing on CGSpace (ie, those with better new copies, new versions, etc) Looking at the Solr statistics for 2022-03 on CGSpace I see 54.
Read more →
diff --git a/docs/categories/notes/index.xml b/docs/categories/notes/index.xml
index 2ddea7c88..4d960654b 100644
--- a/docs/categories/notes/index.xml
+++ b/docs/categories/notes/index.xml
@@ -30,7 +30,7 @@
Tue, 01 Mar 2022 10:53:39 +0300
https://alanorth.github.io/cgspace-notes/2022-03/
- 2022-04-01 I did G1GC tests on DSpace Test (linode26) to compliment the CMS tests I did yesterday The Discovery indexing took this long: real 334m33.625s user 227m51.331s sys 3m43.037s 2022-04-04 Start a full harvest on AReS Help Marianne with submit/approve access on a new collection on CGSpace Go back in Gaia’s batch reports to find records that she indicated for replacing on CGSpace (ie, those with better new copies, new versions, etc)
+ 2022-04-01 I did G1GC tests on DSpace Test (linode26) to compliment the CMS tests I did yesterday The Discovery indexing took this long: real 334m33.625s user 227m51.331s sys 3m43.037s 2022-04-04 Start a full harvest on AReS Help Marianne with submit/approve access on a new collection on CGSpace Go back in Gaia’s batch reports to find records that she indicated for replacing on CGSpace (ie, those with better new copies, new versions, etc) Looking at the Solr statistics for 2022-03 on CGSpace I see 54.
-
diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html
index fa2625a6c..72eb86671 100644
--- a/docs/categories/notes/page/2/index.html
+++ b/docs/categories/notes/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html
index 596b8c5d8..a1c8a8879 100644
--- a/docs/categories/notes/page/3/index.html
+++ b/docs/categories/notes/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html
index c6653e89c..73891edc5 100644
--- a/docs/categories/notes/page/4/index.html
+++ b/docs/categories/notes/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html
index 3bf0efdd3..49b1e118e 100644
--- a/docs/categories/notes/page/5/index.html
+++ b/docs/categories/notes/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/6/index.html b/docs/categories/notes/page/6/index.html
index d0ec9e9bc..4be83afb0 100644
--- a/docs/categories/notes/page/6/index.html
+++ b/docs/categories/notes/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/index.html b/docs/index.html
index 1e414e2ab..7365c5e3c 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -10,7 +10,7 @@
-
+
@@ -131,7 +131,7 @@
- 2022-04-01 I did G1GC tests on DSpace Test (linode26) to compliment the CMS tests I did yesterday The Discovery indexing took this long: real 334m33.625s user 227m51.331s sys 3m43.037s 2022-04-04 Start a full harvest on AReS Help Marianne with submit/approve access on a new collection on CGSpace Go back in Gaia’s batch reports to find records that she indicated for replacing on CGSpace (ie, those with better new copies, new versions, etc)
+ 2022-04-01 I did G1GC tests on DSpace Test (linode26) to compliment the CMS tests I did yesterday The Discovery indexing took this long: real 334m33.625s user 227m51.331s sys 3m43.037s 2022-04-04 Start a full harvest on AReS Help Marianne with submit/approve access on a new collection on CGSpace Go back in Gaia’s batch reports to find records that she indicated for replacing on CGSpace (ie, those with better new copies, new versions, etc) Looking at the Solr statistics for 2022-03 on CGSpace I see 54.
Read more →
diff --git a/docs/index.xml b/docs/index.xml
index 32bf563e8..2b03396b1 100644
--- a/docs/index.xml
+++ b/docs/index.xml
@@ -30,7 +30,7 @@
Tue, 01 Mar 2022 10:53:39 +0300
https://alanorth.github.io/cgspace-notes/2022-03/
- 2022-04-01 I did G1GC tests on DSpace Test (linode26) to compliment the CMS tests I did yesterday The Discovery indexing took this long: real 334m33.625s user 227m51.331s sys 3m43.037s 2022-04-04 Start a full harvest on AReS Help Marianne with submit/approve access on a new collection on CGSpace Go back in Gaia’s batch reports to find records that she indicated for replacing on CGSpace (ie, those with better new copies, new versions, etc)
+ 2022-04-01 I did G1GC tests on DSpace Test (linode26) to compliment the CMS tests I did yesterday The Discovery indexing took this long: real 334m33.625s user 227m51.331s sys 3m43.037s 2022-04-04 Start a full harvest on AReS Help Marianne with submit/approve access on a new collection on CGSpace Go back in Gaia’s batch reports to find records that she indicated for replacing on CGSpace (ie, those with better new copies, new versions, etc) Looking at the Solr statistics for 2022-03 on CGSpace I see 54.
-
diff --git a/docs/page/2/index.html b/docs/page/2/index.html
index 48ed906c7..9fdcfd672 100644
--- a/docs/page/2/index.html
+++ b/docs/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/3/index.html b/docs/page/3/index.html
index fe9036aa9..0293f3fee 100644
--- a/docs/page/3/index.html
+++ b/docs/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/4/index.html b/docs/page/4/index.html
index a03e17424..aab54155d 100644
--- a/docs/page/4/index.html
+++ b/docs/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/5/index.html b/docs/page/5/index.html
index 19b1e2eda..62bc08be5 100644
--- a/docs/page/5/index.html
+++ b/docs/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/6/index.html b/docs/page/6/index.html
index d7429d23a..4b0961863 100644
--- a/docs/page/6/index.html
+++ b/docs/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/7/index.html b/docs/page/7/index.html
index 740408c60..6b753d0ec 100644
--- a/docs/page/7/index.html
+++ b/docs/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/8/index.html b/docs/page/8/index.html
index 2cddb652b..a5ce61913 100644
--- a/docs/page/8/index.html
+++ b/docs/page/8/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/9/index.html b/docs/page/9/index.html
index 91ef19fbd..a64509474 100644
--- a/docs/page/9/index.html
+++ b/docs/page/9/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/index.html b/docs/posts/index.html
index 5b4d45898..ac5e6e362 100644
--- a/docs/posts/index.html
+++ b/docs/posts/index.html
@@ -10,7 +10,7 @@
-
+
@@ -131,7 +131,7 @@
- 2022-04-01 I did G1GC tests on DSpace Test (linode26) to compliment the CMS tests I did yesterday The Discovery indexing took this long: real 334m33.625s user 227m51.331s sys 3m43.037s 2022-04-04 Start a full harvest on AReS Help Marianne with submit/approve access on a new collection on CGSpace Go back in Gaia’s batch reports to find records that she indicated for replacing on CGSpace (ie, those with better new copies, new versions, etc)
+ 2022-04-01 I did G1GC tests on DSpace Test (linode26) to compliment the CMS tests I did yesterday The Discovery indexing took this long: real 334m33.625s user 227m51.331s sys 3m43.037s 2022-04-04 Start a full harvest on AReS Help Marianne with submit/approve access on a new collection on CGSpace Go back in Gaia’s batch reports to find records that she indicated for replacing on CGSpace (ie, those with better new copies, new versions, etc) Looking at the Solr statistics for 2022-03 on CGSpace I see 54.
Read more →
diff --git a/docs/posts/index.xml b/docs/posts/index.xml
index 3da4bb6c8..22bcc06c6 100644
--- a/docs/posts/index.xml
+++ b/docs/posts/index.xml
@@ -30,7 +30,7 @@
Tue, 01 Mar 2022 10:53:39 +0300
https://alanorth.github.io/cgspace-notes/2022-03/
- 2022-04-01 I did G1GC tests on DSpace Test (linode26) to compliment the CMS tests I did yesterday The Discovery indexing took this long: real 334m33.625s user 227m51.331s sys 3m43.037s 2022-04-04 Start a full harvest on AReS Help Marianne with submit/approve access on a new collection on CGSpace Go back in Gaia’s batch reports to find records that she indicated for replacing on CGSpace (ie, those with better new copies, new versions, etc)
+ 2022-04-01 I did G1GC tests on DSpace Test (linode26) to compliment the CMS tests I did yesterday The Discovery indexing took this long: real 334m33.625s user 227m51.331s sys 3m43.037s 2022-04-04 Start a full harvest on AReS Help Marianne with submit/approve access on a new collection on CGSpace Go back in Gaia’s batch reports to find records that she indicated for replacing on CGSpace (ie, those with better new copies, new versions, etc) Looking at the Solr statistics for 2022-03 on CGSpace I see 54.
-
diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html
index 95bb7c4a9..4c9738dba 100644
--- a/docs/posts/page/2/index.html
+++ b/docs/posts/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html
index 547c1a0e2..ef33551d2 100644
--- a/docs/posts/page/3/index.html
+++ b/docs/posts/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html
index 98b94e592..f14f6124e 100644
--- a/docs/posts/page/4/index.html
+++ b/docs/posts/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html
index 80852bce6..3bdb8b56f 100644
--- a/docs/posts/page/5/index.html
+++ b/docs/posts/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html
index f4d67b2a7..729c76750 100644
--- a/docs/posts/page/6/index.html
+++ b/docs/posts/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html
index ee5df5b2a..23c0ef783 100644
--- a/docs/posts/page/7/index.html
+++ b/docs/posts/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/8/index.html b/docs/posts/page/8/index.html
index 42d5ace08..9bff2d497 100644
--- a/docs/posts/page/8/index.html
+++ b/docs/posts/page/8/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/9/index.html b/docs/posts/page/9/index.html
index 907c36c25..2e79cb2d2 100644
--- a/docs/posts/page/9/index.html
+++ b/docs/posts/page/9/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index 0667c6386..3981daa00 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -3,22 +3,22 @@
xmlns:xhtml="http://www.w3.org/1999/xhtml">
https://alanorth.github.io/cgspace-notes/categories/
- 2022-03-31T16:09:14+03:00
+ 2022-04-04T19:15:58+03:00
https://alanorth.github.io/cgspace-notes/
- 2022-03-31T16:09:14+03:00
+ 2022-04-04T19:15:58+03:00
https://alanorth.github.io/cgspace-notes/2022-03/
- 2022-03-31T16:09:14+03:00
+ 2022-04-04T19:15:58+03:00
https://alanorth.github.io/cgspace-notes/categories/notes/
- 2022-03-31T16:09:14+03:00
+ 2022-04-04T19:15:58+03:00
https://alanorth.github.io/cgspace-notes/posts/
- 2022-03-31T16:09:14+03:00
+ 2022-04-04T19:15:58+03:00
https://alanorth.github.io/cgspace-notes/2022-03/
- 2022-03-01T10:53:39+03:00
+ 2022-04-04T19:15:58+03:00
https://alanorth.github.io/cgspace-notes/2022-02/
2022-03-01T17:17:27+03:00