diff --git a/content/posts/2021-06.md b/content/posts/2021-06.md
index 4ce1b6ba5..895b3226a 100644
--- a/content/posts/2021-06.md
+++ b/content/posts/2021-06.md
@@ -194,5 +194,43 @@ $ curl -s -H "Accept: application/json" "https://demo.dspace.org/rest/items?offs
- I tested with filter "farmer managed irrigation systems" on DSpace Test
- Before the patch I got 293 results, and the few I checked didn't have the expected metadata value
- After the patch I got 162 results, and all the items I checked had the exact metadata value I was expecting
+- I tested a fresh harvest from my local AReS on DSpace Test with the DS-4065 REST API patch and here are my results:
+ - 90459 in final from last harvesting
+ - 90307 in temp after new harvest
+ - 90327 in temp after start plugins
+- The 90327 number seems closer to the "real" number of items on CGSpace...
+ - Seems close, but not entirely correct yet:
+
+```console
+$ grep -oE '"handle":"[[:digit:]]+/[[:digit:]]+"' openrxv-items_data-local-ds-4065.json | wc -l
+90327
+$ grep -oE '"handle":"[[:digit:]]+/[[:digit:]]+"' openrxv-items_data-local-ds-4065.json | sort -u | wc -l
+90317
+```
+
+## 2021-06-22
+
+- Make a [pull request](https://github.com/atmire/COUNTER-Robots/pull/43) to the COUNTER-Robots project to add two new user agents: crusty and newspaper
+ - These two bots have made ~3,000 requests on CGSpace
+ - Then I added them to our local bot override in CGSpace (until the above pull request is merged) and ran my bot checking script:
+
+```console
+$ ./ilri/check-spider-hits.sh -f dspace/config/spiders/agents/ilri -p
+Purging 1339 hits from RI\/1\.0 in statistics
+Purging 447 hits from crusty in statistics
+Purging 3736 hits from newspaper in statistics
+
+Total number of bot hits purged: 5522
+```
+
+- Surprised to see RI/1.0 in there because it's been in the override file for a while
+- Looking at the 2021 statistics in Solr I see a few more suspicious user agents:
+ - `PostmanRuntime/7.26.8`
+ - `node-fetch/1.0 (+https://github.com/bitinn/node-fetch)`
+ - `Photon/1.0`
+ - `StatusCake_Pagespeed_indev`
+ - `node-superagent/3.8.3`
+ - `cortex/1.0`
+- These bots account for ~42,000 hits in our statistics... I will just purge them and add them to our local override, but I can't be bothered to submit them to COUNTER-Robots since I'd have to look up the information for each one
diff --git a/docs/2021-06/index.html b/docs/2021-06/index.html
index 237d719b8..8983a62e1 100644
--- a/docs/2021-06/index.html
+++ b/docs/2021-06/index.html
@@ -20,7 +20,7 @@ I simply started it and AReS was running again:
-
+
@@ -46,9 +46,9 @@ I simply started it and AReS was running again:
"@type": "BlogPosting",
"headline": "June, 2021",
"url": "https://alanorth.github.io/cgspace-notes/2021-06/",
- "wordCount": "1419",
+ "wordCount": "1665",
"datePublished": "2021-06-01T10:51:07+03:00",
- "dateModified": "2021-06-17T18:16:18+03:00",
+ "dateModified": "2021-06-21T16:24:40+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@@ -342,6 +342,51 @@ $ curl -s -H "Accept: application/json" "https://demo.dspace.org/
After the patch I got 162 results, and all the items I checked had the exact metadata value I was expecting
+I tested a fresh harvest from my local AReS on DSpace Test with the DS-4065 REST API patch and here are my results:
+
+- 90459 in final from last harvesting
+- 90307 in temp after new harvest
+- 90327 in temp after start plugins
+
+
+The 90327 number seems closer to the “real” number of items on CGSpace…
+
+- Seems close, but not entirely correct yet:
+
+
+
+$ grep -oE '"handle":"[[:digit:]]+/[[:digit:]]+"' openrxv-items_data-local-ds-4065.json | wc -l
+90327
+$ grep -oE '"handle":"[[:digit:]]+/[[:digit:]]+"' openrxv-items_data-local-ds-4065.json | sort -u | wc -l
+90317
+
2021-06-22
+
+- Make a pull request to the COUNTER-Robots project to add two new user agents: crusty and newspaper
+
+- These two bots have made ~3,000 requests on CGSpace
+- Then I added them to our local bot override in CGSpace (until the above pull request is merged) and ran my bot checking script:
+
+
+
+$ ./ilri/check-spider-hits.sh -f dspace/config/spiders/agents/ilri -p
+Purging 1339 hits from RI\/1\.0 in statistics
+Purging 447 hits from crusty in statistics
+Purging 3736 hits from newspaper in statistics
+
+Total number of bot hits purged: 5522
+
+- Surprised to see RI/1.0 in there because it’s been in the override file for a while
+- Looking at the 2021 statistics in Solr I see a few more suspicious user agents:
+
+PostmanRuntime/7.26.8
+node-fetch/1.0 (+https://github.com/bitinn/node-fetch)
+Photon/1.0
+StatusCake_Pagespeed_indev
+node-superagent/3.8.3
+cortex/1.0
+
+
+- These bots account for ~42,000 hits in our statistics… I will just purge them and add them to our local override, but I can’t be bothered to submit them to COUNTER-Robots since I’d have to look up the information for each one
diff --git a/docs/categories/index.html b/docs/categories/index.html
index 832047628..b818cd054 100644
--- a/docs/categories/index.html
+++ b/docs/categories/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html
index e79ac811c..1de5dc72f 100644
--- a/docs/categories/notes/index.html
+++ b/docs/categories/notes/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html
index 4f92fdb7a..ca14076a0 100644
--- a/docs/categories/notes/page/2/index.html
+++ b/docs/categories/notes/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html
index ebc147de4..c03168b20 100644
--- a/docs/categories/notes/page/3/index.html
+++ b/docs/categories/notes/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html
index ff16557a1..6b6264987 100644
--- a/docs/categories/notes/page/4/index.html
+++ b/docs/categories/notes/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html
index d9b0db2ce..e27af1461 100644
--- a/docs/categories/notes/page/5/index.html
+++ b/docs/categories/notes/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/index.html b/docs/index.html
index cf3872552..fb314d7dd 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/2/index.html b/docs/page/2/index.html
index 6552d9fd8..da7e9ad3c 100644
--- a/docs/page/2/index.html
+++ b/docs/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/3/index.html b/docs/page/3/index.html
index 92a90fb6e..16001d89c 100644
--- a/docs/page/3/index.html
+++ b/docs/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/4/index.html b/docs/page/4/index.html
index b502ebb5f..9c3082ad2 100644
--- a/docs/page/4/index.html
+++ b/docs/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/5/index.html b/docs/page/5/index.html
index 675b88b4d..95ffaee34 100644
--- a/docs/page/5/index.html
+++ b/docs/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/6/index.html b/docs/page/6/index.html
index b34131e22..129773daf 100644
--- a/docs/page/6/index.html
+++ b/docs/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/7/index.html b/docs/page/7/index.html
index 5bfce3dab..af4d1cebf 100644
--- a/docs/page/7/index.html
+++ b/docs/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/8/index.html b/docs/page/8/index.html
index c13e179d5..9257775ac 100644
--- a/docs/page/8/index.html
+++ b/docs/page/8/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/index.html b/docs/posts/index.html
index be7067d62..ad9164f99 100644
--- a/docs/posts/index.html
+++ b/docs/posts/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html
index 3f0c5ffdf..9762f6c10 100644
--- a/docs/posts/page/2/index.html
+++ b/docs/posts/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html
index 0295765d3..b3f22c942 100644
--- a/docs/posts/page/3/index.html
+++ b/docs/posts/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html
index bcae6a634..1f6b8eba3 100644
--- a/docs/posts/page/4/index.html
+++ b/docs/posts/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html
index a311235fb..d695ff236 100644
--- a/docs/posts/page/5/index.html
+++ b/docs/posts/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html
index 416bf9e64..795f7695a 100644
--- a/docs/posts/page/6/index.html
+++ b/docs/posts/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html
index cde988a15..c737001a0 100644
--- a/docs/posts/page/7/index.html
+++ b/docs/posts/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/8/index.html b/docs/posts/page/8/index.html
index bb57c4b20..ff0f63835 100644
--- a/docs/posts/page/8/index.html
+++ b/docs/posts/page/8/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index 113ea5051..9f6d2fd01 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -3,19 +3,19 @@
xmlns:xhtml="http://www.w3.org/1999/xhtml">
https://alanorth.github.io/cgspace-notes/categories/
- 2021-06-17T18:16:18+03:00
+ 2021-06-21T16:24:40+03:00
https://alanorth.github.io/cgspace-notes/
- 2021-06-17T18:16:18+03:00
+ 2021-06-21T16:24:40+03:00
https://alanorth.github.io/cgspace-notes/2021-06/
- 2021-06-17T18:16:18+03:00
+ 2021-06-21T16:24:40+03:00
https://alanorth.github.io/cgspace-notes/categories/notes/
- 2021-06-17T18:16:18+03:00
+ 2021-06-21T16:24:40+03:00
https://alanorth.github.io/cgspace-notes/posts/
- 2021-06-17T18:16:18+03:00
+ 2021-06-21T16:24:40+03:00
https://alanorth.github.io/cgspace-notes/2021-05/
2021-05-30T22:09:06+03:00