diff --git a/content/posts/2023-11.md b/content/posts/2023-11.md
index 4b6f5dd68..a3b744a98 100644
--- a/content/posts/2023-11.md
+++ b/content/posts/2023-11.md
@@ -142,4 +142,69 @@ $ du -sh images-*
- Export CGSpace to check for missing Initiative collection mappings
- Start a harvest on AReS
+## 2023-11-22
+
+- I was checking out the [DSpace 7 statistics](https://github.com/DSpace/RestContract/blob/main/statistics-reports.md) again and found that we have total visits and total downloads for each DSpace object, for example [this item](https://dspace7test.ilri.org/items/3f1b9605-f5ff-4bbb-8c89-d6fe4157f748):
+ - TotalVisits: https://dspace7test.ilri.org/server/api/statistics/usagereports/3f1b9605-f5ff-4bbb-8c89-d6fe4157f748_TotalVisits
+ - TotalDownloads: https://dspace7test.ilri.org/server/api/statistics/usagereports/3f1b9605-f5ff-4bbb-8c89-d6fe4157f748_TotalDownloads
+- And the numbers match those in my dspace-statisitcs-api *exactly*!
+- This can be useful to get an individual DSpace object's stats, but there is no way to iterate over all objects like all items...
+ - We can look at using this to draw stats on the community, collection, and item pages
+
+## 2023-11-23
+
+- Brian King was asking me how many PDFs we had in CGSpace so I got a rough estimate using this SQL query:
+
+```console
+localhost/dspace7= ☘ SELECT COUNT(uuid) FROM bitstream WHERE bitstream_format_id=(SELECT bitstream_format_id FROM bitstreamformatregistry WHERE mimetype='application/pdf');
+ count
+───────
+ 47818
+(1 row)
+```
+
+- It's been some time since I looked at our Solr statistics to find new bots
+ - I found a few new ones that I [submitted to COUNTER-Robots](https://github.com/atmire/COUNTER-Robots/pull/60) and added to our local bot list:
+ - GuzzleHttp/7
+ - Owler@ows.eu/1
+ - newspaperjs
+- I ran my old `check-spider-hits.sh` script with a list of bots from our local overrides to purge hits from Solr:
+
+```console
+$ ./ilri/check-spider-hits.sh -f dspace/config/spiders/agents/ilri -p
+Purging 30 hits from ubermetrics in statistics
+Purging 59 hits from curb in statistics
+Purging 36 hits from bitdiscovery in statistics
+Purging 87 hits from omgili in statistics
+Purging 47 hits from Vizzit in statistics
+Purging 109 hits from Java\/17-ea in statistics
+Purging 40 hits from AdobeUxTechC4-Async in statistics
+Purging 21 hits from ZaloPC-win32-24v473 in statistics
+Purging 21 hits from nbertaupete95 in statistics
+Purging 52 hits from Scoop\.it in statistics
+Purging 16 hits from WebAPIClient in statistics
+Purging 241 hits from RStudio in statistics
+Purging 1255 hits from ^MEL in statistics
+Purging 47850 hits from GuzzleHttp in statistics
+Purging 8714 hits from Owler in statistics
+Purging 1083 hits from newspaperjs in statistics
+Purging 369 hits from ^Chrome$ in statistics
+Purging 1474 hits from curl in statistics
+
+Total number of bot hits purged: 61504
+```
+
+- I also noticed 35,000 requests over the past few years from lowercase user agents, which is [definitely weird](https://developers.whatismybrowser.com/api/features/user-agent-checks/weird/#all_lower_case), for example:
+ - `mozilla/5.0 (windows nt 10.0; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/89.0.4389.90 safari/537.36`
+ - `mozilla/5.0 (windows nt 10.0; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/90.0.4430.93 safari/537.36`
+- I'm gonna add those to our overrides and purge them:
+
+```console
+$ ./ilri/check-spider-hits.sh -f dspace/config/spiders/agents/ilri -p
+Purging 35816 hits from ^mozilla in statistics
+
+Total number of bot hits purged: 35816
+```
+
+
diff --git a/docs/2023-11/index.html b/docs/2023-11/index.html
index 8c58e8ad2..8ccdba81f 100644
--- a/docs/2023-11/index.html
+++ b/docs/2023-11/index.html
@@ -23,7 +23,7 @@ Start a harvest on AReS
-
+
@@ -52,9 +52,9 @@ Start a harvest on AReS
"@type": "BlogPosting",
"headline": "November, 2023",
"url": "https://alanorth.github.io/cgspace-notes/2023-11/",
- "wordCount": "889",
+ "wordCount": "1278",
"datePublished": "2023-11-02T12:59:36+03:00",
- "dateModified": "2023-11-16T17:25:15+03:00",
+ "dateModified": "2023-11-19T14:29:52+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@@ -284,7 +284,79 @@ tomcat9[732]: [9955.666s][info ][gc] GC(6292) To-space exhausted
Export CGSpace to check for missing Initiative collection mappings
Start a harvest on AReS
-
+2023-11-22
+
+- I was checking out the DSpace 7 statistics again and found that we have total visits and total downloads for each DSpace object, for example this item:
+
+
+- And the numbers match those in my dspace-statisitcs-api exactly!
+- This can be useful to get an individual DSpace object’s stats, but there is no way to iterate over all objects like all items…
+
+- We can look at using this to draw stats on the community, collection, and item pages
+
+
+
+2023-11-23
+
+- Brian King was asking me how many PDFs we had in CGSpace so I got a rough estimate using this SQL query:
+
+localhost/dspace7= ☘ SELECT COUNT(uuid) FROM bitstream WHERE bitstream_format_id=(SELECT bitstream_format_id FROM bitstreamformatregistry WHERE mimetype='application/pdf');
+ count
+───────
+ 47818
+(1 row)
+
+- It’s been some time since I looked at our Solr statistics to find new bots
+
+
+- I ran my old
check-spider-hits.sh
script with a list of bots from our local overrides to purge hits from Solr:
+
+$ ./ilri/check-spider-hits.sh -f dspace/config/spiders/agents/ilri -p
+Purging 30 hits from ubermetrics in statistics
+Purging 59 hits from curb in statistics
+Purging 36 hits from bitdiscovery in statistics
+Purging 87 hits from omgili in statistics
+Purging 47 hits from Vizzit in statistics
+Purging 109 hits from Java\/17-ea in statistics
+Purging 40 hits from AdobeUxTechC4-Async in statistics
+Purging 21 hits from ZaloPC-win32-24v473 in statistics
+Purging 21 hits from nbertaupete95 in statistics
+Purging 52 hits from Scoop\.it in statistics
+Purging 16 hits from WebAPIClient in statistics
+Purging 241 hits from RStudio in statistics
+Purging 1255 hits from ^MEL in statistics
+Purging 47850 hits from GuzzleHttp in statistics
+Purging 8714 hits from Owler in statistics
+Purging 1083 hits from newspaperjs in statistics
+Purging 369 hits from ^Chrome$ in statistics
+Purging 1474 hits from curl in statistics
+
+Total number of bot hits purged: 61504
+
+- I also noticed 35,000 requests over the past few years from lowercase user agents, which is definitely weird, for example:
+
+mozilla/5.0 (windows nt 10.0; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/89.0.4389.90 safari/537.36
+mozilla/5.0 (windows nt 10.0; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/90.0.4430.93 safari/537.36
+
+
+- I’m gonna add those to our overrides and purge them:
+
+$ ./ilri/check-spider-hits.sh -f dspace/config/spiders/agents/ilri -p
+Purging 35816 hits from ^mozilla in statistics
+
+Total number of bot hits purged: 35816
+
diff --git a/docs/categories/index.html b/docs/categories/index.html
index 08339cf11..e433475e9 100644
--- a/docs/categories/index.html
+++ b/docs/categories/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html
index 453443709..b02eeacb1 100644
--- a/docs/categories/notes/index.html
+++ b/docs/categories/notes/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html
index bc3739c84..1c150b021 100644
--- a/docs/categories/notes/page/2/index.html
+++ b/docs/categories/notes/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html
index 28097342f..652a52d8e 100644
--- a/docs/categories/notes/page/3/index.html
+++ b/docs/categories/notes/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html
index 02e09ecd2..59107560d 100644
--- a/docs/categories/notes/page/4/index.html
+++ b/docs/categories/notes/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html
index aa6a2039d..645f671ae 100644
--- a/docs/categories/notes/page/5/index.html
+++ b/docs/categories/notes/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/6/index.html b/docs/categories/notes/page/6/index.html
index f95619b32..2c8ca3a63 100644
--- a/docs/categories/notes/page/6/index.html
+++ b/docs/categories/notes/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/7/index.html b/docs/categories/notes/page/7/index.html
index 01b794510..bd1722279 100644
--- a/docs/categories/notes/page/7/index.html
+++ b/docs/categories/notes/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/8/index.html b/docs/categories/notes/page/8/index.html
index a0c000a2e..9f5fa7117 100644
--- a/docs/categories/notes/page/8/index.html
+++ b/docs/categories/notes/page/8/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/index.html b/docs/index.html
index bac7d0e97..3c42ae62c 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/10/index.html b/docs/page/10/index.html
index 4e8202d24..0df999ff2 100644
--- a/docs/page/10/index.html
+++ b/docs/page/10/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/2/index.html b/docs/page/2/index.html
index dd78e0d82..816a40511 100644
--- a/docs/page/2/index.html
+++ b/docs/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/3/index.html b/docs/page/3/index.html
index a02ef1994..86b2712fb 100644
--- a/docs/page/3/index.html
+++ b/docs/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/4/index.html b/docs/page/4/index.html
index fe4775ad6..da92e62fc 100644
--- a/docs/page/4/index.html
+++ b/docs/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/5/index.html b/docs/page/5/index.html
index c6f291c0a..4a1cd36e7 100644
--- a/docs/page/5/index.html
+++ b/docs/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/6/index.html b/docs/page/6/index.html
index 0cd3744f7..c709280fb 100644
--- a/docs/page/6/index.html
+++ b/docs/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/7/index.html b/docs/page/7/index.html
index f949cb9f8..b80b94cdc 100644
--- a/docs/page/7/index.html
+++ b/docs/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/8/index.html b/docs/page/8/index.html
index e900caa92..9d4a4e586 100644
--- a/docs/page/8/index.html
+++ b/docs/page/8/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/9/index.html b/docs/page/9/index.html
index a51eb15ed..8d8505535 100644
--- a/docs/page/9/index.html
+++ b/docs/page/9/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/index.html b/docs/posts/index.html
index 424d6642d..c491885fa 100644
--- a/docs/posts/index.html
+++ b/docs/posts/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/10/index.html b/docs/posts/page/10/index.html
index 35bef706f..7ca3ec467 100644
--- a/docs/posts/page/10/index.html
+++ b/docs/posts/page/10/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html
index e495d6880..73da1c122 100644
--- a/docs/posts/page/2/index.html
+++ b/docs/posts/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html
index 8104a5699..19e0a7fc5 100644
--- a/docs/posts/page/3/index.html
+++ b/docs/posts/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html
index fe407e679..e4926533e 100644
--- a/docs/posts/page/4/index.html
+++ b/docs/posts/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html
index c6fb8eff6..d2204611a 100644
--- a/docs/posts/page/5/index.html
+++ b/docs/posts/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html
index 790c52f23..d5093f5c6 100644
--- a/docs/posts/page/6/index.html
+++ b/docs/posts/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html
index b49dfbfb4..843e0c2f4 100644
--- a/docs/posts/page/7/index.html
+++ b/docs/posts/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/8/index.html b/docs/posts/page/8/index.html
index 77e3e5250..028daa51d 100644
--- a/docs/posts/page/8/index.html
+++ b/docs/posts/page/8/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/9/index.html b/docs/posts/page/9/index.html
index 985e658a9..52f77b5b9 100644
--- a/docs/posts/page/9/index.html
+++ b/docs/posts/page/9/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index 22b9a907e..28daee1cd 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -3,19 +3,19 @@
xmlns:xhtml="http://www.w3.org/1999/xhtml">
https://alanorth.github.io/cgspace-notes/categories/
- 2023-11-16T17:25:15+03:00
+ 2023-11-19T14:29:52+03:00
https://alanorth.github.io/cgspace-notes/
- 2023-11-16T17:25:15+03:00
+ 2023-11-19T14:29:52+03:00
https://alanorth.github.io/cgspace-notes/categories/notes/
- 2023-11-16T17:25:15+03:00
+ 2023-11-19T14:29:52+03:00
https://alanorth.github.io/cgspace-notes/2023-11/
- 2023-11-16T17:25:15+03:00
+ 2023-11-19T14:29:52+03:00
https://alanorth.github.io/cgspace-notes/posts/
- 2023-11-16T17:25:15+03:00
+ 2023-11-19T14:29:52+03:00
https://alanorth.github.io/cgspace-notes/2023-10/
2023-11-02T20:58:43+03:00