mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2023-11-23
This commit is contained in:
@ -142,4 +142,69 @@ $ du -sh images-*
|
||||
- Export CGSpace to check for missing Initiative collection mappings
|
||||
- Start a harvest on AReS
|
||||
|
||||
## 2023-11-22
|
||||
|
||||
- I was checking out the [DSpace 7 statistics](https://github.com/DSpace/RestContract/blob/main/statistics-reports.md) again and found that we have total visits and total downloads for each DSpace object, for example [this item](https://dspace7test.ilri.org/items/3f1b9605-f5ff-4bbb-8c89-d6fe4157f748):
|
||||
- TotalVisits: https://dspace7test.ilri.org/server/api/statistics/usagereports/3f1b9605-f5ff-4bbb-8c89-d6fe4157f748_TotalVisits
|
||||
- TotalDownloads: https://dspace7test.ilri.org/server/api/statistics/usagereports/3f1b9605-f5ff-4bbb-8c89-d6fe4157f748_TotalDownloads
|
||||
- And the numbers match those in my dspace-statisitcs-api *exactly*!
|
||||
- This can be useful to get an individual DSpace object's stats, but there is no way to iterate over all objects like all items...
|
||||
- We can look at using this to draw stats on the community, collection, and item pages
|
||||
|
||||
## 2023-11-23
|
||||
|
||||
- Brian King was asking me how many PDFs we had in CGSpace so I got a rough estimate using this SQL query:
|
||||
|
||||
```console
|
||||
localhost/dspace7= ☘ SELECT COUNT(uuid) FROM bitstream WHERE bitstream_format_id=(SELECT bitstream_format_id FROM bitstreamformatregistry WHERE mimetype='application/pdf');
|
||||
count
|
||||
───────
|
||||
47818
|
||||
(1 row)
|
||||
```
|
||||
|
||||
- It's been some time since I looked at our Solr statistics to find new bots
|
||||
- I found a few new ones that I [submitted to COUNTER-Robots](https://github.com/atmire/COUNTER-Robots/pull/60) and added to our local bot list:
|
||||
- GuzzleHttp/7
|
||||
- Owler@ows.eu/1
|
||||
- newspaperjs
|
||||
- I ran my old `check-spider-hits.sh` script with a list of bots from our local overrides to purge hits from Solr:
|
||||
|
||||
```console
|
||||
$ ./ilri/check-spider-hits.sh -f dspace/config/spiders/agents/ilri -p
|
||||
Purging 30 hits from ubermetrics in statistics
|
||||
Purging 59 hits from curb in statistics
|
||||
Purging 36 hits from bitdiscovery in statistics
|
||||
Purging 87 hits from omgili in statistics
|
||||
Purging 47 hits from Vizzit in statistics
|
||||
Purging 109 hits from Java\/17-ea in statistics
|
||||
Purging 40 hits from AdobeUxTechC4-Async in statistics
|
||||
Purging 21 hits from ZaloPC-win32-24v473 in statistics
|
||||
Purging 21 hits from nbertaupete95 in statistics
|
||||
Purging 52 hits from Scoop\.it in statistics
|
||||
Purging 16 hits from WebAPIClient in statistics
|
||||
Purging 241 hits from RStudio in statistics
|
||||
Purging 1255 hits from ^MEL in statistics
|
||||
Purging 47850 hits from GuzzleHttp in statistics
|
||||
Purging 8714 hits from Owler in statistics
|
||||
Purging 1083 hits from newspaperjs in statistics
|
||||
Purging 369 hits from ^Chrome$ in statistics
|
||||
Purging 1474 hits from curl in statistics
|
||||
|
||||
Total number of bot hits purged: 61504
|
||||
```
|
||||
|
||||
- I also noticed 35,000 requests over the past few years from lowercase user agents, which is [definitely weird](https://developers.whatismybrowser.com/api/features/user-agent-checks/weird/#all_lower_case), for example:
|
||||
- `mozilla/5.0 (windows nt 10.0; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/89.0.4389.90 safari/537.36`
|
||||
- `mozilla/5.0 (windows nt 10.0; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/90.0.4430.93 safari/537.36`
|
||||
- I'm gonna add those to our overrides and purge them:
|
||||
|
||||
```console
|
||||
$ ./ilri/check-spider-hits.sh -f dspace/config/spiders/agents/ilri -p
|
||||
Purging 35816 hits from ^mozilla in statistics
|
||||
|
||||
Total number of bot hits purged: 35816
|
||||
```
|
||||
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user