Update notes for 2018-09-25

This commit is contained in:
2018-09-25 19:05:02 +03:00
parent 2634aff405
commit f4c053ef76
3 changed files with 50 additions and 8 deletions

View File

@ -468,5 +468,24 @@ $ psql -h localhost -U postgres dspacestatistics
dspacestatistics=> CREATE TABLE IF NOT EXISTS items
dspacestatistics-> (id INT PRIMARY KEY, views INT DEFAULT 0, downloads INT DEFAULT 0)
```
## 2018-09-25
- I deployed the DSpace statistics API on CGSpace, but when I ran the indexer it wanted to index 180,000 pages of item views
- I'm not even sure how that's possible, as we only have 74,000 items!
- I need to inspect the `id` values that are returned for views and cross check them with the `owningItem` values for bitstream downloads...
- Also, I could try to check all IDs against the items table to see if they are actually items (perhaps the Solr `id` field doesn't correspond with *actual* DSpace items?)
- I want to purge the bot hits from the Solr statistics core, as I am now realizing that I don't give a shit about tens of millions of hits by Google and Bing indexing my shit every day (at least not in Solr!)
- CGSpace's Solr core has 150,000,000 documents in it... and it's still pretty fast to query, but it's really a maintenance and backup burden
- DSpace Test currently has about 2,000,000 documents with `isBot:true` in its Solr statistics core, and the size on disk is 2GB (it's not much, but I have to test this somewhere!)
- According to the [DSpace 5.x Solr documentation](https://wiki.duraspace.org/display/DSDOC5x/SOLR+Statistics+Maintenance) I can use `dspace stats-util -f`, so let's try it:
```
$ dspace stats-util -f
```
- The command comes back after a few seconds and I still see 2,000,000 documents in the statistics core with `isBot:true`
- I was just writing a message to the dspace-tech mailing list and then I decided to check the number of bot view events on DSpace Test again, and now it's 201 instead of 2,000,000, and statistics core is only 30MB now!
- I will set the `logBots = false` property in `dspace/config/modules/usage-statistics.cfg` on DSpace Test and check if the number of `isBot:true` events goes up any more...
- I restarted the server with `logBots = false` and after it came back up I see 266 events with `isBots:true` (maybe they were buffered)... I will check again tomorrow
<!-- vim: set sw=2 ts=2: -->