mirror of
https://github.com/ilri/dspace-statistics-api.git
synced 2024-12-22 12:42:19 +01:00
A simple REST API to expose Solr view and download statistics for items in a DSpace repository.
Alan Orth
e604d8ca81
Basically Solr's numFound has nothing to do with the actual number of distinct facets that are returned. You need to use Solr's stats component to get the number of distinct facets, aka countDistinct. This is apparently deprecated in newer Solr versions, but we're on version 4.10 and it works there. Also, I realized that there is no need to return facets for items without any views or downloads. Using facet.mincount=1 reduces the result set size and also means we can store less data in the data- base. The API returns HTTP 404 Not Found if an item is not in the database anyways. I can't figure it out exactly, but there is some weird issue with Solr's facet results when you don't use facet.mincount=1. For some reason you get tons of results with an id that doesn't even exist in the document database, let alone as an actual DSpace item! See: https://lucene.apache.org/solr/guide/6_6/the-stats-component.html |
||
---|---|---|
contrib | ||
.gitignore | ||
.travis.yml | ||
app.py | ||
CHANGELOG.md | ||
config.py | ||
database.py | ||
indexer.py | ||
LICENSE.txt | ||
README.md | ||
requirements.txt | ||
solr.py |
DSpace Statistics API
A quick and dirty REST API to expose Solr view and download statistics for items in a DSpace repository.
Written and tested in Python 3.5, 3.6, and 3.7. Requires PostgreSQL version 9.5 or greater for UPSERT
support.
Installation
Create a virtual environment and run it:
$ python -m venv venv
$ . venv/bin/activate
$ pip install -r requirements.txt
$ gunicorn app:api
Using the API
The API exposes the following endpoints:
- GET
/items
— return views and downloads for all items that Solr knows about¹. Acceptslimit
andpage
query parameters for pagination of results. - GET
/item/id
— return views and downloads for a single item (id must be a positive integer). Returns HTTP 404 if an item id is not found.
¹ We are querying the Solr statistics core, which technically only knows about items that have either views or downloads.
Todo
- Add API documentation
- Close up DB connection when gunicorn shuts down gracefully
- Better logging
- Tests
License
This work is licensed under the GPLv3.