2018-10-14 18:58:12 +03:00
# DSpace Statistics API [![Build Status](https://travis-ci.org/alanorth/dspace-statistics-api.svg?branch=master)](https://travis-ci.org/alanorth/dspace-statistics-api)
2018-09-27 09:12:52 +03:00
A simple REST API to expose Solr view and download statistics for items in a DSpace repository. This project contains a standalone indexing component and a WSGI application.
2018-09-18 14:11:29 +03:00
2018-09-27 08:57:27 +03:00
## Requirements
- Python 3.5+
2018-09-27 12:45:15 +03:00
- PostgreSQL version 9.5+ (due to [`UPSERT` support ](https://wiki.postgresql.org/wiki/UPSERT ))
2018-09-27 08:57:27 +03:00
- DSpace 4+ with [Solr usage statistics enabled ](https://wiki.duraspace.org/display/DSDOC5x/SOLR+Statistics )
2018-09-18 14:09:29 +03:00
2018-10-03 11:12:18 +03:00
## Installation and Testing
Create a Python virtual environment and install the dependencies:
2018-09-18 14:09:29 +03:00
2018-09-25 12:07:58 +03:00
$ python -m venv venv
2018-09-18 14:09:29 +03:00
$ . venv/bin/activate
2018-09-25 11:02:50 +03:00
$ pip install -r requirements.txt
2018-10-03 11:12:18 +03:00
2018-10-14 18:45:40 +03:00
Set up the environment variables for Solr and PostgreSQL:
2018-10-03 11:12:18 +03:00
$ export SOLR_SERVER=http://localhost:8080/solr
2018-10-14 18:45:40 +03:00
$ export DATABASE_NAME=dspacestatistics
$ export DATABASE_USER=dspacestatistics
$ export DATABASE_PASS=dspacestatistics
$ export DATABASE_HOST=localhost
2018-10-17 09:50:30 +03:00
Index the Solr statistics core to populate the PostgreSQL database:
$ ./indexer.py
Run the REST API:
2018-09-18 14:09:29 +03:00
$ gunicorn app:api
2018-10-17 09:50:30 +03:00
Test to see if there are any statistics:
$ curl 'http://localhost:8000/items?limit=1'
2018-09-27 09:26:47 +03:00
## Deployment
2018-10-23 14:55:36 +03:00
There are example systemd service and timer units in the `contrib` directory. The API service listens on localhost by default so you will need to expose it publicly using a web server like nginx.
An example nginx configuration is:
```
server {
#...
location ~ /rest/statistics/?(.*) {
access_log /var/log/nginx/statistics.log;
proxy_pass http://statistics_api/$1$is_args$args;
}
}
upstream statistics_api {
server 127.0.0.1:5000;
}
```
This would expose the API at `/rest/statistics` .
2018-09-27 09:26:47 +03:00
2018-09-25 11:28:12 +03:00
## Using the API
The API exposes the following endpoints:
2018-09-25 11:34:07 +03:00
- GET `/items` — return views and downloads for all items that Solr knows about¹. Accepts `limit` and `page` query parameters for pagination of results.
2018-09-25 13:12:53 +03:00
- GET `/item/id` — return views and downloads for a single item (*id* must be a positive integer). Returns HTTP 404 if an item id is not found.
2018-09-25 11:28:12 +03:00
2018-09-25 12:24:52 +03:00
¹ We are querying the Solr statistics core, which technically only knows about items that have either views or downloads.
2018-09-25 11:28:12 +03:00
2018-09-18 14:19:14 +03:00
## Todo
2018-09-23 09:52:36 +03:00
- Add API documentation
2018-10-17 09:51:35 +03:00
- Close DB connection when gunicorn shuts down gracefully
2018-09-24 00:35:00 +03:00
- Better logging
2018-09-25 11:28:12 +03:00
- Tests
2018-09-27 09:17:45 +03:00
- Check if database exists (try/except)
2018-10-03 11:12:18 +03:00
- Version API
2018-10-03 20:08:18 +03:00
- Use JSON in PostgreSQL
2018-10-17 09:13:25 +03:00
- Switch to [Python 3.6+ f-string syntax ](https://realpython.com/python-f-strings/ )
2018-09-18 14:19:14 +03:00
2018-09-18 14:16:07 +03:00
## License
This work is licensed under the [GPLv3 ](https://www.gnu.org/licenses/gpl-3.0.en.html ).