mirror of
https://github.com/ilri/dspace-statistics-api.git
synced 2024-11-10 16:55:45 +01:00
Alan Orth
c027f01b48
This follows guidance from several well-known Python best practices guides. Basically, the idea is create a package for the application that is comprised of several re-usable modules. See: https://docs.python-guide.org/writing/structure/ See: https://realpython.com/python-application-layouts/
80 lines
2.7 KiB
Markdown
80 lines
2.7 KiB
Markdown
# DSpace Statistics API [![Build Status](https://travis-ci.org/alanorth/dspace-statistics-api.svg?branch=master)](https://travis-ci.org/alanorth/dspace-statistics-api)
|
||
A simple REST API to expose Solr view and download statistics for items in a DSpace repository. This project contains a standalone indexing component and a WSGI application.
|
||
|
||
## Requirements
|
||
|
||
- Python 3.5+
|
||
- PostgreSQL version 9.5+ (due to [`UPSERT` support](https://wiki.postgresql.org/wiki/UPSERT))
|
||
- DSpace 4+ with [Solr usage statistics enabled](https://wiki.duraspace.org/display/DSDOC5x/SOLR+Statistics)
|
||
|
||
## Installation and Testing
|
||
Create a Python virtual environment and install the dependencies:
|
||
|
||
$ python -m venv venv
|
||
$ . venv/bin/activate
|
||
$ pip install -r requirements.txt
|
||
|
||
Set up the environment variables for Solr and PostgreSQL:
|
||
|
||
$ export SOLR_SERVER=http://localhost:8080/solr
|
||
$ export DATABASE_NAME=dspacestatistics
|
||
$ export DATABASE_USER=dspacestatistics
|
||
$ export DATABASE_PASS=dspacestatistics
|
||
$ export DATABASE_HOST=localhost
|
||
|
||
Index the Solr statistics core to populate the PostgreSQL database:
|
||
|
||
$ python -m dspace_statistics_api.indexer
|
||
|
||
Run the REST API:
|
||
|
||
$ gunicorn dspace_statistics_api.app:api
|
||
|
||
Test to see if there are any statistics:
|
||
|
||
$ curl 'http://localhost:8000/items?limit=1'
|
||
|
||
## Deployment
|
||
There are example systemd service and timer units in the `contrib` directory. The API service listens on localhost by default so you will need to expose it publicly using a web server like nginx.
|
||
|
||
An example nginx configuration is:
|
||
|
||
```
|
||
server {
|
||
#...
|
||
|
||
location ~ /rest/statistics/?(.*) {
|
||
access_log /var/log/nginx/statistics.log;
|
||
proxy_pass http://statistics_api/$1$is_args$args;
|
||
}
|
||
}
|
||
|
||
upstream statistics_api {
|
||
server 127.0.0.1:5000;
|
||
}
|
||
```
|
||
|
||
This would expose the API at `/rest/statistics`.
|
||
|
||
## Using the API
|
||
The API exposes the following endpoints:
|
||
|
||
- GET `/items` — return views and downloads for all items that Solr knows about¹. Accepts `limit` and `page` query parameters for pagination of results.
|
||
- GET `/item/id` — return views and downloads for a single item (*id* must be a positive integer). Returns HTTP 404 if an item id is not found.
|
||
|
||
¹ We are querying the Solr statistics core, which technically only knows about items that have either views or downloads.
|
||
|
||
## Todo
|
||
|
||
- Add API documentation
|
||
- Close DB connection when gunicorn shuts down gracefully
|
||
- Better logging
|
||
- Tests
|
||
- Check if database exists (try/except)
|
||
- Version API
|
||
- Use JSON in PostgreSQL
|
||
- Switch to [Python 3.6+ f-string syntax](https://realpython.com/python-f-strings/)
|
||
|
||
## License
|
||
This work is licensed under the [GPLv3](https://www.gnu.org/licenses/gpl-3.0.en.html).
|