1
0
mirror of https://github.com/ilri/dspace-statistics-api.git synced 2024-11-25 07:40:17 +01:00
A simple REST API to expose Solr view and download statistics for items in a DSpace repository.
Go to file
Alan Orth 6fd2827a7c
Use Python's native json instead of ujson
Falcon can optionally use ujson to speed up JSON (de)serialization,
but Falcon's already really fast and requiring ujson actually makes
deployment trickier in some cases (for example in Docker containers
that are based on Alpine Linux).

Here are some tests of Falcon 1.4.1 on Python 3.5 from my laptop:

    1. falcon...............60172 req/sec or 16.62 μs/req (36x)
    2. falcon-ext...........34186 req/sec or 29.25 μs/req (20x)
    3. bottle...............32924 req/sec or 30.37 μs/req (20x)
    4. werkzeug.............11948 req/sec or 83.70 μs/req (7x)
    5. flask.................6654 req/sec or 150.30 μs/req (4x)
    6. django................4565 req/sec or 219.04 μs/req (3x)
    7. pecan.................1672 req/sec or 598.19 μs/req (1x)

The tests were conducted with Falcon's official Docker benchmarking
tools on my Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz on Arch Linux.

See: https://github.com/falconry/falcon/tree/master/docker
2018-10-24 14:08:23 +03:00
contrib contrib: Adjust example path 2018-10-23 14:34:29 +03:00
.gitignore Update docs to remove SQLite stuff 2018-09-25 00:56:01 +03:00
.travis.yml .travis.yml: Only build master branch 2018-10-14 19:00:31 +03:00
app.py app.py: Don't initialize Solr connection 2018-10-24 11:59:50 +03:00
CHANGELOG.md CHANGELOG.md: Move unreleased changes to v0.5.0 2018-10-24 12:02:42 +03:00
config.py Use PostgreSQL instead of SQLite 2018-09-25 00:49:47 +03:00
database.py database.py: Use one line for psycopg2 imports 2018-09-26 22:23:24 +03:00
indexer.py Use Python's native json instead of ujson 2018-10-24 14:08:23 +03:00
LICENSE.txt Add GPLv3 license 2018-09-18 14:16:07 +03:00
README.md README.md: Add example nginx configuration 2018-10-23 14:55:36 +03:00
requirements.txt Use Python's native json instead of ujson 2018-10-24 14:08:23 +03:00
solr.py Refactor Solr components 2018-09-23 13:24:30 +03:00

DSpace Statistics API Build Status

A simple REST API to expose Solr view and download statistics for items in a DSpace repository. This project contains a standalone indexing component and a WSGI application.

Requirements

Installation and Testing

Create a Python virtual environment and install the dependencies:

$ python -m venv venv
$ . venv/bin/activate
$ pip install -r requirements.txt

Set up the environment variables for Solr and PostgreSQL:

$ export SOLR_SERVER=http://localhost:8080/solr
$ export DATABASE_NAME=dspacestatistics
$ export DATABASE_USER=dspacestatistics
$ export DATABASE_PASS=dspacestatistics
$ export DATABASE_HOST=localhost

Index the Solr statistics core to populate the PostgreSQL database:

$ ./indexer.py

Run the REST API:

$ gunicorn app:api

Test to see if there are any statistics:

$ curl 'http://localhost:8000/items?limit=1'

Deployment

There are example systemd service and timer units in the contrib directory. The API service listens on localhost by default so you will need to expose it publicly using a web server like nginx.

An example nginx configuration is:

server {
    #...

    location ~ /rest/statistics/?(.*) {
        access_log /var/log/nginx/statistics.log;
        proxy_pass http://statistics_api/$1$is_args$args;
    }
}

upstream statistics_api {
    server 127.0.0.1:5000;
}

This would expose the API at /rest/statistics.

Using the API

The API exposes the following endpoints:

  • GET /itemsreturn views and downloads for all items that Solr knows about¹. Accepts limit and page query parameters for pagination of results.
  • GET /item/idreturn views and downloads for a single item (id must be a positive integer). Returns HTTP 404 if an item id is not found.

¹ We are querying the Solr statistics core, which technically only knows about items that have either views or downloads.

Todo

  • Add API documentation
  • Close DB connection when gunicorn shuts down gracefully
  • Better logging
  • Tests
  • Check if database exists (try/except)
  • Version API
  • Use JSON in PostgreSQL
  • Switch to Python 3.6+ f-string syntax

License

This work is licensed under the GPLv3.