A simple REST API to expose Solr view and download statistics for items in a DSpace repository.
Go to file
Alan Orth 13736d6359
CHANGELOG.md: Move unreleased changes to verison 0.8.0
2018-11-11 17:16:26 +02:00
contrib contrib: Update systemd unit files for refactor 2018-10-28 11:14:21 +02:00
dspace_statistics_api dspace_statistics_api/database.py: Raise HTTP 500 on error 2018-11-10 23:58:58 +02:00
tests tests/test_api.py: Use response.text for all json.loads() 2018-11-11 17:05:31 +02:00
.flake8 Enable Flake8 validation in Hound CI 2018-11-04 00:48:06 +02:00
.gitignore Update docs to remove SQLite stuff 2018-09-25 00:56:01 +03:00
.hound.yml .hound.yml: Set pull requests to failed if build fails 2018-11-04 00:53:37 +02:00
.travis.yml .travis.yml: Use PostgreSQL 9.5 2018-11-11 16:41:35 +02:00
CHANGELOG.md CHANGELOG.md: Move unreleased changes to verison 0.8.0 2018-11-11 17:16:26 +02:00
LICENSE.txt Add GPLv3 license 2018-09-18 14:16:07 +03:00
Pipfile Add pytest to pipenv development packages 2018-11-11 16:24:54 +02:00
Pipfile.lock Add pytest to pipenv development packages 2018-11-11 16:24:54 +02:00
README.md README.md: We have tests now 2018-11-11 17:08:51 +02:00
pytest.ini Add initial pytest configuration 2018-11-11 16:24:54 +02:00
requirements.txt requirements.txt: Regenerate 2018-11-04 15:58:31 +02:00

README.md

DSpace Statistics API Build Status

DSpace stores item view and download events in a Solr "statistics" core. This information is available for use in the various DSpace user interfaces, but is not exposed externally via any APIs. The DSpace 4+ REST API, for example, only exposes information about communities, collections, item metadata, and bitstreams.

This project contains an indexer and a Falcon-based web application to make the statistics available via simple REST API. You can read more about the Solr queries used to gather the item view and download statistics on the DSpace wiki.

Requirements

Installation and Testing

Create a Python virtual environment and install the dependencies using pipenv:

$ pipenv install --dev
$ pipenv shell

Set up the environment variables for Solr and PostgreSQL:

$ export SOLR_SERVER=http://localhost:8080/solr
$ export DATABASE_NAME=dspacestatistics
$ export DATABASE_USER=dspacestatistics
$ export DATABASE_PASS=dspacestatistics
$ export DATABASE_HOST=localhost

Index the Solr statistics core to populate the PostgreSQL database:

$ python -m dspace_statistics_api.indexer

Run the REST API:

$ gunicorn dspace_statistics_api.app

Test to see if there are any statistics:

$ curl 'http://localhost:8000/items?limit=1'

Run tests:

$ pytest

Deployment

There are example systemd service and timer units in the contrib directory. The API service listens on localhost by default so you will need to expose it publicly using a web server like nginx.

An example nginx configuration is:

server {
    #...

    location ~ /rest/statistics/?(.*) {
        access_log /var/log/nginx/statistics.log;
        proxy_pass http://statistics_api/$1$is_args$args;
    }
}

upstream statistics_api {
    server 127.0.0.1:5000;
}

This would expose the API at /rest/statistics.

Using the API

The API exposes the following endpoints:

  • GET /return a basic API documentation page.
  • GET /itemsreturn views and downloads for all items that Solr knows about¹. Accepts limit and page query parameters for pagination of results (limit must be an integer between 1 and 100, and page must be an integer greater than or equal to 0).
  • GET /item/idreturn views and downloads for a single item (id must be a positive integer). Returns HTTP 404 if an item id is not found.

The item id is the internal id for an item. You can get these from the standard DSpace REST API.

¹ We are querying the Solr statistics core, which technically only knows about items that have either views or downloads. If an item is not present here you can assume it has zero views and zero downloads, but not necessarily that it does not exist in the repository.

Todo

License

This work is licensed under the GPLv3.