1
0
mirror of https://github.com/ilri/dspace-statistics-api.git synced 2024-11-21 22:05:02 +01:00
A simple REST API to expose Solr view and download statistics for items in a DSpace repository.
Go to file
Alan Orth be83514de1
All checks were successful
continuous-integration/drone/push Build is passing
Re-work Swagger UI configuration
It turns out that Swagger UI mostly does the "right" thing for our
use cases here, but it assumes that API paths are relative to the
root of the host where it is being served. This works in the local
development environment because we are serving on "/", but it does
not work in production where the API is deployed beneath the DSpace
REST API, for example at "/rest/statistics".

The solution here is to allow configuration of the DSpace Statistics
API path and use that when registering the Swagger UI as well as in
a new "server" block in the OpenAPI JSON schema.

By default it is configured to work out of the box in a development
environment. Set the DSPACE_STATISTICS_API_URL environment variable
to something like "/rest/statistics" when running in production.
2020-12-23 13:25:17 +02:00
contrib contrib: Update systemd unit files for refactor 2018-10-28 11:14:21 +02:00
dspace_statistics_api Re-work Swagger UI configuration 2020-12-23 13:25:17 +02:00
tests Add /status route 2020-12-22 11:30:09 +02:00
.build.yml .build.yml: Use poetry instead of pipenv 2020-10-05 22:37:42 +03:00
.drone.yml .drone.yml: Install gcc for Python 3.9 2020-12-14 22:50:21 +02:00
.flake8 Enable Flake8 validation in Hound CI 2018-11-04 00:48:06 +02:00
.gitignore Update docs to remove SQLite stuff 2018-09-25 00:56:01 +03:00
.hound.yml .hound.yml: Set pull requests to failed if build fails 2018-11-04 00:53:37 +02:00
CHANGELOG.md CHANGELOG.md: Add note about the /status page 2020-12-22 11:30:50 +02:00
LICENSE.txt Add GPLv3 license 2018-09-18 14:16:07 +03:00
poetry.lock Add Swagger UI on /swagger 2020-12-22 11:18:47 +02:00
pyproject.toml Bump version to 1.4.0-dev 2020-12-22 11:31:46 +02:00
pytest.ini pytest.ini: Change --strict to --strict-markers 2020-12-14 19:07:02 +02:00
README.md Update changelog and docs 2020-12-20 16:45:49 +02:00
requirements-dev.txt Update requirements 2020-12-22 12:07:07 +02:00
requirements.txt Update requirements 2020-12-22 12:07:07 +02:00
setup.cfg Add configuration for isort and black 2019-11-27 12:26:55 +02:00

DSpace Statistics API Build Status builds.sr.ht status

DSpace stores item view and download events in a Solr "statistics" core. This information is available for use in the various DSpace user interfaces, but is not exposed externally via any APIs. The DSpace 4/5/6 REST API, for example, only exposes metadata about communities, collections, items, and bitstreams.

This project contains an indexer and a Falcon-based web application to make the item, community, and collection statistics available via a simple REST API. You can read more about the Solr queries used to gather the item view and download statistics on the DSpace wiki.

If you use the DSpace Statistics API please cite:

Orth, A. 2018. DSpace statistics API. Nairobi, Kenya: ILRI. https://hdl.handle.net/10568/99143.

Requirements

Installation

Create a Python virtual environment and install the dependencies:

$ python3 -m venv venv
$ source venv/bin/activate
$ pip install -r requirements.txt

Running

Set up the environment variables for Solr and PostgreSQL:

$ export SOLR_SERVER=http://localhost:8080/solr
$ export DATABASE_NAME=dspacestatistics
$ export DATABASE_USER=dspacestatistics
$ export DATABASE_PASS=dspacestatistics
$ export DATABASE_HOST=localhost

Index the Solr statistics core to populate the PostgreSQL database:

$ python -m dspace_statistics_api.indexer

Run the REST API:

$ gunicorn dspace_statistics_api.app

Test to see if there are any statistics:

$ curl 'http://localhost:8000/items?limit=1'

Testing

Install development packages using pip:

$ pip install -r requirements-dev.txt

Run tests:

$ pytest

Deployment

There are example systemd service and timer units in the contrib directory. The API service listens on localhost by default so you will need to expose it publicly using a web server like nginx.

An example nginx configuration is:

server {
    #...

    location ~ /rest/statistics/?(.*) {
        access_log /var/log/nginx/statistics.log;
        proxy_pass http://statistics_api/$1$is_args$args;
    }
}

upstream statistics_api {
    server 127.0.0.1:5000;
}

This would expose the API at /rest/statistics.

Using the API

The API exposes the following endpoints:

  • GET /return a basic API documentation page.
  • GET /itemsreturn views and downloads for all items that Solr knows about¹. Accepts limit and page query parameters for pagination of results (limit must be an integer between 1 and 100, and page must be an integer greater than or equal to 0).
  • POST /itemsreturn views and downloads for an arbitrary list of items with an optional date range. Accepts limit, page, dateFrom, and dateTo parameters².
  • GET /item/idreturn views and downloads for a single item (id must be a UUID). Returns HTTP 404 if an item id is not found.
  • GET /communitiesreturn views and downloads for all communities that Solr knows about¹. Accepts limit and page query parameters for pagination of results (limit must be an integer between 1 and 100, and page must be an integer greater than or equal to 0).
  • POST /communitiesreturn views and downloads for an arbitrary list of communities with an optional date range. Accepts limit, page, dateFrom, and dateTo parameters².
  • GET /community/idreturn views and downloads for a single community (id must be a UUID). Returns HTTP 404 if a community id is not found.
  • GET /collectionsreturn views and downloads for all collections that Solr knows about¹. Accepts limit and page query parameters for pagination of results (limit must be an integer between 1 and 100, and page must be an integer greater than or equal to 0).
  • POST /collectionsreturn views and downloads for an arbitrary list of collections with an optional date range. Accepts limit, page, dateFrom, and dateTo parameters².
  • GET /collection/idreturn views and downloads for a single collection (id must be a UUID). Returns HTTP 404 if an collection id is not found.

The id is the internal UUID for an item, community, or collection. You can get these from the standard DSpace REST API.

¹ We are querying the Solr statistics core, which technically only knows about items, communities, or collections that have either views or downloads. If an item, community, or collection is not present here you can assume it has zero views and zero downloads, but not necessarily that it does not exist in the repository.

² POST requests to /items, /communities, and /collections should be in JSON format with the following parameters (substitute the "items" list for communities or collections accordingly):

{
    "limit": 100, // optional, integer between 0 and 100, default 100
    "page": 0, // optional, integer greater than 0, default 0
    "dateFrom": "2020-01-01T00:00:00Z", // optional, default *
    "dateTo": "2020-09-09T00:00:00Z", // optional, default *
    "items": [
        "f44cf173-2344-4eb2-8f00-ee55df32c76f",
        "2324aa41-e9de-4a2b-bc36-16241464683e",
        "8542f9da-9ce1-4614-abf4-f2e3fdb4b305",
        "0fe573e7-042a-4240-a4d9-753b61233908"
    ]
}

TODO

  • Better logging
  • Version API (or at least include a /version endpoint?)
    • Probably use /status with a version in the response
  • Use JSON in PostgreSQL
  • Add top items endpoint, perhaps /top/items or /items/top?
    • Actually we could add /items?limit=10&sort=views
  • Add Swagger with OpenAPI 3.0.x with falcon-swagger-ui

License

This work is licensed under the GPLv3.

The license allows you to use and modify the work for personal and commercial purposes, but if you distribute the work you must provide users with a means to access the source code for the version you are distributing. Read more about the GPLv3 at TL;DR Legal.