1
0
mirror of https://github.com/ilri/dspace-statistics-api.git synced 2024-12-22 12:42:19 +01:00
A simple REST API to expose Solr view and download statistics for items in a DSpace repository.
Go to file
Alan Orth e604d8ca81
indexer.py: Major refactor
Basically Solr's numFound has nothing to do with the actual number
of distinct facets that are returned. You need to use Solr's stats
component to get the number of distinct facets, aka countDistinct.
This is apparently deprecated in newer Solr versions, but we're on
version 4.10 and it works there.

Also, I realized that there is no need to return facets for items
without any views or downloads. Using facet.mincount=1 reduces the
result set size and also means we can store less data in the data-
base. The API returns HTTP 404 Not Found if an item is not in the
database anyways.

I can't figure it out exactly, but there is some weird issue with
Solr's facet results when you don't use facet.mincount=1. For some
reason you get tons of results with an id that doesn't even exist
in the document database, let alone as an actual DSpace item!

See: https://lucene.apache.org/solr/guide/6_6/the-stats-component.html
2018-09-26 02:41:10 +03:00
contrib contrib/dspace-statistics-indexer.timer: Fix syntax 2018-09-25 23:07:03 +03:00
.gitignore Update docs to remove SQLite stuff 2018-09-25 00:56:01 +03:00
.travis.yml .travis.yml: Add Python 3.7 2018-09-25 12:17:20 +03:00
app.py Return HTTP 404 when an item id is not found 2018-09-25 13:12:53 +03:00
CHANGELOG.md CHANGELOG.md: Add unreleased changes 2018-09-25 23:09:44 +03:00
config.py Use PostgreSQL instead of SQLite 2018-09-25 00:49:47 +03:00
database.py database.py: Use psycopg2.extras.DictCursor 2018-09-25 02:06:29 +03:00
indexer.py indexer.py: Major refactor 2018-09-26 02:41:10 +03:00
LICENSE.txt Add GPLv3 license 2018-09-18 14:16:07 +03:00
README.md Return HTTP 404 when an item id is not found 2018-09-25 13:12:53 +03:00
requirements.txt requirements.txt: Use kazoo 2.5.0 2018-09-25 12:08:28 +03:00
solr.py Refactor Solr components 2018-09-23 13:24:30 +03:00

DSpace Statistics API

A quick and dirty REST API to expose Solr view and download statistics for items in a DSpace repository.

Written and tested in Python 3.5, 3.6, and 3.7. Requires PostgreSQL version 9.5 or greater for UPSERT support.

Installation

Create a virtual environment and run it:

$ python -m venv venv
$ . venv/bin/activate
$ pip install -r requirements.txt
$ gunicorn app:api

Using the API

The API exposes the following endpoints:

  • GET /itemsreturn views and downloads for all items that Solr knows about¹. Accepts limit and page query parameters for pagination of results.
  • GET /item/idreturn views and downloads for a single item (id must be a positive integer). Returns HTTP 404 if an item id is not found.

¹ We are querying the Solr statistics core, which technically only knows about items that have either views or downloads.

Todo

  • Add API documentation
  • Close up DB connection when gunicorn shuts down gracefully
  • Better logging
  • Tests

License

This work is licensed under the GPLv3.