1
0
mirror of https://github.com/ilri/dspace-statistics-api.git synced 2025-01-24 19:33:25 +01:00
Alan Orth e604d8ca81
indexer.py: Major refactor
Basically Solr's numFound has nothing to do with the actual number
of distinct facets that are returned. You need to use Solr's stats
component to get the number of distinct facets, aka countDistinct.
This is apparently deprecated in newer Solr versions, but we're on
version 4.10 and it works there.

Also, I realized that there is no need to return facets for items
without any views or downloads. Using facet.mincount=1 reduces the
result set size and also means we can store less data in the data-
base. The API returns HTTP 404 Not Found if an item is not in the
database anyways.

I can't figure it out exactly, but there is some weird issue with
Solr's facet results when you don't use facet.mincount=1. For some
reason you get tons of results with an id that doesn't even exist
in the document database, let alone as an actual DSpace item!

See: https://lucene.apache.org/solr/guide/6_6/the-stats-component.html
2018-09-26 02:41:10 +03:00
2018-09-25 00:56:01 +03:00
2018-09-25 12:17:20 +03:00
2018-09-25 00:49:47 +03:00
2018-09-26 02:41:10 +03:00
2018-09-18 14:16:07 +03:00
2018-09-23 13:24:30 +03:00

DSpace Statistics API

A quick and dirty REST API to expose Solr view and download statistics for items in a DSpace repository.

Written and tested in Python 3.5, 3.6, and 3.7. Requires PostgreSQL version 9.5 or greater for UPSERT support.

Installation

Create a virtual environment and run it:

$ python -m venv venv
$ . venv/bin/activate
$ pip install -r requirements.txt
$ gunicorn app:api

Using the API

The API exposes the following endpoints:

  • GET /itemsreturn views and downloads for all items that Solr knows about¹. Accepts limit and page query parameters for pagination of results.
  • GET /item/idreturn views and downloads for a single item (id must be a positive integer). Returns HTTP 404 if an item id is not found.

¹ We are querying the Solr statistics core, which technically only knows about items that have either views or downloads.

Todo

  • Add API documentation
  • Close up DB connection when gunicorn shuts down gracefully
  • Better logging
  • Tests

License

This work is licensed under the GPLv3.

Description
A simple REST API to expose Solr view and download statistics for items in a DSpace repository.
Readme 1.6 MiB
Languages
Python 100%