1
0
mirror of https://github.com/ilri/dspace-statistics-api.git synced 2025-05-10 07:06:01 +02:00

Compare commits

...

19 Commits

Author SHA1 Message Date
78900b5d85 CHANGELOG.md: Add changes for v0.6.1 2018-11-01 00:39:12 +02:00
eb08832bf8 Sync API documentation HTML with README.md 2018-11-01 00:37:52 +02:00
c2ec780ad9 README.md: Improve API documentation 2018-11-01 00:37:40 +02:00
df8ebc8bf1 README.md: Improve API endpoint documentation 2018-11-01 00:31:16 +02:00
0d4be5f4c8 README.md: Add API documentation endpoint 2018-11-01 00:22:16 +02:00
30dc7f1939 Add basic API documentation on root (/)
I had imagined plugging in an interactive Swagger or OpenAPI instance
here, but that's actually much more involved in Falcon than I want to
deal with right now.
2018-11-01 00:19:39 +02:00
77194707fd README.md: Improve introduction 2018-11-01 00:08:24 +02:00
10c1f8bdcc README.md: Update Travis CI badge 2018-10-31 23:14:38 +02:00
da74943da2 README.md: Update introduction 2018-10-31 22:40:36 +02:00
fc8348ab29 README.md: Add acknoledgement about the Solr queries 2018-10-31 19:36:50 +02:00
15c3299b99 CHANGELOG.md: Add changes for v0.6.0 2018-10-31 19:26:45 +02:00
d36be5ee50 contrib: Update systemd unit files for refactor 2018-10-28 11:14:21 +02:00
2f45d27554 dspace_statistics_api/app.py: remove unused code
This was added accidentally when I refactored. I was trying to see
if I could use Falcon's on_exit() hook.
2018-10-28 11:14:21 +02:00
b8356f7a87 Add "application" alias to API object
By default gunicorn looks for an "application" object to run, so this
saves us having to type api:app.
2018-10-28 11:14:21 +02:00
2136dc79ce Remove shebang from indexer.py
This is run as a Python module now so does not need a shebang.
2018-10-28 11:14:21 +02:00
ed60120cef Remove executable bit from indexer.py
Now it is run as a Python module.
2018-10-28 11:14:21 +02:00
c027f01b48 Refactor project structure
This follows guidance from several well-known Python best practices
guides. Basically, the idea is create a package for the application
that is comprised of several re-usable modules.

See: https://docs.python-guide.org/writing/structure/
See: https://realpython.com/python-application-layouts/
2018-10-28 11:14:21 +02:00
754663f062 CHANGELOG.md: Add changes for version 0.5.2 2018-10-28 11:12:27 +02:00
507699e58a requirements.txt: Update libraries
Switch to a personal fork of SolrClient so that we can use kazoo 2.5.0
and get rid of the error about the 'async' keyword on Python 3.7. Also
this bumps some of the other libraries to their latest versions.
2018-10-28 11:09:47 +02:00
12 changed files with 67 additions and 27 deletions

View File

@ -4,6 +4,18 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
### [0.6.1] - 2018-10-31
## Added
- API documentation at root path (/)
### [0.6.0] - 2018-10-31
## Changed
- Refactor project structure (note breaking changes to API and indexing invocation, see contrib and README.md)
### [0.5.2] - 2018-10-28
## Changed
- Update library versions in requirements.txt
### [0.5.1] - 2018-10-24
## Changed
- Use Python's native json instead of ujson

View File

@ -1,5 +1,7 @@
# DSpace Statistics API [![Build Status](https://travis-ci.org/alanorth/dspace-statistics-api.svg?branch=master)](https://travis-ci.org/alanorth/dspace-statistics-api)
A simple REST API to expose Solr view and download statistics for items in a DSpace repository. This project contains a standalone indexing component and a WSGI application.
# DSpace Statistics API [![Build Status](https://travis-ci.org/ilri/dspace-statistics-api.svg?branch=master)](https://travis-ci.org/ilri/dspace-statistics-api)
DSpace versions 4.0 and up include a [REST API](https://wiki.duraspace.org/display/DSDOC5x/REST+API) that allows the repository to be queried programmatically. The API exposes information about communities, collections, items, and bitstreams, but not item views or downloads. This project contains a lightweight indexer and a web application to make the view and download statistics available via a simple REST API that can be deployed simultaneously with DSpace's own.
You can read more about the Solr queries used to gather the item view and download statistics on the [DSpace wiki](https://wiki.duraspace.org/display/DSPACE/Solr).
## Requirements
@ -24,11 +26,11 @@ Set up the environment variables for Solr and PostgreSQL:
Index the Solr statistics core to populate the PostgreSQL database:
$ ./indexer.py
$ python -m dspace_statistics_api.indexer
Run the REST API:
$ gunicorn app:api
$ gunicorn dspace_statistics_api.app
Test to see if there are any statistics:
@ -59,14 +61,16 @@ This would expose the API at `/rest/statistics`.
## Using the API
The API exposes the following endpoints:
- GET `/items`return views and downloads for all items that Solr knows about¹. Accepts `limit` and `page` query parameters for pagination of results.
- GET `/item/id`return views and downloads for a single item (*id* must be a positive integer). Returns HTTP 404 if an item id is not found.
- GET `/`return a basic API documentation page.
- GET `/items`return views and downloads for all items that Solr knows about¹. Accepts `limit` and `page` query parameters for pagination of results (`limit` must be an integer between 1 and 100, and `page` must be an integer greater than or equal to 0).
- GET `/item/id`return views and downloads for a single item (`id` must be a positive integer). Returns HTTP 404 if an item id is not found.
¹ We are querying the Solr statistics core, which technically only knows about items that have either views or downloads.
The item id is the *internal* id for an item. You can get these from the standard DSpace REST API.
¹ We are querying the Solr statistics core, which technically only knows about items that have either views or downloads. If an item is not present here you can assume it has zero views and zero downloads, but not necessarily that it does not exist in the repository.
## Todo
- Add API documentation
- Close DB connection when gunicorn shuts down gracefully
- Better logging
- Tests

View File

@ -12,7 +12,7 @@ Group=nogroup
WorkingDirectory=/var/lib/dspace-statistics-api
ExecStart=/var/lib/dspace-statistics-api/venv/bin/gunicorn \
--bind 127.0.0.1:5000 \
app:api
dspace_statistics_api.app
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s TERM $MAINPID

View File

@ -11,7 +11,7 @@ Environment=DATABASE_HOST=localhost
User=nobody
Group=nogroup
WorkingDirectory=/var/lib/dspace-statistics-api
ExecStart=/var/lib/dspace-statistics-api/venv/bin/python indexer.py
ExecStart=/var/lib/dspace-statistics-api/venv/bin/python -m dspace_statistics_api.indexer
[Install]
WantedBy=multi-user.target

View File

View File

@ -1,9 +1,16 @@
from database import database_connection
from .database import database_connection
import falcon
db = database_connection()
db.set_session(readonly=True)
class RootResource:
def on_get(self, req, resp):
resp.status = falcon.HTTP_200
resp.content_type = 'text/html'
with open('dspace_statistics_api/docs/index.html', 'r') as f:
resp.body = f.read()
class AllItemsResource:
def on_get(self, req, resp):
"""Handles GET requests"""
@ -63,10 +70,8 @@ class ItemResource:
cursor.close()
def on_exit(api):
print("Shutting down DB")
api = falcon.API()
api = application = falcon.API()
api.add_route('/', RootResource())
api.add_route('/items', AllItemsResource())
api.add_route('/item/{item_id:int}', ItemResource())

View File

@ -1,7 +1,7 @@
from config import DATABASE_NAME
from config import DATABASE_USER
from config import DATABASE_PASS
from config import DATABASE_HOST
from .config import DATABASE_NAME
from .config import DATABASE_USER
from .config import DATABASE_PASS
from .config import DATABASE_HOST
import psycopg2, psycopg2.extras
def database_connection():

View File

@ -0,0 +1,20 @@
<!DOCTYPE html>
<html lang="en-US">
<head>
<meta charset="UTF-8">
<title>DSpace Statistics API</title>
</head>
<body>
<h1>DSpace Statistics API</h1>
<p>This site is running the <a href="https://github.com/ilri/dspace-statistics-api" title="DSpace Statistics API project">DSpace Statistics API</a>. The following endpoints are available:</p>
<ul>
<li>GET <code>/</code>return a basic API documentation page.</li>
<li>GET <code>/items</code>return views and downloads for all items that Solr knows about¹. Accepts <code>limit</code> and <code>page</code> query parameters for pagination of results (<code>limit</code> must be an integer between 1 and 100, and <code>page</code> must be an integer greater than or equal to 0).</li>
<li>GET <code>/item/id</code>return views and downloads for a single item (<code>id</code> must be a positive integer). Returns HTTP 404 if an item id is not found.</li>
</ul>
<p>The item id is the <em>internal</em> id for an item. You can get these from the standard DSpace REST API.</p>
<p>¹ We are querying the Solr statistics core, which technically only knows about items that have either views or downloads. If an item is not present here you can assume it has zero views and zero downloads, but not necessarily that it does not exist in the repository.</code>
</body>
</html>

5
indexer.py → dspace_statistics_api/indexer.py Executable file → Normal file
View File

@ -1,4 +1,3 @@
#!/usr/bin/env python
#
# indexer.py
#
@ -30,10 +29,10 @@
# See: https://solrclient.readthedocs.io/en/latest/SolrClient.html
# See: https://wiki.duraspace.org/display/DSPACE/Solr
from database import database_connection
from .database import database_connection
import json
import psycopg2.extras
from solr import solr_connection
from .solr import solr_connection
def index_views():
# get total number of distinct facets for items with a minimum of 1 view,

View File

@ -1,4 +1,4 @@
from config import SOLR_SERVER
from .config import SOLR_SERVER
from SolrClient import SolrClient
def solr_connection():

View File

@ -1,4 +1,4 @@
certifi==2018.8.24
certifi==2018.10.15
chardet==3.0.4
falcon==1.4.1
gunicorn==19.9.0
@ -6,7 +6,7 @@ idna==2.7
kazoo==2.5.0
psycopg2-binary==2.7.5
python-mimeparse==1.6.0
requests==2.19.1
requests==2.20.0
six==1.11.0
SolrClient==0.2.1
urllib3==1.23
-e git://github.com/alanorth/SolrClient.git@c629e3475be37c82770b2be61748be7e29882648#egg=SolrClient
urllib3==1.24