1
0
mirror of https://github.com/ilri/dspace-statistics-api.git synced 2025-05-15 17:33:02 +02:00

Compare commits

...

28 Commits

Author SHA1 Message Date
15c3299b99 CHANGELOG.md: Add changes for v0.6.0 2018-10-31 19:26:45 +02:00
d36be5ee50 contrib: Update systemd unit files for refactor 2018-10-28 11:14:21 +02:00
2f45d27554 dspace_statistics_api/app.py: remove unused code
This was added accidentally when I refactored. I was trying to see
if I could use Falcon's on_exit() hook.
2018-10-28 11:14:21 +02:00
b8356f7a87 Add "application" alias to API object
By default gunicorn looks for an "application" object to run, so this
saves us having to type api:app.
2018-10-28 11:14:21 +02:00
2136dc79ce Remove shebang from indexer.py
This is run as a Python module now so does not need a shebang.
2018-10-28 11:14:21 +02:00
ed60120cef Remove executable bit from indexer.py
Now it is run as a Python module.
2018-10-28 11:14:21 +02:00
c027f01b48 Refactor project structure
This follows guidance from several well-known Python best practices
guides. Basically, the idea is create a package for the application
that is comprised of several re-usable modules.

See: https://docs.python-guide.org/writing/structure/
See: https://realpython.com/python-application-layouts/
2018-10-28 11:14:21 +02:00
754663f062 CHANGELOG.md: Add changes for version 0.5.2 2018-10-28 11:12:27 +02:00
507699e58a requirements.txt: Update libraries
Switch to a personal fork of SolrClient so that we can use kazoo 2.5.0
and get rid of the error about the 'async' keyword on Python 3.7. Also
this bumps some of the other libraries to their latest versions.
2018-10-28 11:09:47 +02:00
a016916995 CHANGELOD.md: Add note about ujson 2018-10-24 14:15:03 +03:00
6fd2827a7c Use Python's native json instead of ujson
Falcon can optionally use ujson to speed up JSON (de)serialization,
but Falcon's already really fast and requiring ujson actually makes
deployment trickier in some cases (for example in Docker containers
that are based on Alpine Linux).

Here are some tests of Falcon 1.4.1 on Python 3.5 from my laptop:

    1. falcon...............60172 req/sec or 16.62 μs/req (36x)
    2. falcon-ext...........34186 req/sec or 29.25 μs/req (20x)
    3. bottle...............32924 req/sec or 30.37 μs/req (20x)
    4. werkzeug.............11948 req/sec or 83.70 μs/req (7x)
    5. flask.................6654 req/sec or 150.30 μs/req (4x)
    6. django................4565 req/sec or 219.04 μs/req (3x)
    7. pecan.................1672 req/sec or 598.19 μs/req (1x)

The tests were conducted with Falcon's official Docker benchmarking
tools on my Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz on Arch Linux.

See: https://github.com/falconry/falcon/tree/master/docker
2018-10-24 14:08:23 +03:00
62142eb79e CHANGELOG.md: Move unreleased changes to v0.5.0 2018-10-24 12:02:42 +03:00
fda0321942 CHANGELOG.md: Add note about Solr in API component 2018-10-24 12:01:47 +03:00
963aa245c8 app.py: Don't initialize Solr connection
We only need Solr in the indexing component, not for the API itself.
2018-10-24 11:59:50 +03:00
568ff2eebb CHANGELOG.md: Add note about nginx configuration 2018-10-23 14:56:44 +03:00
deecb8a10b README.md: Add example nginx configuration 2018-10-23 14:55:36 +03:00
12f45d7c08 contrib: Adjust example path 2018-10-23 14:34:29 +03:00
f65089f9ce CHANGELOG.md: Update and move to 0.4.3 release 2018-10-17 09:51:44 +03:00
1db5cf1c29 README.md: Grammar 2018-10-17 09:51:35 +03:00
e581c4b1aa README.md: Improve documentation 2018-10-17 09:50:30 +03:00
e8d356c9ca README.md: Add TODO about Python 3.6+ f-string syntax
They are faster.
2018-10-17 09:13:25 +03:00
34a9b8d629 CHANGELOG.md: Add unreleased changes for Travis CI 2018-10-14 19:02:09 +03:00
41e3d66a0e .travis.yml: Only build master branch 2018-10-14 19:00:31 +03:00
9b2a6137b4 README.md: Add Travis CI badge
For now this is only an indicator that the Python requirements can
be satisfied and installed.
2018-10-14 18:58:12 +03:00
600b986f99 .travis.yml: Use Python 3.7-dev instead of 3.7
I don't think Travis supports Python 3.7 yet because the builds for
that version keep failing.
2018-10-14 18:57:30 +03:00
49a7790794 .travis.yml: Move script to one line 2018-10-14 18:53:45 +03:00
f2deba627c .travis.yml: Run pip install as script
Basically for now there are no tests so I just want to just check
that requirements.txt is correct and that all dependencies can be
installed.
2018-10-14 18:47:14 +03:00
9323513794 README.md: Update instructions 2018-10-14 18:45:40 +03:00
12 changed files with 92 additions and 34 deletions

@ -2,8 +2,10 @@ language: python
python: python:
- "3.5" - "3.5"
- "3.6" - "3.6"
- "3.7" - "3.7-dev"
install: script: pip install -r requirements.txt
- pip install -r requirements.txt branches:
only:
- master
# vim: ts=2 sw=2 et # vim: ts=2 sw=2 et

@ -4,6 +4,32 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
### [0.6.0] - 2018-10-31
## Changed
- Refactor project structure (note breaking changes to API and indexing invocation, see contrib and README.md)
### [0.5.2] - 2018-10-28
## Changed
- Update library versions in requirements.txt
### [0.5.1] - 2018-10-24
## Changed
- Use Python's native json instead of ujson
### [0.5.0] - 2018-10-24
## Added
- Example nginx configuration to README.md
## Changed
- Don't initialize Solr connection in API
### [0.4.3] - 2018-10-17
## Changed
- Use pip install as script for Travis CI
## Improved
- Documentation for deployment and testing
## [0.4.2] - 2018-10-04 ## [0.4.2] - 2018-10-04
### Changed ### Changed
- README.md introduction and requirements - README.md introduction and requirements

@ -1,4 +1,4 @@
# DSpace Statistics API # DSpace Statistics API [![Build Status](https://travis-ci.org/alanorth/dspace-statistics-api.svg?branch=master)](https://travis-ci.org/alanorth/dspace-statistics-api)
A simple REST API to expose Solr view and download statistics for items in a DSpace repository. This project contains a standalone indexing component and a WSGI application. A simple REST API to expose Solr view and download statistics for items in a DSpace repository. This project contains a standalone indexing component and a WSGI application.
## Requirements ## Requirements
@ -14,14 +14,47 @@ Create a Python virtual environment and install the dependencies:
$ . venv/bin/activate $ . venv/bin/activate
$ pip install -r requirements.txt $ pip install -r requirements.txt
Set up the environment variables Solr and PostgreSQL: Set up the environment variables for Solr and PostgreSQL:
$ export SOLR_SERVER=http://localhost:8080/solr $ export SOLR_SERVER=http://localhost:8080/solr
$ $ export DATABASE_NAME=dspacestatistics
$ gunicorn app:api $ export DATABASE_USER=dspacestatistics
$ export DATABASE_PASS=dspacestatistics
$ export DATABASE_HOST=localhost
Index the Solr statistics core to populate the PostgreSQL database:
$ python -m dspace_statistics_api.indexer
Run the REST API:
$ gunicorn dspace_statistics_api.app
Test to see if there are any statistics:
$ curl 'http://localhost:8000/items?limit=1'
## Deployment ## Deployment
There are example systemd service and timer units in the `contrib` directory. There are example systemd service and timer units in the `contrib` directory. The API service listens on localhost by default so you will need to expose it publicly using a web server like nginx.
An example nginx configuration is:
```
server {
#...
location ~ /rest/statistics/?(.*) {
access_log /var/log/nginx/statistics.log;
proxy_pass http://statistics_api/$1$is_args$args;
}
}
upstream statistics_api {
server 127.0.0.1:5000;
}
```
This would expose the API at `/rest/statistics`.
## Using the API ## Using the API
The API exposes the following endpoints: The API exposes the following endpoints:
@ -34,12 +67,13 @@ The API exposes the following endpoints:
## Todo ## Todo
- Add API documentation - Add API documentation
- Close up DB connection when gunicorn shuts down gracefully - Close DB connection when gunicorn shuts down gracefully
- Better logging - Better logging
- Tests - Tests
- Check if database exists (try/except) - Check if database exists (try/except)
- Version API - Version API
- Use JSON in PostgreSQL - Use JSON in PostgreSQL
- Switch to [Python 3.6+ f-string syntax](https://realpython.com/python-f-strings/)
## License ## License
This work is licensed under the [GPLv3](https://www.gnu.org/licenses/gpl-3.0.en.html). This work is licensed under the [GPLv3](https://www.gnu.org/licenses/gpl-3.0.en.html).

@ -9,10 +9,10 @@ Environment=DATABASE_PASS=dspacestatistics
Environment=DATABASE_HOST=localhost Environment=DATABASE_HOST=localhost
User=nobody User=nobody
Group=nogroup Group=nogroup
WorkingDirectory=/opt/ilri/dspace-statistics-api WorkingDirectory=/var/lib/dspace-statistics-api
ExecStart=/opt/ilri/dspace-statistics-api/venv/bin/gunicorn \ ExecStart=/var/lib/dspace-statistics-api/venv/bin/gunicorn \
--bind 127.0.0.1:5000 \ --bind 127.0.0.1:5000 \
app:api dspace_statistics_api.app
ExecReload=/bin/kill -s HUP $MAINPID ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s TERM $MAINPID ExecStop=/bin/kill -s TERM $MAINPID

@ -10,8 +10,8 @@ Environment=DATABASE_PASS=dspacestatistics
Environment=DATABASE_HOST=localhost Environment=DATABASE_HOST=localhost
User=nobody User=nobody
Group=nogroup Group=nogroup
WorkingDirectory=/opt/ilri/dspace-statistics-api WorkingDirectory=/var/lib/dspace-statistics-api
ExecStart=/opt/ilri/dspace-statistics-api/venv/bin/python indexer.py ExecStart=/var/lib/dspace-statistics-api/venv/bin/python -m dspace_statistics_api.indexer
[Install] [Install]
WantedBy=multi-user.target WantedBy=multi-user.target

@ -1,10 +1,8 @@
from database import database_connection from .database import database_connection
import falcon import falcon
from solr import solr_connection
db = database_connection() db = database_connection()
db.set_session(readonly=True) db.set_session(readonly=True)
solr = solr_connection()
class AllItemsResource: class AllItemsResource:
def on_get(self, req, resp): def on_get(self, req, resp):
@ -65,7 +63,7 @@ class ItemResource:
cursor.close() cursor.close()
api = falcon.API() api = application = falcon.API()
api.add_route('/items', AllItemsResource()) api.add_route('/items', AllItemsResource())
api.add_route('/item/{item_id:int}', ItemResource()) api.add_route('/item/{item_id:int}', ItemResource())

@ -1,7 +1,7 @@
from config import DATABASE_NAME from .config import DATABASE_NAME
from config import DATABASE_USER from .config import DATABASE_USER
from config import DATABASE_PASS from .config import DATABASE_PASS
from config import DATABASE_HOST from .config import DATABASE_HOST
import psycopg2, psycopg2.extras import psycopg2, psycopg2.extras
def database_connection(): def database_connection():

11
indexer.py → dspace_statistics_api/indexer.py Executable file → Normal file

@ -1,4 +1,3 @@
#!/usr/bin/env python
# #
# indexer.py # indexer.py
# #
@ -30,10 +29,10 @@
# See: https://solrclient.readthedocs.io/en/latest/SolrClient.html # See: https://solrclient.readthedocs.io/en/latest/SolrClient.html
# See: https://wiki.duraspace.org/display/DSPACE/Solr # See: https://wiki.duraspace.org/display/DSPACE/Solr
from database import database_connection from .database import database_connection
import ujson import json
import psycopg2.extras import psycopg2.extras
from solr import solr_connection from .solr import solr_connection
def index_views(): def index_views():
# get total number of distinct facets for items with a minimum of 1 view, # get total number of distinct facets for items with a minimum of 1 view,
@ -56,7 +55,7 @@ def index_views():
}, rows=0) }, rows=0)
# get total number of distinct facets (countDistinct) # get total number of distinct facets (countDistinct)
results_totalNumFacets = ujson.loads(res.get_json())['stats']['stats_fields']['id']['countDistinct'] results_totalNumFacets = json.loads(res.get_json())['stats']['stats_fields']['id']['countDistinct']
# divide results into "pages" (cast to int to effectively round down) # divide results into "pages" (cast to int to effectively round down)
results_per_page = 100 results_per_page = 100
@ -115,7 +114,7 @@ def index_downloads():
}, rows=0) }, rows=0)
# get total number of distinct facets (countDistinct) # get total number of distinct facets (countDistinct)
results_totalNumFacets = ujson.loads(res.get_json())['stats']['stats_fields']['owningItem']['countDistinct'] results_totalNumFacets = json.loads(res.get_json())['stats']['stats_fields']['owningItem']['countDistinct']
# divide results into "pages" (cast to int to effectively round down) # divide results into "pages" (cast to int to effectively round down)
results_per_page = 100 results_per_page = 100

@ -1,4 +1,4 @@
from config import SOLR_SERVER from .config import SOLR_SERVER
from SolrClient import SolrClient from SolrClient import SolrClient
def solr_connection(): def solr_connection():

@ -1,4 +1,4 @@
certifi==2018.8.24 certifi==2018.10.15
chardet==3.0.4 chardet==3.0.4
falcon==1.4.1 falcon==1.4.1
gunicorn==19.9.0 gunicorn==19.9.0
@ -6,8 +6,7 @@ idna==2.7
kazoo==2.5.0 kazoo==2.5.0
psycopg2-binary==2.7.5 psycopg2-binary==2.7.5
python-mimeparse==1.6.0 python-mimeparse==1.6.0
requests==2.19.1 requests==2.20.0
six==1.11.0 six==1.11.0
SolrClient==0.2.1 -e git://github.com/alanorth/SolrClient.git@c629e3475be37c82770b2be61748be7e29882648#egg=SolrClient
ujson==1.35 urllib3==1.24
urllib3==1.23