1
0
mirror of https://github.com/ilri/dspace-statistics-api.git synced 2025-05-10 07:06:01 +02:00

Compare commits

...

21 Commits

Author SHA1 Message Date
754663f062 CHANGELOG.md: Add changes for version 0.5.2 2018-10-28 11:12:27 +02:00
507699e58a requirements.txt: Update libraries
Switch to a personal fork of SolrClient so that we can use kazoo 2.5.0
and get rid of the error about the 'async' keyword on Python 3.7. Also
this bumps some of the other libraries to their latest versions.
2018-10-28 11:09:47 +02:00
a016916995 CHANGELOD.md: Add note about ujson 2018-10-24 14:15:03 +03:00
6fd2827a7c Use Python's native json instead of ujson
Falcon can optionally use ujson to speed up JSON (de)serialization,
but Falcon's already really fast and requiring ujson actually makes
deployment trickier in some cases (for example in Docker containers
that are based on Alpine Linux).

Here are some tests of Falcon 1.4.1 on Python 3.5 from my laptop:

    1. falcon...............60172 req/sec or 16.62 μs/req (36x)
    2. falcon-ext...........34186 req/sec or 29.25 μs/req (20x)
    3. bottle...............32924 req/sec or 30.37 μs/req (20x)
    4. werkzeug.............11948 req/sec or 83.70 μs/req (7x)
    5. flask.................6654 req/sec or 150.30 μs/req (4x)
    6. django................4565 req/sec or 219.04 μs/req (3x)
    7. pecan.................1672 req/sec or 598.19 μs/req (1x)

The tests were conducted with Falcon's official Docker benchmarking
tools on my Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz on Arch Linux.

See: https://github.com/falconry/falcon/tree/master/docker
2018-10-24 14:08:23 +03:00
62142eb79e CHANGELOG.md: Move unreleased changes to v0.5.0 2018-10-24 12:02:42 +03:00
fda0321942 CHANGELOG.md: Add note about Solr in API component 2018-10-24 12:01:47 +03:00
963aa245c8 app.py: Don't initialize Solr connection
We only need Solr in the indexing component, not for the API itself.
2018-10-24 11:59:50 +03:00
568ff2eebb CHANGELOG.md: Add note about nginx configuration 2018-10-23 14:56:44 +03:00
deecb8a10b README.md: Add example nginx configuration 2018-10-23 14:55:36 +03:00
12f45d7c08 contrib: Adjust example path 2018-10-23 14:34:29 +03:00
f65089f9ce CHANGELOG.md: Update and move to 0.4.3 release 2018-10-17 09:51:44 +03:00
1db5cf1c29 README.md: Grammar 2018-10-17 09:51:35 +03:00
e581c4b1aa README.md: Improve documentation 2018-10-17 09:50:30 +03:00
e8d356c9ca README.md: Add TODO about Python 3.6+ f-string syntax
They are faster.
2018-10-17 09:13:25 +03:00
34a9b8d629 CHANGELOG.md: Add unreleased changes for Travis CI 2018-10-14 19:02:09 +03:00
41e3d66a0e .travis.yml: Only build master branch 2018-10-14 19:00:31 +03:00
9b2a6137b4 README.md: Add Travis CI badge
For now this is only an indicator that the Python requirements can
be satisfied and installed.
2018-10-14 18:58:12 +03:00
600b986f99 .travis.yml: Use Python 3.7-dev instead of 3.7
I don't think Travis supports Python 3.7 yet because the builds for
that version keep failing.
2018-10-14 18:57:30 +03:00
49a7790794 .travis.yml: Move script to one line 2018-10-14 18:53:45 +03:00
f2deba627c .travis.yml: Run pip install as script
Basically for now there are no tests so I just want to just check
that requirements.txt is correct and that all dependencies can be
installed.
2018-10-14 18:47:14 +03:00
9323513794 README.md: Update instructions 2018-10-14 18:45:40 +03:00
8 changed files with 80 additions and 22 deletions

View File

@ -2,8 +2,10 @@ language: python
python:
- "3.5"
- "3.6"
- "3.7"
install:
- pip install -r requirements.txt
- "3.7-dev"
script: pip install -r requirements.txt
branches:
only:
- master
# vim: ts=2 sw=2 et

View File

@ -4,6 +4,28 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
### [0.5.2] - 2018-10-28
## Changed
- Update library versions in requirements.txt
### [0.5.1] - 2018-10-24
## Changed
- Use Python's native json instead of ujson
### [0.5.0] - 2018-10-24
## Added
- Example nginx configuration to README.md
## Changed
- Don't initialize Solr connection in API
### [0.4.3] - 2018-10-17
## Changed
- Use pip install as script for Travis CI
## Improved
- Documentation for deployment and testing
## [0.4.2] - 2018-10-04
### Changed
- README.md introduction and requirements

View File

@ -1,4 +1,4 @@
# DSpace Statistics API
# DSpace Statistics API [![Build Status](https://travis-ci.org/alanorth/dspace-statistics-api.svg?branch=master)](https://travis-ci.org/alanorth/dspace-statistics-api)
A simple REST API to expose Solr view and download statistics for items in a DSpace repository. This project contains a standalone indexing component and a WSGI application.
## Requirements
@ -14,14 +14,47 @@ Create a Python virtual environment and install the dependencies:
$ . venv/bin/activate
$ pip install -r requirements.txt
Set up the environment variables Solr and PostgreSQL:
Set up the environment variables for Solr and PostgreSQL:
$ export SOLR_SERVER=http://localhost:8080/solr
$
$ export DATABASE_NAME=dspacestatistics
$ export DATABASE_USER=dspacestatistics
$ export DATABASE_PASS=dspacestatistics
$ export DATABASE_HOST=localhost
Index the Solr statistics core to populate the PostgreSQL database:
$ ./indexer.py
Run the REST API:
$ gunicorn app:api
Test to see if there are any statistics:
$ curl 'http://localhost:8000/items?limit=1'
## Deployment
There are example systemd service and timer units in the `contrib` directory.
There are example systemd service and timer units in the `contrib` directory. The API service listens on localhost by default so you will need to expose it publicly using a web server like nginx.
An example nginx configuration is:
```
server {
#...
location ~ /rest/statistics/?(.*) {
access_log /var/log/nginx/statistics.log;
proxy_pass http://statistics_api/$1$is_args$args;
}
}
upstream statistics_api {
server 127.0.0.1:5000;
}
```
This would expose the API at `/rest/statistics`.
## Using the API
The API exposes the following endpoints:
@ -34,12 +67,13 @@ The API exposes the following endpoints:
## Todo
- Add API documentation
- Close up DB connection when gunicorn shuts down gracefully
- Close DB connection when gunicorn shuts down gracefully
- Better logging
- Tests
- Check if database exists (try/except)
- Version API
- Use JSON in PostgreSQL
- Switch to [Python 3.6+ f-string syntax](https://realpython.com/python-f-strings/)
## License
This work is licensed under the [GPLv3](https://www.gnu.org/licenses/gpl-3.0.en.html).

5
app.py
View File

@ -1,10 +1,8 @@
from database import database_connection
import falcon
from solr import solr_connection
db = database_connection()
db.set_session(readonly=True)
solr = solr_connection()
class AllItemsResource:
def on_get(self, req, resp):
@ -65,6 +63,9 @@ class ItemResource:
cursor.close()
def on_exit(api):
print("Shutting down DB")
api = falcon.API()
api.add_route('/items', AllItemsResource())
api.add_route('/item/{item_id:int}', ItemResource())

View File

@ -9,8 +9,8 @@ Environment=DATABASE_PASS=dspacestatistics
Environment=DATABASE_HOST=localhost
User=nobody
Group=nogroup
WorkingDirectory=/opt/ilri/dspace-statistics-api
ExecStart=/opt/ilri/dspace-statistics-api/venv/bin/gunicorn \
WorkingDirectory=/var/lib/dspace-statistics-api
ExecStart=/var/lib/dspace-statistics-api/venv/bin/gunicorn \
--bind 127.0.0.1:5000 \
app:api
ExecReload=/bin/kill -s HUP $MAINPID

View File

@ -10,8 +10,8 @@ Environment=DATABASE_PASS=dspacestatistics
Environment=DATABASE_HOST=localhost
User=nobody
Group=nogroup
WorkingDirectory=/opt/ilri/dspace-statistics-api
ExecStart=/opt/ilri/dspace-statistics-api/venv/bin/python indexer.py
WorkingDirectory=/var/lib/dspace-statistics-api
ExecStart=/var/lib/dspace-statistics-api/venv/bin/python indexer.py
[Install]
WantedBy=multi-user.target

View File

@ -31,7 +31,7 @@
# See: https://wiki.duraspace.org/display/DSPACE/Solr
from database import database_connection
import ujson
import json
import psycopg2.extras
from solr import solr_connection
@ -56,7 +56,7 @@ def index_views():
}, rows=0)
# get total number of distinct facets (countDistinct)
results_totalNumFacets = ujson.loads(res.get_json())['stats']['stats_fields']['id']['countDistinct']
results_totalNumFacets = json.loads(res.get_json())['stats']['stats_fields']['id']['countDistinct']
# divide results into "pages" (cast to int to effectively round down)
results_per_page = 100
@ -115,7 +115,7 @@ def index_downloads():
}, rows=0)
# get total number of distinct facets (countDistinct)
results_totalNumFacets = ujson.loads(res.get_json())['stats']['stats_fields']['owningItem']['countDistinct']
results_totalNumFacets = json.loads(res.get_json())['stats']['stats_fields']['owningItem']['countDistinct']
# divide results into "pages" (cast to int to effectively round down)
results_per_page = 100

View File

@ -1,4 +1,4 @@
certifi==2018.8.24
certifi==2018.10.15
chardet==3.0.4
falcon==1.4.1
gunicorn==19.9.0
@ -6,8 +6,7 @@ idna==2.7
kazoo==2.5.0
psycopg2-binary==2.7.5
python-mimeparse==1.6.0
requests==2.19.1
requests==2.20.0
six==1.11.0
SolrClient==0.2.1
ujson==1.35
urllib3==1.23
-e git://github.com/alanorth/SolrClient.git@c629e3475be37c82770b2be61748be7e29882648#egg=SolrClient
urllib3==1.24