mirror of
https://github.com/ilri/dspace-statistics-api.git
synced 2025-05-10 15:16:02 +02:00
Compare commits
52 Commits
Author | SHA1 | Date | |
---|---|---|---|
15c3299b99
|
|||
d36be5ee50 | |||
2f45d27554 | |||
b8356f7a87 | |||
2136dc79ce | |||
ed60120cef | |||
c027f01b48 | |||
754663f062
|
|||
507699e58a
|
|||
a016916995
|
|||
6fd2827a7c
|
|||
62142eb79e
|
|||
fda0321942
|
|||
963aa245c8
|
|||
568ff2eebb
|
|||
deecb8a10b
|
|||
12f45d7c08
|
|||
f65089f9ce
|
|||
1db5cf1c29
|
|||
e581c4b1aa
|
|||
e8d356c9ca
|
|||
34a9b8d629
|
|||
41e3d66a0e
|
|||
9b2a6137b4
|
|||
600b986f99
|
|||
49a7790794
|
|||
f2deba627c
|
|||
9323513794
|
|||
daf15610f2
|
|||
4ede966dbb
|
|||
3580473a6d
|
|||
071c24535f
|
|||
4291aecac4
|
|||
46bf537e88
|
|||
eaca5354d3
|
|||
4600288ee4
|
|||
8179563378
|
|||
b14c3eef4d
|
|||
71a789b13f
|
|||
c68ddacaa4
|
|||
9c9e79769e
|
|||
2ad5ade556
|
|||
7412a09670
|
|||
bb744a00b8
|
|||
7499b89d99
|
|||
2c1e4952b1
|
|||
379f202c3f
|
|||
560fa6056d
|
|||
385a34e5d0
|
|||
d0ea62d2bd
|
|||
366ae25b8e
|
|||
0f3054ae03
|
@ -2,8 +2,10 @@ language: python
|
|||||||
python:
|
python:
|
||||||
- "3.5"
|
- "3.5"
|
||||||
- "3.6"
|
- "3.6"
|
||||||
- "3.7"
|
- "3.7-dev"
|
||||||
install:
|
script: pip install -r requirements.txt
|
||||||
- pip install -r requirements.txt
|
branches:
|
||||||
|
only:
|
||||||
|
- master
|
||||||
|
|
||||||
# vim: ts=2 sw=2 et
|
# vim: ts=2 sw=2 et
|
||||||
|
39
CHANGELOG.md
39
CHANGELOG.md
@ -4,6 +4,45 @@ All notable changes to this project will be documented in this file.
|
|||||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
||||||
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||||
|
|
||||||
|
### [0.6.0] - 2018-10-31
|
||||||
|
## Changed
|
||||||
|
- Refactor project structure (note breaking changes to API and indexing invocation, see contrib and README.md)
|
||||||
|
|
||||||
|
### [0.5.2] - 2018-10-28
|
||||||
|
## Changed
|
||||||
|
- Update library versions in requirements.txt
|
||||||
|
|
||||||
|
### [0.5.1] - 2018-10-24
|
||||||
|
## Changed
|
||||||
|
- Use Python's native json instead of ujson
|
||||||
|
|
||||||
|
### [0.5.0] - 2018-10-24
|
||||||
|
## Added
|
||||||
|
- Example nginx configuration to README.md
|
||||||
|
|
||||||
|
## Changed
|
||||||
|
- Don't initialize Solr connection in API
|
||||||
|
|
||||||
|
### [0.4.3] - 2018-10-17
|
||||||
|
## Changed
|
||||||
|
- Use pip install as script for Travis CI
|
||||||
|
|
||||||
|
## Improved
|
||||||
|
- Documentation for deployment and testing
|
||||||
|
|
||||||
|
## [0.4.2] - 2018-10-04
|
||||||
|
### Changed
|
||||||
|
- README.md introduction and requirements
|
||||||
|
- Use ujson instead of json
|
||||||
|
- Iterate directly on SQL cursor in `/items` route
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Logic error in SQL for item views
|
||||||
|
|
||||||
|
## [0.4.1] - 2018-09-26
|
||||||
|
### Changed
|
||||||
|
- Use execute_values() to batch insert records to PostgreSQL
|
||||||
|
|
||||||
## [0.4.0] - 2018-09-25
|
## [0.4.0] - 2018-09-25
|
||||||
### Fixed
|
### Fixed
|
||||||
- Invalid OnCalendar syntax in dspace-statistics-indexer.timer
|
- Invalid OnCalendar syntax in dspace-statistics-indexer.timer
|
||||||
|
63
README.md
63
README.md
@ -1,15 +1,60 @@
|
|||||||
# DSpace Statistics API
|
# DSpace Statistics API [](https://travis-ci.org/alanorth/dspace-statistics-api)
|
||||||
A quick and dirty REST API to expose Solr view and download statistics for items in a DSpace repository.
|
A simple REST API to expose Solr view and download statistics for items in a DSpace repository. This project contains a standalone indexing component and a WSGI application.
|
||||||
|
|
||||||
Written and tested in Python 3.5, 3.6, and 3.7. Requires PostgreSQL version 9.5 or greater for [`UPSERT` support](https://wiki.postgresql.org/wiki/UPSERT).
|
## Requirements
|
||||||
|
|
||||||
## Installation
|
- Python 3.5+
|
||||||
Create a virtual environment and run it:
|
- PostgreSQL version 9.5+ (due to [`UPSERT` support](https://wiki.postgresql.org/wiki/UPSERT))
|
||||||
|
- DSpace 4+ with [Solr usage statistics enabled](https://wiki.duraspace.org/display/DSDOC5x/SOLR+Statistics)
|
||||||
|
|
||||||
|
## Installation and Testing
|
||||||
|
Create a Python virtual environment and install the dependencies:
|
||||||
|
|
||||||
$ python -m venv venv
|
$ python -m venv venv
|
||||||
$ . venv/bin/activate
|
$ . venv/bin/activate
|
||||||
$ pip install -r requirements.txt
|
$ pip install -r requirements.txt
|
||||||
$ gunicorn app:api
|
|
||||||
|
Set up the environment variables for Solr and PostgreSQL:
|
||||||
|
|
||||||
|
$ export SOLR_SERVER=http://localhost:8080/solr
|
||||||
|
$ export DATABASE_NAME=dspacestatistics
|
||||||
|
$ export DATABASE_USER=dspacestatistics
|
||||||
|
$ export DATABASE_PASS=dspacestatistics
|
||||||
|
$ export DATABASE_HOST=localhost
|
||||||
|
|
||||||
|
Index the Solr statistics core to populate the PostgreSQL database:
|
||||||
|
|
||||||
|
$ python -m dspace_statistics_api.indexer
|
||||||
|
|
||||||
|
Run the REST API:
|
||||||
|
|
||||||
|
$ gunicorn dspace_statistics_api.app
|
||||||
|
|
||||||
|
Test to see if there are any statistics:
|
||||||
|
|
||||||
|
$ curl 'http://localhost:8000/items?limit=1'
|
||||||
|
|
||||||
|
## Deployment
|
||||||
|
There are example systemd service and timer units in the `contrib` directory. The API service listens on localhost by default so you will need to expose it publicly using a web server like nginx.
|
||||||
|
|
||||||
|
An example nginx configuration is:
|
||||||
|
|
||||||
|
```
|
||||||
|
server {
|
||||||
|
#...
|
||||||
|
|
||||||
|
location ~ /rest/statistics/?(.*) {
|
||||||
|
access_log /var/log/nginx/statistics.log;
|
||||||
|
proxy_pass http://statistics_api/$1$is_args$args;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
upstream statistics_api {
|
||||||
|
server 127.0.0.1:5000;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This would expose the API at `/rest/statistics`.
|
||||||
|
|
||||||
## Using the API
|
## Using the API
|
||||||
The API exposes the following endpoints:
|
The API exposes the following endpoints:
|
||||||
@ -22,9 +67,13 @@ The API exposes the following endpoints:
|
|||||||
## Todo
|
## Todo
|
||||||
|
|
||||||
- Add API documentation
|
- Add API documentation
|
||||||
- Close up DB connection when gunicorn shuts down gracefully
|
- Close DB connection when gunicorn shuts down gracefully
|
||||||
- Better logging
|
- Better logging
|
||||||
- Tests
|
- Tests
|
||||||
|
- Check if database exists (try/except)
|
||||||
|
- Version API
|
||||||
|
- Use JSON in PostgreSQL
|
||||||
|
- Switch to [Python 3.6+ f-string syntax](https://realpython.com/python-f-strings/)
|
||||||
|
|
||||||
## License
|
## License
|
||||||
This work is licensed under the [GPLv3](https://www.gnu.org/licenses/gpl-3.0.en.html).
|
This work is licensed under the [GPLv3](https://www.gnu.org/licenses/gpl-3.0.en.html).
|
||||||
|
@ -9,10 +9,10 @@ Environment=DATABASE_PASS=dspacestatistics
|
|||||||
Environment=DATABASE_HOST=localhost
|
Environment=DATABASE_HOST=localhost
|
||||||
User=nobody
|
User=nobody
|
||||||
Group=nogroup
|
Group=nogroup
|
||||||
WorkingDirectory=/opt/ilri/dspace-statistics-api
|
WorkingDirectory=/var/lib/dspace-statistics-api
|
||||||
ExecStart=/opt/ilri/dspace-statistics-api/venv/bin/gunicorn \
|
ExecStart=/var/lib/dspace-statistics-api/venv/bin/gunicorn \
|
||||||
--bind 127.0.0.1:5000 \
|
--bind 127.0.0.1:5000 \
|
||||||
app:api
|
dspace_statistics_api.app
|
||||||
ExecReload=/bin/kill -s HUP $MAINPID
|
ExecReload=/bin/kill -s HUP $MAINPID
|
||||||
ExecStop=/bin/kill -s TERM $MAINPID
|
ExecStop=/bin/kill -s TERM $MAINPID
|
||||||
|
|
||||||
|
@ -10,8 +10,8 @@ Environment=DATABASE_PASS=dspacestatistics
|
|||||||
Environment=DATABASE_HOST=localhost
|
Environment=DATABASE_HOST=localhost
|
||||||
User=nobody
|
User=nobody
|
||||||
Group=nogroup
|
Group=nogroup
|
||||||
WorkingDirectory=/opt/ilri/dspace-statistics-api
|
WorkingDirectory=/var/lib/dspace-statistics-api
|
||||||
ExecStart=/opt/ilri/dspace-statistics-api/venv/bin/python indexer.py
|
ExecStart=/var/lib/dspace-statistics-api/venv/bin/python -m dspace_statistics_api.indexer
|
||||||
|
|
||||||
[Install]
|
[Install]
|
||||||
WantedBy=multi-user.target
|
WantedBy=multi-user.target
|
||||||
|
0
dspace_statistics_api/__init__.py
Normal file
0
dspace_statistics_api/__init__.py
Normal file
@ -1,10 +1,8 @@
|
|||||||
from database import database_connection
|
from .database import database_connection
|
||||||
import falcon
|
import falcon
|
||||||
from solr import solr_connection
|
|
||||||
|
|
||||||
db = database_connection()
|
db = database_connection()
|
||||||
db.set_session(readonly=True)
|
db.set_session(readonly=True)
|
||||||
solr = solr_connection()
|
|
||||||
|
|
||||||
class AllItemsResource:
|
class AllItemsResource:
|
||||||
def on_get(self, req, resp):
|
def on_get(self, req, resp):
|
||||||
@ -22,16 +20,16 @@ class AllItemsResource:
|
|||||||
|
|
||||||
# get statistics, ordered by id, and use limit and offset to page through results
|
# get statistics, ordered by id, and use limit and offset to page through results
|
||||||
cursor.execute('SELECT id, views, downloads FROM items ORDER BY id ASC LIMIT {} OFFSET {}'.format(limit, offset))
|
cursor.execute('SELECT id, views, downloads FROM items ORDER BY id ASC LIMIT {} OFFSET {}'.format(limit, offset))
|
||||||
results = cursor.fetchmany(limit)
|
|
||||||
cursor.close()
|
|
||||||
|
|
||||||
# create a list to hold dicts of item stats
|
# create a list to hold dicts of item stats
|
||||||
statistics = list()
|
statistics = list()
|
||||||
|
|
||||||
# iterate over results and build statistics object
|
# iterate over results and build statistics object
|
||||||
for item in results:
|
for item in cursor:
|
||||||
statistics.append({ 'id': item['id'], 'views': item['views'], 'downloads': item['downloads'] })
|
statistics.append({ 'id': item['id'], 'views': item['views'], 'downloads': item['downloads'] })
|
||||||
|
|
||||||
|
cursor.close()
|
||||||
|
|
||||||
message = {
|
message = {
|
||||||
'currentPage': page,
|
'currentPage': page,
|
||||||
'totalPages': pages,
|
'totalPages': pages,
|
||||||
@ -65,7 +63,7 @@ class ItemResource:
|
|||||||
|
|
||||||
cursor.close()
|
cursor.close()
|
||||||
|
|
||||||
api = falcon.API()
|
api = application = falcon.API()
|
||||||
api.add_route('/items', AllItemsResource())
|
api.add_route('/items', AllItemsResource())
|
||||||
api.add_route('/item/{item_id:int}', ItemResource())
|
api.add_route('/item/{item_id:int}', ItemResource())
|
||||||
|
|
@ -1,9 +1,8 @@
|
|||||||
from config import DATABASE_NAME
|
from .config import DATABASE_NAME
|
||||||
from config import DATABASE_USER
|
from .config import DATABASE_USER
|
||||||
from config import DATABASE_PASS
|
from .config import DATABASE_PASS
|
||||||
from config import DATABASE_HOST
|
from .config import DATABASE_HOST
|
||||||
import psycopg2
|
import psycopg2, psycopg2.extras
|
||||||
import psycopg2.extras
|
|
||||||
|
|
||||||
def database_connection():
|
def database_connection():
|
||||||
connection = psycopg2.connect("dbname={} user={} password={} host='{}'".format(DATABASE_NAME, DATABASE_USER, DATABASE_PASS, DATABASE_HOST), cursor_factory=psycopg2.extras.DictCursor)
|
connection = psycopg2.connect("dbname={} user={} password={} host='{}'".format(DATABASE_NAME, DATABASE_USER, DATABASE_PASS, DATABASE_HOST), cursor_factory=psycopg2.extras.DictCursor)
|
35
indexer.py → dspace_statistics_api/indexer.py
Executable file → Normal file
35
indexer.py → dspace_statistics_api/indexer.py
Executable file → Normal file
@ -1,4 +1,3 @@
|
|||||||
#!/usr/bin/env python
|
|
||||||
#
|
#
|
||||||
# indexer.py
|
# indexer.py
|
||||||
#
|
#
|
||||||
@ -30,9 +29,10 @@
|
|||||||
# See: https://solrclient.readthedocs.io/en/latest/SolrClient.html
|
# See: https://solrclient.readthedocs.io/en/latest/SolrClient.html
|
||||||
# See: https://wiki.duraspace.org/display/DSPACE/Solr
|
# See: https://wiki.duraspace.org/display/DSPACE/Solr
|
||||||
|
|
||||||
from database import database_connection
|
from .database import database_connection
|
||||||
import json
|
import json
|
||||||
from solr import solr_connection
|
import psycopg2.extras
|
||||||
|
from .solr import solr_connection
|
||||||
|
|
||||||
def index_views():
|
def index_views():
|
||||||
# get total number of distinct facets for items with a minimum of 1 view,
|
# get total number of distinct facets for items with a minimum of 1 view,
|
||||||
@ -64,6 +64,9 @@ def index_views():
|
|||||||
|
|
||||||
cursor = db.cursor()
|
cursor = db.cursor()
|
||||||
|
|
||||||
|
# create an empty list to store values for batch insertion
|
||||||
|
data = []
|
||||||
|
|
||||||
while results_current_page <= results_num_pages:
|
while results_current_page <= results_num_pages:
|
||||||
print('Indexing item views (page {} of {})'.format(results_current_page, results_num_pages))
|
print('Indexing item views (page {} of {})'.format(results_current_page, results_num_pages))
|
||||||
|
|
||||||
@ -77,19 +80,20 @@ def index_views():
|
|||||||
'facet.offset':results_current_page * results_per_page
|
'facet.offset':results_current_page * results_per_page
|
||||||
}, rows=0)
|
}, rows=0)
|
||||||
|
|
||||||
# check number of facets returned in the last query
|
|
||||||
#results_currentNumFacets = len(res.get_facets()['id'])
|
|
||||||
|
|
||||||
# SolrClient's get_facets() returns a dict of dicts
|
# SolrClient's get_facets() returns a dict of dicts
|
||||||
views = res.get_facets()
|
views = res.get_facets()
|
||||||
# in this case iterate over the 'id' dict and get the item ids and views
|
# in this case iterate over the 'id' dict and get the item ids and views
|
||||||
for item_id, item_views in views['id'].items():
|
for item_id, item_views in views['id'].items():
|
||||||
cursor.execute('''INSERT INTO items(id, views) VALUES(%s, %s)
|
data.append((item_id, item_views))
|
||||||
ON CONFLICT(id) DO UPDATE SET downloads=excluded.views''',
|
|
||||||
(item_id, item_views))
|
|
||||||
|
|
||||||
|
# do a batch insert of values from the current "page" of results
|
||||||
|
sql = 'INSERT INTO items(id, views) VALUES %s ON CONFLICT(id) DO UPDATE SET views=excluded.views'
|
||||||
|
psycopg2.extras.execute_values(cursor, sql, data, template='(%s, %s)')
|
||||||
db.commit()
|
db.commit()
|
||||||
|
|
||||||
|
# clear all items from the list so we can populate it with the next batch
|
||||||
|
data.clear()
|
||||||
|
|
||||||
results_current_page += 1
|
results_current_page += 1
|
||||||
|
|
||||||
cursor.close()
|
cursor.close()
|
||||||
@ -119,6 +123,9 @@ def index_downloads():
|
|||||||
|
|
||||||
cursor = db.cursor()
|
cursor = db.cursor()
|
||||||
|
|
||||||
|
# create an empty list to store values for batch insertion
|
||||||
|
data = []
|
||||||
|
|
||||||
while results_current_page <= results_num_pages:
|
while results_current_page <= results_num_pages:
|
||||||
print('Indexing item downloads (page {} of {})'.format(results_current_page, results_num_pages))
|
print('Indexing item downloads (page {} of {})'.format(results_current_page, results_num_pages))
|
||||||
|
|
||||||
@ -136,12 +143,16 @@ def index_downloads():
|
|||||||
downloads = res.get_facets()
|
downloads = res.get_facets()
|
||||||
# in this case iterate over the 'owningItem' dict and get the item ids and downloads
|
# in this case iterate over the 'owningItem' dict and get the item ids and downloads
|
||||||
for item_id, item_downloads in downloads['owningItem'].items():
|
for item_id, item_downloads in downloads['owningItem'].items():
|
||||||
cursor.execute('''INSERT INTO items(id, downloads) VALUES(%s, %s)
|
data.append((item_id, item_downloads))
|
||||||
ON CONFLICT(id) DO UPDATE SET downloads=excluded.downloads''',
|
|
||||||
(item_id, item_downloads))
|
|
||||||
|
|
||||||
|
# do a batch insert of values from the current "page" of results
|
||||||
|
sql = 'INSERT INTO items(id, downloads) VALUES %s ON CONFLICT(id) DO UPDATE SET downloads=excluded.downloads'
|
||||||
|
psycopg2.extras.execute_values(cursor, sql, data, template='(%s, %s)')
|
||||||
db.commit()
|
db.commit()
|
||||||
|
|
||||||
|
# clear all items from the list so we can populate it with the next batch
|
||||||
|
data.clear()
|
||||||
|
|
||||||
results_current_page += 1
|
results_current_page += 1
|
||||||
|
|
||||||
cursor.close()
|
cursor.close()
|
@ -1,4 +1,4 @@
|
|||||||
from config import SOLR_SERVER
|
from .config import SOLR_SERVER
|
||||||
from SolrClient import SolrClient
|
from SolrClient import SolrClient
|
||||||
|
|
||||||
def solr_connection():
|
def solr_connection():
|
@ -1,4 +1,4 @@
|
|||||||
certifi==2018.8.24
|
certifi==2018.10.15
|
||||||
chardet==3.0.4
|
chardet==3.0.4
|
||||||
falcon==1.4.1
|
falcon==1.4.1
|
||||||
gunicorn==19.9.0
|
gunicorn==19.9.0
|
||||||
@ -6,7 +6,7 @@ idna==2.7
|
|||||||
kazoo==2.5.0
|
kazoo==2.5.0
|
||||||
psycopg2-binary==2.7.5
|
psycopg2-binary==2.7.5
|
||||||
python-mimeparse==1.6.0
|
python-mimeparse==1.6.0
|
||||||
requests==2.19.1
|
requests==2.20.0
|
||||||
six==1.11.0
|
six==1.11.0
|
||||||
SolrClient==0.2.1
|
-e git://github.com/alanorth/SolrClient.git@c629e3475be37c82770b2be61748be7e29882648#egg=SolrClient
|
||||||
urllib3==1.23
|
urllib3==1.24
|
||||||
|
Reference in New Issue
Block a user