1
0
mirror of https://github.com/ilri/dspace-statistics-api.git synced 2025-05-10 15:16:02 +02:00

Compare commits

...

180 Commits

Author SHA1 Message Date
0c8fb21f80 README.md: Update DSpace wiki URLs 2020-04-13 15:25:17 +03:00
b359c2466f .travis.yml: Don't build in a container
I didn't realize these LXD containers are not available on AMD64.
Now I understand why the build was so slow: because it was ARM64!
2020-03-29 16:36:47 +03:00
0eaed3e8c4 .travis.yml: Use Python 3.8-dev instead of master
See: https://docs.travis-ci.com/user/languages/python/#specifying-python-versions
2020-03-29 16:26:57 +03:00
70e96214c8 .travis.yml: Go→Python
Fix incorrect language.
2020-03-29 16:24:39 +03:00
cab9f16dbc .travis.yml: Try to run in an LXD container
According to the build environment documentation we need to specify
an OS of Linux in order to get a container instead of a VM.

See: https://docs.travis-ci.com/user/reference/overview/
2020-03-29 16:24:00 +03:00
bd49e1d1f6 .travis.yml: Correctly specify PostgreSQL 10
See: https://docs.travis-ci.com/user/database-setup/#postgresql
2020-03-29 16:19:08 +03:00
144ed9a7c4 .travis.yml: Use PostgreSQL 10.0
Production is still PostgreSQL 9.6, but I have been using 10.0 in
local development and staging environments.
2020-03-29 16:08:29 +03:00
48eef8c8e3 .travis.yml: Test on Python master
But allow failures!
2020-03-29 16:07:44 +03:00
fa9325e8a3 CHANGELOG.md: Add changes for v1.2.1 2020-03-02 14:32:07 +02:00
998e833470 dspace_statistics_api/docs/index.html: Adjust help text 2020-03-02 14:30:16 +02:00
dd8252601f README.md: Adjust API help text 2020-03-02 14:29:13 +02:00
9a9555853f README.md: Add note about versions 2020-03-02 14:28:22 +02:00
385e92cc5e README.md: Update
Remove TODOs that I've recently completed and update introduction.
2020-03-02 14:25:47 +02:00
b0e6481961 tests/dspacestatistics.sql: Update
New database snapshot that uses UUIDs.
2020-03-02 12:36:06 +02:00
f96a903be3 README.md: Update Python requirement 2020-03-02 11:47:03 +02:00
fcf8fa4c29 CHANGELOG.md: Minor syntax and spelling changes 2020-03-02 11:45:26 +02:00
5dd50ff998 CHANGELOG.md: Version 1.2.0
This version only works with DSpace 6+ where the internal item id-
entifiers are UUIDs instead of integers. Version 1.1.1 was the last
version to work with DSpace 4 and 5.
2020-03-02 11:34:58 +02:00
6704e7375f CHANGELOG.md: Add note about Python dependencies 2020-03-02 11:34:13 +02:00
37630d8dac CHANGELOG.md: Add note about DSpace 6+ UUIDs 2020-03-02 11:27:10 +02:00
0ef071a91d dspace_statistics_api: Use f-strings instead of format()
We had previously been avoiding the f-strings because we needed to
run on Python 3.5 and they were only available in Python 3.6+, but
now the black formatter requires Python 3.6 and all our systems are
running Python 3.6+ anyways.
2020-03-02 11:24:29 +02:00
9e7dd28156 dspace_statistics_api/app.py: Use parameterized SQL queries
This is a better way to run SQL queries because psycopg2 takes care
of the quoting for us.
2020-03-02 11:16:05 +02:00
60e6ea57b1 tests/test_api.py: Use UUID
DSpace 6+ uses a UUID for item identifiers instead of an integer so
we need to adapt our tests accordingly. The Python UUID object must
be cast to a string to use it elsewhere in the code.
2020-03-02 11:10:41 +02:00
5955868b9a dspace_statistics_api/app.py: Use UUID
DSpace 6+ uses a UUID for item identifiers instead of an integer so
we need to adapt our PostgreSQL queries to use those. Note that we
can no longer sort results in the "all items" endpoint by ID. Also,
we need to use parameterized psycopg2 queries instead of strings to
support queries with UUIDs properly. To use the Python UUID objects
elsewhere in the code we need to make sure that we cast them to str.
2020-03-02 11:06:48 +02:00
250fd8164f dspace_statistics_api/indexer.py: Use UUID
DSpace 6+ uses a UUID for item identifiers instead of an integer so
we need to update the PostgreSQL schema accordingly. Solr still re-
fers to them as "id" in its schema so we don't need to change anyt-
hing there.
2020-03-01 21:22:10 +02:00
82be1a4d00 Update requirements
Generated from pipenv with:

  $ pipenv lock -r > requirements.txt
  $ pipenv lock -r -d > requirements-dev.txt
2020-03-01 21:21:13 +02:00
0615064e3d Add pytest-clarity to pipenv
Makes pytest output easier to understand.
2020-03-01 21:19:28 +02:00
76be1b749a Run pipenv update 2020-03-01 21:13:32 +02:00
92146fe426 tests/test_api.py: Format with black 2019-12-14 12:39:58 +02:00
440b2f2dfa Pipfile.lock: Run pipenv update 2019-12-14 12:38:11 +02:00
67bc30ead0 Pipfile: Specify exact version of black
Black only releases pre-release versions, which causes issues with
pipenv. Instead of always running pipenv with "--pre" and potenti-
ally letting in some other pre-release versions for other depende-
ncies, I would rather specify the latest black version explicitly.

See: https://github.com/psf/black/issues/517
See: https://github.com/microsoft/vscode-python/issues/5171
2019-12-14 12:37:10 +02:00
142959acdb CHANGELOG.md: Unreleased changes 2019-11-27 12:56:39 +02:00
322f5a8db8 .travis.yml: Remove Python 3.5
black does not work with Python 3.5. It's not such a big deal, as
this is only required for running tests, not for running the app.
2019-11-27 12:55:34 +02:00
90dcaa6ec6 CHANGELOG.md: Fix typo 2019-11-27 12:47:07 +02:00
9aca827d69 Update requirements-dev.txt
Generated with pipenv:

    $ pipenv lock -r -d > requirements-dev.txt
2019-11-27 12:36:05 +02:00
1b394ec50e CHANGELOG.md: Move unreleased changes to 1.1.1 2019-11-27 12:32:54 +02:00
3e9753b600 CHANGELOG.md: Add unreleased changes 2019-11-27 12:32:16 +02:00
cb3c3d37fa Sort imports with isort 2019-11-27 12:31:04 +02:00
4ff1fd4a22 Format code with black 2019-11-27 12:30:06 +02:00
d2fe420a9a Add configuration for isort and black
This does linting and automatic code formatting according to PEP8.

See: https://sourcery.ai/blog/python-best-practices/
2019-11-27 12:26:55 +02:00
3197b79578 CHANGELOG.md: Update unreleased changes 2019-11-27 12:14:49 +02:00
eeb8e6bba1 dspace_statistics_api/indexer.py: Fix minor issues raised by flake8 2019-11-27 12:12:05 +02:00
3540ce328b Update requirements
Generated from pipenv with:

  $ pipenv lock -r > requirements.txt
  $ pipenv lock -r -d > requirements-dev.txt
2019-11-27 12:08:32 +02:00
520e04f9be Pipfile.lock: run pipenv update
Brings gunicorn 20.0.4, pytest 5.3.1, and others. I hadn't noticed
that gunicorn was bumped from 19.x.x to 20.x.x last week.

See: https://docs.gunicorn.org/en/stable/news.html#id6
2019-11-27 12:06:09 +02:00
8a46a64cfc CHANGELOG.md: Use Python 3.8 for pipenv 2019-11-27 10:53:38 +02:00
b8442f8cce .travis.yml: Remove pipenv-specific environment variables 2019-11-15 00:48:57 +02:00
95f7871cc1 .travis.yml: Use vanilla pip 2019-11-15 00:46:58 +02:00
3bc07027e5 .travis.yml: Test with Python 3.8 2019-11-15 00:46:04 +02:00
afcc445855 Update requirements
Generated from pipenv with:

  $ pipenv lock -r > requirements.txt
  $ pipenv lock -r -d > requirements-dev.txt
2019-11-15 00:41:12 +02:00
494548c691 Use Python 3.8.0 for pipenv
Python 3.8.0 was released several months ago and has made it into
Arch Linux's core repositories so it's time to start moving.
2019-11-15 00:38:45 +02:00
feb60b6adf CHANGELOG.md: Update unreleased changes 2019-11-15 00:06:49 +02:00
1541ae3e3b .travis.yml: Use Ubuntu 18.04 "Bionic" 2019-11-14 23:57:46 +02:00
1aedc0ca29 CHANGELOG.md: Add note about Python dependencies 2019-08-29 00:31:31 +03:00
a648183f35 Update requirements
Generated from pipenv with:

  $ pipenv lock -r > requirements.txt
  $ pipenv lock -r -d > requirements-dev.txt
2019-08-29 00:31:06 +03:00
b8f379e7fa Pipfile.lock: Run pipenv update
This brings in, among others, psycogpg 2.8.3, requests 2.22.0, and
pytest 5.1.1.
2019-08-29 00:30:06 +03:00
78f9949ecb CHANGELOG.md: Release version 1.1.0 2019-05-05 23:38:04 +03:00
af80c4b447 CHANGELOG.md: Add falcon 2.0.0 to unreleased changes 2019-05-03 16:33:00 +03:00
edd9e90f59 Update requirements
Generated using pipenv:

  $ pipenv lock -r > requirements.txt
  $ pipenv lock -r -d > requirements-dev.txt
2019-05-03 16:32:17 +03:00
1806d50a51 Pipfile: Use falcon 2.0.0
See: https://github.com/falconry/falcon/releases/tag/2.0.0
2019-05-03 16:31:06 +03:00
a459e66fd9 Use falcon 2.0.0rc2 2019-04-18 10:04:43 +03:00
5a3b392a1d dspace_statistics_api/app.py: Fix Falcon 2.0 syntax
See: dspace_statistics_api/app.py
2019-04-18 09:57:18 +03:00
9dcda114c6 Bump Falcon version to 2.0.0b1
See: https://github.com/falconry/falcon/releases/tag/2.0.0b1
2019-04-18 09:57:18 +03:00
2b8aba5835 CHANGELOG.md: Move unreleased changes to v1.0.0 2019-04-15 10:39:48 +03:00
9eb30a98e3 Update requirements
Generated using pipenv:

  $ pipenv lock -r > requirements.txt
  $ pipenv lock -r -d > requirements-dev.txt
2019-04-15 10:31:19 +03:00
622e9a86f1 CHANGELOG.md: Add notes about Python updates 2019-04-15 10:30:29 +03:00
2acd08e0ab Use one-based paging in indexer output
It is easier for humans to understand one-based paging output like
"page 1 of 3" than "page 0 of 2" in the indexer.
2019-04-15 10:25:54 +03:00
f75bcf292c README.md: Remove TODO about SolrClient
I switched to using the vanilla requests library.
2019-04-15 10:24:24 +03:00
8f46ceb8d8 Refactor to use vanilla requests library
The SolrClient library is unmaintained, which is starting to cause
problems due to the moving Python ecosystem. Switching to requests
does not change my code in any meaningful way and makes maintenance
easier.
2019-04-15 10:19:50 +03:00
18e1e1a227 README.md: Add TODO about checking IDs in the database
Theoretically some items could be deleted and we should remove them
from the database.
2019-04-04 18:33:45 +03:00
fd46041698 README.md: Add build badge for sourcehut (sr.ht) 2019-03-17 23:45:33 +02:00
4ce7231ece CHANGELOG.md: Add unreleased changes 2019-03-17 23:40:51 +02:00
60689d9014 Disable emojis and animated output in CI
Makes for cleaner logs.

See: https://docs.travis-ci.com/user/environment-variables/
See: https://man.sr.ht/builds.sr.ht/manifest.md
2019-03-17 23:39:38 +02:00
7bca32189a .travis.yml: Use PostgreSQL 9.6
This matches what we're using in production.
2019-03-17 23:28:06 +02:00
94c5d91d3c CHANGELOG.md: Add unreleased changes 2019-03-17 22:51:39 +02:00
a640f734c8 Pipfile.lock: run pipenv update 2019-03-17 22:46:39 +02:00
d56a3420f7 README.md: Add TODO about SolrClient
SolrClient works, but hasn't been updated in some time and this is
starting to cause issues with some of its dependencies (kazoo). We
can probably get by with using Python requests library and getting
JSON directly from Solr.
2019-02-19 13:54:34 -08:00
7add0d6164 README.md: Add TODO about top items endpoint
This might be something useful that would be trivial to provide from
the data we already have in PostgreSQL.
2019-02-10 14:20:09 +02:00
c86bec4d8f .travis.yml: Use Ubuntu 16.04 xenial image
This is a newer userland and allows us to use Python 3.7, for example.

See: https://docs.travis-ci.com/user/reference/xenial/
2019-02-07 17:41:36 +02:00
5429fe5cc8 Update requirements
Generated from pipenv with:

  $ pipenv lock -r > requirements.txt
  $ pipenv lock -r -d > requirements-dev.txt
2019-02-07 17:39:50 +02:00
f8a4cfd3da CHANGELOG.md: Add notes about updated python modules 2019-02-07 17:30:08 +02:00
be94c94433 Pipfile.lock: Run pipenv update 2019-02-07 17:29:47 +02:00
ba49b78a25 CHANGELOG.md: Add build configuration for build.sr.ht
See: https://man.sr.ht/builds.sr.ht/
2019-02-07 17:28:41 +02:00
842f80036f .build.yml: Fix PostgreSQL import
When building on sr.ht the default environment is the home directory
so we need to change to the source directory before trying to import
the SQL file.
2019-02-07 17:25:19 +02:00
f738b8029b Rename sr.ht build.yml to .build.yml
This means git.sr.ht will trigger builds automatically on push.

See: https://man.sr.ht/builds.sr.ht/
2019-02-07 17:09:48 +02:00
d08c43f3d5 build.yml: Functioning build
Finally got this working after testing the manifest manually a few
times on the web UI.
2019-02-07 17:09:48 +02:00
819f8e6b0d Add build.yml for sr.ht
Trying to figure out how to run builds on this new platform.

See: https://man.sr.ht/builds.sr.ht/#build-manifests
2019-02-07 17:09:48 +02:00
c79e50a364 README.md: Add TODO about DSpace 6 UUIDs
I'm not sure how this will affect us, especially if we want to keep
support for DSpace 4, 5, and 6 in the same code base. At least the
REST API endpoint will have to change from an integer, our database
schema will have to change depending on whether the repository is
using IDs or UUIDs, and maybe even the Solr queries will change.
2019-02-07 16:52:36 +02:00
71006d8bbf README.md: Add citation 2019-01-23 16:19:58 +02:00
b7d723ef7c README.md: Fix sentence 2019-01-22 14:23:13 +02:00
914ec52fbb CHANGELOG.md: Move unrelease changes to 0.9.0 2019-01-22 09:02:29 +02:00
5524066656 CHANGELOG.md: Add note about catching errors 2019-01-22 09:01:54 +02:00
043d897cef dspace_statistics_api/indexer.py: Catch case of no views/downloads
Don't fail with an exception when there are no views or downloads,
for example on a new DSpace installation.
2019-01-22 09:00:22 +02:00
bd28353cda README.md: Remove TODO for fixing querying of shards 2019-01-22 08:41:39 +02:00
e23d66c2a2 CHANGELOG.md: Add note about fixing querying of sharded cores 2019-01-22 08:41:31 +02:00
40e284dac0 dspace_statistics_api/indexer.py: Query multiple shards
DSpace's stats-util script splits the Solr statistics core into yearly
shards. We need to use Solr's `shards` query parameter in order to get
the statistics for previous years. This commit adds a helper function
to enumerate the active Solr cores to find yearly shards matching the
statistics-YYYY pattern and add them to the query.
2019-01-22 08:39:36 +02:00
934fa9db9b README.md: Add TODO about sharded statistics cores 2019-01-21 12:55:43 +02:00
1fabb72b58 Update requirements
Generated from pipenv with:

  $ pipenv lock -r > requirements.txt
  $ pipenv lock -r -d > requirements-dev.txt
2019-01-16 12:34:50 +02:00
c7f95f0b60 README.md: Update TODO
I think it might be possible to compute community and collection
statistics from Solr and make them available at new endpoints:

  - /communities
  - /community/id
  - /collections
  - /collection/id
2019-01-16 09:59:29 +02:00
c95a98dd2d Pipfile.lock: update dependencies
Updated with `pipenv update`.
2019-01-15 10:22:46 +02:00
3f70f94a10 Pipfile.lock: Run pipenv update 2018-11-26 11:53:37 +02:00
9b8ad9defd Merge pull request #9 from ilri/pipenv-update
Pipenv update
2018-11-19 23:50:44 +02:00
d69ab20220 CHANGELOG.md: pytest version 4.0.0 2018-11-19 23:46:03 +02:00
378f56ddc2 Pipfile.lock: Run pipenv update 2018-11-19 23:34:34 +02:00
5a2a7d684c CHANGELOG.md: Move unreleased changes to version 0.8.1 2018-11-14 09:37:00 +02:00
18276e910f CHANGELOG.md: Add notes about pipenv 2018-11-14 09:36:13 +02:00
8de8c2765f Merge pull request #8 from ilri/update-dependencies
Update dependencies
2018-11-14 09:34:45 +02:00
11a1755e59 Update requirements.txt
Generated from pipenv with:

    $ pipenv lock -r > requirements.txt
2018-11-14 09:19:47 +02:00
a835b0fdc5 Re-create pipenv environment from scratch
When I originally created the pipenv environment I used the standard
pip requirements.txt that I already had, which captured all the mod-
ules and their exact versions at the time. This makes it hard to se-
parate the project's actual dependencies from the dependencies' dep-
endencies, complicating the Pipfile and making it hard to update mo-
dule versions later.

I've re-created the environment with the following commands:

    $ pipenv install gunicorn falcon psycopg2-binary git+https://github.com/alanorth/SolrClient.git@kazoo-2.5.0#egg=SolrClient
    $ pipenv install --dev ipython flake8 pytest
2018-11-14 09:07:32 +02:00
a88600c92b README.md: Add note about GPLv3 2018-11-13 12:34:31 +02:00
019d9242c9 Merge pull request #7 from ilri/use-pip
Rework to use pip instead of pipenv
2018-11-12 09:17:16 +02:00
f4d7312a3f CHANGELOG.md: Add unreleased changes 2018-11-12 09:02:04 +02:00
9c46cfc7e2 Use Python 3.7 for pipenv
Now that I'm only using pipenv locally it shouldn't create problems
for people. They can still just create a vanilla virtualenv and use
pip to install the dependencies.
2018-11-12 08:54:54 +02:00
c1c2e319ac README.md: Rework to use pip instead of pipenv
Pipenv is great for local development, but I don't think many people
are using it yet. I can use it locally and on Travis, but still keep
vanilla requirements.txt for use with pip. The requirements.txt file
can be generated easily from pipenv itself:

    $ pipenv lock -r > requirements.txt

The same for the development requirements:

    $ pipenv lock -r -d > requirements-dev.txt
2018-11-12 08:49:02 +02:00
0895b4f469 Add requirements-dev.txt for pip
Generated with pipenv lock -r -d. Will be used for separating the
development dependencies.
2018-11-12 08:48:45 +02:00
dcfef06a65 Pipfile.lock: Run pipenv update 2018-11-12 08:20:47 +02:00
13736d6359 CHANGELOG.md: Move unreleased changes to verison 0.8.0 2018-11-11 17:16:26 +02:00
4fc64edeb8 Merge pull request #6 from ilri/pytest
Pytest
2018-11-11 17:14:49 +02:00
2a8901dc4f CHANGELOG.md: Update notes 2018-11-11 17:10:45 +02:00
e25c974796 README.md: We have tests now 2018-11-11 17:08:51 +02:00
ffc62e9ee6 tests/test_api.py: Use response.text for all json.loads()
This allows the code to work in Python 3.5 as well as 3.6+.
2018-11-11 17:05:31 +02:00
556c5ae088 tests/test_api.py: Use response.text instead of content
Falcon's response content is raw bytes, while its text is a string.
Let's use the latter so we can use json.loads() in Python 3.5, 3.6,
and 3.7 with the same code.

See: https://falcon.readthedocs.io/en/stable/api/testing.html
2018-11-11 17:01:17 +02:00
d94134f80a tests/test_api.py: Try to add workaround for Python 3.5
In Python 3.5 it seems that json.loads() cannot decode a bytes, but
it works in Python 3.6 and 3.7. Let's try a workaround to see if we
can get it working on both Python 3.5 and 3.6+.

See: https://docs.python.org/3.5/library/json.html#json.loads
See: https://docs.python.org/3.6/library/json.html#json.loads
2018-11-11 17:00:20 +02:00
586231eb2d .travis.yml: Use PostgreSQL 9.5
Default PostgreSQL in Travis CI is 9.2 which is very old, so let's
try to use 9.5.

See: https://docs.travis-ci.com/user/database-setup/#postgresql
2018-11-11 16:41:35 +02:00
766b77a3b6 .travis.yml: Use PostgreSQL directly
It seems that Travis CI already has a PostgreSQL service running.
2018-11-11 16:35:28 +02:00
1959e8154e .travis.yml: Use localhost for Docker's PostgreSQL ports
See: https://docs.travis-ci.com/user/docker/
2018-11-11 16:28:18 +02:00
d40b2f0b2e Test API using pytest and PostgreSQL on Travis
First attempt at getting the Travis Docker setup correct. Inspired
by the Travis pipenv setup used in Responder.

See: https://docs.travis-ci.com/user/docker/
See: https://github.com/kennethreitz/responder/blob/master/.travis.yml
2018-11-11 16:25:16 +02:00
061d0a8f5f CHANGELOG.md: Add API tests to unreleased changes 2018-11-11 16:24:54 +02:00
e57660ff88 Add initial pytest configuration
From: https://github.com/kennethreitz/responder/blob/master/pytest.ini
2018-11-11 16:24:54 +02:00
5c8756bede Add pytest to pipenv development packages 2018-11-11 16:24:54 +02:00
bae9fb80e4 Add initial API tests
Test the basic assumptions of the API like response codes and types.
2018-11-11 16:24:54 +02:00
8a65d99e08 .travis.yml: Don't limit builds to master
This is good in theory but it means we can't trigger builds for other
branches on the fly from the Travis web interface.
2018-11-11 16:21:48 +02:00
d479b7dc6c CHANGELOG.md: Syntax fixes 2018-11-11 00:08:44 +02:00
40aac8bf89 Merge pull request #5 from ilri/database-error-handling
Database error handling
2018-11-11 00:07:25 +02:00
53ba6f2936 CHANGELOG.md: Add database try/except to unreleased changes 2018-11-11 00:05:49 +02:00
140cc4cb07 README.md: Remove TODO for database try/except
Now database connection errors are properly excepted and raised.
2018-11-11 00:04:28 +02:00
d5d2d2149b dspace_statistics_api/database.py: Raise HTTP 500 on error
Properly except on database connection error and raise an HTTP 500
instead of spamming the console/log with twenty lines of text.
2018-11-10 23:58:58 +02:00
4c51d12eb4 CHANGELOG.md: Move unreleased changes to version 0.7.0 2018-11-07 17:55:01 +02:00
a6ce44e852 Merge pull request #4 from ilri/database-refactor
Database refactor
2018-11-07 17:54:04 +02:00
f6e866a589 dspace_statistics_api/indexer.py: Remove debug code 2018-11-07 17:51:24 +02:00
eb5c187d41 CHANGELOG.md: Add note about database re-factor 2018-11-07 17:50:46 +02:00
b06c82bb16 README.md: Remove TODO about closing database connection
Now I'm using a database manager class with Python's "with" context
blocks to automatically and concisely open and close connections.
2018-11-07 17:47:59 +02:00
2f342be948 Refactor database code to use a context manager
Instead of opening one global persistent database connection when
the application I am now abstracting it to a class that I can use
in combination with Python's "with" context. Both connections and
cursors are kept for the context of each "with" block and closed
automatically when exiting.

See: https://alysivji.github.io/managing-resources-with-context-managers-pythonic.html
See: http://initd.org/psycopg/docs/connection.html#connection.close
2018-11-07 17:41:21 +02:00
e39f2b260c Merge pull request #3 from ilri/ipython
Add ipython to pipenv dev packages
2018-11-07 17:09:09 +02:00
60ad474b88 Add ipython to pipenv dev packages
This is very useful for debugging Python code interactively.
2018-11-07 17:07:14 +02:00
888f85d19e README.md: Adjust installation for pipenv
It's nicer to manager module versions using pipenv, and I can still
generate a requirements.txt for deploying the exact versions on the
production server.
2018-11-04 16:07:27 +02:00
df7de93964 CHANGELOG.md: Add pipenv to unreleased changes 2018-11-04 15:59:11 +02:00
7218631cc4 requirements.txt: Regenerate
Created from pipenv with the following command:

    $ pipenv lock -r > requirements.txt
2018-11-04 15:58:31 +02:00
085e525b2f Regenerate pipenv
Uses 'kazoo-2.5.0' branch name for installing SolrClient instead of
the commit hash and adds flake8 as a dev package. This means that I
can track dependencies for production and development and still end
up with a requirements.txt for produciton.
2018-11-04 15:55:28 +02:00
e1580df12f requirements.txt: Update packages
Generated from pipenv with:

    $ pipenv run pip freeze > requirements.txt
2018-11-04 15:39:09 +02:00
be18779ff9 Add flake8 to pipenv 2018-11-04 15:38:51 +02:00
60cfd8f23b Add Pipfile for pipenv
Eventually I'd like to be able to use pipenv instead of plain pip.
For now I'll just keep using pipenv and generating requirements.txt
like this:

    $ pipenv run pip freeze > requirements.txt

Then I can kinda have the best of both worlds, where I use pipenv
on my local machine and pip with requirements.txt on the server.
2018-11-04 15:35:47 +02:00
87fd117d77 .hound.yml: Set pull requests to failed if build fails 2018-11-04 00:53:37 +02:00
f262ebdca2 CHANGELOG.md: Add unreleased changes 2018-11-04 00:50:46 +02:00
64d7f1a3b2 Enable Flake8 validation in Hound CI
Will check all pull requests in the project to make sure they don't
violate PEP 8 style (except the E501 for long lines because I think
it makes code hard to read).
2018-11-04 00:48:06 +02:00
a238a727d2 CHANGELOG.md: Add unreleased changes 2018-11-04 00:05:16 +02:00
cc5ce3ab98 Correct issues highlighted by Flake8
Flake8 validates code style against PEP 8 in order to encourage the
writing of idiomatic Python. For reference, I am currently ignoring
errors about line length (E501) because I feel it makes code harder
to read.

This is the invocation I am using:

    $ flake8 --ignore E501 dspace_statistics_api
2018-11-04 00:04:27 +02:00
70dfcb93c5 dspace_statistics_api/database.py: Don't quote host in connect() 2018-11-03 22:43:05 +02:00
69bcd1b5e4 CHANGELOG.md: Add unreleased changes 2018-11-03 22:42:08 +02:00
5f3bd61998 Allow configuration of PostgreSQL port
Defaults to port 5432, but can be overridden with DATABASE_PORT.
2018-11-03 22:40:45 +02:00
e54dd8888f README.md: Update requirements 2018-11-01 16:31:36 +02:00
2ba09f8693 README.md: Improve introduction
Obsessed with the text presentation and line length in GitHub!
2018-11-01 16:28:01 +02:00
a468a87a5a README.md: Improve introduction 2018-11-01 15:58:12 +02:00
6a30b6550d README.md: Improve introduction 2018-11-01 15:45:53 +02:00
18f013bfa0 README.md: Add Falcon to introduction 2018-11-01 10:24:37 +02:00
78900b5d85 CHANGELOG.md: Add changes for v0.6.1 2018-11-01 00:39:12 +02:00
eb08832bf8 Sync API documentation HTML with README.md 2018-11-01 00:37:52 +02:00
c2ec780ad9 README.md: Improve API documentation 2018-11-01 00:37:40 +02:00
df8ebc8bf1 README.md: Improve API endpoint documentation 2018-11-01 00:31:16 +02:00
0d4be5f4c8 README.md: Add API documentation endpoint 2018-11-01 00:22:16 +02:00
30dc7f1939 Add basic API documentation on root (/)
I had imagined plugging in an interactive Swagger or OpenAPI instance
here, but that's actually much more involved in Falcon than I want to
deal with right now.
2018-11-01 00:19:39 +02:00
77194707fd README.md: Improve introduction 2018-11-01 00:08:24 +02:00
10c1f8bdcc README.md: Update Travis CI badge 2018-10-31 23:14:38 +02:00
da74943da2 README.md: Update introduction 2018-10-31 22:40:36 +02:00
fc8348ab29 README.md: Add acknoledgement about the Solr queries 2018-10-31 19:36:50 +02:00
15c3299b99 CHANGELOG.md: Add changes for v0.6.0 2018-10-31 19:26:45 +02:00
d36be5ee50 contrib: Update systemd unit files for refactor 2018-10-28 11:14:21 +02:00
2f45d27554 dspace_statistics_api/app.py: remove unused code
This was added accidentally when I refactored. I was trying to see
if I could use Falcon's on_exit() hook.
2018-10-28 11:14:21 +02:00
b8356f7a87 Add "application" alias to API object
By default gunicorn looks for an "application" object to run, so this
saves us having to type api:app.
2018-10-28 11:14:21 +02:00
2136dc79ce Remove shebang from indexer.py
This is run as a Python module now so does not need a shebang.
2018-10-28 11:14:21 +02:00
ed60120cef Remove executable bit from indexer.py
Now it is run as a Python module.
2018-10-28 11:14:21 +02:00
c027f01b48 Refactor project structure
This follows guidance from several well-known Python best practices
guides. Basically, the idea is create a package for the application
that is comprised of several re-usable modules.

See: https://docs.python-guide.org/writing/structure/
See: https://realpython.com/python-application-layouts/
2018-10-28 11:14:21 +02:00
28 changed files with 5139 additions and 325 deletions

24
.build.yml Normal file
View File

@ -0,0 +1,24 @@
image: archlinux
packages:
- python-pipenv
- postgresql
sources:
- https://git.sr.ht/~alanorth/dspace-statistics-api
tasks:
- setup: |
id
psql --version
sudo su - postgres -c "initdb --locale en_US.UTF-8 -E UTF8 -D '/var/lib/postgres/data'"
sudo systemctl start postgresql
createuser -U postgres dspacestatistics
psql -U postgres -c "ALTER USER dspacestatistics WITH PASSWORD 'dspacestatistics'"
createdb -U postgres -O dspacestatistics --encoding=UNICODE dspacestatistics
cd dspace-statistics-api
psql -U postgres -d dspacestatistics < tests/dspacestatistics.sql
pipenv install --dev
- test: |
cd dspace-statistics-api
pipenv run pytest
environment:
PIPENV_NOSPIN: 'True'
PIPENV_HIDE_EMOJIS: 'True'

2
.flake8 Normal file
View File

@ -0,0 +1,2 @@
[flake8]
ignore = E501

4
.hound.yml Normal file
View File

@ -0,0 +1,4 @@
flake8:
enabled: true
config_file: .flake8
fail_on_violations: true

View File

@ -1,11 +1,24 @@
dist: bionic
language: python
python:
- "3.5"
- "3.6"
- "3.7-dev"
script: pip install -r requirements.txt
branches:
only:
- master
- "3.7"
- "3.8"
- "3.8-dev" # 3.8 development branch
jobs:
allow_failures:
- python: "3.8-dev"
addons:
postgresql: "10"
before_script:
- psql --version
- createuser -U postgres dspacestatistics
- psql -U postgres -c "ALTER USER dspacestatistics WITH PASSWORD 'dspacestatistics'"
- createdb -U postgres -O dspacestatistics --encoding=UNICODE dspacestatistics
- psql -U postgres -d dspacestatistics < tests/dspacestatistics.sql
install:
- "pip install -r requirements.txt"
- "pip install -r requirements-dev.txt"
script: pytest
# vim: ts=2 sw=2 et

View File

@ -4,26 +4,111 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
### [0.5.2] - 2018-10-28
## Changed
## [1.2.1] - 2020-03-02
### Changed
- Help text in API docs should reference UUIDs
- Sample SQL file for tests should use UUIDs
## [1.2.0] - 2020-03-02
### Changed
- Remove Python 3.5 from TravisCI because black requires Python >= 3.6
- Adapt API for DSpace 6+ UUIDs
- This requires dropping the statistics database and re-indexing
### Updated
- Run pipenv update, bringing requests 2.23.0 and pytest 5.3.5
## [1.1.1] - 2019-11-27
### Added
- Configuration for automatic sorting of imports with isort
- Configuration for automatic code formatting with black
### Updated
- Run pipenv update, bringing psycopg2 2.8.4, requests 2.22.0, pytest 5.3.1,
and gunicorn 20.0.4
### Changed
- Use Ubuntu 18.04 "Bionic" for TravisCI builds
- Use Python 3.8.0 for pipenv
- Minor syntax issues highlighted by flake8
## [1.1.0] - 2019-05-05
## Updated
- Falcon 2.0.0 (@alanorth)
## [1.0.0] - 2019-04-15
### Added
- Build configuration for build.sr.ht
### Updated
- Run pipenv update, bringing pytest version 4.4.0, psycopg-binary 2.8.2, etc
- sr.ht and TravisCI configuration to disable emojis and animation to keep logs clean
### Changed
- Use vanilla requests library instead of SolrClient
- Use one-based paging in indexer output (for human readability)
## [0.9.0] - 2019-01-22
### Updated
- pytest version 4.0.0
- Fix indexing of sharded statistics cores ([#10))
- Handle case of missing views/downloads gracefully
## [0.8.1] - 2018-11-14
### Changed
- README.md to recommend using vanilla Python virtual environments and pip instead of pipenv
- Regenerate pipenv environment to capture only direct dependencies
### Added
- `requirements-dev.txt` for installing development packages with pip
## [0.8.0] - 2018-11-11
### Changed
- Properly handle database connection errors
### Added
- API tests with pytest
## [0.7.0] - 2018-11-07
### Added
- Ability to configure PostgreSQL database port with DATABASE_PORT environment variable (defaults to 5432)
- Hound CI configuration to validate pull requests against PEP 8 code style with Flake8
- Configuration for [pipenv](https://pipenv.readthedocs.io/en/latest/)
### Changed
- Use a database management class with Python context management to automatically open/close connections and cursors
### Changed
- Validate code against PEP 8 style guide with Flake8
## [0.6.1] - 2018-10-31
### Added
- API documentation at root path (/)
## [0.6.0] - 2018-10-31
### Changed
- Refactor project structure (note breaking changes to API and indexing invocation, see contrib and README.md)
## [0.5.2] - 2018-10-28
### Changed
- Update library versions in requirements.txt
### [0.5.1] - 2018-10-24
## Changed
## [0.5.1] - 2018-10-24
### Changed
- Use Python's native json instead of ujson
### [0.5.0] - 2018-10-24
## Added
## [0.5.0] - 2018-10-24
### Added
- Example nginx configuration to README.md
## Changed
### Changed
- Don't initialize Solr connection in API
### [0.4.3] - 2018-10-17
## Changed
## [0.4.3] - 2018-10-17
### Changed
- Use pip install as script for Travis CI
## Improved
### Improved
- Documentation for deployment and testing
## [0.4.2] - 2018-10-04
@ -37,7 +122,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [0.4.1] - 2018-09-26
### Changed
- Use execute_values() to batch insert records to PostgreSQL
- Use `execute_values()` to batch insert records to PostgreSQL
## [0.4.0] - 2018-09-25
### Fixed

21
Pipfile Normal file
View File

@ -0,0 +1,21 @@
[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"
[packages]
gunicorn = "*"
falcon = "==2.0.0"
"psycopg2-binary" = "*"
requests = "*"
[dev-packages]
ipython = "*"
"flake8" = "*"
pytest = "*"
isort = "*"
black = "==19.10b0"
pytest-clarity = "==0.3.0a0"
[requires]
python_version = "3.8"

419
Pipfile.lock generated Normal file
View File

@ -0,0 +1,419 @@
{
"_meta": {
"hash": {
"sha256": "be968d3927117f9ac14b9a6f60d6147b2d57ce55f694f34ed6e53abcd2197823"
},
"pipfile-spec": 6,
"requires": {
"python_version": "3.8"
},
"sources": [
{
"name": "pypi",
"url": "https://pypi.org/simple",
"verify_ssl": true
}
]
},
"default": {
"certifi": {
"hashes": [
"sha256:017c25db2a153ce562900032d5bc68e9f191e44e9a0f762f373977de9df1fbb3",
"sha256:25b64c7da4cd7479594d035c08c2d809eb4aab3a26e5a990ea98cc450c320f1f"
],
"version": "==2019.11.28"
},
"chardet": {
"hashes": [
"sha256:84ab92ed1c4d4f16916e05906b6b75a6c0fb5db821cc65e70cbd64a3e2a5eaae",
"sha256:fc323ffcaeaed0e0a02bf4d117757b98aed530d9ed4531e3e15460124c106691"
],
"version": "==3.0.4"
},
"falcon": {
"hashes": [
"sha256:18157af2a4fc3feedf2b5dcc6196f448639acf01c68bc33d4d5a04c3ef87f494",
"sha256:24adcd2b29a8ffa9d552dc79638cd21736a3fb04eda7d102c6cebafdaadb88ad",
"sha256:54f2cb4b687035b2a03206dbfc538055cc48b59a953187b0458aa1b574d47b53",
"sha256:59d1e8c993b9a37ea06df9d72cf907a46cc8063b30717cdac2f34d1658b6f936",
"sha256:733033ec80c896e30a43ab3e776856096836787197a44eb21022320a61311983",
"sha256:74cf1d18207381c665b9e6292d65100ce146d958707793174b03869dc6e614f4",
"sha256:95bf6ce986c1119aef12c9b348f4dee9c6dcc58391bdd0bc2b0bf353c2b15986",
"sha256:9712975adcf8c6e12876239085ad757b8fdeba223d46d23daef82b47658f83a9",
"sha256:a5ebb22a04c9cc65081938ee7651b4e3b4d2a28522ea8ec04c7bdd2b3e9e8cd8",
"sha256:aa184895d1ad4573fbfaaf803563d02f019ebdf4790e41cc568a330607eae439",
"sha256:e3782b7b92fefd46a6ad1fd8fe63fe6c6f1b7740a95ca56957f48d1aee34b357",
"sha256:e9efa0791b5d9f9dd9689015ea6bce0a27fcd5ecbcd30e6d940bffa4f7f03389",
"sha256:eea593cf466b9c126ce667f6d30503624ef24459f118c75594a69353b6c3d5fc",
"sha256:f93351459f110b4c1ee28556aef9a791832df6f910bea7b3f616109d534df06b"
],
"index": "pypi",
"version": "==2.0.0"
},
"gunicorn": {
"hashes": [
"sha256:1904bb2b8a43658807108d59c3f3d56c2b6121a701161de0ddf9ad140073c626",
"sha256:cd4a810dd51bf497552cf3f863b575dabd73d6ad6a91075b65936b151cbf4f9c"
],
"index": "pypi",
"version": "==20.0.4"
},
"idna": {
"hashes": [
"sha256:7588d1c14ae4c77d74036e8c22ff447b26d0fde8f007354fd48a7814db15b7cb",
"sha256:a068a21ceac8a4d63dbfd964670474107f541babbd2250d61922f029858365fa"
],
"version": "==2.9"
},
"psycopg2-binary": {
"hashes": [
"sha256:040234f8a4a8dfd692662a8308d78f63f31a97e1c42d2480e5e6810c48966a29",
"sha256:086f7e89ec85a6704db51f68f0dcae432eff9300809723a6e8782c41c2f48e03",
"sha256:18ca813fdb17bc1db73fe61b196b05dd1ca2165b884dd5ec5568877cabf9b039",
"sha256:19dc39616850342a2a6db70559af55b22955f86667b5f652f40c0e99253d9881",
"sha256:2166e770cb98f02ed5ee2b0b569d40db26788e0bf2ec3ae1a0d864ea6f1d8309",
"sha256:3a2522b1d9178575acee4adf8fd9f979f9c0449b00b4164bb63c3475ea6528ed",
"sha256:3aa773580f85a28ffdf6f862e59cb5a3cc7ef6885121f2de3fca8d6ada4dbf3b",
"sha256:3b5deaa3ee7180585a296af33e14c9b18c218d148e735c7accf78130765a47e3",
"sha256:407af6d7e46593415f216c7f56ba087a9a42bd6dc2ecb86028760aa45b802bd7",
"sha256:4c3c09fb674401f630626310bcaf6cd6285daf0d5e4c26d6e55ca26a2734e39b",
"sha256:4c6717962247445b4f9e21c962ea61d2e884fc17df5ddf5e35863b016f8a1f03",
"sha256:50446fae5681fc99f87e505d4e77c9407e683ab60c555ec302f9ac9bffa61103",
"sha256:5057669b6a66aa9ca118a2a860159f0ee3acf837eda937bdd2a64f3431361a2d",
"sha256:5dd90c5438b4f935c9d01fcbad3620253da89d19c1f5fca9158646407ed7df35",
"sha256:659c815b5b8e2a55193ede2795c1e2349b8011497310bb936da7d4745652823b",
"sha256:69b13fdf12878b10dc6003acc8d0abf3ad93e79813fd5f3812497c1c9fb9be49",
"sha256:7a1cb80e35e1ccea3e11a48afe65d38744a0e0bde88795cc56a4d05b6e4f9d70",
"sha256:7e6e3c52e6732c219c07bd97fff6c088f8df4dae3b79752ee3a817e6f32e177e",
"sha256:7f42a8490c4fe854325504ce7a6e4796b207960dabb2cbafe3c3959cb00d1d7e",
"sha256:84156313f258eafff716b2961644a4483a9be44a5d43551d554844d15d4d224e",
"sha256:8578d6b8192e4c805e85f187bc530d0f52ba86c39172e61cd51f68fddd648103",
"sha256:890167d5091279a27e2505ff0e1fb273f8c48c41d35c5b92adbf4af80e6b2ed6",
"sha256:98e10634792ac0e9e7a92a76b4991b44c2325d3e7798270a808407355e7bb0a1",
"sha256:9aadff9032e967865f9778485571e93908d27dab21d0fdfdec0ca779bb6f8ad9",
"sha256:9f24f383a298a0c0f9b3113b982e21751a8ecde6615494a3f1470eb4a9d70e9e",
"sha256:a73021b44813b5c84eda4a3af5826dd72356a900bac9bd9dd1f0f81ee1c22c2f",
"sha256:afd96845e12638d2c44d213d4810a08f4dc4a563f9a98204b7428e567014b1cd",
"sha256:b73ddf033d8cd4cc9dfed6324b1ad2a89ba52c410ef6877998422fcb9c23e3a8",
"sha256:b8f490f5fad1767a1331df1259763b3bad7d7af12a75b950c2843ba319b2415f",
"sha256:dbc5cd56fff1a6152ca59445178652756f4e509f672e49ccdf3d79c1043113a4",
"sha256:eac8a3499754790187bb00574ab980df13e754777d346f85e0ff6df929bcd964",
"sha256:eaed1c65f461a959284649e37b5051224f4db6ebdc84e40b5e65f2986f101a08"
],
"index": "pypi",
"version": "==2.8.4"
},
"requests": {
"hashes": [
"sha256:43999036bfa82904b6af1d99e4882b560e5e2c68e5c4b0aa03b655f3d7d73fee",
"sha256:b3f43d496c6daba4493e7c431722aeb7dbc6288f52a6e04e7b6023b0247817e6"
],
"index": "pypi",
"version": "==2.23.0"
},
"urllib3": {
"hashes": [
"sha256:2f3db8b19923a873b3e5256dc9c2dedfa883e33d87c690d9c7913e1f40673cdc",
"sha256:87716c2d2a7121198ebcb7ce7cccf6ce5e9ba539041cfbaeecfb641dc0bf6acc"
],
"version": "==1.25.8"
}
},
"develop": {
"appdirs": {
"hashes": [
"sha256:9e5896d1372858f8dd3344faf4e5014d21849c756c8d5701f78f8a103b372d92",
"sha256:d8b24664561d0d34ddfaec54636d502d7cea6e29c3eaf68f3df6180863e2166e"
],
"version": "==1.4.3"
},
"attrs": {
"hashes": [
"sha256:08a96c641c3a74e44eb59afb61a24f2cb9f4d7188748e76ba4bb5edfa3cb7d1c",
"sha256:f7b7ce16570fe9965acd6d30101a28f62fb4a7f9e926b3bbc9b61f8b04247e72"
],
"version": "==19.3.0"
},
"backcall": {
"hashes": [
"sha256:38ecd85be2c1e78f77fd91700c76e14667dc21e2713b63876c0eb901196e01e4",
"sha256:bbbf4b1e5cd2bdb08f915895b51081c041bac22394fdfcfdfbe9f14b77c08bf2"
],
"version": "==0.1.0"
},
"black": {
"hashes": [
"sha256:1b30e59be925fafc1ee4565e5e08abef6b03fe455102883820fe5ee2e4734e0b",
"sha256:c2edb73a08e9e0e6f65a0e6af18b059b8b1cdd5bef997d7a0b181df93dc81539"
],
"index": "pypi",
"version": "==19.10b0"
},
"click": {
"hashes": [
"sha256:2335065e6395b9e67ca716de5f7526736bfa6ceead690adf616d925bdc622b13",
"sha256:5b94b49521f6456670fdb30cd82a4eca9412788a93fa6dd6df72c94d5a8ff2d7"
],
"version": "==7.0"
},
"decorator": {
"hashes": [
"sha256:41fa54c2a0cc4ba648be4fd43cff00aedf5b9465c9bf18d64325bc225f08f760",
"sha256:e3a62f0520172440ca0dcc823749319382e377f37f140a0b99ef45fecb84bfe7"
],
"version": "==4.4.2"
},
"entrypoints": {
"hashes": [
"sha256:589f874b313739ad35be6e0cd7efde2a4e9b6fea91edcc34e58ecbb8dbe56d19",
"sha256:c70dd71abe5a8c85e55e12c19bd91ccfeec11a6e99044204511f9ed547d48451"
],
"version": "==0.3"
},
"flake8": {
"hashes": [
"sha256:45681a117ecc81e870cbf1262835ae4af5e7a8b08e40b944a8a6e6b895914cfb",
"sha256:49356e766643ad15072a789a20915d3c91dc89fd313ccd71802303fd67e4deca"
],
"index": "pypi",
"version": "==3.7.9"
},
"ipython": {
"hashes": [
"sha256:ca478e52ae1f88da0102360e57e528b92f3ae4316aabac80a2cd7f7ab2efb48a",
"sha256:eb8d075de37f678424527b5ef6ea23f7b80240ca031c2dd6de5879d687a65333"
],
"index": "pypi",
"version": "==7.13.0"
},
"ipython-genutils": {
"hashes": [
"sha256:72dd37233799e619666c9f639a9da83c34013a73e8bbc79a7a6348d93c61fab8",
"sha256:eb2e116e75ecef9d4d228fdc66af54269afa26ab4463042e33785b887c628ba8"
],
"version": "==0.2.0"
},
"isort": {
"hashes": [
"sha256:54da7e92468955c4fceacd0c86bd0ec997b0e1ee80d97f67c35a78b719dccab1",
"sha256:6e811fcb295968434526407adb8796944f1988c5b65e8139058f2014cbe100fd"
],
"index": "pypi",
"version": "==4.3.21"
},
"jedi": {
"hashes": [
"sha256:b4f4052551025c6b0b0b193b29a6ff7bdb74c52450631206c262aef9f7159ad2",
"sha256:d5c871cb9360b414f981e7072c52c33258d598305280fef91c6cae34739d65d5"
],
"version": "==0.16.0"
},
"mccabe": {
"hashes": [
"sha256:ab8a6258860da4b6677da4bd2fe5dc2c659cff31b3ee4f7f5d64e79735b80d42",
"sha256:dd8d182285a0fe56bace7f45b5e7d1a6ebcbf524e8f3bd87eb0f125271b8831f"
],
"version": "==0.6.1"
},
"more-itertools": {
"hashes": [
"sha256:5dd8bcf33e5f9513ffa06d5ad33d78f31e1931ac9a18f33d37e77a180d393a7c",
"sha256:b1ddb932186d8a6ac451e1d95844b382f55e12686d51ca0c68b6f61f2ab7a507"
],
"version": "==8.2.0"
},
"packaging": {
"hashes": [
"sha256:170748228214b70b672c581a3dd610ee51f733018650740e98c7df862a583f73",
"sha256:e665345f9eef0c621aa0bf2f8d78cf6d21904eef16a93f020240b704a57f1334"
],
"version": "==20.1"
},
"parso": {
"hashes": [
"sha256:0c5659e0c6eba20636f99a04f469798dca8da279645ce5c387315b2c23912157",
"sha256:8515fc12cfca6ee3aa59138741fc5624d62340c97e401c74875769948d4f2995"
],
"version": "==0.6.2"
},
"pathspec": {
"hashes": [
"sha256:163b0632d4e31cef212976cf57b43d9fd6b0bac6e67c26015d611a647d5e7424",
"sha256:562aa70af2e0d434367d9790ad37aed893de47f1693e4201fd1d3dca15d19b96"
],
"version": "==0.7.0"
},
"pexpect": {
"hashes": [
"sha256:0b48a55dcb3c05f3329815901ea4fc1537514d6ba867a152b581d69ae3710937",
"sha256:fc65a43959d153d0114afe13997d439c22823a27cefceb5ff35c2178c6784c0c"
],
"markers": "sys_platform != 'win32'",
"version": "==4.8.0"
},
"pickleshare": {
"hashes": [
"sha256:87683d47965c1da65cdacaf31c8441d12b8044cdec9aca500cd78fc2c683afca",
"sha256:9649af414d74d4df115d5d718f82acb59c9d418196b7b4290ed47a12ce62df56"
],
"version": "==0.7.5"
},
"pluggy": {
"hashes": [
"sha256:15b2acde666561e1298d71b523007ed7364de07029219b604cf808bfa1c765b0",
"sha256:966c145cd83c96502c3c3868f50408687b38434af77734af1e9ca461a4081d2d"
],
"version": "==0.13.1"
},
"prompt-toolkit": {
"hashes": [
"sha256:a402e9bf468b63314e37460b68ba68243d55b2f8c4d0192f85a019af3945050e",
"sha256:c93e53af97f630f12f5f62a3274e79527936ed466f038953dfa379d4941f651a"
],
"version": "==3.0.3"
},
"ptyprocess": {
"hashes": [
"sha256:923f299cc5ad920c68f2bc0bc98b75b9f838b93b599941a6b63ddbc2476394c0",
"sha256:d7cc528d76e76342423ca640335bd3633420dc1366f258cb31d05e865ef5ca1f"
],
"version": "==0.6.0"
},
"py": {
"hashes": [
"sha256:5e27081401262157467ad6e7f851b7aa402c5852dbcb3dae06768434de5752aa",
"sha256:c20fdd83a5dbc0af9efd622bee9a5564e278f6380fffcacc43ba6f43db2813b0"
],
"version": "==1.8.1"
},
"pycodestyle": {
"hashes": [
"sha256:95a2219d12372f05704562a14ec30bc76b05a5b297b21a5dfe3f6fac3491ae56",
"sha256:e40a936c9a450ad81df37f549d676d127b1b66000a6c500caa2b085bc0ca976c"
],
"version": "==2.5.0"
},
"pyflakes": {
"hashes": [
"sha256:17dbeb2e3f4d772725c777fabc446d5634d1038f234e77343108ce445ea69ce0",
"sha256:d976835886f8c5b31d47970ed689944a0262b5f3afa00a5a7b4dc81e5449f8a2"
],
"version": "==2.1.1"
},
"pygments": {
"hashes": [
"sha256:2a3fe295e54a20164a9df49c75fa58526d3be48e14aceba6d6b1e8ac0bfd6f1b",
"sha256:98c8aa5a9f778fcd1026a17361ddaf7330d1b7c62ae97c3bb0ae73e0b9b6b0fe"
],
"version": "==2.5.2"
},
"pyparsing": {
"hashes": [
"sha256:4c830582a84fb022400b85429791bc551f1f4871c33f23e44f353119e92f969f",
"sha256:c342dccb5250c08d45fd6f8b4a559613ca603b57498511740e65cd11a2e7dcec"
],
"version": "==2.4.6"
},
"pytest": {
"hashes": [
"sha256:0d5fe9189a148acc3c3eb2ac8e1ac0742cb7618c084f3d228baaec0c254b318d",
"sha256:ff615c761e25eb25df19edddc0b970302d2a9091fbce0e7213298d85fb61fef6"
],
"index": "pypi",
"version": "==5.3.5"
},
"pytest-clarity": {
"hashes": [
"sha256:5cc99e3d9b7969dfe17e5f6072d45a917c59d363b679686d3c958a1ded2e4dcf"
],
"index": "pypi",
"version": "==0.3.0a0"
},
"regex": {
"hashes": [
"sha256:01b2d70cbaed11f72e57c1cfbaca71b02e3b98f739ce33f5f26f71859ad90431",
"sha256:046e83a8b160aff37e7034139a336b660b01dbfe58706f9d73f5cdc6b3460242",
"sha256:113309e819634f499d0006f6200700c8209a2a8bf6bd1bdc863a4d9d6776a5d1",
"sha256:200539b5124bc4721247a823a47d116a7a23e62cc6695744e3eb5454a8888e6d",
"sha256:25f4ce26b68425b80a233ce7b6218743c71cf7297dbe02feab1d711a2bf90045",
"sha256:269f0c5ff23639316b29f31df199f401e4cb87529eafff0c76828071635d417b",
"sha256:5de40649d4f88a15c9489ed37f88f053c15400257eeb18425ac7ed0a4e119400",
"sha256:7f78f963e62a61e294adb6ff5db901b629ef78cb2a1cfce3cf4eeba80c1c67aa",
"sha256:82469a0c1330a4beb3d42568f82dffa32226ced006e0b063719468dcd40ffdf0",
"sha256:8c2b7fa4d72781577ac45ab658da44c7518e6d96e2a50d04ecb0fd8f28b21d69",
"sha256:974535648f31c2b712a6b2595969f8ab370834080e00ab24e5dbb9d19b8bfb74",
"sha256:99272d6b6a68c7ae4391908fc15f6b8c9a6c345a46b632d7fdb7ef6c883a2bbb",
"sha256:9b64a4cc825ec4df262050c17e18f60252cdd94742b4ba1286bcfe481f1c0f26",
"sha256:9e9624440d754733eddbcd4614378c18713d2d9d0dc647cf9c72f64e39671be5",
"sha256:9ff16d994309b26a1cdf666a6309c1ef51ad4f72f99d3392bcd7b7139577a1f2",
"sha256:b33ebcd0222c1d77e61dbcd04a9fd139359bded86803063d3d2d197b796c63ce",
"sha256:bba52d72e16a554d1894a0cc74041da50eea99a8483e591a9edf1025a66843ab",
"sha256:bed7986547ce54d230fd8721aba6fd19459cdc6d315497b98686d0416efaff4e",
"sha256:c7f58a0e0e13fb44623b65b01052dae8e820ed9b8b654bb6296bc9c41f571b70",
"sha256:d58a4fa7910102500722defbde6e2816b0372a4fcc85c7e239323767c74f5cbc",
"sha256:f1ac2dc65105a53c1c2d72b1d3e98c2464a133b4067a51a3d2477b28449709a0"
],
"version": "==2020.2.20"
},
"six": {
"hashes": [
"sha256:236bdbdce46e6e6a3d61a337c0f8b763ca1e8717c03b369e87a7ec7ce1319c0a",
"sha256:8f3cd2e254d8f793e7f3d6d9df77b92252b52637291d0f0da013c76ea2724b6c"
],
"version": "==1.14.0"
},
"termcolor": {
"hashes": [
"sha256:1d6d69ce66211143803fbc56652b41d73b4a400a2891d7bf7a1cdf4c02de613b"
],
"version": "==1.1.0"
},
"toml": {
"hashes": [
"sha256:229f81c57791a41d65e399fc06bf0848bab550a9dfd5ed66df18ce5f05e73d5c",
"sha256:235682dd292d5899d361a811df37e04a8828a5b1da3115886b73cf81ebc9100e"
],
"version": "==0.10.0"
},
"traitlets": {
"hashes": [
"sha256:70b4c6a1d9019d7b4f6846832288f86998aa3b9207c6821f3578a6a6a467fe44",
"sha256:d023ee369ddd2763310e4c3eae1ff649689440d4ae59d7485eb4cfbbe3e359f7"
],
"version": "==4.3.3"
},
"typed-ast": {
"hashes": [
"sha256:0666aa36131496aed8f7be0410ff974562ab7eeac11ef351def9ea6fa28f6355",
"sha256:0c2c07682d61a629b68433afb159376e24e5b2fd4641d35424e462169c0a7919",
"sha256:249862707802d40f7f29f6e1aad8d84b5aa9e44552d2cc17384b209f091276aa",
"sha256:24995c843eb0ad11a4527b026b4dde3da70e1f2d8806c99b7b4a7cf491612652",
"sha256:269151951236b0f9a6f04015a9004084a5ab0d5f19b57de779f908621e7d8b75",
"sha256:4083861b0aa07990b619bd7ddc365eb7fa4b817e99cf5f8d9cf21a42780f6e01",
"sha256:498b0f36cc7054c1fead3d7fc59d2150f4d5c6c56ba7fb150c013fbc683a8d2d",
"sha256:4e3e5da80ccbebfff202a67bf900d081906c358ccc3d5e3c8aea42fdfdfd51c1",
"sha256:6daac9731f172c2a22ade6ed0c00197ee7cc1221aa84cfdf9c31defeb059a907",
"sha256:715ff2f2df46121071622063fc7543d9b1fd19ebfc4f5c8895af64a77a8c852c",
"sha256:73d785a950fc82dd2a25897d525d003f6378d1cb23ab305578394694202a58c3",
"sha256:8c8aaad94455178e3187ab22c8b01a3837f8ee50e09cf31f1ba129eb293ec30b",
"sha256:8ce678dbaf790dbdb3eba24056d5364fb45944f33553dd5869b7580cdbb83614",
"sha256:aaee9905aee35ba5905cfb3c62f3e83b3bec7b39413f0a7f19be4e547ea01ebb",
"sha256:bcd3b13b56ea479b3650b82cabd6b5343a625b0ced5429e4ccad28a8973f301b",
"sha256:c9e348e02e4d2b4a8b2eedb48210430658df6951fa484e59de33ff773fbd4b41",
"sha256:d205b1b46085271b4e15f670058ce182bd1199e56b317bf2ec004b6a44f911f6",
"sha256:d43943ef777f9a1c42bf4e552ba23ac77a6351de620aa9acf64ad54933ad4d34",
"sha256:d5d33e9e7af3b34a40dc05f498939f0ebf187f07c385fd58d591c533ad8562fe",
"sha256:fc0fea399acb12edbf8a628ba8d2312f583bdbdb3335635db062fa98cf71fca4",
"sha256:fe460b922ec15dd205595c9b5b99e2f056fd98ae8f9f56b888e7a17dc2b757e7"
],
"version": "==1.4.1"
},
"wcwidth": {
"hashes": [
"sha256:8fd29383f539be45b20bd4df0dc29c20ba48654a41e661925e612311e9f3c603",
"sha256:f28b3e8a6483e5d49e7f8949ac1a78314e740333ae305b4ba5defd3e74fb37a8"
],
"version": "==0.1.8"
}
}
}

View File

@ -1,19 +1,30 @@
# DSpace Statistics API [![Build Status](https://travis-ci.org/alanorth/dspace-statistics-api.svg?branch=master)](https://travis-ci.org/alanorth/dspace-statistics-api)
A simple REST API to expose Solr view and download statistics for items in a DSpace repository. This project contains a standalone indexing component and a WSGI application.
# DSpace Statistics API [![Build Status](https://travis-ci.org/ilri/dspace-statistics-api.svg?branch=master)](https://travis-ci.org/ilri/dspace-statistics-api) [![builds.sr.ht status](https://builds.sr.ht/~alanorth/dspace-statistics-api.svg)](https://builds.sr.ht/~alanorth/dspace-statistics-api?)
DSpace stores item view and download events in a Solr "statistics" core. This information is available for use in the various DSpace user interfaces, but is not exposed externally via any APIs. The DSpace 4/5/6 [REST API](https://wiki.lyrasis.org/display/DSDOC5x/REST+API), for example, only exposes information about communities, collections, item metadata, and bitstreams.
- If your DSpace is version 4 or 5, use [dspace-statistics-api v1.1.1](https://github.com/ilri/dspace-statistics-api/releases/tag/v1.1.1)
- If your DSpace is version 6+, use [dspace-statistics-api v1.2.0 or greater](https://github.com/ilri/dspace-statistics-api/releases/tag/v1.2.0)
This project contains an indexer and a [Falcon-based](https://falcon.readthedocs.io/) web application to make the statistics available via a simple REST API. You can read more about the Solr queries used to gather the item view and download statistics on the [DSpace wiki](https://wiki.lyrasis.org/display/DSPACE/Solr).
If you use the DSpace Statistics API please cite:
*Orth, A. 2018. DSpace statistics API. Nairobi, Kenya: ILRI. https://hdl.handle.net/10568/99143.*
## Requirements
- Python 3.5+
- Python 3.6+
- PostgreSQL version 9.5+ (due to [`UPSERT` support](https://wiki.postgresql.org/wiki/UPSERT))
- DSpace 4+ with [Solr usage statistics enabled](https://wiki.duraspace.org/display/DSDOC5x/SOLR+Statistics)
- DSpace with [Solr usage statistics enabled](https://wiki.lyrasis.org/display/DSDOC5x/SOLR+Statistics) (tested with 5.x)
## Installation and Testing
## Installation
Create a Python virtual environment and install the dependencies:
$ python -m venv venv
$ . venv/bin/activate
$ python3 -m venv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
## Running
Set up the environment variables for Solr and PostgreSQL:
$ export SOLR_SERVER=http://localhost:8080/solr
@ -24,16 +35,25 @@ Set up the environment variables for Solr and PostgreSQL:
Index the Solr statistics core to populate the PostgreSQL database:
$ ./indexer.py
$ python -m dspace_statistics_api.indexer
Run the REST API:
$ gunicorn app:api
$ gunicorn dspace_statistics_api.app
Test to see if there are any statistics:
$ curl 'http://localhost:8000/items?limit=1'
## Testing
Install development packages using pip:
$ pip install -r requirements-dev.txt
Run tests:
$ pytest
## Deployment
There are example systemd service and timer units in the `contrib` directory. The API service listens on localhost by default so you will need to expose it publicly using a web server like nginx.
@ -59,21 +79,24 @@ This would expose the API at `/rest/statistics`.
## Using the API
The API exposes the following endpoints:
- GET `/items`return views and downloads for all items that Solr knows about¹. Accepts `limit` and `page` query parameters for pagination of results.
- GET `/item/id`return views and downloads for a single item (*id* must be a positive integer). Returns HTTP 404 if an item id is not found.
- GET `/`return a basic API documentation page.
- GET `/items`return views and downloads for all items that Solr knows about¹. Accepts `limit` and `page` query parameters for pagination of results (`limit` must be an integer between 1 and 100, and `page` must be an integer greater than or equal to 0).
- GET `/item/id`return views and downloads for a single item (`id` must be a UUID). Returns HTTP 404 if an item id is not found.
¹ We are querying the Solr statistics core, which technically only knows about items that have either views or downloads.
The item id is the *internal* uuid for an item. You can get these from the standard DSpace REST API.
¹ We are querying the Solr statistics core, which technically only knows about items that have either views or downloads. If an item is not present here you can assume it has zero views and zero downloads, but not necessarily that it does not exist in the repository.
## Todo
- Add API documentation
- Close DB connection when gunicorn shuts down gracefully
- Better logging
- Tests
- Check if database exists (try/except)
- Version API
- Use JSON in PostgreSQL
- Switch to [Python 3.6+ f-string syntax](https://realpython.com/python-f-strings/)
- Add top items endpoint, perhaps `/top/items` or `/items/top`?
- Make community and collection stats available
- Check IDs in database to see if they are deleted...
## License
This work is licensed under the [GPLv3](https://www.gnu.org/licenses/gpl-3.0.en.html).
The license allows you to use and modify the work for personal and commercial purposes, but if you distribute the work you must provide users with a means to access the source code for the version you are distributing. Read more about the [GPLv3 at TL;DR Legal](https://tldrlegal.com/license/gnu-general-public-license-v3-(gpl-3)).

73
app.py
View File

@ -1,73 +0,0 @@
from database import database_connection
import falcon
db = database_connection()
db.set_session(readonly=True)
class AllItemsResource:
def on_get(self, req, resp):
"""Handles GET requests"""
# Return HTTPBadRequest if id parameter is not present and valid
limit = req.get_param_as_int("limit", min=0, max=100) or 100
page = req.get_param_as_int("page", min=0) or 0
offset = limit * page
cursor = db.cursor()
# get total number of items so we can estimate the pages
cursor.execute('SELECT COUNT(id) FROM items')
pages = round(cursor.fetchone()[0] / limit)
# get statistics, ordered by id, and use limit and offset to page through results
cursor.execute('SELECT id, views, downloads FROM items ORDER BY id ASC LIMIT {} OFFSET {}'.format(limit, offset))
# create a list to hold dicts of item stats
statistics = list()
# iterate over results and build statistics object
for item in cursor:
statistics.append({ 'id': item['id'], 'views': item['views'], 'downloads': item['downloads'] })
cursor.close()
message = {
'currentPage': page,
'totalPages': pages,
'limit': limit,
'statistics': statistics
}
resp.media = message
class ItemResource:
def on_get(self, req, resp, item_id):
"""Handles GET requests"""
cursor = db.cursor()
cursor.execute('SELECT views, downloads FROM items WHERE id={}'.format(item_id))
if cursor.rowcount == 0:
raise falcon.HTTPNotFound(
title='Item not found',
description='The item with id "{}" was not found.'.format(item_id)
)
else:
results = cursor.fetchone()
statistics = {
'id': item_id,
'views': results['views'],
'downloads': results['downloads']
}
resp.media = statistics
cursor.close()
def on_exit(api):
print("Shutting down DB")
api = falcon.API()
api.add_route('/items', AllItemsResource())
api.add_route('/item/{item_id:int}', ItemResource())
# vim: set sw=4 ts=4 expandtab:

View File

@ -1,11 +0,0 @@
import os
# Check if Solr connection information was provided in the environment
SOLR_SERVER = os.environ.get('SOLR_SERVER', 'http://localhost:8080/solr')
DATABASE_NAME = os.environ.get('DATABASE_NAME', 'dspacestatistics')
DATABASE_USER = os.environ.get('DATABASE_USER', 'dspacestatistics')
DATABASE_PASS = os.environ.get('DATABASE_PASS', 'dspacestatistics')
DATABASE_HOST = os.environ.get('DATABASE_HOST', 'localhost')
# vim: set sw=4 ts=4 expandtab:

View File

@ -12,7 +12,7 @@ Group=nogroup
WorkingDirectory=/var/lib/dspace-statistics-api
ExecStart=/var/lib/dspace-statistics-api/venv/bin/gunicorn \
--bind 127.0.0.1:5000 \
app:api
dspace_statistics_api.app
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s TERM $MAINPID

View File

@ -11,7 +11,7 @@ Environment=DATABASE_HOST=localhost
User=nobody
Group=nogroup
WorkingDirectory=/var/lib/dspace-statistics-api
ExecStart=/var/lib/dspace-statistics-api/venv/bin/python indexer.py
ExecStart=/var/lib/dspace-statistics-api/venv/bin/python -m dspace_statistics_api.indexer
[Install]
WantedBy=multi-user.target

View File

@ -1,12 +0,0 @@
from config import DATABASE_NAME
from config import DATABASE_USER
from config import DATABASE_PASS
from config import DATABASE_HOST
import psycopg2, psycopg2.extras
def database_connection():
connection = psycopg2.connect("dbname={} user={} password={} host='{}'".format(DATABASE_NAME, DATABASE_USER, DATABASE_PASS, DATABASE_HOST), cursor_factory=psycopg2.extras.DictCursor)
return connection
# vim: set sw=4 ts=4 expandtab:

View File

View File

@ -0,0 +1,99 @@
import falcon
from .database import DatabaseManager
class RootResource:
def on_get(self, req, resp):
resp.status = falcon.HTTP_200
resp.content_type = "text/html"
with open("dspace_statistics_api/docs/index.html", "r") as f:
resp.body = f.read()
class AllItemsResource:
def on_get(self, req, resp):
"""Handles GET requests"""
# Return HTTPBadRequest if id parameter is not present and valid
limit = req.get_param_as_int("limit", min_value=0, max_value=100) or 100
page = req.get_param_as_int("page", min_value=0) or 0
offset = limit * page
with DatabaseManager() as db:
db.set_session(readonly=True)
with db.cursor() as cursor:
# get total number of items so we can estimate the pages
cursor.execute("SELECT COUNT(id) FROM items")
pages = round(cursor.fetchone()[0] / limit)
# get statistics and use limit and offset to page through results
cursor.execute(
"SELECT id, views, downloads FROM items LIMIT %s OFFSET %s",
[limit, offset],
)
# create a list to hold dicts of item stats
statistics = list()
# iterate over results and build statistics object
for item in cursor:
statistics.append(
{
"id": str(item["id"]),
"views": item["views"],
"downloads": item["downloads"],
}
)
message = {
"currentPage": page,
"totalPages": pages,
"limit": limit,
"statistics": statistics,
}
resp.media = message
class ItemResource:
def on_get(self, req, resp, item_id):
"""Handles GET requests"""
import psycopg2.extras
# Adapt Pythons uuid.UUID type to PostgreSQLs uuid
# See: https://www.psycopg.org/docs/extras.html
psycopg2.extras.register_uuid()
with DatabaseManager() as db:
db.set_session(readonly=True)
with db.cursor() as cursor:
cursor = db.cursor()
cursor.execute(
"SELECT views, downloads FROM items WHERE id=%s", [str(item_id)]
)
if cursor.rowcount == 0:
raise falcon.HTTPNotFound(
title="Item not found",
description=f'The item with id "{str(item_id)}" was not found.',
)
else:
results = cursor.fetchone()
statistics = {
"id": str(item_id),
"views": results["views"],
"downloads": results["downloads"],
}
resp.media = statistics
api = application = falcon.API()
api.add_route("/", RootResource())
api.add_route("/items", AllItemsResource())
api.add_route("/item/{item_id:uuid}", ItemResource())
# vim: set sw=4 ts=4 expandtab:

View File

@ -0,0 +1,12 @@
import os
# Check if Solr connection information was provided in the environment
SOLR_SERVER = os.environ.get("SOLR_SERVER", "http://localhost:8080/solr")
DATABASE_NAME = os.environ.get("DATABASE_NAME", "dspacestatistics")
DATABASE_USER = os.environ.get("DATABASE_USER", "dspacestatistics")
DATABASE_PASS = os.environ.get("DATABASE_PASS", "dspacestatistics")
DATABASE_HOST = os.environ.get("DATABASE_HOST", "localhost")
DATABASE_PORT = os.environ.get("DATABASE_PORT", "5432")
# vim: set sw=4 ts=4 expandtab:

View File

@ -0,0 +1,36 @@
import falcon
import psycopg2
import psycopg2.extras
from .config import (
DATABASE_HOST,
DATABASE_NAME,
DATABASE_PASS,
DATABASE_PORT,
DATABASE_USER,
)
class DatabaseManager:
"""Manage database connection."""
def __init__(self):
self._connection_uri = f"dbname={DATABASE_NAME} user={DATABASE_USER} password={DATABASE_PASS} host={DATABASE_HOST} port={DATABASE_PORT}"
def __enter__(self):
try:
self._connection = psycopg2.connect(
self._connection_uri, cursor_factory=psycopg2.extras.DictCursor
)
except psycopg2.OperationalError:
title = "500 Internal Server Error"
description = "Could not connect to database"
raise falcon.HTTPInternalServerError(title, description)
return self._connection
def __exit__(self, exc_type, exc_value, exc_traceback):
self._connection.close()
# vim: set sw=4 ts=4 expandtab:

View File

@ -0,0 +1,20 @@
<!DOCTYPE html>
<html lang="en-US">
<head>
<meta charset="UTF-8">
<title>DSpace Statistics API</title>
</head>
<body>
<h1>DSpace Statistics API</h1>
<p>This site is running the <a href="https://github.com/ilri/dspace-statistics-api" title="DSpace Statistics API project">DSpace Statistics API</a>. The following endpoints are available:</p>
<ul>
<li>GET <code>/</code>return a basic API documentation page.</li>
<li>GET <code>/items</code>return views and downloads for all items that Solr knows about¹. Accepts <code>limit</code> and <code>page</code> query parameters for pagination of results (<code>limit</code> must be an integer between 1 and 100, and <code>page</code> must be an integer greater than or equal to 0).</li>
<li>GET <code>/item/id</code>return views and downloads for a single item (<code>id</code> must be a UUID). Returns HTTP 404 if an item id is not found.</li>
</ul>
<p>The item id is the <em>internal</em> uuid for an item. You can get these from the standard DSpace REST API.</p>
<p>¹ We are querying the Solr statistics core, which technically only knows about items that have either views or downloads. If an item is not present here you can assume it has zero views and zero downloads, but not necessarily that it does not exist in the repository.</code>
</body>
</html>

View File

@ -0,0 +1,274 @@
#
# indexer.py
#
# Copyright 2018 Alan Orth.
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
#
# ---
#
# Connects to a DSpace Solr statistics core and ingests item views and downloads
# into a PostgreSQL database for use by other applications (like an API).
#
# This script is written for Python 3.5+ and requires several modules that you
# can install with pip (I recommend using a Python virtual environment):
#
# $ pip install SolrClient psycopg2-binary
#
# See: https://solrclient.readthedocs.io/en/latest/SolrClient.html
# See: https://wiki.duraspace.org/display/DSPACE/Solr
import re
import psycopg2.extras
import requests
from .config import SOLR_SERVER
from .database import DatabaseManager
# Enumerate the cores in Solr to determine if statistics have been sharded into
# yearly shards by DSpace's stats-util or not (for example: statistics-2018).
def get_statistics_shards():
# Initialize an empty list for statistics core years
statistics_core_years = []
# URL for Solr status to check active cores
solr_query_params = {"action": "STATUS", "wt": "json"}
solr_url = SOLR_SERVER + "/admin/cores"
res = requests.get(solr_url, params=solr_query_params)
if res.status_code == requests.codes.ok:
data = res.json()
# Iterate over active cores from Solr's STATUS response (cores are in
# the status array of this response).
for core in data["status"]:
# Pattern to match, for example: statistics-2018
pattern = re.compile("^statistics-[0-9]{4}$")
if not pattern.match(core):
continue
# Append current core to list
statistics_core_years.append(core)
# Initialize a string to hold our shards (may end up being empty if the Solr
# core has not been processed by stats-util).
shards = str()
if len(statistics_core_years) > 0:
# Begin building a string of shards starting with the default one
shards = f"{SOLR_SERVER}/statistics"
for core in statistics_core_years:
# Create a comma-separated list of shards to pass to our Solr query
#
# See: https://wiki.apache.org/solr/DistributedSearch
shards += f",{SOLR_SERVER}/{core}"
# Return the string of shards, which may actually be empty. Solr doesn't
# seem to mind if the shards query parameter is empty and I haven't seen
# any negative performance impact so this should be fine.
return shards
def index_views():
# get total number of distinct facets for items with a minimum of 1 view,
# otherwise Solr returns all kinds of weird ids that are actually not in
# the database. Also, stats are expensive, but we need stats.calcdistinct
# so we can get the countDistinct summary.
#
# see: https://lucene.apache.org/solr/guide/6_6/the-stats-component.html
solr_query_params = {
"q": "type:2",
"fq": "isBot:false AND statistics_type:view",
"facet": "true",
"facet.field": "id",
"facet.mincount": 1,
"facet.limit": 1,
"facet.offset": 0,
"stats": "true",
"stats.field": "id",
"stats.calcdistinct": "true",
"shards": shards,
"rows": 0,
"wt": "json",
}
solr_url = SOLR_SERVER + "/statistics/select"
res = requests.get(solr_url, params=solr_query_params)
try:
# get total number of distinct facets (countDistinct)
results_totalNumFacets = res.json()["stats"]["stats_fields"]["id"][
"countDistinct"
]
except TypeError:
print("No item views to index, exiting.")
exit(0)
# divide results into "pages" (cast to int to effectively round down)
results_per_page = 100
results_num_pages = int(results_totalNumFacets / results_per_page)
results_current_page = 0
with DatabaseManager() as db:
with db.cursor() as cursor:
# create an empty list to store values for batch insertion
data = []
while results_current_page <= results_num_pages:
# "pages" are zero based, but one based is more human readable
print(
f"Indexing item views (page {results_current_page + 1} of {results_num_pages + 1})"
)
solr_query_params = {
"q": "type:2",
"fq": "isBot:false AND statistics_type:view",
"facet": "true",
"facet.field": "id",
"facet.mincount": 1,
"facet.limit": results_per_page,
"facet.offset": results_current_page * results_per_page,
"shards": shards,
"rows": 0,
"wt": "json",
"json.nl": "map", # return facets as a dict instead of a flat list
}
solr_url = SOLR_SERVER + "/statistics/select"
res = requests.get(solr_url, params=solr_query_params)
# Solr returns facets as a dict of dicts (see json.nl parameter)
views = res.json()["facet_counts"]["facet_fields"]
# iterate over the 'id' dict and get the item ids and views
for item_id, item_views in views["id"].items():
data.append((item_id, item_views))
# do a batch insert of values from the current "page" of results
sql = "INSERT INTO items(id, views) VALUES %s ON CONFLICT(id) DO UPDATE SET views=excluded.views"
psycopg2.extras.execute_values(cursor, sql, data, template="(%s, %s)")
db.commit()
# clear all items from the list so we can populate it with the next batch
data.clear()
results_current_page += 1
def index_downloads():
# get the total number of distinct facets for items with at least 1 download
solr_query_params = {
"q": "type:0",
"fq": "isBot:false AND statistics_type:view AND bundleName:ORIGINAL",
"facet": "true",
"facet.field": "owningItem",
"facet.mincount": 1,
"facet.limit": 1,
"facet.offset": 0,
"stats": "true",
"stats.field": "owningItem",
"stats.calcdistinct": "true",
"shards": shards,
"rows": 0,
"wt": "json",
}
solr_url = SOLR_SERVER + "/statistics/select"
res = requests.get(solr_url, params=solr_query_params)
try:
# get total number of distinct facets (countDistinct)
results_totalNumFacets = res.json()["stats"]["stats_fields"]["owningItem"][
"countDistinct"
]
except TypeError:
print("No item downloads to index, exiting.")
exit(0)
# divide results into "pages" (cast to int to effectively round down)
results_per_page = 100
results_num_pages = int(results_totalNumFacets / results_per_page)
results_current_page = 0
with DatabaseManager() as db:
with db.cursor() as cursor:
# create an empty list to store values for batch insertion
data = []
while results_current_page <= results_num_pages:
# "pages" are zero based, but one based is more human readable
print(
f"Indexing item downloads (page {results_current_page + 1} of {results_num_pages + 1})"
)
solr_query_params = {
"q": "type:0",
"fq": "isBot:false AND statistics_type:view AND bundleName:ORIGINAL",
"facet": "true",
"facet.field": "owningItem",
"facet.mincount": 1,
"facet.limit": results_per_page,
"facet.offset": results_current_page * results_per_page,
"shards": shards,
"rows": 0,
"wt": "json",
"json.nl": "map", # return facets as a dict instead of a flat list
}
solr_url = SOLR_SERVER + "/statistics/select"
res = requests.get(solr_url, params=solr_query_params)
# Solr returns facets as a dict of dicts (see json.nl parameter)
downloads = res.json()["facet_counts"]["facet_fields"]
# iterate over the 'owningItem' dict and get the item ids and downloads
for item_id, item_downloads in downloads["owningItem"].items():
data.append((item_id, item_downloads))
# do a batch insert of values from the current "page" of results
sql = "INSERT INTO items(id, downloads) VALUES %s ON CONFLICT(id) DO UPDATE SET downloads=excluded.downloads"
psycopg2.extras.execute_values(cursor, sql, data, template="(%s, %s)")
db.commit()
# clear all items from the list so we can populate it with the next batch
data.clear()
results_current_page += 1
with DatabaseManager() as db:
with db.cursor() as cursor:
# create table to store item views and downloads
cursor.execute(
"""CREATE TABLE IF NOT EXISTS items
(id UUID PRIMARY KEY, views INT DEFAULT 0, downloads INT DEFAULT 0)"""
)
# commit the table creation before closing the database connection
db.commit()
shards = get_statistics_shards()
index_views()
index_downloads()
# vim: set sw=4 ts=4 expandtab:

View File

@ -1,173 +0,0 @@
#!/usr/bin/env python
#
# indexer.py
#
# Copyright 2018 Alan Orth.
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
#
# ---
#
# Connects to a DSpace Solr statistics core and ingests item views and downloads
# into a PostgreSQL database for use by other applications (like an API).
#
# This script is written for Python 3.5+ and requires several modules that you
# can install with pip (I recommend using a Python virtual environment):
#
# $ pip install SolrClient psycopg2-binary
#
# See: https://solrclient.readthedocs.io/en/latest/SolrClient.html
# See: https://wiki.duraspace.org/display/DSPACE/Solr
from database import database_connection
import json
import psycopg2.extras
from solr import solr_connection
def index_views():
# get total number of distinct facets for items with a minimum of 1 view,
# otherwise Solr returns all kinds of weird ids that are actually not in
# the database. Also, stats are expensive, but we need stats.calcdistinct
# so we can get the countDistinct summary.
#
# see: https://lucene.apache.org/solr/guide/6_6/the-stats-component.html
res = solr.query('statistics', {
'q':'type:2',
'fq':'isBot:false AND statistics_type:view',
'facet':True,
'facet.field':'id',
'facet.mincount':1,
'facet.limit':1,
'facet.offset':0,
'stats':True,
'stats.field':'id',
'stats.calcdistinct':True
}, rows=0)
# get total number of distinct facets (countDistinct)
results_totalNumFacets = json.loads(res.get_json())['stats']['stats_fields']['id']['countDistinct']
# divide results into "pages" (cast to int to effectively round down)
results_per_page = 100
results_num_pages = int(results_totalNumFacets / results_per_page)
results_current_page = 0
cursor = db.cursor()
# create an empty list to store values for batch insertion
data = []
while results_current_page <= results_num_pages:
print('Indexing item views (page {} of {})'.format(results_current_page, results_num_pages))
res = solr.query('statistics', {
'q':'type:2',
'fq':'isBot:false AND statistics_type:view',
'facet':True,
'facet.field':'id',
'facet.mincount':1,
'facet.limit':results_per_page,
'facet.offset':results_current_page * results_per_page
}, rows=0)
# SolrClient's get_facets() returns a dict of dicts
views = res.get_facets()
# in this case iterate over the 'id' dict and get the item ids and views
for item_id, item_views in views['id'].items():
data.append((item_id, item_views))
# do a batch insert of values from the current "page" of results
sql = 'INSERT INTO items(id, views) VALUES %s ON CONFLICT(id) DO UPDATE SET views=excluded.views'
psycopg2.extras.execute_values(cursor, sql, data, template='(%s, %s)')
db.commit()
# clear all items from the list so we can populate it with the next batch
data.clear()
results_current_page += 1
cursor.close()
def index_downloads():
# get the total number of distinct facets for items with at least 1 download
res = solr.query('statistics', {
'q':'type:0',
'fq':'isBot:false AND statistics_type:view AND bundleName:ORIGINAL',
'facet':True,
'facet.field':'owningItem',
'facet.mincount':1,
'facet.limit':1,
'facet.offset':0,
'stats':True,
'stats.field':'owningItem',
'stats.calcdistinct':True
}, rows=0)
# get total number of distinct facets (countDistinct)
results_totalNumFacets = json.loads(res.get_json())['stats']['stats_fields']['owningItem']['countDistinct']
# divide results into "pages" (cast to int to effectively round down)
results_per_page = 100
results_num_pages = int(results_totalNumFacets / results_per_page)
results_current_page = 0
cursor = db.cursor()
# create an empty list to store values for batch insertion
data = []
while results_current_page <= results_num_pages:
print('Indexing item downloads (page {} of {})'.format(results_current_page, results_num_pages))
res = solr.query('statistics', {
'q':'type:0',
'fq':'isBot:false AND statistics_type:view AND bundleName:ORIGINAL',
'facet':True,
'facet.field':'owningItem',
'facet.mincount':1,
'facet.limit':results_per_page,
'facet.offset':results_current_page * results_per_page
}, rows=0)
# SolrClient's get_facets() returns a dict of dicts
downloads = res.get_facets()
# in this case iterate over the 'owningItem' dict and get the item ids and downloads
for item_id, item_downloads in downloads['owningItem'].items():
data.append((item_id, item_downloads))
# do a batch insert of values from the current "page" of results
sql = 'INSERT INTO items(id, downloads) VALUES %s ON CONFLICT(id) DO UPDATE SET downloads=excluded.downloads'
psycopg2.extras.execute_values(cursor, sql, data, template='(%s, %s)')
db.commit()
# clear all items from the list so we can populate it with the next batch
data.clear()
results_current_page += 1
cursor.close()
db = database_connection()
solr = solr_connection()
# create table to store item views and downloads
cursor = db.cursor()
cursor.execute('''CREATE TABLE IF NOT EXISTS items
(id INT PRIMARY KEY, views INT DEFAULT 0, downloads INT DEFAULT 0)''')
index_views()
index_downloads()
db.close()
# vim: set sw=4 ts=4 expandtab:

4
pytest.ini Normal file
View File

@ -0,0 +1,4 @@
[pytest]
addopts= -rsxX -s -v --strict
filterwarnings =
error::UserWarning

37
requirements-dev.txt Normal file
View File

@ -0,0 +1,37 @@
-i https://pypi.org/simple
appdirs==1.4.3
attrs==19.3.0
backcall==0.1.0
black==19.10b0
click==7.0
decorator==4.4.2
entrypoints==0.3
flake8==3.7.9
ipython-genutils==0.2.0
ipython==7.13.0
isort==4.3.21
jedi==0.16.0
mccabe==0.6.1
more-itertools==8.2.0
packaging==20.1
parso==0.6.2
pathspec==0.7.0
pexpect==4.8.0 ; sys_platform != 'win32'
pickleshare==0.7.5
pluggy==0.13.1
prompt-toolkit==3.0.3
ptyprocess==0.6.0
py==1.8.1
pycodestyle==2.5.0
pyflakes==2.1.1
pygments==2.5.2
pyparsing==2.4.6
pytest-clarity==0.3.0a0
pytest==5.3.5
regex==2020.2.20
six==1.14.0
termcolor==1.1.0
toml==0.10.0
traitlets==4.3.3
typed-ast==1.4.1
wcwidth==0.1.8

View File

@ -1,12 +1,9 @@
certifi==2018.10.15
-i https://pypi.org/simple
certifi==2019.11.28
chardet==3.0.4
falcon==1.4.1
gunicorn==19.9.0
idna==2.7
kazoo==2.5.0
psycopg2-binary==2.7.5
python-mimeparse==1.6.0
requests==2.20.0
six==1.11.0
-e git://github.com/alanorth/SolrClient.git@c629e3475be37c82770b2be61748be7e29882648#egg=SolrClient
urllib3==1.24
falcon==2.0.0
gunicorn==20.0.4
idna==2.9
psycopg2-binary==2.8.4
requests==2.23.0
urllib3==1.25.8

6
setup.cfg Normal file
View File

@ -0,0 +1,6 @@
[isort]
multi_line_output=3
include_trailing_comma=True
force_grid_wrap=0
use_parentheses=True
line_length=88

View File

@ -1,9 +0,0 @@
from config import SOLR_SERVER
from SolrClient import SolrClient
def solr_connection():
connection = SolrClient(SOLR_SERVER)
return connection
# vim: set sw=4 ts=4 expandtab:

0
tests/__init__.py Normal file
View File

3949
tests/dspacestatistics.sql Normal file

File diff suppressed because it is too large Load Diff

67
tests/test_api.py Normal file
View File

@ -0,0 +1,67 @@
from falcon import testing
import json
import pytest
from dspace_statistics_api.app import api
@pytest.fixture
def client():
return testing.TestClient(api)
def test_get_docs(client):
"""Test requesting the documentation at the root."""
response = client.simulate_get("/")
assert isinstance(response.content, bytes)
assert response.status_code == 200
def test_get_item(client):
"""Test requesting a single item."""
response = client.simulate_get("/item/c3910974-c3a5-4053-9dce-104aa7bb1621")
response_doc = json.loads(response.text)
assert isinstance(response_doc["downloads"], int)
assert isinstance(response_doc["id"], str)
assert isinstance(response_doc["views"], int)
assert response.status_code == 200
def test_get_missing_item(client):
"""Test requesting a single non-existing item."""
response = client.simulate_get("/item/c3910974-c3a5-4053-9dce-104aa7bb1620")
assert response.status_code == 404
def test_get_items(client):
"""Test requesting 100 items."""
response = client.simulate_get("/items", query_string="limit=100")
response_doc = json.loads(response.text)
assert isinstance(response_doc["currentPage"], int)
assert isinstance(response_doc["totalPages"], int)
assert isinstance(response_doc["statistics"], list)
assert response.status_code == 200
def test_get_items_invalid_limit(client):
"""Test requesting 100 items with an invalid limit parameter."""
response = client.simulate_get("/items", query_string="limit=101")
assert response.status_code == 400
def test_get_items_invalid_page(client):
"""Test requesting 100 items with an invalid page parameter."""
response = client.simulate_get("/items", query_string="page=-1")
assert response.status_code == 400