1
0
mirror of https://github.com/ilri/dspace-statistics-api.git synced 2025-05-10 15:16:02 +02:00

Compare commits

...

41 Commits

Author SHA1 Message Date
2b8aba5835 CHANGELOG.md: Move unreleased changes to v1.0.0 2019-04-15 10:39:48 +03:00
9eb30a98e3 Update requirements
Generated using pipenv:

  $ pipenv lock -r > requirements.txt
  $ pipenv lock -r -d > requirements-dev.txt
2019-04-15 10:31:19 +03:00
622e9a86f1 CHANGELOG.md: Add notes about Python updates 2019-04-15 10:30:29 +03:00
2acd08e0ab Use one-based paging in indexer output
It is easier for humans to understand one-based paging output like
"page 1 of 3" than "page 0 of 2" in the indexer.
2019-04-15 10:25:54 +03:00
f75bcf292c README.md: Remove TODO about SolrClient
I switched to using the vanilla requests library.
2019-04-15 10:24:24 +03:00
8f46ceb8d8 Refactor to use vanilla requests library
The SolrClient library is unmaintained, which is starting to cause
problems due to the moving Python ecosystem. Switching to requests
does not change my code in any meaningful way and makes maintenance
easier.
2019-04-15 10:19:50 +03:00
18e1e1a227 README.md: Add TODO about checking IDs in the database
Theoretically some items could be deleted and we should remove them
from the database.
2019-04-04 18:33:45 +03:00
fd46041698 README.md: Add build badge for sourcehut (sr.ht) 2019-03-17 23:45:33 +02:00
4ce7231ece CHANGELOG.md: Add unreleased changes 2019-03-17 23:40:51 +02:00
60689d9014 Disable emojis and animated output in CI
Makes for cleaner logs.

See: https://docs.travis-ci.com/user/environment-variables/
See: https://man.sr.ht/builds.sr.ht/manifest.md
2019-03-17 23:39:38 +02:00
7bca32189a .travis.yml: Use PostgreSQL 9.6
This matches what we're using in production.
2019-03-17 23:28:06 +02:00
94c5d91d3c CHANGELOG.md: Add unreleased changes 2019-03-17 22:51:39 +02:00
a640f734c8 Pipfile.lock: run pipenv update 2019-03-17 22:46:39 +02:00
d56a3420f7 README.md: Add TODO about SolrClient
SolrClient works, but hasn't been updated in some time and this is
starting to cause issues with some of its dependencies (kazoo). We
can probably get by with using Python requests library and getting
JSON directly from Solr.
2019-02-19 13:54:34 -08:00
7add0d6164 README.md: Add TODO about top items endpoint
This might be something useful that would be trivial to provide from
the data we already have in PostgreSQL.
2019-02-10 14:20:09 +02:00
c86bec4d8f .travis.yml: Use Ubuntu 16.04 xenial image
This is a newer userland and allows us to use Python 3.7, for example.

See: https://docs.travis-ci.com/user/reference/xenial/
2019-02-07 17:41:36 +02:00
5429fe5cc8 Update requirements
Generated from pipenv with:

  $ pipenv lock -r > requirements.txt
  $ pipenv lock -r -d > requirements-dev.txt
2019-02-07 17:39:50 +02:00
f8a4cfd3da CHANGELOG.md: Add notes about updated python modules 2019-02-07 17:30:08 +02:00
be94c94433 Pipfile.lock: Run pipenv update 2019-02-07 17:29:47 +02:00
ba49b78a25 CHANGELOG.md: Add build configuration for build.sr.ht
See: https://man.sr.ht/builds.sr.ht/
2019-02-07 17:28:41 +02:00
842f80036f .build.yml: Fix PostgreSQL import
When building on sr.ht the default environment is the home directory
so we need to change to the source directory before trying to import
the SQL file.
2019-02-07 17:25:19 +02:00
f738b8029b Rename sr.ht build.yml to .build.yml
This means git.sr.ht will trigger builds automatically on push.

See: https://man.sr.ht/builds.sr.ht/
2019-02-07 17:09:48 +02:00
d08c43f3d5 build.yml: Functioning build
Finally got this working after testing the manifest manually a few
times on the web UI.
2019-02-07 17:09:48 +02:00
819f8e6b0d Add build.yml for sr.ht
Trying to figure out how to run builds on this new platform.

See: https://man.sr.ht/builds.sr.ht/#build-manifests
2019-02-07 17:09:48 +02:00
c79e50a364 README.md: Add TODO about DSpace 6 UUIDs
I'm not sure how this will affect us, especially if we want to keep
support for DSpace 4, 5, and 6 in the same code base. At least the
REST API endpoint will have to change from an integer, our database
schema will have to change depending on whether the repository is
using IDs or UUIDs, and maybe even the Solr queries will change.
2019-02-07 16:52:36 +02:00
71006d8bbf README.md: Add citation 2019-01-23 16:19:58 +02:00
b7d723ef7c README.md: Fix sentence 2019-01-22 14:23:13 +02:00
914ec52fbb CHANGELOG.md: Move unrelease changes to 0.9.0 2019-01-22 09:02:29 +02:00
5524066656 CHANGELOG.md: Add note about catching errors 2019-01-22 09:01:54 +02:00
043d897cef dspace_statistics_api/indexer.py: Catch case of no views/downloads
Don't fail with an exception when there are no views or downloads,
for example on a new DSpace installation.
2019-01-22 09:00:22 +02:00
bd28353cda README.md: Remove TODO for fixing querying of shards 2019-01-22 08:41:39 +02:00
e23d66c2a2 CHANGELOG.md: Add note about fixing querying of sharded cores 2019-01-22 08:41:31 +02:00
40e284dac0 dspace_statistics_api/indexer.py: Query multiple shards
DSpace's stats-util script splits the Solr statistics core into yearly
shards. We need to use Solr's `shards` query parameter in order to get
the statistics for previous years. This commit adds a helper function
to enumerate the active Solr cores to find yearly shards matching the
statistics-YYYY pattern and add them to the query.
2019-01-22 08:39:36 +02:00
934fa9db9b README.md: Add TODO about sharded statistics cores 2019-01-21 12:55:43 +02:00
1fabb72b58 Update requirements
Generated from pipenv with:

  $ pipenv lock -r > requirements.txt
  $ pipenv lock -r -d > requirements-dev.txt
2019-01-16 12:34:50 +02:00
c7f95f0b60 README.md: Update TODO
I think it might be possible to compute community and collection
statistics from Solr and make them available at new endpoints:

  - /communities
  - /community/id
  - /collections
  - /collection/id
2019-01-16 09:59:29 +02:00
c95a98dd2d Pipfile.lock: update dependencies
Updated with `pipenv update`.
2019-01-15 10:22:46 +02:00
3f70f94a10 Pipfile.lock: Run pipenv update 2018-11-26 11:53:37 +02:00
9b8ad9defd Merge pull request #9 from ilri/pipenv-update
Pipenv update
2018-11-19 23:50:44 +02:00
d69ab20220 CHANGELOG.md: pytest version 4.0.0 2018-11-19 23:46:03 +02:00
378f56ddc2 Pipfile.lock: Run pipenv update 2018-11-19 23:34:34 +02:00
10 changed files with 339 additions and 160 deletions

24
.build.yml Normal file
View File

@ -0,0 +1,24 @@
image: archlinux
packages:
- python-pipenv
- postgresql
sources:
- https://git.sr.ht/~alanorth/dspace-statistics-api
tasks:
- setup: |
id
psql --version
sudo su - postgres -c "initdb --locale en_US.UTF-8 -E UTF8 -D '/var/lib/postgres/data'"
sudo systemctl start postgresql
createuser -U postgres dspacestatistics
psql -U postgres -c "ALTER USER dspacestatistics WITH PASSWORD 'dspacestatistics'"
createdb -U postgres -O dspacestatistics --encoding=UNICODE dspacestatistics
cd dspace-statistics-api
psql -U postgres -d dspacestatistics < tests/dspacestatistics.sql
pipenv install --dev
- test: |
cd dspace-statistics-api
pipenv run pytest
environment:
PIPENV_NOSPIN: 'True'
PIPENV_HIDE_EMOJIS: 'True'

View File

@ -1,10 +1,11 @@
dist: xenial
language: python
python:
- "3.5"
- "3.6"
- "3.7-dev"
- "3.7"
addons:
postgresql: "9.5"
postgresql: "9.6"
before_script:
- psql --version
- createuser -U postgres dspacestatistics
@ -15,5 +16,8 @@ install:
- "pip install pipenv --upgrade-strategy=only-if-needed"
- "pipenv install --dev"
script: pytest
env:
- PIPENV_NOSPIN=True
- PIPENV_HIDE_EMOJIS=True
# vim: ts=2 sw=2 et

View File

@ -4,6 +4,24 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [1.0.0] - 2019-04-15
### Added
- Build configuration for build.sr.ht
### Updated
- Run pipenv update, bringing pytest version 4.4.0, psycopg-binary 2.8.2, etc
- sr.ht and TravisCI configuration to disable emojis and animation to keep logs clean
### Changed
- Use vanilla requests library instead of SolrClient
- Use one-based paging in indexer output (for human readability)
## [0.9.0] - 2019-01-22
### Updated
- pytest version 4.0.0
- Fix indexing of sharded statistics cores ([#10))
- Handle case of missing views/downloads gracefully
## [0.8.1] - 2018-11-14
### Changed
- README.md to recommend using vanilla Python virtual environments and pip instead of pipenv

View File

@ -7,7 +7,7 @@ name = "pypi"
gunicorn = "*"
falcon = "*"
"psycopg2-binary" = "*"
solrclient = {ref = "kazoo-2.5.0", git = "https://github.com/alanorth/SolrClient.git"}
requests = "*"
[dev-packages]
ipython = "*"

219
Pipfile.lock generated
View File

@ -1,7 +1,7 @@
{
"_meta": {
"hash": {
"sha256": "a846fdab4de5765a7e7fc19424a97a6196248e29f87285cf81fd76e8e9ae3e28"
"sha256": "d18369b55f85594f6acfa49779909334a794a4002656508fc85acea43df2521d"
},
"pipfile-spec": 6,
"requires": {
@ -16,6 +16,20 @@
]
},
"default": {
"certifi": {
"hashes": [
"sha256:59b7658e26ca9c7339e00f8f4636cdfe59d34fa37b9b04f6f9e9926b3cece1a5",
"sha256:b26104d6835d1f5e49452a26eb2ff87fe7090b89dfcaee5ea2212697e1e1d7ae"
],
"version": "==2019.3.9"
},
"chardet": {
"hashes": [
"sha256:84ab92ed1c4d4f16916e05906b6b75a6c0fb5db821cc65e70cbd64a3e2a5eaae",
"sha256:fc323ffcaeaed0e0a02bf4d117757b98aed530d9ed4531e3e15460124c106691"
],
"version": "==3.0.4"
},
"falcon": {
"hashes": [
"sha256:0a66b33458fab9c1e400a9be1a68056abda178eb02a8cb4b8f795e9df20b053b",
@ -32,41 +46,46 @@
"index": "pypi",
"version": "==19.9.0"
},
"idna": {
"hashes": [
"sha256:c357b3f628cf53ae2c4c05627ecc484553142ca23264e593d327bcde5e9c3407",
"sha256:ea8b7f6188e6fa117537c3df7da9fc686d485087abf6ac197f9c46432f7e4a3c"
],
"version": "==2.8"
},
"psycopg2-binary": {
"hashes": [
"sha256:036bcb198a7cc4ce0fe43344f8c2c9a8155aefa411633f426c8c6ed58a6c0426",
"sha256:1d770fcc02cdf628aebac7404d56b28a7e9ebec8cfc0e63260bd54d6edfa16d4",
"sha256:1fdc6f369dcf229de6c873522d54336af598b9470ccd5300e2f58ee506f5ca13",
"sha256:21f9ddc0ff6e07f7d7b6b484eb9da2c03bc9931dd13e36796b111d631f7135a3",
"sha256:247873cda726f7956f745a3e03158b00de79c4abea8776dc2f611d5ba368d72d",
"sha256:3aa31c42f29f1da6f4fd41433ad15052d5ff045f2214002e027a321f79d64e2c",
"sha256:475f694f87dbc619010b26de7d0fc575a4accf503f2200885cc21f526bffe2ad",
"sha256:4b5e332a24bf6e2fda1f51ca2a57ae1083352293a08eeea1fa1112dc7dd542d1",
"sha256:570d521660574aca40be7b4d532dfb6f156aad7b16b5ed62d1534f64f1ef72d8",
"sha256:59072de7def0690dd13112d2bdb453e20570a97297070f876fbbb7cbc1c26b05",
"sha256:5f0b658989e918ef187f8a08db0420528126f2c7da182a7b9f8bf7f85144d4e4",
"sha256:649199c84a966917d86cdc2046e03d536763576c0b2a756059ae0b3a9656bc20",
"sha256:6645fc9b4705ae8fbf1ef7674f416f89ae1559deec810f6dd15197dfa52893da",
"sha256:6872dd54d4e398d781efe8fe2e2d7eafe4450d61b5c4898aced7610109a6df75",
"sha256:6ce34fbc251fc0d691c8d131250ba6f42fd2b28ef28558d528ba8c558cb28804",
"sha256:73920d167a0a4d1006f5f3b9a3efce6f0e5e883a99599d38206d43f27697df00",
"sha256:8a671732b87ae423e34b51139628123bc0306c2cb85c226e71b28d3d57d7e42a",
"sha256:8d517e8fda2efebca27c2018e14c90ed7dc3f04d7098b3da2912e62a1a5585fe",
"sha256:9475a008eb7279e20d400c76471843c321b46acacc7ee3de0b47233a1e3fa2cf",
"sha256:96947b8cd7b3148fb0e6549fcb31258a736595d6f2a599f8cd450e9a80a14781",
"sha256:abf229f24daa93f67ac53e2e17c8798a71a01711eb9fcdd029abba8637164338",
"sha256:b1ab012f276df584beb74f81acb63905762c25803ece647016613c3d6ad4e432",
"sha256:b22b33f6f0071fe57cb4e9158f353c88d41e739a3ec0d76f7b704539e7076427",
"sha256:b3b2d53274858e50ad2ffdd6d97ce1d014e1e530f82ec8b307edd5d4c921badf",
"sha256:bab26a729befc7b9fab9ded1bba9c51b785188b79f8a2796ba03e7e734269e2e",
"sha256:daa1a593629aa49f506eddc9d23dc7f89b35693b90e1fbcd4480182d1203ea90",
"sha256:dd111280ce40e89fd17b19c1269fd1b74a30fce9d44a550840e86edb33924eb8",
"sha256:e0b86084f1e2e78c451994410de756deba206884d6bed68d5a3d7f39ff5fea1d",
"sha256:eb86520753560a7e89639500e2a254bb6f683342af598088cb72c73edcad21e6",
"sha256:ff18c5c40a38d41811c23e2480615425c97ea81fd7e9118b8b899c512d97c737"
"sha256:007ca0df127b1862fc010125bc4100b7a630efc6841047bd11afceadb4754611",
"sha256:03c49e02adf0b4d68f422fdbd98f7a7c547beb27e99a75ed02298f85cb48406a",
"sha256:0a1232cdd314e08848825edda06600455ad2a7adaa463ebfb12ece2d09f3370e",
"sha256:131c80d0958c89273d9720b9adf9df1d7600bb3120e16019a7389ab15b079af5",
"sha256:2de34cc3b775724623f86617d2601308083176a495f5b2efc2bbb0da154f483a",
"sha256:2eddc31500f73544a2a54123d4c4b249c3c711d31e64deddb0890982ea37397a",
"sha256:484f6c62bdc166ee0e5be3aa831120423bf399786d1f3b0304526c86180fbc0b",
"sha256:4c2d9369ed40b4a44a8ccd6bc3a7db6272b8314812d2d1091f95c4c836d92e06",
"sha256:70f570b5fa44413b9f30dbc053d17ef3ce6a4100147a10822f8662e58d473656",
"sha256:7a2b5b095f3bd733aab101c89c0e1a3f0dfb4ebdc26f6374805c086ffe29d5b2",
"sha256:804914a669186e2843c1f7fbe12b55aad1b36d40a28274abe6027deffad9433d",
"sha256:8520c03172da18345d012949a53617a963e0191ccb3c666f23276d5326af27b5",
"sha256:90da901fc33ea393fc644607e4a3916b509387e9339ec6ebc7bfded45b7a0ae9",
"sha256:a582416ad123291a82c300d1d872bdc4136d69ad0b41d57dc5ca3df7ef8e3088",
"sha256:ac8c5e20309f4989c296d62cac20ee456b69c41fd1bc03829e27de23b6fa9dd0",
"sha256:b2cf82f55a619879f8557fdaae5cec7a294fac815e0087c4f67026fdf5259844",
"sha256:b59d6f8cfca2983d8fdbe457bf95d2192f7b7efdb2b483bf5fa4e8981b04e8b2",
"sha256:be08168197021d669b9964bd87628fa88f910b1be31e7010901070f2540c05fd",
"sha256:be0f952f1c365061041bad16e27e224e29615d4eb1fb5b7e7760a1d3d12b90b6",
"sha256:c1c9a33e46d7c12b9c96cf2d4349d783e3127163fd96254dcd44663cf0a1d438",
"sha256:d18c89957ac57dd2a2724ecfe9a759912d776f96ecabba23acb9ecbf5c731035",
"sha256:d7e7b0ff21f39433c50397e60bf0995d078802c591ca3b8d99857ea18a7496ee",
"sha256:da0929b2bf0d1f365345e5eb940d8713c1d516312e010135b14402e2a3d2404d",
"sha256:de24a4962e361c512d3e528ded6c7480eab24c655b8ca1f0b761d3b3650d2f07",
"sha256:e45f93ff3f7dae2202248cf413a87aeb330821bf76998b3cf374eda2fc893dd7",
"sha256:f046aeae1f7a845041b8661bb7a52449202b6c5d3fb59eb4724e7ca088811904",
"sha256:f1dc2b7b2748084b890f5d05b65a47cd03188824890e9a60818721fd492249fb",
"sha256:fcbe7cf3a786572b73d2cd5f34ed452a5f5fac47c9c9d1e0642c457a148f9f88"
],
"index": "pypi",
"version": "==2.7.6.1"
"version": "==2.8.2"
},
"python-mimeparse": {
"hashes": [
@ -75,32 +94,43 @@
],
"version": "==1.6.0"
},
"requests": {
"hashes": [
"sha256:502a824f31acdacb3a35b6690b5fbf0bc41d63a24a45c4004352b0242707598e",
"sha256:7bf2a778576d825600030a110f3c0e3e8edc51dfaafe1c146e39a2027784957b"
],
"index": "pypi",
"version": "==2.21.0"
},
"six": {
"hashes": [
"sha256:70e8a77beed4562e7f14fe23a786b54f6296e34344c23bc42f07b15018ff98e9",
"sha256:832dc0e10feb1aa2c68dcc57dbb658f1c7e65b9b61af69048abc87a2db00a0eb"
"sha256:3350809f0555b11f552448330d0b52d5f24c91a322ea4a15ef22629740f3761c",
"sha256:d16a0141ec1a18405cd4ce8b4613101da75da0e9a7aec5bdd4fa804d0e0eba73"
],
"version": "==1.11.0"
"version": "==1.12.0"
},
"solrclient": {
"git": "https://github.com/alanorth/SolrClient.git",
"ref": "c629e3475be37c82770b2be61748be7e29882648"
"urllib3": {
"hashes": [
"sha256:61bf29cada3fc2fbefad4fdf059ea4bd1b4a86d2b6d15e1c7c0b582b9752fe39",
"sha256:de9529817c93f27c8ccbfead6985011db27bd0ddfcdb2d86f3f663385c6a9c22"
],
"version": "==1.24.1"
}
},
"develop": {
"atomicwrites": {
"hashes": [
"sha256:0312ad34fcad8fac3704d441f7b317e50af620823353ec657a53e981f92920c0",
"sha256:ec9ae8adaae229e4f8446952d204a3e4b5fdd2d099f9be3aaf556120135fb3ee"
"sha256:03472c30eb2c5d1ba9227e4c2ca66ab8287fbfbbda3888aa93dc2e28fc6811b4",
"sha256:75a9445bac02d8d058d5e1fe689654ba5a6556a1dfd8ce6ec55a0ed79866cfa6"
],
"version": "==1.2.1"
"version": "==1.3.0"
},
"attrs": {
"hashes": [
"sha256:10cbf6e27dbce8c30807caf056c8eb50917e0eaafe86347671b57254006c3e69",
"sha256:ca4be454458f9dec299268d472aaa5a11f67a4ff70093396e1ceae9c76cf4bbb"
"sha256:69c0dbf2ed392de1cb5ec704444b08a5ef81680a61cb899dc08127123af36a79",
"sha256:f0b870f674851ecbfbbbd364d6b5cbdff9dcedbc7f3f5e18a6891057f21fe399"
],
"version": "==18.2.0"
"version": "==19.1.0"
},
"backcall": {
"hashes": [
@ -111,26 +141,33 @@
},
"decorator": {
"hashes": [
"sha256:2c51dff8ef3c447388fe5e4453d24a2bf128d3a4c32af3fabef1f01c6851ab82",
"sha256:c39efa13fbdeb4506c476c9b3babf6a718da943dab7811c206005a4a956c080c"
"sha256:86156361c50488b84a3f148056ea716ca587df2f0de1d34750d35c21312725de",
"sha256:f069f3a01830ca754ba5258fde2278454a0b5b79e0d7f5c13b3b97e57d4acff6"
],
"version": "==4.3.0"
"version": "==4.4.0"
},
"entrypoints": {
"hashes": [
"sha256:589f874b313739ad35be6e0cd7efde2a4e9b6fea91edcc34e58ecbb8dbe56d19",
"sha256:c70dd71abe5a8c85e55e12c19bd91ccfeec11a6e99044204511f9ed547d48451"
],
"version": "==0.3"
},
"flake8": {
"hashes": [
"sha256:6a35f5b8761f45c5513e3405f110a86bea57982c3b75b766ce7b65217abe1670",
"sha256:c01f8a3963b3571a8e6bd7a4063359aff90749e160778e03817cd9b71c9e07d2"
"sha256:859996073f341f2670741b51ec1e67a01da142831aa1fdc6242dbf88dffbe661",
"sha256:a796a115208f5c03b18f332f7c11729812c8c3ded6c46319c59b53efd3819da8"
],
"index": "pypi",
"version": "==3.6.0"
"version": "==3.7.7"
},
"ipython": {
"hashes": [
"sha256:a5781d6934a3341a1f9acb4ea5acdc7ea0a0855e689dbe755d070ca51e995435",
"sha256:b10a7ddd03657c761fc503495bc36471c8158e3fc948573fb9fe82a7029d8efd"
"sha256:b038baa489c38f6d853a3cfc4c635b0cda66f2864d136fe8f40c1a6e334e2a6b",
"sha256:f5102c1cd67e399ec8ea66bcebe6e3968ea25a8977e53f012963e5affeb1fe38"
],
"index": "pypi",
"version": "==7.1.1"
"version": "==7.4.0"
},
"ipython-genutils": {
"hashes": [
@ -141,10 +178,10 @@
},
"jedi": {
"hashes": [
"sha256:0191c447165f798e6a730285f2eee783fff81b0d3df261945ecb80983b5c3ca7",
"sha256:b7493f73a2febe0dc33d51c99b474547f7f6c0b2c8fb2b21f453eef204c12148"
"sha256:2bb0603e3506f708e792c7f4ad8fc2a7a9d9c2d292a358fbbd58da531695595b",
"sha256:2c6bcd9545c7d6440951b12b44d373479bf18123a401a52025cf98563fbd826c"
],
"version": "==0.13.1"
"version": "==0.13.3"
},
"mccabe": {
"hashes": [
@ -155,26 +192,26 @@
},
"more-itertools": {
"hashes": [
"sha256:c187a73da93e7a8acc0001572aebc7e3c69daf7bf6881a2cea10650bd4420092",
"sha256:c476b5d3a34e12d40130bc2f935028b5f636df8f372dc2c1c01dc19681b2039e",
"sha256:fcbfeaea0be121980e15bc97b3817b5202ca73d0eae185b4550cbfce2a3ebb3d"
"sha256:2112d2ca570bb7c3e53ea1a35cd5df42bb0fd10c45f0fb97178679c3c03d64c7",
"sha256:c3e4748ba1aad8dba30a4886b0b1a2004f9a863837b8654e7059eebf727afa5a"
],
"version": "==4.3.0"
"markers": "python_version > '2.7'",
"version": "==7.0.0"
},
"parso": {
"hashes": [
"sha256:35704a43a3c113cce4de228ddb39aab374b8004f4f2407d070b6a2ca784ce8a2",
"sha256:895c63e93b94ac1e1690f5fdd40b65f07c8171e3e53cbd7793b5b96c0e0a7f24"
"sha256:17cc2d7a945eb42c3569d4564cdf49bde221bc2b552af3eca9c1aad517dcdd33",
"sha256:2e9574cb12e7112a87253e14e2c380ce312060269d04bd018478a3c92ea9a376"
],
"version": "==0.3.1"
"version": "==0.4.0"
},
"pexpect": {
"hashes": [
"sha256:2a8e88259839571d1251d278476f3eec5db26deb73a70be5ed5dc5435e418aba",
"sha256:3fbd41d4caf27fa4a377bfd16fef87271099463e6fa73e92a52f92dfee5d425b"
"sha256:2094eefdfcf37a1fdbfb9aa090862c1a4878e5c7e0e7e7088bdb511c558e5cd1",
"sha256:9e2c1fd0e6ee3a49b28f95d4b33bc389c89b20af6a1255906e90ff1262ce62eb"
],
"markers": "sys_platform != 'win32'",
"version": "==4.6.0"
"version": "==4.7.0"
},
"pickleshare": {
"hashes": [
@ -185,18 +222,18 @@
},
"pluggy": {
"hashes": [
"sha256:447ba94990e8014ee25ec853339faf7b0fc8050cdc3289d4d71f7f410fb90095",
"sha256:bde19360a8ec4dfd8a20dcb811780a30998101f078fc7ded6162f0076f50508f"
"sha256:19ecf9ce9db2fce065a7a0586e07cfb4ac8614fe96edf628a264b1c70116cf8f",
"sha256:84d306a647cc805219916e62aab89caa97a33a1dd8c342e87a37f91073cd4746"
],
"version": "==0.8.0"
"version": "==0.9.0"
},
"prompt-toolkit": {
"hashes": [
"sha256:c1d6aff5252ab2ef391c2fe498ed8c088066f66bc64a8d5c095bbf795d9fec34",
"sha256:d4c47f79b635a0e70b84fdb97ebd9a274203706b1ee5ed44c10da62755cf3ec9",
"sha256:fd17048d8335c1e6d5ee403c3569953ba3eb8555d710bfc548faf0712666ea39"
"sha256:11adf3389a996a6d45cc277580d0d53e8a5afd281d0c9ec71b28e6f121463780",
"sha256:2519ad1d8038fd5fc8e770362237ad0364d16a7650fb5724af6997ed5515e3c1",
"sha256:977c6583ae813a37dc1c2e1b715892461fcbdaa57f6fc62f33a528c4886c8f55"
],
"version": "==2.0.7"
"version": "==2.0.9"
},
"ptyprocess": {
"hashes": [
@ -207,46 +244,46 @@
},
"py": {
"hashes": [
"sha256:bf92637198836372b520efcba9e020c330123be8ce527e535d185ed4b6f45694",
"sha256:e76826342cefe3c3d5f7e8ee4316b80d1dd8a300781612ddbc765c17ba25a6c6"
"sha256:64f65755aee5b381cea27766a3a147c3f15b9b6b9ac88676de66ba2ae36793fa",
"sha256:dc639b046a6e2cff5bbe40194ad65936d6ba360b52b3c3fe1d08a82dd50b5e53"
],
"version": "==1.7.0"
"version": "==1.8.0"
},
"pycodestyle": {
"hashes": [
"sha256:cbc619d09254895b0d12c2c691e237b2e91e9b2ecf5e84c26b35400f93dcfb83",
"sha256:cbfca99bd594a10f674d0cd97a3d802a1fdef635d4361e1a2658de47ed261e3a"
"sha256:95a2219d12372f05704562a14ec30bc76b05a5b297b21a5dfe3f6fac3491ae56",
"sha256:e40a936c9a450ad81df37f549d676d127b1b66000a6c500caa2b085bc0ca976c"
],
"version": "==2.4.0"
"version": "==2.5.0"
},
"pyflakes": {
"hashes": [
"sha256:9a7662ec724d0120012f6e29d6248ae3727d821bba522a0e6b356eff19126a49",
"sha256:f661252913bc1dbe7fcfcbf0af0db3f42ab65aabd1a6ca68fe5d466bace94dae"
"sha256:17dbeb2e3f4d772725c777fabc446d5634d1038f234e77343108ce445ea69ce0",
"sha256:d976835886f8c5b31d47970ed689944a0262b5f3afa00a5a7b4dc81e5449f8a2"
],
"version": "==2.0.0"
"version": "==2.1.1"
},
"pygments": {
"hashes": [
"sha256:78f3f434bcc5d6ee09020f92ba487f95ba50f1e3ef83ae96b9d5ffa1bab25c5d",
"sha256:dbae1046def0efb574852fab9e90209b23f556367b5a320c0bcb871c77c3e8cc"
"sha256:5ffada19f6203563680669ee7f53b64dabbeb100eb51b61996085e99c03b284a",
"sha256:e8218dd399a61674745138520d0d4cf2621d7e032439341bc3f647bff125818d"
],
"version": "==2.2.0"
"version": "==2.3.1"
},
"pytest": {
"hashes": [
"sha256:3f193df1cfe1d1609d4c583838bea3d532b18d6160fd3f55c9447fdca30848ec",
"sha256:e246cf173c01169b9617fc07264b7b1316e78d7a650055235d6d897bc80d9660"
"sha256:13c5e9fb5ec5179995e9357111ab089af350d788cbc944c628f3cde72285809b",
"sha256:f21d2f1fb8200830dcbb5d8ec466a9c9120e20d8b53c7585d180125cce1d297a"
],
"index": "pypi",
"version": "==3.10.1"
"version": "==4.4.0"
},
"six": {
"hashes": [
"sha256:70e8a77beed4562e7f14fe23a786b54f6296e34344c23bc42f07b15018ff98e9",
"sha256:832dc0e10feb1aa2c68dcc57dbb658f1c7e65b9b61af69048abc87a2db00a0eb"
"sha256:3350809f0555b11f552448330d0b52d5f24c91a322ea4a15ef22629740f3761c",
"sha256:d16a0141ec1a18405cd4ce8b4613101da75da0e9a7aec5bdd4fa804d0e0eba73"
],
"version": "==1.11.0"
"version": "==1.12.0"
},
"traitlets": {
"hashes": [

View File

@ -1,7 +1,11 @@
# DSpace Statistics API [![Build Status](https://travis-ci.org/ilri/dspace-statistics-api.svg?branch=master)](https://travis-ci.org/ilri/dspace-statistics-api)
DSpace stores item view and download events in a Solr "statistics" core. This information is available for use in the various DSpace user interfaces, but is not exposed externally via any APIs. The DSpace 4+ [REST API](https://wiki.duraspace.org/display/DSDOC5x/REST+API), for example, only exposes information about communities, collections, item metadata, and bitstreams.
# DSpace Statistics API [![Build Status](https://travis-ci.org/ilri/dspace-statistics-api.svg?branch=master)](https://travis-ci.org/ilri/dspace-statistics-api) [![builds.sr.ht status](https://builds.sr.ht/~alanorth/dspace-statistics-api.svg)](https://builds.sr.ht/~alanorth/dspace-statistics-api?)
DSpace stores item view and download events in a Solr "statistics" core. This information is available for use in the various DSpace user interfaces, but is not exposed externally via any APIs. The DSpace 4/5 [REST API](https://wiki.duraspace.org/display/DSDOC5x/REST+API), for example, only exposes information about communities, collections, item metadata, and bitstreams.
This project contains an indexer and a [Falcon-based](https://falcon.readthedocs.io/) web application to make the statistics available via simple REST API. You can read more about the Solr queries used to gather the item view and download statistics on the [DSpace wiki](https://wiki.duraspace.org/display/DSPACE/Solr).
This project contains an indexer and a [Falcon-based](https://falcon.readthedocs.io/) web application to make the statistics available via a simple REST API. You can read more about the Solr queries used to gather the item view and download statistics on the [DSpace wiki](https://wiki.duraspace.org/display/DSPACE/Solr).
If you use the DSpace Statistics API please cite:
*Orth, A. 2018. DSpace statistics API. Nairobi, Kenya: ILRI. https://hdl.handle.net/10568/99143.*
## Requirements
@ -85,7 +89,11 @@ The item id is the *internal* id for an item. You can get these from the standar
- Better logging
- Version API
- Use JSON in PostgreSQL
- Add top items endpoint, perhaps `/top/items` or `/items/top`?
- Make community and collection stats available
- Support [DSpace 6 UUIDs](https://jira.duraspace.org/browse/DS-1782)
- Switch to [Python 3.6+ f-string syntax](https://realpython.com/python-f-strings/)
- Check IDs in database to see if they are deleted...
## License
This work is licensed under the [GPLv3](https://www.gnu.org/licenses/gpl-3.0.en.html).

View File

@ -29,10 +29,61 @@
# See: https://solrclient.readthedocs.io/en/latest/SolrClient.html
# See: https://wiki.duraspace.org/display/DSPACE/Solr
from .config import SOLR_SERVER
from .database import DatabaseManager
import json
import psycopg2.extras
from .solr import solr_connection
import re
import requests
# Enumerate the cores in Solr to determine if statistics have been sharded into
# yearly shards by DSpace's stats-util or not (for example: statistics-2018).
def get_statistics_shards():
# Initialize an empty list for statistics core years
statistics_core_years = []
# URL for Solr status to check active cores
solr_query_params = {
'action': 'STATUS',
'wt': 'json'
}
solr_url = SOLR_SERVER + '/admin/cores'
res = requests.get(solr_url, params=solr_query_params)
if res.status_code == requests.codes.ok:
data = res.json()
# Iterate over active cores from Solr's STATUS response (cores are in
# the status array of this response).
for core in data['status']:
# Pattern to match, for example: statistics-2018
pattern = re.compile('^statistics-[0-9]{4}$')
if not pattern.match(core):
continue
# Append current core to list
statistics_core_years.append(core)
# Initialize a string to hold our shards (may end up being empty if the Solr
# core has not been processed by stats-util).
shards = str()
if len(statistics_core_years) > 0:
# Begin building a string of shards starting with the default one
shards = '{}/statistics'.format(SOLR_SERVER)
for core in statistics_core_years:
# Create a comma-separated list of shards to pass to our Solr query
#
# See: https://wiki.apache.org/solr/DistributedSearch
shards += ',{}/{}'.format(SOLR_SERVER, core)
# Return the string of shards, which may actually be empty. Solr doesn't
# seem to mind if the shards query parameter is empty and I haven't seen
# any negative performance impact so this should be fine.
return shards
def index_views():
@ -42,21 +93,33 @@ def index_views():
# so we can get the countDistinct summary.
#
# see: https://lucene.apache.org/solr/guide/6_6/the-stats-component.html
res = solr.query('statistics', {
solr_query_params = {
'q': 'type:2',
'fq': 'isBot:false AND statistics_type:view',
'facet': True,
'facet': 'true',
'facet.field': 'id',
'facet.mincount': 1,
'facet.limit': 1,
'facet.offset': 0,
'stats': True,
'stats': 'true',
'stats.field': 'id',
'stats.calcdistinct': True
}, rows=0)
'stats.calcdistinct': 'true',
'shards': shards,
'rows': 0,
'wt': 'json'
}
# get total number of distinct facets (countDistinct)
results_totalNumFacets = json.loads(res.get_json())['stats']['stats_fields']['id']['countDistinct']
solr_url = SOLR_SERVER + '/statistics/select'
res = requests.get(solr_url, params=solr_query_params)
try:
# get total number of distinct facets (countDistinct)
results_totalNumFacets = res.json()['stats']['stats_fields']['id']['countDistinct']
except TypeError:
print('No item views to index, exiting.')
exit(0)
# divide results into "pages" (cast to int to effectively round down)
results_per_page = 100
@ -69,21 +132,30 @@ def index_views():
data = []
while results_current_page <= results_num_pages:
print('Indexing item views (page {} of {})'.format(results_current_page, results_num_pages))
# "pages" are zero based, but one based is more human readable
print('Indexing item views (page {} of {})'.format(results_current_page + 1, results_num_pages + 1))
res = solr.query('statistics', {
solr_query_params = {
'q': 'type:2',
'fq': 'isBot:false AND statistics_type:view',
'facet': True,
'facet': 'true',
'facet.field': 'id',
'facet.mincount': 1,
'facet.limit': results_per_page,
'facet.offset': results_current_page * results_per_page
}, rows=0)
'facet.offset': results_current_page * results_per_page,
'shards': shards,
'rows': 0,
'wt': 'json',
'json.nl': 'map' # return facets as a dict instead of a flat list
}
# SolrClient's get_facets() returns a dict of dicts
views = res.get_facets()
# in this case iterate over the 'id' dict and get the item ids and views
solr_url = SOLR_SERVER + '/statistics/select'
res = requests.get(solr_url, params=solr_query_params)
# Solr returns facets as a dict of dicts (see json.nl parameter)
views = res.json()['facet_counts']['facet_fields']
# iterate over the 'id' dict and get the item ids and views
for item_id, item_views in views['id'].items():
data.append((item_id, item_views))
@ -100,21 +172,33 @@ def index_views():
def index_downloads():
# get the total number of distinct facets for items with at least 1 download
res = solr.query('statistics', {
solr_query_params= {
'q': 'type:0',
'fq': 'isBot:false AND statistics_type:view AND bundleName:ORIGINAL',
'facet': True,
'facet': 'true',
'facet.field': 'owningItem',
'facet.mincount': 1,
'facet.limit': 1,
'facet.offset': 0,
'stats': True,
'stats': 'true',
'stats.field': 'owningItem',
'stats.calcdistinct': True
}, rows=0)
'stats.calcdistinct': 'true',
'shards': shards,
'rows': 0,
'wt': 'json'
}
# get total number of distinct facets (countDistinct)
results_totalNumFacets = json.loads(res.get_json())['stats']['stats_fields']['owningItem']['countDistinct']
solr_url = SOLR_SERVER + '/statistics/select'
res = requests.get(solr_url, params=solr_query_params)
try:
# get total number of distinct facets (countDistinct)
results_totalNumFacets = res.json()['stats']['stats_fields']['owningItem']['countDistinct']
except TypeError:
print('No item downloads to index, exiting.')
exit(0)
# divide results into "pages" (cast to int to effectively round down)
results_per_page = 100
@ -127,21 +211,30 @@ def index_downloads():
data = []
while results_current_page <= results_num_pages:
print('Indexing item downloads (page {} of {})'.format(results_current_page, results_num_pages))
# "pages" are zero based, but one based is more human readable
print('Indexing item downloads (page {} of {})'.format(results_current_page + 1, results_num_pages + 1))
res = solr.query('statistics', {
solr_query_params = {
'q': 'type:0',
'fq': 'isBot:false AND statistics_type:view AND bundleName:ORIGINAL',
'facet': True,
'facet': 'true',
'facet.field': 'owningItem',
'facet.mincount': 1,
'facet.limit': results_per_page,
'facet.offset': results_current_page * results_per_page
}, rows=0)
'facet.offset': results_current_page * results_per_page,
'shards': shards,
'rows': 0,
'wt': 'json',
'json.nl': 'map' # return facets as a dict instead of a flat list
}
# SolrClient's get_facets() returns a dict of dicts
downloads = res.get_facets()
# in this case iterate over the 'owningItem' dict and get the item ids and downloads
solr_url = SOLR_SERVER + '/statistics/select'
res = requests.get(solr_url, params=solr_query_params)
# Solr returns facets as a dict of dicts (see json.nl parameter)
downloads = res.json()['facet_counts']['facet_fields']
# iterate over the 'owningItem' dict and get the item ids and downloads
for item_id, item_downloads in downloads['owningItem'].items():
data.append((item_id, item_downloads))
@ -156,8 +249,6 @@ def index_downloads():
results_current_page += 1
solr = solr_connection()
with DatabaseManager() as db:
with db.cursor() as cursor:
# create table to store item views and downloads
@ -167,6 +258,8 @@ with DatabaseManager() as db:
# commit the table creation before closing the database connection
db.commit()
shards = get_statistics_shards()
index_views()
index_downloads()

View File

@ -1,10 +0,0 @@
from .config import SOLR_SERVER
from SolrClient import SolrClient
def solr_connection():
connection = SolrClient(SOLR_SERVER)
return connection
# vim: set sw=4 ts=4 expandtab:

View File

@ -1,25 +1,26 @@
-i https://pypi.org/simple
atomicwrites==1.2.1
attrs==18.2.0
atomicwrites==1.3.0
attrs==19.1.0
backcall==0.1.0
decorator==4.3.0
flake8==3.6.0
decorator==4.4.0
entrypoints==0.3
flake8==3.7.7
ipython-genutils==0.2.0
ipython==7.1.1
jedi==0.13.1
ipython==7.4.0
jedi==0.13.3
mccabe==0.6.1
more-itertools==4.3.0
parso==0.3.1
pexpect==4.6.0 ; sys_platform != 'win32'
more-itertools==7.0.0 ; python_version > '2.7'
parso==0.4.0
pexpect==4.7.0 ; sys_platform != 'win32'
pickleshare==0.7.5
pluggy==0.8.0
prompt-toolkit==2.0.7
pluggy==0.9.0
prompt-toolkit==2.0.9
ptyprocess==0.6.0
py==1.7.0
pycodestyle==2.4.0
pyflakes==2.0.0
pygments==2.2.0
pytest==3.10.1
six==1.11.0
py==1.8.0
pycodestyle==2.5.0
pyflakes==2.1.1
pygments==2.3.1
pytest==4.4.0
six==1.12.0
traitlets==4.3.2
wcwidth==0.1.7

View File

@ -1,7 +1,11 @@
-i https://pypi.org/simple
certifi==2019.3.9
chardet==3.0.4
falcon==1.4.1
git+https://github.com/alanorth/SolrClient.git@c629e3475be37c82770b2be61748be7e29882648#egg=solrclient
gunicorn==19.9.0
psycopg2-binary==2.7.6.1
idna==2.8
psycopg2-binary==2.8.2
python-mimeparse==1.6.0
six==1.11.0
requests==2.21.0
six==1.12.0
urllib3==1.24.1