Commit Graph

91 Commits

Author SHA1 Message Date
Alan Orth e70a7a9675
Apply fixes from fixit
RewriteToLiteral: It's slower to call list() than using the empty literal
2023-12-09 13:57:55 +03:00
Alan Orth fe9f98bcc0
dspace_statistics_api/util.py: format with black
continuous-integration/drone/push Build is passing Details
continuous-integration/drone Build is passing Details
2023-05-30 16:04:18 +03:00
Alan Orth c3b9a541b7
Bump version to 1.4.4-dev
continuous-integration/drone/push Build is passing Details
2022-03-26 19:10:47 +03:00
Alan Orth 1a1a14a25f
Version 1.4.3 2022-03-26 19:08:56 +03:00
Alan Orth a524068cf6
Bump version to 1.4.3-dev
continuous-integration/drone/push Build is passing Details
2021-04-15 14:44:44 +03:00
Alan Orth 964d5dff06
Version 1.4.2 2021-04-15 14:23:07 +03:00
Alan Orth 2f8e4f8a0a Changes for Falcon 3.0.0
Mostly it seems we just need to use resp.text instead of resp.body,
including in falcon-swagger-ui (I forked the upstream one to make
this change).

See: https://falcon.readthedocs.io/en/latest/changes/3.0.0.html
2021-04-06 08:30:28 +03:00
Alan Orth 0650c5985e
Add SPDX short license identifier to all Python files
continuous-integration/drone/push Build is passing Details
See: https://spdx.github.io/spdx-spec/appendix-V-using-SPDX-short-identifiers-in-source-files/
2021-03-22 13:42:42 +02:00
Alan Orth 80a11ead97
Version 1.4.1
continuous-integration/drone/push Build is passing Details
2021-01-14 14:19:50 +02:00
Alan Orth 49751b53f0
dspace_statistics_api/indexer.py: Limit to UUIDs
We need to make sure that the indexer only tries to index UUIDs, as
opposed to legacy IDs that may have been left over from a migration
from earlier DSpace versions. For example, "98110-unmigrated", "-1"
etc.

For matching the UUIDs in Solr I decided that it is sufficient for
our use case to simply match thirty-six characters, where a UUID is
composed of thirty-two hexadecimal characters and four dashes. We
don't need to do any verification of "real" UUIDs because it would
be needlessly complex in our case.

See: https://github.com/ilri/dspace-statistics-api/issues/12
2021-01-05 12:30:27 +02:00
Alan Orth 33dc210452
dspace_statistics_api/docs/openapi.json: Minor edit
Better to leave the version in there because Swagger Editor doesn't
like it without. Also, change the example page parameter for POSTing
to /items and /collections, as it doesn't make sense to start on a
later page if we have less items than our limit.
2020-12-27 13:53:59 +02:00
Alan Orth 282d5f644a
Move unreleased change to v1.4.0 2020-12-27 12:52:24 +02:00
Alan Orth 05e0e8bdca
openapi.json: Set the API version from config
We don't need to hard code this in the JSON anymore since we are
reading and modifying it now for the server config anyways.
2020-12-27 12:48:13 +02:00
Alan Orth 2567bb8604
dspace_statistics_api/app.py: Format with black 2020-12-27 12:27:01 +02:00
Alan Orth 4f8cd1097b
Rework paging
The "totalPages" value in our response is calculated incorrectly.
Instead of casting to int and rounding, we should rather round up
to the next integer with math.ceil. This is a more correct way to
get the value.

Also update the indexer to use the same logic, although there the
values are printed with +1 so they are more readable.
2020-12-27 12:22:07 +02:00
Alan Orth d1229c2387
Adjust docs at root
Don't use a static HTML file anymore. Now I simply print an XHTML
page from the Falcon resource. This way I can use variables to add
in the API version as well as a link to the Swagger UI.

The list of API calls is still present on the README.md, though in
the long run I might move them to some dedicated documentation or
a GitHub wiki.
2020-12-23 16:12:50 +02:00
Alan Orth be83514de1
Re-work Swagger UI configuration
continuous-integration/drone/push Build is passing Details
It turns out that Swagger UI mostly does the "right" thing for our
use cases here, but it assumes that API paths are relative to the
root of the host where it is being served. This works in the local
development environment because we are serving on "/", but it does
not work in production where the API is deployed beneath the DSpace
REST API, for example at "/rest/statistics".

The solution here is to allow configuration of the DSpace Statistics
API path and use that when registering the Swagger UI as well as in
a new "server" block in the OpenAPI JSON schema.

By default it is configured to work out of the box in a development
environment. Set the DSPACE_STATISTICS_API_URL environment variable
to something like "/rest/statistics" when running in production.
2020-12-23 13:25:17 +02:00
Alan Orth 70b2ba83ba
Allow configuration of Swagger and OpenAPI JSON URL
continuous-integration/drone/push Build is passing Details
When running in production your statistics API might be deployed to
a path like /rest/statistics instead of at the root.
2020-12-22 12:50:03 +02:00
Alan Orth 68418ea053
dspace_statistics_api/docs/openapi.json: Add /status
Add a /status to the Swagger UI schema.
2020-12-22 11:41:47 +02:00
Alan Orth 6bbee7919e
Bump version to 1.4.0-dev 2020-12-22 11:31:46 +02:00
Alan Orth 4b1398c67f
Add /status route
Currently this only prints the API version.
2020-12-22 11:30:09 +02:00
Alan Orth a35ecf2394
Add Swagger UI on /swagger
This includes a Swagger UI with an OpenAPI 3.0 JSON schema for easy
interactive demonstration and testing of the API. The JSON schema
was created with the standalone swagger-editor. Includes tests to
make sure that the /swagger and /docs/openapi.json paths are acce-
ssible.
2020-12-22 11:18:47 +02:00
Alan Orth ab82e90773
dspace_statistics_api/stats.py: Use -isBot:true
continuous-integration/drone/push Build is passing Details
Minor change to bot filtering. We should use a negated match for
documents that have `isBot:true` rather than looking for documents
that are tagged with `isBot:false` (the distinction is subtle, but
important).
2020-12-20 16:56:03 +02:00
Alan Orth 8a1244d2d0
Update changelog and docs 2020-12-20 16:45:49 +02:00
Alan Orth 04f0756c7f
dspace_statistics_api/util.py: Add vim modeline 2020-12-20 16:31:52 +02:00
Alan Orth 830e4415f5
dspace_statistics_api/app.py: Run isort 2020-12-20 16:29:35 +02:00
Alan Orth 47b4eb3df7
Rename items.py to stats.py
It is no longer used only for item-related statistics functions.
2020-12-20 16:28:56 +02:00
Alan Orth 3339bf8d9c
Add communities and collections support to API
The basic logic is similar to items, where you can request single
item statistics with a UUID, all item statistics, and item statis-
tics for a list of items (optionally with a date range). Most of
the item code was re-purposed to work on "elements", which can be
items, communities, or collections depending on the request, with
the use of Falcon's `before` hooks to set the statistics scope so
we know how to behave for the current request.

Other than the minor difference in facet fields, another issue I
had with communities and collections is that the owningComm and
owningColl fields are multi-valued (unlike items' id field). This
means that, when you facet the results of your query, Solr returns
ids that seem unrelated, but are actually present in the field, so
I had to make sure I checked all returned ids to see if they were
in the user's POSTed elements list.

TODO:
  - Add tests
  - Revise docstrings
  - Refactor items.py as it is now generic
2020-12-20 16:14:46 +02:00
Alan Orth 20c8ba0cf8 indexer.py: Add support for communities and collections
The logic to get views and downloads is very similar to that used
for items, but we facet by different fields. This uses a generic
function for indexing that takes an "indexType" and a "facetField"
parameter. The indexType parameter controls which database table
to insert into, and the facetField parameter indicates which field
to facet by in Solr.
2020-12-18 22:53:16 +02:00
Alan Orth b486f51dd7 indexer.py: Rename index functions for items
Start making plans for indexing communities and collections.
2020-12-18 22:53:16 +02:00
Alan Orth 9e6fcf279b
dspace_statistics_api/items.py: Format with black 2020-12-18 22:45:39 +02:00
Alan Orth 4dbf734a4b
Move all imports to top of file
A few months ago I had an issue setting up mocking because I was
trying to be clever importing these libraries only when I needed
them rather than at the global scope. Someone pointed out to me
that if the imports are at the top of the file Falcon will load
them once when the WSGI server starts, whereas if they are in the
on_get() or on_post() they will load for every request! Also, it
seems that PEP8 recommends keeping imports at the top of the file
anyways, so I will just do that.

Imports sorted with isort.

See: https://www.python.org/dev/peps/pep-0008/#imports
2020-12-18 22:42:06 +02:00
Alan Orth a0d0a47150
items.py: Add fl paramter to Solr queries
I forgot to add the fl parameter here as well.
2020-12-18 16:12:34 +02:00
Alan Orth 4bbbaa4af3
dspace_statistics_api/indexer.py: Use `fl` parameter
continuous-integration/drone/push Build is passing Details
I forgot to add the fl parameter to the downloads function.
2020-12-18 10:44:02 +02:00
Alan Orth 2407aeec70
dspace_statistics_api/indexer.py: Use `fl` parameter
When indexing item views and downloads the only field we need is the
the id. The `fl` parameter tells Solr which fields to return in the
search results. This should theoretically be more efficient, though
I don't have any time to figure out how to measure it right now.
2020-12-17 12:25:28 +02:00
Alan Orth 4590fc8708
dspace_statistics_api/app.py: Use ORDER BY in /items
Since we are paging through the results by limit/offset we need to
be sure that we are returning results deterministically.
2020-12-17 10:10:40 +02:00
Alan Orth 930250352a
Update docs about POST /items 2020-12-13 20:09:20 +02:00
Alan Orth 3125e96a16
Bump version to 1.3.2 2020-11-18 22:01:18 +02:00
Alan Orth 88a8db6c78
Make sure limit is between 1 and 100
We were not properly checking whether the limit was actually less
than or equal to 100.
2020-11-18 21:55:54 +02:00
Alan Orth 810508d038
dspace_statistics_api/indexer.py: Use -isBot:true
Minor change to bot filtering. We should use a negated match for
documents that have `isBot:true` rather than looking for documents
that are tagged with `isBot:false` (the distinction is subtle, but
important).
2020-11-17 17:40:08 +02:00
Alan Orth 2d6520fc97
Fix limit in docs 2020-11-02 22:14:08 +02:00
Alan Orth ca1582a8b6
Make sure limit is between 1 and 100
We were not properly checking whether the limit was greater than 0
in all cases.
2020-11-02 21:59:20 +02:00
Alan Orth 549b8bf1a7
dspace_statistics_api/docs/index.html: Fix version
We need to print it in the body, not the title.
2020-10-06 22:22:11 +03:00
Alan Orth 899a79b2e7
Version 1.3.1 2020-10-06 22:15:52 +03:00
Alan Orth 4e9064329d
Bump version to 1.3.0 2020-10-06 21:33:38 +03:00
Alan Orth 5acd927210
dspace_statistics_api: Sort imports with isort 2020-10-06 15:12:13 +03:00
Alan Orth 630fa0d5fb
dspace_statistics_api/util.py: Fix f-strings
flake8 raised this warning:

    F541 f-string is missing placeholders
2020-10-06 15:11:12 +03:00
Alan Orth 58d2b8d4ed
dspace_statistics_api/items.py: Move util import
Move util import from global scope because it causes tests to fail.
We don't need the set up the Solr connection unless we're actually
trying to use the get_views and get_downloads methods, either when
running the API in production or during tests where the connection
has been set up.
2020-10-06 15:07:00 +03:00
Alan Orth d4518d62ad
dspace_statistics_api/app.py: Refactor for testability
I thought it was clever to only import these in the on_post handler
because they aren't needed elsewhere, but it turns out that this is
not a common pattern and even causes problems with testability.

First, if the imports are at the top of the file as PEP8 recommends,
then the WSGI server will import them once when it loads the app and
they remain in memory for the lifecycle of the app. If the imports
are in the on_post handler they would be re-imported on every request!

Second, this pattern of importing in a method makes it tricky to use
object patching in mocks.

See: https://www.python.org/dev/peps/pep-0008/#imports
2020-10-05 20:43:50 +03:00
Alan Orth 3a98de78e3
dspace_statistics_api/items.py: Remove executable bit
We don't need to execute this on the command line.
2020-10-05 14:33:36 +03:00