dspace-statistics-api

mirror of https://github.com/ilri/dspace-statistics-api.git synced 2024-11-14 10:27:04 +01:00

Author	SHA1	Message	Date
Alan Orth	e70a7a9675	Apply fixes from fixit RewriteToLiteral: It's slower to call list() than using the empty literal	2023-12-09 13:57:55 +03:00
Alan Orth	2f8e4f8a0a	Changes for Falcon 3.0.0 Mostly it seems we just need to use resp.text instead of resp.body, including in falcon-swagger-ui (I forked the upstream one to make this change). See: https://falcon.readthedocs.io/en/latest/changes/3.0.0.html	2021-04-06 08:30:28 +03:00
Alan Orth	0650c5985e	Add SPDX short license identifier to all Python files All checks were successful continuous-integration/drone/push Build is passing Details See: https://spdx.github.io/spdx-spec/appendix-V-using-SPDX-short-identifiers-in-source-files/	2021-03-22 13:42:42 +02:00
Alan Orth	05e0e8bdca	openapi.json: Set the API version from config We don't need to hard code this in the JSON anymore since we are reading and modifying it now for the server config anyways.	2020-12-27 12:48:13 +02:00
Alan Orth	2567bb8604	dspace_statistics_api/app.py: Format with black	2020-12-27 12:27:01 +02:00
Alan Orth	4f8cd1097b	Rework paging The "totalPages" value in our response is calculated incorrectly. Instead of casting to int and rounding, we should rather round up to the next integer with math.ceil. This is a more correct way to get the value. Also update the indexer to use the same logic, although there the values are printed with +1 so they are more readable.	2020-12-27 12:22:07 +02:00
Alan Orth	d1229c2387	Adjust docs at root Don't use a static HTML file anymore. Now I simply print an XHTML page from the Falcon resource. This way I can use variables to add in the API version as well as a link to the Swagger UI. The list of API calls is still present on the README.md, though in the long run I might move them to some dedicated documentation or a GitHub wiki.	2020-12-23 16:12:50 +02:00
Alan Orth	be83514de1	Re-work Swagger UI configuration All checks were successful continuous-integration/drone/push Build is passing Details It turns out that Swagger UI mostly does the "right" thing for our use cases here, but it assumes that API paths are relative to the root of the host where it is being served. This works in the local development environment because we are serving on "/", but it does not work in production where the API is deployed beneath the DSpace REST API, for example at "/rest/statistics". The solution here is to allow configuration of the DSpace Statistics API path and use that when registering the Swagger UI as well as in a new "server" block in the OpenAPI JSON schema. By default it is configured to work out of the box in a development environment. Set the DSPACE_STATISTICS_API_URL environment variable to something like "/rest/statistics" when running in production.	2020-12-23 13:25:17 +02:00
Alan Orth	70b2ba83ba	Allow configuration of Swagger and OpenAPI JSON URL All checks were successful continuous-integration/drone/push Build is passing Details When running in production your statistics API might be deployed to a path like /rest/statistics instead of at the root.	2020-12-22 12:50:03 +02:00
Alan Orth	4b1398c67f	Add /status route Currently this only prints the API version.	2020-12-22 11:30:09 +02:00
Alan Orth	a35ecf2394	Add Swagger UI on /swagger This includes a Swagger UI with an OpenAPI 3.0 JSON schema for easy interactive demonstration and testing of the API. The JSON schema was created with the standalone swagger-editor. Includes tests to make sure that the /swagger and /docs/openapi.json paths are acce- ssible.	2020-12-22 11:18:47 +02:00
Alan Orth	830e4415f5	dspace_statistics_api/app.py: Run isort	2020-12-20 16:29:35 +02:00
Alan Orth	47b4eb3df7	Rename items.py to stats.py It is no longer used only for item-related statistics functions.	2020-12-20 16:28:56 +02:00
Alan Orth	3339bf8d9c	Add communities and collections support to API The basic logic is similar to items, where you can request single item statistics with a UUID, all item statistics, and item statis- tics for a list of items (optionally with a date range). Most of the item code was re-purposed to work on "elements", which can be items, communities, or collections depending on the request, with the use of Falcon's `before` hooks to set the statistics scope so we know how to behave for the current request. Other than the minor difference in facet fields, another issue I had with communities and collections is that the owningComm and owningColl fields are multi-valued (unlike items' id field). This means that, when you facet the results of your query, Solr returns ids that seem unrelated, but are actually present in the field, so I had to make sure I checked all returned ids to see if they were in the user's POSTed elements list. TODO: - Add tests - Revise docstrings - Refactor items.py as it is now generic	2020-12-20 16:14:46 +02:00
Alan Orth	4dbf734a4b	Move all imports to top of file A few months ago I had an issue setting up mocking because I was trying to be clever importing these libraries only when I needed them rather than at the global scope. Someone pointed out to me that if the imports are at the top of the file Falcon will load them once when the WSGI server starts, whereas if they are in the on_get() or on_post() they will load for every request! Also, it seems that PEP8 recommends keeping imports at the top of the file anyways, so I will just do that. Imports sorted with isort. See: https://www.python.org/dev/peps/pep-0008/#imports	2020-12-18 22:42:06 +02:00
Alan Orth	4590fc8708	dspace_statistics_api/app.py: Use ORDER BY in /items Since we are paging through the results by limit/offset we need to be sure that we are returning results deterministically.	2020-12-17 10:10:40 +02:00
Alan Orth	ca1582a8b6	Make sure limit is between 1 and 100 We were not properly checking whether the limit was greater than 0 in all cases.	2020-11-02 21:59:20 +02:00
Alan Orth	5acd927210	dspace_statistics_api: Sort imports with isort	2020-10-06 15:12:13 +03:00
Alan Orth	d4518d62ad	dspace_statistics_api/app.py: Refactor for testability I thought it was clever to only import these in the on_post handler because they aren't needed elsewhere, but it turns out that this is not a common pattern and even causes problems with testability. First, if the imports are at the top of the file as PEP8 recommends, then the WSGI server will import them once when it loads the app and they remain in memory for the lifecycle of the app. If the imports are in the on_post handler they would be re-imported on every request! Second, this pattern of importing in a method makes it tricky to use object patching in mocks. See: https://www.python.org/dev/peps/pep-0008/#imports	2020-10-05 20:43:50 +03:00
Alan Orth	5a53b57b3b	Refactor `/items` POST handler to use a before hook This allows us to do the dirty work of parsing, validating, and setting local variables from the POST parameters outside of the on_post function. We then share the parameters via the req.context object. Functionally it is the same, but readability is better and it's a neat trick that I could use elsewhere. See: https://falcon.readthedocs.io/en/stable/user/faq.html#how-can-i-pass-data-from-a-hook-to-a-responder-and-between-hooks	2020-09-26 18:40:52 +03:00
Alan Orth	946f0749e2	dspace_statistics_api/app.py: Use bounded_stream in on_post For reasons I don't quite understand, we need to use bounded_stream in the on_post request handler in order to use simulate_post() with the testing client in Falcon 2.0.0. Normal runtime operation via gunicorn does not have any issues with stream. See: https://github.com/falconry/falcon/issues/1720 See: https://github.com/falconry/falcon/issues/1554	2020-09-26 11:50:57 +03:00
Alan Orth	73c71fa8a0	dspace_statistics_api: Add support for date ranges to /items You can now POST a JSON request to /items with a list of items and a date range. This allows the possibility to get view and download statistics for arbitrary items and arbitrary date ranges. The JSON request should be in the following format: { "limit": 100, "page": 0, "dateFrom": "2020-01-01T00:00:00Z", "dateTo": "2020-09-09T00:00:00Z", "items": [ "f44cf173-2344-4eb2-8f00-ee55df32c76f", "2324aa41-e9de-4a2b-bc36-16241464683e", "8542f9da-9ce1-4614-abf4-f2e3fdb4b305", "0fe573e7-042a-4240-a4d9-753b61233908" ] } The limit, page, and date parameters are all optional. By default it will use a limit of 100, page 0, and [* TO *] Solr date range.	2020-09-25 12:21:11 +03:00
Alan Orth	0ef071a91d	dspace_statistics_api: Use f-strings instead of format() We had previously been avoiding the f-strings because we needed to run on Python 3.5 and they were only available in Python 3.6+, but now the black formatter requires Python 3.6 and all our systems are running Python 3.6+ anyways.	2020-03-02 11:24:29 +02:00
Alan Orth	9e7dd28156	dspace_statistics_api/app.py: Use parameterized SQL queries This is a better way to run SQL queries because psycopg2 takes care of the quoting for us.	2020-03-02 11:16:05 +02:00
Alan Orth	5955868b9a	dspace_statistics_api/app.py: Use UUID DSpace 6+ uses a UUID for item identifiers instead of an integer so we need to adapt our PostgreSQL queries to use those. Note that we can no longer sort results in the "all items" endpoint by ID. Also, we need to use parameterized psycopg2 queries instead of strings to support queries with UUIDs properly. To use the Python UUID objects elsewhere in the code we need to make sure that we cast them to str.	2020-03-02 11:06:48 +02:00
Alan Orth	cb3c3d37fa	Sort imports with isort	2019-11-27 12:31:04 +02:00
Alan Orth	4ff1fd4a22	Format code with black	2019-11-27 12:30:06 +02:00
Alan Orth	5a3b392a1d	dspace_statistics_api/app.py: Fix Falcon 2.0 syntax See: dspace_statistics_api/app.py	2019-04-18 09:57:18 +03:00
Alan Orth	2f342be948	Refactor database code to use a context manager Instead of opening one global persistent database connection when the application I am now abstracting it to a class that I can use in combination with Python's "with" context. Both connections and cursors are kept for the context of each "with" block and closed automatically when exiting. See: https://alysivji.github.io/managing-resources-with-context-managers-pythonic.html See: http://initd.org/psycopg/docs/connection.html#connection.close	2018-11-07 17:41:21 +02:00
Alan Orth	cc5ce3ab98	Correct issues highlighted by Flake8 Flake8 validates code style against PEP 8 in order to encourage the writing of idiomatic Python. For reference, I am currently ignoring errors about line length (E501) because I feel it makes code harder to read. This is the invocation I am using: $ flake8 --ignore E501 dspace_statistics_api	2018-11-04 00:04:27 +02:00
Alan Orth	30dc7f1939	Add basic API documentation on root (/) I had imagined plugging in an interactive Swagger or OpenAPI instance here, but that's actually much more involved in Falcon than I want to deal with right now.	2018-11-01 00:19:39 +02:00
Alan Orth	2f45d27554	dspace_statistics_api/app.py: remove unused code This was added accidentally when I refactored. I was trying to see if I could use Falcon's on_exit() hook.	2018-10-28 11:14:21 +02:00
Alan Orth	b8356f7a87	Add "application" alias to API object By default gunicorn looks for an "application" object to run, so this saves us having to type api:app.	2018-10-28 11:14:21 +02:00
Alan Orth	c027f01b48	Refactor project structure This follows guidance from several well-known Python best practices guides. Basically, the idea is create a package for the application that is comprised of several re-usable modules. See: https://docs.python-guide.org/writing/structure/ See: https://realpython.com/python-application-layouts/	2018-10-28 11:14:21 +02:00

34 Commits