dspace-statistics-api

mirror of https://github.com/ilri/dspace-statistics-api.git synced 2025-09-16 08:26:45 +02:00

Author	SHA1	Message	Date
Alan Orth	2f8e4f8a0a	Changes for Falcon 3.0.0 Mostly it seems we just need to use resp.text instead of resp.body, including in falcon-swagger-ui (I forked the upstream one to make this change). See: https://falcon.readthedocs.io/en/latest/changes/3.0.0.html	2021-04-06 08:30:28 +03:00
Alan Orth	0650c5985e	Add SPDX short license identifier to all Python files All checks were successful continuous-integration/drone/push Build is passing Details See: https://spdx.github.io/spdx-spec/appendix-V-using-SPDX-short-identifiers-in-source-files/	2021-03-22 13:42:42 +02:00
Alan Orth	80a11ead97	Version 1.4.1 All checks were successful continuous-integration/drone/push Build is passing Details	2021-01-14 14:19:50 +02:00
Alan Orth	49751b53f0	dspace_statistics_api/indexer.py: Limit to UUIDs We need to make sure that the indexer only tries to index UUIDs, as opposed to legacy IDs that may have been left over from a migration from earlier DSpace versions. For example, "98110-unmigrated", "-1" etc. For matching the UUIDs in Solr I decided that it is sufficient for our use case to simply match thirty-six characters, where a UUID is composed of thirty-two hexadecimal characters and four dashes. We don't need to do any verification of "real" UUIDs because it would be needlessly complex in our case. See: https://github.com/ilri/dspace-statistics-api/issues/12	2021-01-05 12:30:27 +02:00
Alan Orth	33dc210452	dspace_statistics_api/docs/openapi.json: Minor edit Better to leave the version in there because Swagger Editor doesn't like it without. Also, change the example page parameter for POSTing to /items and /collections, as it doesn't make sense to start on a later page if we have less items than our limit.	2020-12-27 13:53:59 +02:00
Alan Orth	282d5f644a	Move unreleased change to v1.4.0	2020-12-27 12:52:24 +02:00
Alan Orth	05e0e8bdca	openapi.json: Set the API version from config We don't need to hard code this in the JSON anymore since we are reading and modifying it now for the server config anyways.	2020-12-27 12:48:13 +02:00
Alan Orth	2567bb8604	dspace_statistics_api/app.py: Format with black	2020-12-27 12:27:01 +02:00
Alan Orth	4f8cd1097b	Rework paging The "totalPages" value in our response is calculated incorrectly. Instead of casting to int and rounding, we should rather round up to the next integer with math.ceil. This is a more correct way to get the value. Also update the indexer to use the same logic, although there the values are printed with +1 so they are more readable.	2020-12-27 12:22:07 +02:00
Alan Orth	d1229c2387	Adjust docs at root Don't use a static HTML file anymore. Now I simply print an XHTML page from the Falcon resource. This way I can use variables to add in the API version as well as a link to the Swagger UI. The list of API calls is still present on the README.md, though in the long run I might move them to some dedicated documentation or a GitHub wiki.	2020-12-23 16:12:50 +02:00
Alan Orth	be83514de1	Re-work Swagger UI configuration All checks were successful continuous-integration/drone/push Build is passing Details It turns out that Swagger UI mostly does the "right" thing for our use cases here, but it assumes that API paths are relative to the root of the host where it is being served. This works in the local development environment because we are serving on "/", but it does not work in production where the API is deployed beneath the DSpace REST API, for example at "/rest/statistics". The solution here is to allow configuration of the DSpace Statistics API path and use that when registering the Swagger UI as well as in a new "server" block in the OpenAPI JSON schema. By default it is configured to work out of the box in a development environment. Set the DSPACE_STATISTICS_API_URL environment variable to something like "/rest/statistics" when running in production.	2020-12-23 13:25:17 +02:00
Alan Orth	70b2ba83ba	Allow configuration of Swagger and OpenAPI JSON URL All checks were successful continuous-integration/drone/push Build is passing Details When running in production your statistics API might be deployed to a path like /rest/statistics instead of at the root.	2020-12-22 12:50:03 +02:00
Alan Orth	68418ea053	dspace_statistics_api/docs/openapi.json: Add /status Add a /status to the Swagger UI schema.	2020-12-22 11:41:47 +02:00
Alan Orth	6bbee7919e	Bump version to 1.4.0-dev	2020-12-22 11:31:46 +02:00
Alan Orth	4b1398c67f	Add /status route Currently this only prints the API version.	2020-12-22 11:30:09 +02:00
Alan Orth	a35ecf2394	Add Swagger UI on /swagger This includes a Swagger UI with an OpenAPI 3.0 JSON schema for easy interactive demonstration and testing of the API. The JSON schema was created with the standalone swagger-editor. Includes tests to make sure that the /swagger and /docs/openapi.json paths are acce- ssible.	2020-12-22 11:18:47 +02:00
Alan Orth	ab82e90773	dspace_statistics_api/stats.py: Use -isBot:true All checks were successful continuous-integration/drone/push Build is passing Details Minor change to bot filtering. We should use a negated match for documents that have `isBot:true` rather than looking for documents that are tagged with `isBot:false` (the distinction is subtle, but important).	2020-12-20 16:56:03 +02:00
Alan Orth	8a1244d2d0	Update changelog and docs	2020-12-20 16:45:49 +02:00
Alan Orth	04f0756c7f	dspace_statistics_api/util.py: Add vim modeline	2020-12-20 16:31:52 +02:00
Alan Orth	830e4415f5	dspace_statistics_api/app.py: Run isort	2020-12-20 16:29:35 +02:00
Alan Orth	47b4eb3df7	Rename items.py to stats.py It is no longer used only for item-related statistics functions.	2020-12-20 16:28:56 +02:00
Alan Orth	3339bf8d9c	Add communities and collections support to API The basic logic is similar to items, where you can request single item statistics with a UUID, all item statistics, and item statis- tics for a list of items (optionally with a date range). Most of the item code was re-purposed to work on "elements", which can be items, communities, or collections depending on the request, with the use of Falcon's `before` hooks to set the statistics scope so we know how to behave for the current request. Other than the minor difference in facet fields, another issue I had with communities and collections is that the owningComm and owningColl fields are multi-valued (unlike items' id field). This means that, when you facet the results of your query, Solr returns ids that seem unrelated, but are actually present in the field, so I had to make sure I checked all returned ids to see if they were in the user's POSTed elements list. TODO: - Add tests - Revise docstrings - Refactor items.py as it is now generic	2020-12-20 16:14:46 +02:00
Alan Orth	20c8ba0cf8	indexer.py: Add support for communities and collections The logic to get views and downloads is very similar to that used for items, but we facet by different fields. This uses a generic function for indexing that takes an "indexType" and a "facetField" parameter. The indexType parameter controls which database table to insert into, and the facetField parameter indicates which field to facet by in Solr.	2020-12-18 22:53:16 +02:00
Alan Orth	b486f51dd7	indexer.py: Rename index functions for items Start making plans for indexing communities and collections.	2020-12-18 22:53:16 +02:00
Alan Orth	9e6fcf279b	dspace_statistics_api/items.py: Format with black	2020-12-18 22:45:39 +02:00
Alan Orth	4dbf734a4b	Move all imports to top of file A few months ago I had an issue setting up mocking because I was trying to be clever importing these libraries only when I needed them rather than at the global scope. Someone pointed out to me that if the imports are at the top of the file Falcon will load them once when the WSGI server starts, whereas if they are in the on_get() or on_post() they will load for every request! Also, it seems that PEP8 recommends keeping imports at the top of the file anyways, so I will just do that. Imports sorted with isort. See: https://www.python.org/dev/peps/pep-0008/#imports	2020-12-18 22:42:06 +02:00
Alan Orth	a0d0a47150	items.py: Add fl paramter to Solr queries I forgot to add the fl parameter here as well.	2020-12-18 16:12:34 +02:00
Alan Orth	4bbbaa4af3	dspace_statistics_api/indexer.py: Use `fl` parameter All checks were successful continuous-integration/drone/push Build is passing Details I forgot to add the fl parameter to the downloads function.	2020-12-18 10:44:02 +02:00
Alan Orth	2407aeec70	dspace_statistics_api/indexer.py: Use `fl` parameter When indexing item views and downloads the only field we need is the the id. The `fl` parameter tells Solr which fields to return in the search results. This should theoretically be more efficient, though I don't have any time to figure out how to measure it right now.	2020-12-17 12:25:28 +02:00
Alan Orth	4590fc8708	dspace_statistics_api/app.py: Use ORDER BY in /items Since we are paging through the results by limit/offset we need to be sure that we are returning results deterministically.	2020-12-17 10:10:40 +02:00
Alan Orth	930250352a	Update docs about POST /items	2020-12-13 20:09:20 +02:00
Alan Orth	3125e96a16	Bump version to 1.3.2	2020-11-18 22:01:18 +02:00
Alan Orth	88a8db6c78	Make sure limit is between 1 and 100 We were not properly checking whether the limit was actually less than or equal to 100.	2020-11-18 21:55:54 +02:00
Alan Orth	810508d038	dspace_statistics_api/indexer.py: Use -isBot:true Minor change to bot filtering. We should use a negated match for documents that have `isBot:true` rather than looking for documents that are tagged with `isBot:false` (the distinction is subtle, but important).	2020-11-17 17:40:08 +02:00
Alan Orth	2d6520fc97	Fix limit in docs	2020-11-02 22:14:08 +02:00
Alan Orth	ca1582a8b6	Make sure limit is between 1 and 100 We were not properly checking whether the limit was greater than 0 in all cases.	2020-11-02 21:59:20 +02:00
Alan Orth	549b8bf1a7	dspace_statistics_api/docs/index.html: Fix version We need to print it in the body, not the title.	2020-10-06 22:22:11 +03:00
Alan Orth	899a79b2e7	Version 1.3.1	2020-10-06 22:15:52 +03:00
Alan Orth	4e9064329d	Bump version to 1.3.0	2020-10-06 21:33:38 +03:00
Alan Orth	5acd927210	dspace_statistics_api: Sort imports with isort	2020-10-06 15:12:13 +03:00
Alan Orth	630fa0d5fb	dspace_statistics_api/util.py: Fix f-strings flake8 raised this warning: F541 f-string is missing placeholders	2020-10-06 15:11:12 +03:00
Alan Orth	58d2b8d4ed	dspace_statistics_api/items.py: Move util import Move util import from global scope because it causes tests to fail. We don't need the set up the Solr connection unless we're actually trying to use the get_views and get_downloads methods, either when running the API in production or during tests where the connection has been set up.	2020-10-06 15:07:00 +03:00
Alan Orth	d4518d62ad	dspace_statistics_api/app.py: Refactor for testability I thought it was clever to only import these in the on_post handler because they aren't needed elsewhere, but it turns out that this is not a common pattern and even causes problems with testability. First, if the imports are at the top of the file as PEP8 recommends, then the WSGI server will import them once when it loads the app and they remain in memory for the lifecycle of the app. If the imports are in the on_post handler they would be re-imported on every request! Second, this pattern of importing in a method makes it tricky to use object patching in mocks. See: https://www.python.org/dev/peps/pep-0008/#imports	2020-10-05 20:43:50 +03:00
Alan Orth	3a98de78e3	dspace_statistics_api/items.py: Remove executable bit We don't need to execute this on the command line.	2020-10-05 14:33:36 +03:00
Alan Orth	5a53b57b3b	Refactor `/items` POST handler to use a before hook This allows us to do the dirty work of parsing, validating, and setting local variables from the POST parameters outside of the on_post function. We then share the parameters via the req.context object. Functionally it is the same, but readability is better and it's a neat trick that I could use elsewhere. See: https://falcon.readthedocs.io/en/stable/user/faq.html#how-can-i-pass-data-from-a-hook-to-a-responder-and-between-hooks	2020-09-26 18:40:52 +03:00
Alan Orth	3ceb9a6eb0	dspace_statistics_api/items.py: Fix flake8 warning According to flake8 we need to use a different syntax for strings with backslash escape sequences: > As of Python 3.6, a backslash-character pair that is not a valid > escape sequence now generates a DeprecationWarning. This will > eventually become a SyntaxError. The warning was: W605 invalid escape sequence '\-' See: https://www.flake8rules.com/rules/W605.html	2020-09-26 12:22:06 +03:00
Alan Orth	946f0749e2	dspace_statistics_api/app.py: Use bounded_stream in on_post For reasons I don't quite understand, we need to use bounded_stream in the on_post request handler in order to use simulate_post() with the testing client in Falcon 2.0.0. Normal runtime operation via gunicorn does not have any issues with stream. See: https://github.com/falconry/falcon/issues/1720 See: https://github.com/falconry/falcon/issues/1554	2020-09-26 11:50:57 +03:00
Alan Orth	b06651d1ec	dspace_statistics_api/indexer.py: Fix Python comment	2020-09-25 13:35:05 +03:00
Alan Orth	a0ee181361	dspace_statistics_api/docs/index.html: Fix whitespace	2020-09-25 13:33:45 +03:00
Alan Orth	f58c209609	dspace_statistics_api/indexer.py: Update comment I don't remember why we needed the stats, but it seems that it was because without them there is no way to know how many results were returned and therefore no way to know how many pages we'll need to iterate over. Having the total number allows us to use a limit and and offset to page through them deterministically.	2020-09-25 13:25:34 +03:00

1 2

85 Commits