1
0
mirror of https://github.com/ilri/dspace-statistics-api.git synced 2024-06-29 09:33:47 +02:00
Commit Graph

69 Commits

Author SHA1 Message Date
ab82e90773
dspace_statistics_api/stats.py: Use -isBot:true
All checks were successful
continuous-integration/drone/push Build is passing
Minor change to bot filtering. We should use a negated match for
documents that have `isBot:true` rather than looking for documents
that are tagged with `isBot:false` (the distinction is subtle, but
important).
2020-12-20 16:56:03 +02:00
8a1244d2d0
Update changelog and docs 2020-12-20 16:45:49 +02:00
04f0756c7f
dspace_statistics_api/util.py: Add vim modeline 2020-12-20 16:31:52 +02:00
830e4415f5
dspace_statistics_api/app.py: Run isort 2020-12-20 16:29:35 +02:00
47b4eb3df7
Rename items.py to stats.py
It is no longer used only for item-related statistics functions.
2020-12-20 16:28:56 +02:00
3339bf8d9c
Add communities and collections support to API
The basic logic is similar to items, where you can request single
item statistics with a UUID, all item statistics, and item statis-
tics for a list of items (optionally with a date range). Most of
the item code was re-purposed to work on "elements", which can be
items, communities, or collections depending on the request, with
the use of Falcon's `before` hooks to set the statistics scope so
we know how to behave for the current request.

Other than the minor difference in facet fields, another issue I
had with communities and collections is that the owningComm and
owningColl fields are multi-valued (unlike items' id field). This
means that, when you facet the results of your query, Solr returns
ids that seem unrelated, but are actually present in the field, so
I had to make sure I checked all returned ids to see if they were
in the user's POSTed elements list.

TODO:
  - Add tests
  - Revise docstrings
  - Refactor items.py as it is now generic
2020-12-20 16:14:46 +02:00
20c8ba0cf8 indexer.py: Add support for communities and collections
The logic to get views and downloads is very similar to that used
for items, but we facet by different fields. This uses a generic
function for indexing that takes an "indexType" and a "facetField"
parameter. The indexType parameter controls which database table
to insert into, and the facetField parameter indicates which field
to facet by in Solr.
2020-12-18 22:53:16 +02:00
b486f51dd7 indexer.py: Rename index functions for items
Start making plans for indexing communities and collections.
2020-12-18 22:53:16 +02:00
9e6fcf279b
dspace_statistics_api/items.py: Format with black 2020-12-18 22:45:39 +02:00
4dbf734a4b
Move all imports to top of file
A few months ago I had an issue setting up mocking because I was
trying to be clever importing these libraries only when I needed
them rather than at the global scope. Someone pointed out to me
that if the imports are at the top of the file Falcon will load
them once when the WSGI server starts, whereas if they are in the
on_get() or on_post() they will load for every request! Also, it
seems that PEP8 recommends keeping imports at the top of the file
anyways, so I will just do that.

Imports sorted with isort.

See: https://www.python.org/dev/peps/pep-0008/#imports
2020-12-18 22:42:06 +02:00
a0d0a47150
items.py: Add fl paramter to Solr queries
I forgot to add the fl parameter here as well.
2020-12-18 16:12:34 +02:00
4bbbaa4af3
dspace_statistics_api/indexer.py: Use fl parameter
All checks were successful
continuous-integration/drone/push Build is passing
I forgot to add the fl parameter to the downloads function.
2020-12-18 10:44:02 +02:00
2407aeec70
dspace_statistics_api/indexer.py: Use fl parameter
When indexing item views and downloads the only field we need is the
the id. The `fl` parameter tells Solr which fields to return in the
search results. This should theoretically be more efficient, though
I don't have any time to figure out how to measure it right now.
2020-12-17 12:25:28 +02:00
4590fc8708
dspace_statistics_api/app.py: Use ORDER BY in /items
Since we are paging through the results by limit/offset we need to
be sure that we are returning results deterministically.
2020-12-17 10:10:40 +02:00
930250352a
Update docs about POST /items 2020-12-13 20:09:20 +02:00
3125e96a16
Bump version to 1.3.2 2020-11-18 22:01:18 +02:00
88a8db6c78
Make sure limit is between 1 and 100
We were not properly checking whether the limit was actually less
than or equal to 100.
2020-11-18 21:55:54 +02:00
810508d038
dspace_statistics_api/indexer.py: Use -isBot:true
Minor change to bot filtering. We should use a negated match for
documents that have `isBot:true` rather than looking for documents
that are tagged with `isBot:false` (the distinction is subtle, but
important).
2020-11-17 17:40:08 +02:00
2d6520fc97
Fix limit in docs 2020-11-02 22:14:08 +02:00
ca1582a8b6
Make sure limit is between 1 and 100
We were not properly checking whether the limit was greater than 0
in all cases.
2020-11-02 21:59:20 +02:00
549b8bf1a7
dspace_statistics_api/docs/index.html: Fix version
We need to print it in the body, not the title.
2020-10-06 22:22:11 +03:00
899a79b2e7
Version 1.3.1 2020-10-06 22:15:52 +03:00
4e9064329d
Bump version to 1.3.0 2020-10-06 21:33:38 +03:00
5acd927210
dspace_statistics_api: Sort imports with isort 2020-10-06 15:12:13 +03:00
630fa0d5fb
dspace_statistics_api/util.py: Fix f-strings
flake8 raised this warning:

    F541 f-string is missing placeholders
2020-10-06 15:11:12 +03:00
58d2b8d4ed
dspace_statistics_api/items.py: Move util import
Move util import from global scope because it causes tests to fail.
We don't need the set up the Solr connection unless we're actually
trying to use the get_views and get_downloads methods, either when
running the API in production or during tests where the connection
has been set up.
2020-10-06 15:07:00 +03:00
d4518d62ad
dspace_statistics_api/app.py: Refactor for testability
I thought it was clever to only import these in the on_post handler
because they aren't needed elsewhere, but it turns out that this is
not a common pattern and even causes problems with testability.

First, if the imports are at the top of the file as PEP8 recommends,
then the WSGI server will import them once when it loads the app and
they remain in memory for the lifecycle of the app. If the imports
are in the on_post handler they would be re-imported on every request!

Second, this pattern of importing in a method makes it tricky to use
object patching in mocks.

See: https://www.python.org/dev/peps/pep-0008/#imports
2020-10-05 20:43:50 +03:00
3a98de78e3
dspace_statistics_api/items.py: Remove executable bit
We don't need to execute this on the command line.
2020-10-05 14:33:36 +03:00
5a53b57b3b
Refactor /items POST handler to use a before hook
This allows us to do the dirty work of parsing, validating, and
setting local variables from the POST parameters outside of the
on_post function. We then share the parameters via the req.context
object. Functionally it is the same, but readability is better
and it's a neat trick that I could use elsewhere.

See: https://falcon.readthedocs.io/en/stable/user/faq.html#how-can-i-pass-data-from-a-hook-to-a-responder-and-between-hooks
2020-09-26 18:40:52 +03:00
3ceb9a6eb0
dspace_statistics_api/items.py: Fix flake8 warning
According to flake8 we need to use a different syntax for strings
with backslash escape sequences:

> As of Python 3.6, a backslash-character pair that is not a valid
> escape sequence now generates a DeprecationWarning. This will
> eventually become a SyntaxError.

The warning was:

    W605 invalid escape sequence '\-'

See: https://www.flake8rules.com/rules/W605.html
2020-09-26 12:22:06 +03:00
946f0749e2
dspace_statistics_api/app.py: Use bounded_stream in on_post
For reasons I don't quite understand, we need to use bounded_stream
in the on_post request handler in order to use simulate_post() with
the testing client in Falcon 2.0.0. Normal runtime operation via
gunicorn does not have any issues with stream.

See: https://github.com/falconry/falcon/issues/1720
See: https://github.com/falconry/falcon/issues/1554
2020-09-26 11:50:57 +03:00
b06651d1ec
dspace_statistics_api/indexer.py: Fix Python comment 2020-09-25 13:35:05 +03:00
a0ee181361
dspace_statistics_api/docs/index.html: Fix whitespace 2020-09-25 13:33:45 +03:00
f58c209609
dspace_statistics_api/indexer.py: Update comment
I don't remember why we needed the stats, but it seems that it was
because without them there is no way to know how many results were
returned and therefore no way to know how many pages we'll need to
iterate over. Having the total number allows us to use a limit and
and offset to page through them deterministically.
2020-09-25 13:25:34 +03:00
2201d3df4e
dspace_statistics_api/docs/index.html: Minor HTML syntax issue 2020-09-25 12:55:39 +03:00
2c0436f845
Update API docs HTML for /items POST functionality 2020-09-25 12:53:30 +03:00
f1e939481b
dspace_statistics_api/items.py: Remove shebang
This was originally a standalone script I was testing interactively.
2020-09-25 12:39:00 +03:00
4d8026a3d0
Add missing dspace_statistics_api/items.py
This was meant to be added with the new /items POST changes.
2020-09-25 12:30:06 +03:00
73c71fa8a0
dspace_statistics_api: Add support for date ranges to /items
You can now POST a JSON request to /items with a list of items and
a date range. This allows the possibility to get view and download
statistics for arbitrary items and arbitrary date ranges.

The JSON request should be in the following format:

    {
        "limit": 100,
        "page": 0,
        "dateFrom": "2020-01-01T00:00:00Z",
        "dateTo": "2020-09-09T00:00:00Z",
        "items": [
            "f44cf173-2344-4eb2-8f00-ee55df32c76f",
            "2324aa41-e9de-4a2b-bc36-16241464683e",
            "8542f9da-9ce1-4614-abf4-f2e3fdb4b305",
            "0fe573e7-042a-4240-a4d9-753b61233908"
        ]
    }

The limit, page, and date parameters are all optional. By default
it will use a limit of 100, page 0, and [* TO *] Solr date range.
2020-09-25 12:21:11 +03:00
21b500b4f7
dspace_statistics_api/util.py: Use docstring for get_statistics_shards
It seems better to use a docstring instead of a comment because it
can potentially be used by IDEs or documentation generators.
2020-09-24 12:07:31 +03:00
495386856b
Refactor indexer
Move the get_statistics_shards() method to a utility module so it
can be used by other things.
2020-09-24 12:03:12 +03:00
8e87f80e9a
dspace_statistics_api/indexer.py: Remove duplicate solr_url variable
This is declared twice and it never changes.
2020-09-24 11:54:31 +03:00
6ff95bb5f2
dspace_statistics_api/indexer.py: Remove SolrClient reference
We stopped using SolrClient in favor of vanilla requests.
2020-09-24 11:30:31 +03:00
998e833470 dspace_statistics_api/docs/index.html: Adjust help text 2020-03-02 14:30:16 +02:00
0ef071a91d dspace_statistics_api: Use f-strings instead of format()
We had previously been avoiding the f-strings because we needed to
run on Python 3.5 and they were only available in Python 3.6+, but
now the black formatter requires Python 3.6 and all our systems are
running Python 3.6+ anyways.
2020-03-02 11:24:29 +02:00
9e7dd28156 dspace_statistics_api/app.py: Use parameterized SQL queries
This is a better way to run SQL queries because psycopg2 takes care
of the quoting for us.
2020-03-02 11:16:05 +02:00
5955868b9a dspace_statistics_api/app.py: Use UUID
DSpace 6+ uses a UUID for item identifiers instead of an integer so
we need to adapt our PostgreSQL queries to use those. Note that we
can no longer sort results in the "all items" endpoint by ID. Also,
we need to use parameterized psycopg2 queries instead of strings to
support queries with UUIDs properly. To use the Python UUID objects
elsewhere in the code we need to make sure that we cast them to str.
2020-03-02 11:06:48 +02:00
250fd8164f dspace_statistics_api/indexer.py: Use UUID
DSpace 6+ uses a UUID for item identifiers instead of an integer so
we need to update the PostgreSQL schema accordingly. Solr still re-
fers to them as "id" in its schema so we don't need to change anyt-
hing there.
2020-03-01 21:22:10 +02:00
cb3c3d37fa
Sort imports with isort 2019-11-27 12:31:04 +02:00
4ff1fd4a22
Format code with black 2019-11-27 12:30:06 +02:00