Commit Graph

91 Commits

Author SHA1 Message Date
Alan Orth 5a53b57b3b
Refactor `/items` POST handler to use a before hook
This allows us to do the dirty work of parsing, validating, and
setting local variables from the POST parameters outside of the
on_post function. We then share the parameters via the req.context
object. Functionally it is the same, but readability is better
and it's a neat trick that I could use elsewhere.

See: https://falcon.readthedocs.io/en/stable/user/faq.html#how-can-i-pass-data-from-a-hook-to-a-responder-and-between-hooks
2020-09-26 18:40:52 +03:00
Alan Orth 3ceb9a6eb0
dspace_statistics_api/items.py: Fix flake8 warning
According to flake8 we need to use a different syntax for strings
with backslash escape sequences:

> As of Python 3.6, a backslash-character pair that is not a valid
> escape sequence now generates a DeprecationWarning. This will
> eventually become a SyntaxError.

The warning was:

    W605 invalid escape sequence '\-'

See: https://www.flake8rules.com/rules/W605.html
2020-09-26 12:22:06 +03:00
Alan Orth 946f0749e2
dspace_statistics_api/app.py: Use bounded_stream in on_post
For reasons I don't quite understand, we need to use bounded_stream
in the on_post request handler in order to use simulate_post() with
the testing client in Falcon 2.0.0. Normal runtime operation via
gunicorn does not have any issues with stream.

See: https://github.com/falconry/falcon/issues/1720
See: https://github.com/falconry/falcon/issues/1554
2020-09-26 11:50:57 +03:00
Alan Orth b06651d1ec
dspace_statistics_api/indexer.py: Fix Python comment 2020-09-25 13:35:05 +03:00
Alan Orth a0ee181361
dspace_statistics_api/docs/index.html: Fix whitespace 2020-09-25 13:33:45 +03:00
Alan Orth f58c209609
dspace_statistics_api/indexer.py: Update comment
I don't remember why we needed the stats, but it seems that it was
because without them there is no way to know how many results were
returned and therefore no way to know how many pages we'll need to
iterate over. Having the total number allows us to use a limit and
and offset to page through them deterministically.
2020-09-25 13:25:34 +03:00
Alan Orth 2201d3df4e
dspace_statistics_api/docs/index.html: Minor HTML syntax issue 2020-09-25 12:55:39 +03:00
Alan Orth 2c0436f845
Update API docs HTML for /items POST functionality 2020-09-25 12:53:30 +03:00
Alan Orth f1e939481b
dspace_statistics_api/items.py: Remove shebang
This was originally a standalone script I was testing interactively.
2020-09-25 12:39:00 +03:00
Alan Orth 4d8026a3d0
Add missing dspace_statistics_api/items.py
This was meant to be added with the new /items POST changes.
2020-09-25 12:30:06 +03:00
Alan Orth 73c71fa8a0
dspace_statistics_api: Add support for date ranges to /items
You can now POST a JSON request to /items with a list of items and
a date range. This allows the possibility to get view and download
statistics for arbitrary items and arbitrary date ranges.

The JSON request should be in the following format:

    {
        "limit": 100,
        "page": 0,
        "dateFrom": "2020-01-01T00:00:00Z",
        "dateTo": "2020-09-09T00:00:00Z",
        "items": [
            "f44cf173-2344-4eb2-8f00-ee55df32c76f",
            "2324aa41-e9de-4a2b-bc36-16241464683e",
            "8542f9da-9ce1-4614-abf4-f2e3fdb4b305",
            "0fe573e7-042a-4240-a4d9-753b61233908"
        ]
    }

The limit, page, and date parameters are all optional. By default
it will use a limit of 100, page 0, and [* TO *] Solr date range.
2020-09-25 12:21:11 +03:00
Alan Orth 21b500b4f7
dspace_statistics_api/util.py: Use docstring for get_statistics_shards
It seems better to use a docstring instead of a comment because it
can potentially be used by IDEs or documentation generators.
2020-09-24 12:07:31 +03:00
Alan Orth 495386856b
Refactor indexer
Move the get_statistics_shards() method to a utility module so it
can be used by other things.
2020-09-24 12:03:12 +03:00
Alan Orth 8e87f80e9a
dspace_statistics_api/indexer.py: Remove duplicate solr_url variable
This is declared twice and it never changes.
2020-09-24 11:54:31 +03:00
Alan Orth 6ff95bb5f2
dspace_statistics_api/indexer.py: Remove SolrClient reference
We stopped using SolrClient in favor of vanilla requests.
2020-09-24 11:30:31 +03:00
Alan Orth 998e833470 dspace_statistics_api/docs/index.html: Adjust help text 2020-03-02 14:30:16 +02:00
Alan Orth 0ef071a91d dspace_statistics_api: Use f-strings instead of format()
We had previously been avoiding the f-strings because we needed to
run on Python 3.5 and they were only available in Python 3.6+, but
now the black formatter requires Python 3.6 and all our systems are
running Python 3.6+ anyways.
2020-03-02 11:24:29 +02:00
Alan Orth 9e7dd28156 dspace_statistics_api/app.py: Use parameterized SQL queries
This is a better way to run SQL queries because psycopg2 takes care
of the quoting for us.
2020-03-02 11:16:05 +02:00
Alan Orth 5955868b9a dspace_statistics_api/app.py: Use UUID
DSpace 6+ uses a UUID for item identifiers instead of an integer so
we need to adapt our PostgreSQL queries to use those. Note that we
can no longer sort results in the "all items" endpoint by ID. Also,
we need to use parameterized psycopg2 queries instead of strings to
support queries with UUIDs properly. To use the Python UUID objects
elsewhere in the code we need to make sure that we cast them to str.
2020-03-02 11:06:48 +02:00
Alan Orth 250fd8164f dspace_statistics_api/indexer.py: Use UUID
DSpace 6+ uses a UUID for item identifiers instead of an integer so
we need to update the PostgreSQL schema accordingly. Solr still re-
fers to them as "id" in its schema so we don't need to change anyt-
hing there.
2020-03-01 21:22:10 +02:00
Alan Orth cb3c3d37fa
Sort imports with isort 2019-11-27 12:31:04 +02:00
Alan Orth 4ff1fd4a22
Format code with black 2019-11-27 12:30:06 +02:00
Alan Orth eeb8e6bba1
dspace_statistics_api/indexer.py: Fix minor issues raised by flake8 2019-11-27 12:12:05 +02:00
Alan Orth 5a3b392a1d dspace_statistics_api/app.py: Fix Falcon 2.0 syntax
See: dspace_statistics_api/app.py
2019-04-18 09:57:18 +03:00
Alan Orth 2acd08e0ab
Use one-based paging in indexer output
It is easier for humans to understand one-based paging output like
"page 1 of 3" than "page 0 of 2" in the indexer.
2019-04-15 10:25:54 +03:00
Alan Orth 8f46ceb8d8
Refactor to use vanilla requests library
The SolrClient library is unmaintained, which is starting to cause
problems due to the moving Python ecosystem. Switching to requests
does not change my code in any meaningful way and makes maintenance
easier.
2019-04-15 10:19:50 +03:00
Alan Orth 043d897cef
dspace_statistics_api/indexer.py: Catch case of no views/downloads
Don't fail with an exception when there are no views or downloads,
for example on a new DSpace installation.
2019-01-22 09:00:22 +02:00
Alan Orth 40e284dac0
dspace_statistics_api/indexer.py: Query multiple shards
DSpace's stats-util script splits the Solr statistics core into yearly
shards. We need to use Solr's `shards` query parameter in order to get
the statistics for previous years. This commit adds a helper function
to enumerate the active Solr cores to find yearly shards matching the
statistics-YYYY pattern and add them to the query.
2019-01-22 08:39:36 +02:00
Alan Orth d5d2d2149b
dspace_statistics_api/database.py: Raise HTTP 500 on error
Properly except on database connection error and raise an HTTP 500
instead of spamming the console/log with twenty lines of text.
2018-11-10 23:58:58 +02:00
Alan Orth f6e866a589
dspace_statistics_api/indexer.py: Remove debug code 2018-11-07 17:51:24 +02:00
Alan Orth 2f342be948
Refactor database code to use a context manager
Instead of opening one global persistent database connection when
the application I am now abstracting it to a class that I can use
in combination with Python's "with" context. Both connections and
cursors are kept for the context of each "with" block and closed
automatically when exiting.

See: https://alysivji.github.io/managing-resources-with-context-managers-pythonic.html
See: http://initd.org/psycopg/docs/connection.html#connection.close
2018-11-07 17:41:21 +02:00
Alan Orth cc5ce3ab98 Correct issues highlighted by Flake8
Flake8 validates code style against PEP 8 in order to encourage the
writing of idiomatic Python. For reference, I am currently ignoring
errors about line length (E501) because I feel it makes code harder
to read.

This is the invocation I am using:

    $ flake8 --ignore E501 dspace_statistics_api
2018-11-04 00:04:27 +02:00
Alan Orth 70dfcb93c5
dspace_statistics_api/database.py: Don't quote host in connect() 2018-11-03 22:43:05 +02:00
Alan Orth 5f3bd61998
Allow configuration of PostgreSQL port
Defaults to port 5432, but can be overridden with DATABASE_PORT.
2018-11-03 22:40:45 +02:00
Alan Orth eb08832bf8
Sync API documentation HTML with README.md 2018-11-01 00:37:52 +02:00
Alan Orth 30dc7f1939
Add basic API documentation on root (/)
I had imagined plugging in an interactive Swagger or OpenAPI instance
here, but that's actually much more involved in Falcon than I want to
deal with right now.
2018-11-01 00:19:39 +02:00
Alan Orth 2f45d27554 dspace_statistics_api/app.py: remove unused code
This was added accidentally when I refactored. I was trying to see
if I could use Falcon's on_exit() hook.
2018-10-28 11:14:21 +02:00
Alan Orth b8356f7a87 Add "application" alias to API object
By default gunicorn looks for an "application" object to run, so this
saves us having to type api:app.
2018-10-28 11:14:21 +02:00
Alan Orth 2136dc79ce Remove shebang from indexer.py
This is run as a Python module now so does not need a shebang.
2018-10-28 11:14:21 +02:00
Alan Orth ed60120cef Remove executable bit from indexer.py
Now it is run as a Python module.
2018-10-28 11:14:21 +02:00
Alan Orth c027f01b48 Refactor project structure
This follows guidance from several well-known Python best practices
guides. Basically, the idea is create a package for the application
that is comprised of several re-usable modules.

See: https://docs.python-guide.org/writing/structure/
See: https://realpython.com/python-application-layouts/
2018-10-28 11:14:21 +02:00