dspace-statistics-api

mirror of https://github.com/ilri/dspace-statistics-api.git synced 2024-11-29 01:18:19 +01:00

Author	SHA1	Message	Date
Alan Orth	963aa245c8	app.py: Don't initialize Solr connection We only need Solr in the indexing component, not for the API itself.	2018-10-24 11:59:50 +03:00
Alan Orth	eaca5354d3	app.py: Iterate directly on cursor We don't need to create an intermediate variable for the results of the SQL query because psycopg2's cursor is iterable. See: http://initd.org/psycopg/docs/cursor.html	2018-09-27 11:03:44 +03:00
Alan Orth	2850035a4c	Return HTTP 404 when an item id is not found	2018-09-25 13:12:53 +03:00
Alan Orth	4cf8656b35	Change / route to /items I think it's more obvious if the "all items" route is plural. Also, this will allow me to eventually put documentation at the root.	2018-09-25 11:34:07 +03:00
Alan Orth	3160c44566	app.py: Remove comment This comment was added when I first began the application and the testing status is documented in the README now.	2018-09-25 02:20:51 +03:00
Alan Orth	4b72f626d9	Update string substitution format Instead of doing numbered strings I will just depend on the order, at least to be consistent.	2018-09-25 02:19:29 +03:00
Alan Orth	8f7450f67a	Use PostgreSQL instead of SQLite I was very surprised how easy and fast and robust SQLite was, but in the end I realized that its UPSERT support only came in version 3.24 and both Ubuntu 16.04 and 18.04 have older versions than that! I did manage to install libsqlite3-0 from Ubuntu 18.04 cosmic on my xenial host, but that feels dirty. PostgreSQL has support for UPSERT since 9.5, not to mention the same nice LIMIT and OFFSET clauses.	2018-09-25 00:49:47 +03:00
Alan Orth	19a45f3f6f	app.py: Add route to page through all item statistics This route exposes all item statistics and uses the limit and offset parameters to control paging throug the result set. The logic here is extremely easy thanks to the brilliant LIMIT and OFFSET features of SQLite (of course the SQL query sorts the results by some unique field to ensure the order is already the same).	2018-09-24 16:07:26 +03:00
Alan Orth	1543cacc54	app.py: Update SQL logic to use single table The indexer.py script was updated to use a single table because I learned about UPSERT. This simplifies the database schema and the Python logic, and makes it easier to page all views and downloads at once without complicated JOIN queries.	2018-09-24 14:28:00 +03:00
Alan Orth	a51422273c	Remove SOLR_CORE configuration variable This parameter is not customizable. All DSpace instances use this name for the Solr statistics core.	2018-09-24 00:20:54 +03:00
Alan Orth	89621af85d	Split database access into RW and RO The indexer need to be able to write to the database, but the API only needs to read it.	2018-09-24 00:00:05 +03:00
Alan Orth	9e942736b1	app.py: Get item statistics from SQLite database It is much more efficient to cache view and download statistics in a database than to query Solr on demand (not to mention that it is not possible to page easily with facets in Solr). I decided to use SQLite because it is fast, native in Python 3, and doesn't require any extra steps during provisioning (assuming permissions are ok).	2018-09-23 16:47:00 +03:00
Alan Orth	ea85393b13	app.py: Use parameterized URI instead of query for /item Falcon's get_param_as_int() is really nice in that it gets a query parameter and does validation for you, but I really wanted to have cleaner URIs for API routes so I am now using a route URI template with a field converter. This is cleaner, but means that parameters not matching the template will return HTTP 404. See: https://falcon.readthedocs.io/en/stable/api/routing.html#field-converters	2018-09-23 16:23:33 +03:00
Alan Orth	b0d81a543c	Refactor Solr components This makes it so we only need to define and connect once and then we can re-use the connection everywhere else.	2018-09-23 13:24:30 +03:00
Alan Orth	84801a4ab5	Add vim modeline to all Python files Uses four spaces for tab and shift widths, and turns on expansion of tabs to spaces.	2018-09-23 11:33:26 +03:00
Alan Orth	a263996582	app.py: Fix Solr queries for item views According to dspace-api's Constants.java, items are type 2 and they use a unique ID field of `id` instead of `owningItem`. There is no need to check the bundleName for item types. Also, I decided to use the main Solr query for item IDs because the filter query parameter (fq) stores results in the filterCache and can be quite expensive with cores storing tens of millions of docu- ments (we currently have 149 million docs!). It makes sense to use the filter query parameter to reduce the result set returned by the main Solr query.	2018-09-20 17:37:13 +03:00
Alan Orth	ed9d25294e	app.py: Use SolrClient's rows parameter Instead of putting this in the raw query we can just use SolrClient's native rows parameter.	2018-09-19 12:48:28 +03:00
Alan Orth	8e29fd8a43	app.py: Use rows=0 for Solr queries There is no need to return any rows of the result because I am only interested in the numFound.	2018-09-19 01:48:35 +03:00
Alan Orth	a87aaba812	app.py: Simplify Solr query for bitstream downloads This whole business with negative query ranges is confusing as hell and I'll definitely forget it in the future. In DSpace's Solr term- inology a "download" is a view to some bitstream that lives in the ORIGINAL bundle. This is where bitstreams that are uploaded during the item submission process go, versus generated thumbnails, etc.	2018-09-19 00:24:23 +03:00
Alan Orth	06ab254017	Refactor configuration into separate module There is a good example of this in the Project Weekend GitHub profile. See: https://github.com/projectweekend/Falcon-PostgreSQL-API-Seed	2018-09-18 16:59:28 +03:00
Alan Orth	4b4a959a1c	Add ability to get Solr parameters from environment You can use the SOLR_SERVER and SOLR_CORE variables to make deployment via systemd, etc easier.	2018-09-18 15:34:25 +03:00
Alan Orth	36633e405a	Initial commit Add first working version of the statistics API.	2018-09-18 14:03:15 +03:00

22 Commits