dspace-statistics-api

mirror of https://github.com/ilri/dspace-statistics-api.git synced 2025-09-12 06:37:03 +02:00

Author	SHA1	Message	Date
Alan Orth	6e4bc630f7	database.py: Use psycopg2.extras.DictCursor This allows us to access records using their column name. I didn't notice that this was not working, as I had been testing the wrong server! See: http://initd.org/psycopg/docs/extras.html	2018-09-25 02:06:29 +03:00
Alan Orth	44884140e5	CHANGELOG.md: Add new unreleased changes	2018-09-25 01:11:37 +03:00
Alan Orth	74ff86ee3b	contrib: Update environment settings in system units	2018-09-25 01:10:14 +03:00
Alan Orth	3327884f21	Update docs to remove SQLite stuff I've decided to use PostgreSQL instead of SQLite because the UPSERT support is available in versions of PostgreSQL we're alread running, whereas SQLite needs a VERY new (3.24.0) version that is not avail- able on any recent long-term support Ubuntu releases. v0.2.0	2018-09-25 00:56:01 +03:00
Alan Orth	8f7450f67a	Use PostgreSQL instead of SQLite I was very surprised how easy and fast and robust SQLite was, but in the end I realized that its UPSERT support only came in version 3.24 and both Ubuntu 16.04 and 18.04 have older versions than that! I did manage to install libsqlite3-0 from Ubuntu 18.04 cosmic on my xenial host, but that feels dirty. PostgreSQL has support for UPSERT since 9.5, not to mention the same nice LIMIT and OFFSET clauses.	2018-09-25 00:49:47 +03:00
Alan Orth	28d61fb041	README.md: Add notes about Python and SQLite versions	2018-09-24 17:26:48 +03:00
Alan Orth	cbc98991b4	CHANGELOG.md: Move unreleased notes to version 0.1.0 v0.1.0	2018-09-24 16:14:14 +03:00
Alan Orth	6c28be0463	README.md: Add note about route for all items	2018-09-24 16:13:26 +03:00
Alan Orth	42e8f17305	CHANGELOG.md: Add note about route for all items	2018-09-24 16:13:05 +03:00
Alan Orth	19a45f3f6f	app.py: Add route to page through all item statistics This route exposes all item statistics and uses the limit and offset parameters to control paging throug the result set. The logic here is extremely easy thanks to the brilliant LIMIT and OFFSET features of SQLite (of course the SQL query sorts the results by some unique field to ensure the order is already the same).	2018-09-24 16:07:26 +03:00
Alan Orth	505ef31101	CHANGELOG.md: Add note about UPSERT	2018-09-24 14:31:05 +03:00
Alan Orth	1543cacc54	app.py: Update SQL logic to use single table The indexer.py script was updated to use a single table because I learned about UPSERT. This simplifies the database schema and the Python logic, and makes it easier to page all views and downloads at once without complicated JOIN queries.	2018-09-24 14:28:00 +03:00
Alan Orth	2cab456f16	indexer.py: Use single items table with UPSERT I was using two separate tables for item views and downloads without realizing that SQLite didn't support FULL OUTER JOIN, which would be needed to get views and downloads for a given item in a single query. Instead I can use one table with a default value of 0 for both views and downloads, and then use "UPSERT" to populate the statistics. This is a newish SQL concept that allows you to attempt an INSERT and then specify an action to perform in case of conflict. This works well in SQLite and actually simplifies my Python logic greatly! Note that the "excluded" table qualifier is a special keyword that allows you to reference the value that would have been inserted. See: https://www.sqlite.org/lang_UPSERT.html	2018-09-24 14:19:50 +03:00
Alan Orth	53615dea2d	indexer.py: Add license and documentation	2018-09-24 09:18:50 +03:00
Alan Orth	2d8d1e6833	README.md: Add TODO for nonexistent items	2018-09-24 00:48:02 +03:00
Alan Orth	e26e595ea1	README.md: Add more TODOs	2018-09-24 00:35:00 +03:00
Alan Orth	a9151b5bbf	CHANGELOG.md: Update unreleased notes	2018-09-24 00:30:58 +03:00
Alan Orth	76833d6f5f	contrib: Update some old CGSpace references to DSpace	2018-09-24 00:30:26 +03:00
Alan Orth	a51422273c	Remove SOLR_CORE configuration variable This parameter is not customizable. All DSpace instances use this name for the Solr statistics core.	2018-09-24 00:20:54 +03:00
Alan Orth	89621af85d	Split database access into RW and RO The indexer need to be able to write to the database, but the API only needs to read it.	2018-09-24 00:00:05 +03:00
Alan Orth	c554404d7f	CHANGELOG.md: Add systemd units for indexer	2018-09-23 23:15:27 +03:00
Alan Orth	90d7a452bd	contrib: Add systemd units for indexer An example systemd service unit for the indexer and an accompanying timer unit.	2018-09-23 23:13:43 +03:00
Alan Orth	431a1c9d64	CHANGELOG.md: Add unreleased changes	2018-09-23 23:04:01 +03:00
Alan Orth	e1b9d1284f	Rename project to DSpace Statistics API At first I called it "CGSpace" because I was making it specifically for our CGSpace DSpace repository, but the potential here is bigger than that!	2018-09-23 23:02:21 +03:00
Alan Orth	bac764a0a4	CHANGELOG.md: Move entries to version 0.0.4 v0.0.4	2018-09-23 16:49:25 +03:00
Alan Orth	1a650e57c0	CHANGELOG.md: Update unreleased features	2018-09-23 16:48:39 +03:00
Alan Orth	2db5e02be9	Add indexer.py Standalone script to ingest item views and downloads from Solr into SQLite.	2018-09-23 16:47:48 +03:00
Alan Orth	9e942736b1	app.py: Get item statistics from SQLite database It is much more efficient to cache view and download statistics in a database than to query Solr on demand (not to mention that it is not possible to page easily with facets in Solr). I decided to use SQLite because it is fast, native in Python 3, and doesn't require any extra steps during provisioning (assuming permissions are ok).	2018-09-23 16:47:00 +03:00
Alan Orth	ea85393b13	app.py: Use parameterized URI instead of query for /item Falcon's get_param_as_int() is really nice in that it gets a query parameter and does validation for you, but I really wanted to have cleaner URIs for API routes so I am now using a route URI template with a field converter. This is cleaner, but means that parameters not matching the template will return HTTP 404. See: https://falcon.readthedocs.io/en/stable/api/routing.html#field-converters	2018-09-23 16:23:33 +03:00
Alan Orth	cbeb7c89a7	CHANGELOG.md: Add note about Solr connection refactor	2018-09-23 13:27:43 +03:00
Alan Orth	b0d81a543c	Refactor Solr components This makes it so we only need to define and connect once and then we can re-use the connection everywhere else.	2018-09-23 13:24:30 +03:00
Alan Orth	84801a4ab5	Add vim modeline to all Python files Uses four spaces for tab and shift widths, and turns on expansion of tabs to spaces.	2018-09-23 11:33:26 +03:00
Alan Orth	4e8621e3d9	README.md: Add TODO about API documentation	2018-09-23 09:52:36 +03:00
Alan Orth	2c8430171d	CHANGELOG.md: Add note about systemd unit file	2018-09-23 07:58:15 +03:00
Alan Orth	fb60133713	Add example systemd unit for statistics API	2018-09-23 07:50:04 +03:00
Alan Orth	9e01a80011	CHANGELOG.md: Move changes to version 0.0.3 v0.0.3	2018-09-20 17:41:47 +03:00
Alan Orth	a263996582	app.py: Fix Solr queries for item views According to dspace-api's Constants.java, items are type 2 and they use a unique ID field of `id` instead of `owningItem`. There is no need to check the bundleName for item types. Also, I decided to use the main Solr query for item IDs because the filter query parameter (fq) stores results in the filterCache and can be quite expensive with cores storing tens of millions of docu- ments (we currently have 149 million docs!). It makes sense to use the filter query parameter to reduce the result set returned by the main Solr query.	2018-09-20 17:37:13 +03:00
Alan Orth	ed9d25294e	app.py: Use SolrClient's rows parameter Instead of putting this in the raw query we can just use SolrClient's native rows parameter.	2018-09-19 12:48:28 +03:00
Alan Orth	5e165d2e88	CHANGELOG.md: Add note about using rows=0 in Solr queries	2018-09-19 01:50:14 +03:00
Alan Orth	8e29fd8a43	app.py: Use rows=0 for Solr queries There is no need to return any rows of the result because I am only interested in the numFound.	2018-09-19 01:48:35 +03:00
Alan Orth	24af83b03f	CHANGELOG.md: Add note about simplified Solr query	2018-09-19 00:30:28 +03:00
Alan Orth	a87aaba812	app.py: Simplify Solr query for bitstream downloads This whole business with negative query ranges is confusing as hell and I'll definitely forget it in the future. In DSpace's Solr term- inology a "download" is a view to some bitstream that lives in the ORIGINAL bundle. This is where bitstreams that are uploaded during the item submission process go, versus generated thumbnails, etc.	2018-09-19 00:24:23 +03:00
Alan Orth	57faec59c8	CHANGELOG.md: Add note about config refactor	2018-09-18 17:01:24 +03:00
Alan Orth	06ab254017	Refactor configuration into separate module There is a good example of this in the Project Weekend GitHub profile. See: https://github.com/projectweekend/Falcon-PostgreSQL-API-Seed	2018-09-18 16:59:28 +03:00
Alan Orth	5b5cab8b34	README.md: Update todo	2018-09-18 15:59:27 +03:00
Alan Orth	40ce3c72a9	CHANGELOG.md: Update for version 0.0.2 v0.0.2	2018-09-18 15:36:56 +03:00
Alan Orth	ea2283355b	Add CHANGELOG.md See: https://keepachangelog.com/en/1.0.0/	2018-09-18 15:35:42 +03:00
Alan Orth	4b4a959a1c	Add ability to get Solr parameters from environment You can use the SOLR_SERVER and SOLR_CORE variables to make deployment via systemd, etc easier.	2018-09-18 15:34:25 +03:00
Alan Orth	1e16beed30	README.md: Add todo list	2018-09-18 14:19:14 +03:00
Alan Orth	182e13efca	Add GPLv3 license	2018-09-18 14:16:07 +03:00

1 2 3 4

154 Commits