1
0
mirror of https://github.com/ilri/dspace-statistics-api.git synced 2025-05-10 15:16:02 +02:00

Compare commits

...

10 Commits

Author SHA1 Message Date
9e01a80011 CHANGELOG.md: Move changes to version 0.0.3 2018-09-20 17:41:47 +03:00
a263996582 app.py: Fix Solr queries for item views
According to dspace-api's Constants.java, items are type 2 and they
use a unique ID field of `id` instead of `owningItem`. There is no
need to check the bundleName for item types.

Also, I decided to use the main Solr query for item IDs because the
filter query parameter (fq) stores results in the filterCache and
can be quite expensive with cores storing tens of millions of docu-
ments (we currently have 149 million docs!). It makes sense to use
the filter query parameter to reduce the result set returned by the
main Solr query.
2018-09-20 17:37:13 +03:00
ed9d25294e app.py: Use SolrClient's rows parameter
Instead of putting this in the raw query we can just use SolrClient's
native rows parameter.
2018-09-19 12:48:28 +03:00
5e165d2e88 CHANGELOG.md: Add note about using rows=0 in Solr queries 2018-09-19 01:50:14 +03:00
8e29fd8a43 app.py: Use rows=0 for Solr queries
There is no need to return any rows of the result because I am only
interested in the numFound.
2018-09-19 01:48:35 +03:00
24af83b03f CHANGELOG.md: Add note about simplified Solr query 2018-09-19 00:30:28 +03:00
a87aaba812 app.py: Simplify Solr query for bitstream downloads
This whole business with negative query ranges is confusing as hell
and I'll definitely forget it in the future. In DSpace's Solr term-
inology a "download" is a view to some bitstream that lives in the
ORIGINAL bundle. This is where bitstreams that are uploaded during
the item submission process go, versus generated thumbnails, etc.
2018-09-19 00:24:23 +03:00
57faec59c8 CHANGELOG.md: Add note about config refactor 2018-09-18 17:01:24 +03:00
06ab254017 Refactor configuration into separate module
There is a good example of this in the Project Weekend GitHub profile.

See: https://github.com/projectweekend/Falcon-PostgreSQL-API-Seed
2018-09-18 16:59:28 +03:00
5b5cab8b34 README.md: Update todo 2018-09-18 15:59:27 +03:00
4 changed files with 24 additions and 14 deletions

View File

@ -4,6 +4,13 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [0.0.3] - 2018-09-20
### Changed
- Refactor environment variables into config module
- Simplify Solr query for "downloads"
- Optimize Solr query by using rows=0
- Fix Solr queries for item views
## [0.0.2] - 2018-09-18
### Added
- Ability to get Solr parameters from environment (`SOLR_SERVER` and `SOLR_CORE`)

View File

@ -13,7 +13,7 @@ Create a virtual environment and run it:
## Todo
- Take a list of items (POST in JSON?)
- Ability to return a paginated list of items (on a different route?)
## License
This work is licensed under the [GPLv3](https://www.gnu.org/licenses/gpl-3.0.en.html).

24
app.py
View File

@ -2,13 +2,11 @@
# See DSpace Solr docs for tips about parameters
# https://wiki.duraspace.org/display/DSPACE/Solr
from config import SOLR_SERVER
from config import SOLR_CORE
import falcon
import os
from SolrClient import SolrClient
# Check if Solr connection information was provided in the environment
solr_server = os.environ.get('SOLR_SERVER', 'http://localhost:8080/solr')
solr_core = os.environ.get('SOLR_CORE', 'statistics')
class ItemResource:
def on_get(self, req, resp):
@ -16,21 +14,21 @@ class ItemResource:
# Return HTTPBadRequest if id parameter is not present and valid
item_id = req.get_param_as_int("id", required=True, min=0)
solr = SolrClient(solr_server)
solr = SolrClient(SOLR_SERVER)
# Get views
res = solr.query(solr_core, {
'q':'type:0',
'fq':'owningItem:{0} AND isBot:false AND statistics_type:view AND -bundleName:ORIGINAL'.format(item_id)
})
res = solr.query(SOLR_CORE, {
'q':'type:2 AND id:{0}'.format(item_id),
'fq':'isBot:false AND statistics_type:view'
}, rows=0)
views = res.get_num_found()
# Get downloads
res = solr.query(solr_core, {
'q':'type:0',
'fq':'owningItem:{0} AND isBot:false AND statistics_type:view AND -(bundleName:[* TO *] -bundleName:ORIGINAL)'.format(item_id)
})
res = solr.query(SOLR_CORE, {
'q':'type:0 AND owningItem:{0}'.format(item_id),
'fq':'isBot:false AND statistics_type:view AND bundleName:ORIGINAL'
}, rows=0)
downloads = res.get_num_found()

5
config.py Normal file
View File

@ -0,0 +1,5 @@
import os
# Check if Solr connection information was provided in the environment
SOLR_SERVER = os.environ.get('SOLR_SERVER', 'http://localhost:8080/solr')
SOLR_CORE = os.environ.get('SOLR_CORE', 'statistics')