2021-03-22 12:42:42 +01:00
|
|
|
# SPDX-License-Identifier: GPL-3.0-only
|
|
|
|
|
2020-09-25 11:29:51 +02:00
|
|
|
import requests
|
|
|
|
|
|
|
|
from .config import SOLR_SERVER
|
2020-12-18 21:42:06 +01:00
|
|
|
from .util import get_statistics_shards
|
2020-09-25 11:29:51 +02:00
|
|
|
|
|
|
|
|
Add communities and collections support to API
The basic logic is similar to items, where you can request single
item statistics with a UUID, all item statistics, and item statis-
tics for a list of items (optionally with a date range). Most of
the item code was re-purposed to work on "elements", which can be
items, communities, or collections depending on the request, with
the use of Falcon's `before` hooks to set the statistics scope so
we know how to behave for the current request.
Other than the minor difference in facet fields, another issue I
had with communities and collections is that the owningComm and
owningColl fields are multi-valued (unlike items' id field). This
means that, when you facet the results of your query, Solr returns
ids that seem unrelated, but are actually present in the field, so
I had to make sure I checked all returned ids to see if they were
in the user's POSTed elements list.
TODO:
- Add tests
- Revise docstrings
- Refactor items.py as it is now generic
2020-12-20 15:14:46 +01:00
|
|
|
def get_views(solr_date_string: str, elements: list, facetField: str):
|
2020-09-25 11:29:51 +02:00
|
|
|
"""
|
Add communities and collections support to API
The basic logic is similar to items, where you can request single
item statistics with a UUID, all item statistics, and item statis-
tics for a list of items (optionally with a date range). Most of
the item code was re-purposed to work on "elements", which can be
items, communities, or collections depending on the request, with
the use of Falcon's `before` hooks to set the statistics scope so
we know how to behave for the current request.
Other than the minor difference in facet fields, another issue I
had with communities and collections is that the owningComm and
owningColl fields are multi-valued (unlike items' id field). This
means that, when you facet the results of your query, Solr returns
ids that seem unrelated, but are actually present in the field, so
I had to make sure I checked all returned ids to see if they were
in the user's POSTed elements list.
TODO:
- Add tests
- Revise docstrings
- Refactor items.py as it is now generic
2020-12-20 15:14:46 +01:00
|
|
|
Get view statistics for a list of elements from Solr. Depending on the req-
|
|
|
|
uest this could be items, communities, or collections.
|
2020-09-25 11:29:51 +02:00
|
|
|
|
|
|
|
:parameter solr_date_string (str): Solr date string, for example "[* TO *]"
|
Add communities and collections support to API
The basic logic is similar to items, where you can request single
item statistics with a UUID, all item statistics, and item statis-
tics for a list of items (optionally with a date range). Most of
the item code was re-purposed to work on "elements", which can be
items, communities, or collections depending on the request, with
the use of Falcon's `before` hooks to set the statistics scope so
we know how to behave for the current request.
Other than the minor difference in facet fields, another issue I
had with communities and collections is that the owningComm and
owningColl fields are multi-valued (unlike items' id field). This
means that, when you facet the results of your query, Solr returns
ids that seem unrelated, but are actually present in the field, so
I had to make sure I checked all returned ids to see if they were
in the user's POSTed elements list.
TODO:
- Add tests
- Revise docstrings
- Refactor items.py as it is now generic
2020-12-20 15:14:46 +01:00
|
|
|
:parameter elements (list): a list of IDs
|
|
|
|
:parameter facetField (str): Solr field to facet by, for example "id"
|
|
|
|
:returns: A dict of IDs and views
|
2020-09-25 11:29:51 +02:00
|
|
|
"""
|
2020-10-06 14:07:00 +02:00
|
|
|
shards = get_statistics_shards()
|
|
|
|
|
2020-09-25 11:29:51 +02:00
|
|
|
# Join the UUIDs with "OR" and escape the hyphens for Solr
|
Add communities and collections support to API
The basic logic is similar to items, where you can request single
item statistics with a UUID, all item statistics, and item statis-
tics for a list of items (optionally with a date range). Most of
the item code was re-purposed to work on "elements", which can be
items, communities, or collections depending on the request, with
the use of Falcon's `before` hooks to set the statistics scope so
we know how to behave for the current request.
Other than the minor difference in facet fields, another issue I
had with communities and collections is that the owningComm and
owningColl fields are multi-valued (unlike items' id field). This
means that, when you facet the results of your query, Solr returns
ids that seem unrelated, but are actually present in the field, so
I had to make sure I checked all returned ids to see if they were
in the user's POSTed elements list.
TODO:
- Add tests
- Revise docstrings
- Refactor items.py as it is now generic
2020-12-20 15:14:46 +01:00
|
|
|
solr_elements_string: str = " OR ".join(elements).replace("-", r"\-")
|
2020-09-25 11:29:51 +02:00
|
|
|
|
|
|
|
solr_query_params = {
|
Add communities and collections support to API
The basic logic is similar to items, where you can request single
item statistics with a UUID, all item statistics, and item statis-
tics for a list of items (optionally with a date range). Most of
the item code was re-purposed to work on "elements", which can be
items, communities, or collections depending on the request, with
the use of Falcon's `before` hooks to set the statistics scope so
we know how to behave for the current request.
Other than the minor difference in facet fields, another issue I
had with communities and collections is that the owningComm and
owningColl fields are multi-valued (unlike items' id field). This
means that, when you facet the results of your query, Solr returns
ids that seem unrelated, but are actually present in the field, so
I had to make sure I checked all returned ids to see if they were
in the user's POSTed elements list.
TODO:
- Add tests
- Revise docstrings
- Refactor items.py as it is now generic
2020-12-20 15:14:46 +01:00
|
|
|
"q": f"{facetField}:({solr_elements_string})",
|
2020-12-20 15:56:03 +01:00
|
|
|
"fq": f"type:2 AND -isBot:true AND statistics_type:view AND time:{solr_date_string}",
|
Add communities and collections support to API
The basic logic is similar to items, where you can request single
item statistics with a UUID, all item statistics, and item statis-
tics for a list of items (optionally with a date range). Most of
the item code was re-purposed to work on "elements", which can be
items, communities, or collections depending on the request, with
the use of Falcon's `before` hooks to set the statistics scope so
we know how to behave for the current request.
Other than the minor difference in facet fields, another issue I
had with communities and collections is that the owningComm and
owningColl fields are multi-valued (unlike items' id field). This
means that, when you facet the results of your query, Solr returns
ids that seem unrelated, but are actually present in the field, so
I had to make sure I checked all returned ids to see if they were
in the user's POSTed elements list.
TODO:
- Add tests
- Revise docstrings
- Refactor items.py as it is now generic
2020-12-20 15:14:46 +01:00
|
|
|
"fl": facetField,
|
2020-09-25 11:29:51 +02:00
|
|
|
"facet": "true",
|
Add communities and collections support to API
The basic logic is similar to items, where you can request single
item statistics with a UUID, all item statistics, and item statis-
tics for a list of items (optionally with a date range). Most of
the item code was re-purposed to work on "elements", which can be
items, communities, or collections depending on the request, with
the use of Falcon's `before` hooks to set the statistics scope so
we know how to behave for the current request.
Other than the minor difference in facet fields, another issue I
had with communities and collections is that the owningComm and
owningColl fields are multi-valued (unlike items' id field). This
means that, when you facet the results of your query, Solr returns
ids that seem unrelated, but are actually present in the field, so
I had to make sure I checked all returned ids to see if they were
in the user's POSTed elements list.
TODO:
- Add tests
- Revise docstrings
- Refactor items.py as it is now generic
2020-12-20 15:14:46 +01:00
|
|
|
"facet.field": facetField,
|
2020-09-25 11:29:51 +02:00
|
|
|
"facet.mincount": 1,
|
|
|
|
"shards": shards,
|
|
|
|
"rows": 0,
|
|
|
|
"wt": "json",
|
|
|
|
"json.nl": "map", # return facets as a dict instead of a flat list
|
|
|
|
}
|
|
|
|
|
|
|
|
solr_url = SOLR_SERVER + "/statistics/select"
|
|
|
|
res = requests.get(solr_url, params=solr_query_params)
|
|
|
|
|
|
|
|
# Create an empty dict to store views
|
|
|
|
data = {}
|
|
|
|
|
|
|
|
# Solr returns facets as a dict of dicts (see the json.nl parameter)
|
|
|
|
views = res.json()["facet_counts"]["facet_fields"]
|
Add communities and collections support to API
The basic logic is similar to items, where you can request single
item statistics with a UUID, all item statistics, and item statis-
tics for a list of items (optionally with a date range). Most of
the item code was re-purposed to work on "elements", which can be
items, communities, or collections depending on the request, with
the use of Falcon's `before` hooks to set the statistics scope so
we know how to behave for the current request.
Other than the minor difference in facet fields, another issue I
had with communities and collections is that the owningComm and
owningColl fields are multi-valued (unlike items' id field). This
means that, when you facet the results of your query, Solr returns
ids that seem unrelated, but are actually present in the field, so
I had to make sure I checked all returned ids to see if they were
in the user's POSTed elements list.
TODO:
- Add tests
- Revise docstrings
- Refactor items.py as it is now generic
2020-12-20 15:14:46 +01:00
|
|
|
# iterate over the facetField dict and ids and views
|
|
|
|
for id_, views in views[facetField].items():
|
|
|
|
# For items we can rely on Solr returning facets for the *only* the ids
|
|
|
|
# in our query, but for communities and collections, the owningComm and
|
|
|
|
# owningColl fields are multi-value so Solr will return facets with the
|
|
|
|
# values in our query as well as *any others* that happen to be present
|
|
|
|
# in the field (which looks like Solr returning unrelated results until
|
|
|
|
# you realize that the field is multi-value and this is correct).
|
|
|
|
#
|
|
|
|
# To work around this I make sure that each id in the returned dict are
|
|
|
|
# present in the elements list POSTed by the user.
|
|
|
|
if id_ in elements:
|
|
|
|
data[id_] = views
|
|
|
|
|
|
|
|
# Check if any ids have missing stats so we can set them to 0
|
|
|
|
if len(data) < len(elements):
|
|
|
|
# List comprehension to get a list of ids (keys) in the data
|
2020-09-25 11:29:51 +02:00
|
|
|
data_ids = [k for k, v in data.items()]
|
Add communities and collections support to API
The basic logic is similar to items, where you can request single
item statistics with a UUID, all item statistics, and item statis-
tics for a list of items (optionally with a date range). Most of
the item code was re-purposed to work on "elements", which can be
items, communities, or collections depending on the request, with
the use of Falcon's `before` hooks to set the statistics scope so
we know how to behave for the current request.
Other than the minor difference in facet fields, another issue I
had with communities and collections is that the owningComm and
owningColl fields are multi-valued (unlike items' id field). This
means that, when you facet the results of your query, Solr returns
ids that seem unrelated, but are actually present in the field, so
I had to make sure I checked all returned ids to see if they were
in the user's POSTed elements list.
TODO:
- Add tests
- Revise docstrings
- Refactor items.py as it is now generic
2020-12-20 15:14:46 +01:00
|
|
|
for element_id in elements:
|
|
|
|
if element_id not in data_ids:
|
|
|
|
data[element_id] = 0
|
2020-09-25 11:29:51 +02:00
|
|
|
continue
|
|
|
|
|
|
|
|
return data
|
|
|
|
|
|
|
|
|
Add communities and collections support to API
The basic logic is similar to items, where you can request single
item statistics with a UUID, all item statistics, and item statis-
tics for a list of items (optionally with a date range). Most of
the item code was re-purposed to work on "elements", which can be
items, communities, or collections depending on the request, with
the use of Falcon's `before` hooks to set the statistics scope so
we know how to behave for the current request.
Other than the minor difference in facet fields, another issue I
had with communities and collections is that the owningComm and
owningColl fields are multi-valued (unlike items' id field). This
means that, when you facet the results of your query, Solr returns
ids that seem unrelated, but are actually present in the field, so
I had to make sure I checked all returned ids to see if they were
in the user's POSTed elements list.
TODO:
- Add tests
- Revise docstrings
- Refactor items.py as it is now generic
2020-12-20 15:14:46 +01:00
|
|
|
def get_downloads(solr_date_string: str, elements: list, facetField: str):
|
2020-09-25 11:29:51 +02:00
|
|
|
"""
|
Add communities and collections support to API
The basic logic is similar to items, where you can request single
item statistics with a UUID, all item statistics, and item statis-
tics for a list of items (optionally with a date range). Most of
the item code was re-purposed to work on "elements", which can be
items, communities, or collections depending on the request, with
the use of Falcon's `before` hooks to set the statistics scope so
we know how to behave for the current request.
Other than the minor difference in facet fields, another issue I
had with communities and collections is that the owningComm and
owningColl fields are multi-valued (unlike items' id field). This
means that, when you facet the results of your query, Solr returns
ids that seem unrelated, but are actually present in the field, so
I had to make sure I checked all returned ids to see if they were
in the user's POSTed elements list.
TODO:
- Add tests
- Revise docstrings
- Refactor items.py as it is now generic
2020-12-20 15:14:46 +01:00
|
|
|
Get download statistics for a list of items from Solr. Depending on the req-
|
|
|
|
uest this could be items, communities, or collections.
|
2020-09-25 11:29:51 +02:00
|
|
|
|
|
|
|
:parameter solr_date_string (str): Solr date string, for example "[* TO *]"
|
Add communities and collections support to API
The basic logic is similar to items, where you can request single
item statistics with a UUID, all item statistics, and item statis-
tics for a list of items (optionally with a date range). Most of
the item code was re-purposed to work on "elements", which can be
items, communities, or collections depending on the request, with
the use of Falcon's `before` hooks to set the statistics scope so
we know how to behave for the current request.
Other than the minor difference in facet fields, another issue I
had with communities and collections is that the owningComm and
owningColl fields are multi-valued (unlike items' id field). This
means that, when you facet the results of your query, Solr returns
ids that seem unrelated, but are actually present in the field, so
I had to make sure I checked all returned ids to see if they were
in the user's POSTed elements list.
TODO:
- Add tests
- Revise docstrings
- Refactor items.py as it is now generic
2020-12-20 15:14:46 +01:00
|
|
|
:parameter elements (list): a list of IDs
|
|
|
|
:parameter facetField (str): Solr field to facet by, for example "id"
|
|
|
|
:returns: A dict of IDs and downloads
|
2020-09-25 11:29:51 +02:00
|
|
|
"""
|
2020-10-06 14:07:00 +02:00
|
|
|
shards = get_statistics_shards()
|
|
|
|
|
2020-09-25 11:29:51 +02:00
|
|
|
# Join the UUIDs with "OR" and escape the hyphens for Solr
|
Add communities and collections support to API
The basic logic is similar to items, where you can request single
item statistics with a UUID, all item statistics, and item statis-
tics for a list of items (optionally with a date range). Most of
the item code was re-purposed to work on "elements", which can be
items, communities, or collections depending on the request, with
the use of Falcon's `before` hooks to set the statistics scope so
we know how to behave for the current request.
Other than the minor difference in facet fields, another issue I
had with communities and collections is that the owningComm and
owningColl fields are multi-valued (unlike items' id field). This
means that, when you facet the results of your query, Solr returns
ids that seem unrelated, but are actually present in the field, so
I had to make sure I checked all returned ids to see if they were
in the user's POSTed elements list.
TODO:
- Add tests
- Revise docstrings
- Refactor items.py as it is now generic
2020-12-20 15:14:46 +01:00
|
|
|
solr_elements_string: str = " OR ".join(elements).replace("-", r"\-")
|
2020-09-25 11:29:51 +02:00
|
|
|
|
|
|
|
solr_query_params = {
|
Add communities and collections support to API
The basic logic is similar to items, where you can request single
item statistics with a UUID, all item statistics, and item statis-
tics for a list of items (optionally with a date range). Most of
the item code was re-purposed to work on "elements", which can be
items, communities, or collections depending on the request, with
the use of Falcon's `before` hooks to set the statistics scope so
we know how to behave for the current request.
Other than the minor difference in facet fields, another issue I
had with communities and collections is that the owningComm and
owningColl fields are multi-valued (unlike items' id field). This
means that, when you facet the results of your query, Solr returns
ids that seem unrelated, but are actually present in the field, so
I had to make sure I checked all returned ids to see if they were
in the user's POSTed elements list.
TODO:
- Add tests
- Revise docstrings
- Refactor items.py as it is now generic
2020-12-20 15:14:46 +01:00
|
|
|
"q": f"{facetField}:({solr_elements_string})",
|
2020-12-20 15:56:03 +01:00
|
|
|
"fq": f"type:0 AND -isBot:true AND statistics_type:view AND bundleName:ORIGINAL AND time:{solr_date_string}",
|
Add communities and collections support to API
The basic logic is similar to items, where you can request single
item statistics with a UUID, all item statistics, and item statis-
tics for a list of items (optionally with a date range). Most of
the item code was re-purposed to work on "elements", which can be
items, communities, or collections depending on the request, with
the use of Falcon's `before` hooks to set the statistics scope so
we know how to behave for the current request.
Other than the minor difference in facet fields, another issue I
had with communities and collections is that the owningComm and
owningColl fields are multi-valued (unlike items' id field). This
means that, when you facet the results of your query, Solr returns
ids that seem unrelated, but are actually present in the field, so
I had to make sure I checked all returned ids to see if they were
in the user's POSTed elements list.
TODO:
- Add tests
- Revise docstrings
- Refactor items.py as it is now generic
2020-12-20 15:14:46 +01:00
|
|
|
"fl": facetField,
|
2020-09-25 11:29:51 +02:00
|
|
|
"facet": "true",
|
Add communities and collections support to API
The basic logic is similar to items, where you can request single
item statistics with a UUID, all item statistics, and item statis-
tics for a list of items (optionally with a date range). Most of
the item code was re-purposed to work on "elements", which can be
items, communities, or collections depending on the request, with
the use of Falcon's `before` hooks to set the statistics scope so
we know how to behave for the current request.
Other than the minor difference in facet fields, another issue I
had with communities and collections is that the owningComm and
owningColl fields are multi-valued (unlike items' id field). This
means that, when you facet the results of your query, Solr returns
ids that seem unrelated, but are actually present in the field, so
I had to make sure I checked all returned ids to see if they were
in the user's POSTed elements list.
TODO:
- Add tests
- Revise docstrings
- Refactor items.py as it is now generic
2020-12-20 15:14:46 +01:00
|
|
|
"facet.field": facetField,
|
2020-09-25 11:29:51 +02:00
|
|
|
"facet.mincount": 1,
|
|
|
|
"shards": shards,
|
|
|
|
"rows": 0,
|
|
|
|
"wt": "json",
|
|
|
|
"json.nl": "map", # return facets as a dict instead of a flat list
|
|
|
|
}
|
|
|
|
|
|
|
|
solr_url = SOLR_SERVER + "/statistics/select"
|
|
|
|
res = requests.get(solr_url, params=solr_query_params)
|
|
|
|
|
|
|
|
# Create an empty dict to store downloads
|
|
|
|
data = {}
|
|
|
|
|
|
|
|
# Solr returns facets as a dict of dicts (see the json.nl parameter)
|
|
|
|
downloads = res.json()["facet_counts"]["facet_fields"]
|
Add communities and collections support to API
The basic logic is similar to items, where you can request single
item statistics with a UUID, all item statistics, and item statis-
tics for a list of items (optionally with a date range). Most of
the item code was re-purposed to work on "elements", which can be
items, communities, or collections depending on the request, with
the use of Falcon's `before` hooks to set the statistics scope so
we know how to behave for the current request.
Other than the minor difference in facet fields, another issue I
had with communities and collections is that the owningComm and
owningColl fields are multi-valued (unlike items' id field). This
means that, when you facet the results of your query, Solr returns
ids that seem unrelated, but are actually present in the field, so
I had to make sure I checked all returned ids to see if they were
in the user's POSTed elements list.
TODO:
- Add tests
- Revise docstrings
- Refactor items.py as it is now generic
2020-12-20 15:14:46 +01:00
|
|
|
# Iterate over the facetField dict and get the ids and downloads
|
|
|
|
for id_, downloads in downloads[facetField].items():
|
|
|
|
# Make sure that each id in the returned dict are present in the
|
|
|
|
# elements list POSTed by the user.
|
|
|
|
if id_ in elements:
|
|
|
|
data[id_] = downloads
|
|
|
|
|
|
|
|
# Check if any elements have missing stats so we can set them to 0
|
|
|
|
if len(data) < len(elements):
|
|
|
|
# List comprehension to get a list of ids (keys) in the data
|
2020-09-25 11:29:51 +02:00
|
|
|
data_ids = [k for k, v in data.items()]
|
Add communities and collections support to API
The basic logic is similar to items, where you can request single
item statistics with a UUID, all item statistics, and item statis-
tics for a list of items (optionally with a date range). Most of
the item code was re-purposed to work on "elements", which can be
items, communities, or collections depending on the request, with
the use of Falcon's `before` hooks to set the statistics scope so
we know how to behave for the current request.
Other than the minor difference in facet fields, another issue I
had with communities and collections is that the owningComm and
owningColl fields are multi-valued (unlike items' id field). This
means that, when you facet the results of your query, Solr returns
ids that seem unrelated, but are actually present in the field, so
I had to make sure I checked all returned ids to see if they were
in the user's POSTed elements list.
TODO:
- Add tests
- Revise docstrings
- Refactor items.py as it is now generic
2020-12-20 15:14:46 +01:00
|
|
|
for element_id in elements:
|
|
|
|
if element_id not in data_ids:
|
|
|
|
data[element_id] = 0
|
2020-09-25 11:29:51 +02:00
|
|
|
continue
|
|
|
|
|
|
|
|
return data
|
|
|
|
|
2020-12-18 21:45:39 +01:00
|
|
|
|
2020-09-25 11:29:51 +02:00
|
|
|
# vim: set sw=4 ts=4 expandtab:
|