From 817f470888f47313edc6a50023053e1134439f1b Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Tue, 18 Sep 2018 01:16:21 +0300 Subject: [PATCH] Update notes for 2018-09-17 --- content/posts/2018-09.md | 40 +++++++++++++++++++++++++++++ docs/2018-09/index.html | 54 +++++++++++++++++++++++++++++++++++++--- docs/sitemap.xml | 10 ++++---- 3 files changed, 96 insertions(+), 8 deletions(-) diff --git a/content/posts/2018-09.md b/content/posts/2018-09.md index 5e5caeed1..b043492a8 100644 --- a/content/posts/2018-09.md +++ b/content/posts/2018-09.md @@ -294,5 +294,45 @@ https://cgspace.cgiar.org/rest/statlets?handle=10568/97103 - Check if it's possible to have items deposited via REST use a workflow so we can perhaps tell ICARDA to use that from MEL - Agree that we'll publicize AReS explorer on the week before the Big Data Platform workshop - Put a link and or picture on the CGSpace homepage saying "Visualized CGSpace research" or something, and post a message on Yammer +- I want to explore creating a thin API to make the item view and download stats available from Solr so CodeObia can use them in the AReS explorer +- Currently CodeObia is exploring using the Atmire statlets internal API, but I don't really like that... +- There are some example queries on the [DSpace Solr wiki](https://wiki.duraspace.org/display/DSPACE/Solr) +- For example, this query returns 1655 rows for item [10568/10630](https://cgspace.cgiar.org/handle/10568/10630): + +``` +$ http 'http://localhost:3000/solr/statistics/select?indent=on&rows=0&q=type:0+owningItem:11576&fq=isBot:false' +``` + +- The id in the Solr query is the item's database id (get it from the REST API or something) +- Next, I adopted a query to get the downloads and it shows 889, which is similar to the number Atmire's statlet shows, though the query logic here is confusing: + +``` +$ http 'http://localhost:3000/solr/statistics/select?indent=on&rows=0&q=type:0+owningItem:11576&fq=isBot:false&fq=-(bundleName:[*+TO+*]-bundleName:ORIGINAL)&fq=-(statistics_type:[*+TO+*]+-statistics_type:view)' +``` + +- According to the [SolrQuerySyntax](https://wiki.apache.org/solr/SolrQuerySyntax) page on the Apache wiki, the `[* TO *]` syntax just selects a range (in this case all values for a field) +- So it seems to be: + - `type:0` is for bitstreams according to the DSpace Solr documentation + - `-(bundleName:[*+TO+*]-bundleName:ORIGINAL)` seems to be a [negative query starting with all documents](https://wiki.apache.org/solr/NegativeQueryProblems), subtracting those with `bundleName:ORIGINAL`, and then negating the whole thing... meaning only documents from `bundleName:ORIGINAL`? +- What the shit, I think I'm right: the simplified logic in *this* query returns the same 889: + +``` +$ http 'http://localhost:3000/solr/statistics/select?indent=on&rows=0&q=type:0+owningItem:11576&fq=isBot:false&fq=bundleName:ORIGINAL&fq=-(statistics_type:[*+TO+*]+-statistics_type:view)' +``` + +- And if I simplify the `statistics_type` logic the same way, it still returns the same 889! + +``` +$ http 'http://localhost:3000/solr/statistics/select?indent=on&rows=0&q=type:0+owningItem:11576&fq=isBot:false&fq=bundleName:ORIGINAL&fq=statistics_type:view' +``` + +- As for item views, I suppose that's just the same query, minus the `bundleName:ORIGINAL`: + +``` +$ http 'http://localhost:3000/solr/statistics/select?indent=on&rows=0&q=type:0+owningItem:11576&fq=isBot:false&fq=-bundleName:ORIGINAL&fq=statistics_type:view' +``` + +- That one returns 766, which is exactly 1655 minus 889... +- Also, Solr's `fq` is similar to the regular `q` query parameter, but it is considered for the Solr query cache so it should be faster for multiple queries diff --git a/docs/2018-09/index.html b/docs/2018-09/index.html index a8b297cdf..4811fa239 100644 --- a/docs/2018-09/index.html +++ b/docs/2018-09/index.html @@ -18,7 +18,7 @@ I’m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I " /> - +
  • Put a link and or picture on the CGSpace homepage saying “Visualized CGSpace research” or something, and post a message on Yammer
  • +
  • I want to explore creating a thin API to make the item view and download stats available from Solr so CodeObia can use them in the AReS explorer
  • +
  • Currently CodeObia is exploring using the Atmire statlets internal API, but I don’t really like that…
  • +
  • There are some example queries on the DSpace Solr wiki
  • +
  • For example, this query returns 1655 rows for item 1056810630:
  • + + +
    $ http 'http://localhost:3000/solr/statistics/select?indent=on&rows=0&q=type:0+owningItem:11576&fq=isBot:false'
    +
    + + + +
    $ http 'http://localhost:3000/solr/statistics/select?indent=on&rows=0&q=type:0+owningItem:11576&fq=isBot:false&fq=-(bundleName:[*+TO+*]-bundleName:ORIGINAL)&fq=-(statistics_type:[*+TO+*]+-statistics_type:view)'
    +
    + + + +
    $ http 'http://localhost:3000/solr/statistics/select?indent=on&rows=0&q=type:0+owningItem:11576&fq=isBot:false&fq=bundleName:ORIGINAL&fq=-(statistics_type:[*+TO+*]+-statistics_type:view)'
    +
    + + + +
    $ http 'http://localhost:3000/solr/statistics/select?indent=on&rows=0&q=type:0+owningItem:11576&fq=isBot:false&fq=bundleName:ORIGINAL&fq=statistics_type:view'
    +
    + + + +
    $ http 'http://localhost:3000/solr/statistics/select?indent=on&rows=0&q=type:0+owningItem:11576&fq=isBot:false&fq=-bundleName:ORIGINAL&fq=statistics_type:view'
    +
    + + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index d5a1b7115..00d5dbda6 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,7 +4,7 @@ https://alanorth.github.io/cgspace-notes/2018-09/ - 2018-09-17T17:34:48+03:00 + 2018-09-17T19:53:08+03:00 @@ -184,7 +184,7 @@ https://alanorth.github.io/cgspace-notes/ - 2018-09-17T17:34:48+03:00 + 2018-09-17T19:53:08+03:00 0 @@ -195,7 +195,7 @@ https://alanorth.github.io/cgspace-notes/tags/notes/ - 2018-09-17T17:34:48+03:00 + 2018-09-17T19:53:08+03:00 0 @@ -207,13 +207,13 @@ https://alanorth.github.io/cgspace-notes/posts/ - 2018-09-17T17:34:48+03:00 + 2018-09-17T19:53:08+03:00 0 https://alanorth.github.io/cgspace-notes/tags/ - 2018-09-17T17:34:48+03:00 + 2018-09-17T19:53:08+03:00 0