Update notes for 2018-09-17

This commit is contained in:
Alan Orth 2018-09-18 01:16:21 +03:00
parent 4cfa9aa101
commit 817f470888
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
3 changed files with 96 additions and 8 deletions

View File

@ -294,5 +294,45 @@ https://cgspace.cgiar.org/rest/statlets?handle=10568/97103
- Check if it's possible to have items deposited via REST use a workflow so we can perhaps tell ICARDA to use that from MEL - Check if it's possible to have items deposited via REST use a workflow so we can perhaps tell ICARDA to use that from MEL
- Agree that we'll publicize AReS explorer on the week before the Big Data Platform workshop - Agree that we'll publicize AReS explorer on the week before the Big Data Platform workshop
- Put a link and or picture on the CGSpace homepage saying "Visualized CGSpace research" or something, and post a message on Yammer - Put a link and or picture on the CGSpace homepage saying "Visualized CGSpace research" or something, and post a message on Yammer
- I want to explore creating a thin API to make the item view and download stats available from Solr so CodeObia can use them in the AReS explorer
- Currently CodeObia is exploring using the Atmire statlets internal API, but I don't really like that...
- There are some example queries on the [DSpace Solr wiki](https://wiki.duraspace.org/display/DSPACE/Solr)
- For example, this query returns 1655 rows for item [10568/10630](https://cgspace.cgiar.org/handle/10568/10630):
```
$ http 'http://localhost:3000/solr/statistics/select?indent=on&rows=0&q=type:0+owningItem:11576&fq=isBot:false'
```
- The id in the Solr query is the item's database id (get it from the REST API or something)
- Next, I adopted a query to get the downloads and it shows 889, which is similar to the number Atmire's statlet shows, though the query logic here is confusing:
```
$ http 'http://localhost:3000/solr/statistics/select?indent=on&rows=0&q=type:0+owningItem:11576&fq=isBot:false&fq=-(bundleName:[*+TO+*]-bundleName:ORIGINAL)&fq=-(statistics_type:[*+TO+*]+-statistics_type:view)'
```
- According to the [SolrQuerySyntax](https://wiki.apache.org/solr/SolrQuerySyntax) page on the Apache wiki, the `[* TO *]` syntax just selects a range (in this case all values for a field)
- So it seems to be:
- `type:0` is for bitstreams according to the DSpace Solr documentation
- `-(bundleName:[*+TO+*]-bundleName:ORIGINAL)` seems to be a [negative query starting with all documents](https://wiki.apache.org/solr/NegativeQueryProblems), subtracting those with `bundleName:ORIGINAL`, and then negating the whole thing... meaning only documents from `bundleName:ORIGINAL`?
- What the shit, I think I'm right: the simplified logic in *this* query returns the same 889:
```
$ http 'http://localhost:3000/solr/statistics/select?indent=on&rows=0&q=type:0+owningItem:11576&fq=isBot:false&fq=bundleName:ORIGINAL&fq=-(statistics_type:[*+TO+*]+-statistics_type:view)'
```
- And if I simplify the `statistics_type` logic the same way, it still returns the same 889!
```
$ http 'http://localhost:3000/solr/statistics/select?indent=on&rows=0&q=type:0+owningItem:11576&fq=isBot:false&fq=bundleName:ORIGINAL&fq=statistics_type:view'
```
- As for item views, I suppose that's just the same query, minus the `bundleName:ORIGINAL`:
```
$ http 'http://localhost:3000/solr/statistics/select?indent=on&rows=0&q=type:0+owningItem:11576&fq=isBot:false&fq=-bundleName:ORIGINAL&fq=statistics_type:view'
```
- That one returns 766, which is exactly 1655 minus 889...
- Also, Solr's `fq` is similar to the regular `q` query parameter, but it is considered for the Solr query cache so it should be faster for multiple queries
<!-- vim: set sw=2 ts=2: --> <!-- vim: set sw=2 ts=2: -->

View File

@ -18,7 +18,7 @@ I&rsquo;m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-09/" /><meta property="article:published_time" content="2018-09-02T09:55:54&#43;03:00"/> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-09/" /><meta property="article:published_time" content="2018-09-02T09:55:54&#43;03:00"/>
<meta property="article:modified_time" content="2018-09-17T17:34:48&#43;03:00"/> <meta property="article:modified_time" content="2018-09-17T19:53:08&#43;03:00"/>
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="September, 2018"/> <meta name="twitter:title" content="September, 2018"/>
<meta name="twitter:description" content="2018-09-02 <meta name="twitter:description" content="2018-09-02
@ -41,9 +41,9 @@ I&rsquo;m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I
"@type": "BlogPosting", "@type": "BlogPosting",
"headline": "September, 2018", "headline": "September, 2018",
"url": "https://alanorth.github.io/cgspace-notes/2018-09/", "url": "https://alanorth.github.io/cgspace-notes/2018-09/",
"wordCount": "2107", "wordCount": "2386",
"datePublished": "2018-09-02T09:55:54&#43;03:00", "datePublished": "2018-09-02T09:55:54&#43;03:00",
"dateModified": "2018-09-17T17:34:48&#43;03:00", "dateModified": "2018-09-17T19:53:08&#43;03:00",
"author": { "author": {
"@type": "Person", "@type": "Person",
"name": "Alan Orth" "name": "Alan Orth"
@ -440,6 +440,54 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=50.116.102.77' dspace.log.2018-09-
<ul> <ul>
<li>Put a link and or picture on the CGSpace homepage saying &ldquo;Visualized CGSpace research&rdquo; or something, and post a message on Yammer</li> <li>Put a link and or picture on the CGSpace homepage saying &ldquo;Visualized CGSpace research&rdquo; or something, and post a message on Yammer</li>
</ul></li> </ul></li>
<li>I want to explore creating a thin API to make the item view and download stats available from Solr so CodeObia can use them in the AReS explorer</li>
<li>Currently CodeObia is exploring using the Atmire statlets internal API, but I don&rsquo;t really like that&hellip;</li>
<li>There are some example queries on the <a href="https://wiki.duraspace.org/display/DSPACE/Solr">DSpace Solr wiki</a></li>
<li>For example, this query returns 1655 rows for item <a href="https://cgspace.cgiar.org/handle/10568/10630"><sup>10568</sup>&frasl;<sub>10630</sub></a>:</li>
</ul>
<pre><code>$ http 'http://localhost:3000/solr/statistics/select?indent=on&amp;rows=0&amp;q=type:0+owningItem:11576&amp;fq=isBot:false'
</code></pre>
<ul>
<li>The id in the Solr query is the item&rsquo;s database id (get it from the REST API or something)</li>
<li>Next, I adopted a query to get the downloads and it shows 889, which is similar to the number Atmire&rsquo;s statlet shows, though the query logic here is confusing:</li>
</ul>
<pre><code>$ http 'http://localhost:3000/solr/statistics/select?indent=on&amp;rows=0&amp;q=type:0+owningItem:11576&amp;fq=isBot:false&amp;fq=-(bundleName:[*+TO+*]-bundleName:ORIGINAL)&amp;fq=-(statistics_type:[*+TO+*]+-statistics_type:view)'
</code></pre>
<ul>
<li>According to the <a href="https://wiki.apache.org/solr/SolrQuerySyntax">SolrQuerySyntax</a> page on the Apache wiki, the <code>[* TO *]</code> syntax just selects a range (in this case all values for a field)</li>
<li>So it seems to be:
<ul>
<li><code>type:0</code> is for bitstreams according to the DSpace Solr documentation</li>
<li><code>-(bundleName:[*+TO+*]-bundleName:ORIGINAL)</code> seems to be a <a href="https://wiki.apache.org/solr/NegativeQueryProblems">negative query starting with all documents</a>, subtracting those with <code>bundleName:ORIGINAL</code>, and then negating the whole thing&hellip; meaning only documents from <code>bundleName:ORIGINAL</code>?</li>
</ul></li>
<li>What the shit, I think I&rsquo;m right: the simplified logic in <em>this</em> query returns the same 889:</li>
</ul>
<pre><code>$ http 'http://localhost:3000/solr/statistics/select?indent=on&amp;rows=0&amp;q=type:0+owningItem:11576&amp;fq=isBot:false&amp;fq=bundleName:ORIGINAL&amp;fq=-(statistics_type:[*+TO+*]+-statistics_type:view)'
</code></pre>
<ul>
<li>And if I simplify the <code>statistics_type</code> logic the same way, it still returns the same 889!</li>
</ul>
<pre><code>$ http 'http://localhost:3000/solr/statistics/select?indent=on&amp;rows=0&amp;q=type:0+owningItem:11576&amp;fq=isBot:false&amp;fq=bundleName:ORIGINAL&amp;fq=statistics_type:view'
</code></pre>
<ul>
<li>As for item views, I suppose that&rsquo;s just the same query, minus the <code>bundleName:ORIGINAL</code>:</li>
</ul>
<pre><code>$ http 'http://localhost:3000/solr/statistics/select?indent=on&amp;rows=0&amp;q=type:0+owningItem:11576&amp;fq=isBot:false&amp;fq=-bundleName:ORIGINAL&amp;fq=statistics_type:view'
</code></pre>
<ul>
<li>That one returns 766, which is exactly 1655 minus 889&hellip;</li>
<li>Also, Solr&rsquo;s <code>fq</code> is similar to the regular <code>q</code> query parameter, but it is considered for the Solr query cache so it should be faster for multiple queries</li>
</ul> </ul>
<!-- vim: set sw=2 ts=2: --> <!-- vim: set sw=2 ts=2: -->

View File

@ -4,7 +4,7 @@
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/2018-09/</loc> <loc>https://alanorth.github.io/cgspace-notes/2018-09/</loc>
<lastmod>2018-09-17T17:34:48+03:00</lastmod> <lastmod>2018-09-17T19:53:08+03:00</lastmod>
</url> </url>
<url> <url>
@ -184,7 +184,7 @@
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/</loc> <loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2018-09-17T17:34:48+03:00</lastmod> <lastmod>2018-09-17T19:53:08+03:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>
@ -195,7 +195,7 @@
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc> <loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
<lastmod>2018-09-17T17:34:48+03:00</lastmod> <lastmod>2018-09-17T19:53:08+03:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>
@ -207,13 +207,13 @@
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc> <loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2018-09-17T17:34:48+03:00</lastmod> <lastmod>2018-09-17T19:53:08+03:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc> <loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
<lastmod>2018-09-17T17:34:48+03:00</lastmod> <lastmod>2018-09-17T19:53:08+03:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>