diff --git a/content/posts/2019-01.md b/content/posts/2019-01.md index b85d249a2..3896c5605 100644 --- a/content/posts/2019-01.md +++ b/content/posts/2019-01.md @@ -704,5 +704,38 @@ print(results.hits) ``` - So I guess I need to figure out how to use join queries and maybe even switch to using raw Python requests with JSON +- This enumerates the list of Solr cores and returns JSON format: + +``` +http://localhost:3000/solr/admin/cores?action=STATUS&wt=json +``` + +- I think I figured out how to search across shards, I needed to give the whole URL to each other core +- Now I get more results when I start adding the other statistics cores: + +``` +$ http 'http://localhost:3000/solr/statistics/select?&indent=on&rows=0&q=*:*' | grep numFound +$ http 'http://localhost:3000/solr/statistics/select?&shards=localhost:8081/solr/statistics-2018&indent=on&rows=0&q=*:*' | grep numFound + +$ http 'http://localhost:3000/solr/statistics/select?&shards=localhost:8081/solr/statistics-2018,localhost:8081/solr/statistics-2017&indent=on&rows=0&q=*:*' | grep numFound + +$ http 'http://localhost:3000/solr/statistics/select?&shards=localhost:8081/solr/statistics-2018,localhost:8081/solr/statistics-2017,localhost:8081/solr/statistics-2016&indent=on&rows=0&q=*:*' | grep numFound + +``` + +- I should be able to modify the dspace-statistics-api to check the shards via the Solr core status, then add the `shards` parameter to each query to make the search distributed among the cores +- I implemented a proof of concept to query the Solr STATUS for active cores and to add them with a `shards` query string +- A few things I noticed: + - Solr doesn't mind if you use an empty `shards` parameter + - Solr doesn't mind if you have an extra comma at the end of the `shards` parameter + - If you are searching multiple cores, you need to include the base core in the `shards` parameter as well + - For example, compare the following two queries, first including the base core and the shard in the `shards` parameter, and then only including the shard: + +``` +$ http 'http://localhost:8081/solr/statistics/select?indent=on&rows=0&q=type:2+id:11576&fq=isBot:false&fq=statistics_type:view&shards=localhost:8081/solr/statistics,localhost:8081/solr/statistics-2018' | grep numFound + +$ http 'http://localhost:8081/solr/statistics/select?indent=on&rows=0&q=type:2+id:11576&fq=isBot:false&fq=statistics_type:view&shards=localhost:8081/solr/statistics-2018' | grep numFound + +``` diff --git a/docs/2019-01/index.html b/docs/2019-01/index.html index 7d275a82a..a075d4dc1 100644 --- a/docs/2019-01/index.html +++ b/docs/2019-01/index.html @@ -27,7 +27,7 @@ I don’t see anything interesting in the web server logs around that time t " /> - + @@ -60,9 +60,9 @@ I don’t see anything interesting in the web server logs around that time t "@type": "BlogPosting", "headline": "January, 2019", "url": "https://alanorth.github.io/cgspace-notes/2019-01/", - "wordCount": "3266", + "wordCount": "3507", "datePublished": "2019-01-02T09:48:30+02:00", - "dateModified": "2019-01-21T12:54:29+02:00", + "dateModified": "2019-01-21T14:16:56+02:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -957,8 +957,45 @@ print(results.hits)
  • So I guess I need to figure out how to use join queries and maybe even switch to using raw Python requests with JSON
  • +
  • This enumerates the list of Solr cores and returns JSON format:
+
http://localhost:3000/solr/admin/cores?action=STATUS&wt=json
+
+ +
    +
  • I think I figured out how to search across shards, I needed to give the whole URL to each other core
  • +
  • Now I get more results when I start adding the other statistics cores:
  • +
+ +
$ http 'http://localhost:3000/solr/statistics/select?&indent=on&rows=0&q=*:*' | grep numFound<result name="response" numFound="2061320" start="0">
+$ http 'http://localhost:3000/solr/statistics/select?&shards=localhost:8081/solr/statistics-2018&indent=on&rows=0&q=*:*' | grep numFound
+<result name="response" numFound="16280292" start="0" maxScore="1.0">
+$ http 'http://localhost:3000/solr/statistics/select?&shards=localhost:8081/solr/statistics-2018,localhost:8081/solr/statistics-2017&indent=on&rows=0&q=*:*' | grep numFound
+<result name="response" numFound="25606142" start="0" maxScore="1.0">
+$ http 'http://localhost:3000/solr/statistics/select?&shards=localhost:8081/solr/statistics-2018,localhost:8081/solr/statistics-2017,localhost:8081/solr/statistics-2016&indent=on&rows=0&q=*:*' | grep numFound
+<result name="response" numFound="31532212" start="0" maxScore="1.0">
+
+ +
    +
  • I should be able to modify the dspace-statistics-api to check the shards via the Solr core status, then add the shards parameter to each query to make the search distributed among the cores
  • +
  • I implemented a proof of concept to query the Solr STATUS for active cores and to add them with a shards query string
  • +
  • A few things I noticed: + +
      +
    • Solr doesn’t mind if you use an empty shards parameter
    • +
    • Solr doesn’t mind if you have an extra comma at the end of the shards parameter
    • +
    • If you are searching multiple cores, you need to include the base core in the shards parameter as well
    • +
    • For example, compare the following two queries, first including the base core and the shard in the shards parameter, and then only including the shard:
    • +
  • +
+ +
$ http 'http://localhost:8081/solr/statistics/select?indent=on&rows=0&q=type:2+id:11576&fq=isBot:false&fq=statistics_type:view&shards=localhost:8081/solr/statistics,localhost:8081/solr/statistics-2018' | grep numFound
+<result name="response" numFound="275" start="0" maxScore="12.205825">
+$ http 'http://localhost:8081/solr/statistics/select?indent=on&rows=0&q=type:2+id:11576&fq=isBot:false&fq=statistics_type:view&shards=localhost:8081/solr/statistics-2018' | grep numFound
+<result name="response" numFound="241" start="0" maxScore="12.205825">
+
+ diff --git a/docs/sitemap.xml b/docs/sitemap.xml index c0e7ec138..e49890709 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,7 +4,7 @@ https://alanorth.github.io/cgspace-notes/2019-01/ - 2019-01-21T12:54:29+02:00 + 2019-01-21T14:16:56+02:00 @@ -204,7 +204,7 @@ https://alanorth.github.io/cgspace-notes/ - 2019-01-21T12:54:29+02:00 + 2019-01-21T14:16:56+02:00 0 @@ -215,7 +215,7 @@ https://alanorth.github.io/cgspace-notes/tags/notes/ - 2019-01-21T12:54:29+02:00 + 2019-01-21T14:16:56+02:00 0 @@ -227,13 +227,13 @@ https://alanorth.github.io/cgspace-notes/posts/ - 2019-01-21T12:54:29+02:00 + 2019-01-21T14:16:56+02:00 0 https://alanorth.github.io/cgspace-notes/tags/ - 2019-01-21T12:54:29+02:00 + 2019-01-21T14:16:56+02:00 0