diff --git a/content/posts/2019-04.md b/content/posts/2019-04.md index 37d7dc62c..dd6b3af5f 100644 --- a/content/posts/2019-04.md +++ b/content/posts/2019-04.md @@ -688,4 +688,82 @@ sys 2m13.463s - Export IITA's community from CGSpace because they want to experiment with importing it into their internal DSpace for some testing or something +## 2019-04-17 + +- Reading an interesting [blog post about Solr caching](https://teaspoon-consulting.com/articles/solr-cache-tuning.html) +- Did some tests of the dspace-statistics-api on my local DSpace instance with 28 million documents in a sharded statistics core (`statistics` and `statistics-2018`) and monitored the memory usage of Tomcat in VisualVM +- 4GB heap, CMS GC, 512 filter cache, 512 query cache, with 28 million documents in two shards + - Run 1: + - Time: 3.11s user 0.44s system 0% cpu 13:45.07 total + - Tomcat (not Solr) max JVM heap usage: 2.04 GiB + - Run 2: + - Time: 3.23s user 0.43s system 0% cpu 13:46.10 total + - Tomcat (not Solr) max JVM heap usage: 2.06 GiB + - Run 3: + - Time: 3.23s user 0.42s system 0% cpu 13:14.70 total + - Tomcat (not Solr) max JVM heap usage: 2.13 GiB + - `filterCache` size: 482, `cumulative_lookups`: 7062712, `cumulative_hits`: 167903, `cumulative_hitratio`: 0.02 + - queryResultCache size: 2 +- 4GB heap, CMS GC, 1024 filter cache, 512 query cache, with 28 million documents in two shards + - Run 1: + - Time: 2.92s user 0.39s system 0% cpu 12:33.08 total + - Tomcat (not Solr) max JVM heap usage: 2.16 GiB + - Run 2: + - Time: 3.10s user 0.39s system 0% cpu 12:25.32 total + - Tomcat (not Solr) max JVM heap usage: 2.07 GiB + - Run 3: + - Time: 3.29s user 0.36s system 0% cpu 11:53.47 total + - Tomcat (not Solr) max JVM heap usage: 2.08 GiB + - `filterCache` size: 951, `cumulative_lookups`: 7062712, `cumulative_hits`: 254379, `cumulative_hitratio`: 0.04 +- 4GB heap, CMS GC, 2048 filter cache, 512 query cache, with 28 million documents in two shards + - Run 1: + - Time: 2.90s user 0.48s system 0% cpu 10:37.31 total + - Tomcat max JVM heap usage: 1.96 GiB + - `filterCache` size: 1901, `cumulative_lookups`: 2354237, `cumulative_hits`: 180111, `cumulative_hitratio`: 0.08 + - Run 2: + - Time: 2.97s user 0.39s system 0% cpu 10:40.06 total + - Tomcat max JVM heap usage: 2.09 GiB + - `filterCache` size: 1901, `cumulative_lookups`: 4708473, `cumulative_hits`: 360068, `cumulative_hitratio`: 0.08 + - Run 3: + - Time: 3.28s user 0.37s system 0% cpu 10:49.56 total + - Tomcat max JVM heap usage: 2.05 GiB + - `filterCache` size: 1901, `cumulative_lookups`: 7062712, `cumulative_hits`: 540020, `cumulative_hitratio`: 0.08 +- 4GB heap, CMS GC, 4096 filter cache, 512 query cache, with 28 million documents in two shards + - Run 1: + - Time: 2.88s user 0.35s system 0% cpu 8:29.55 total + - Tomcat max JVM heap usage: 2.15 GiB + - `filterCache` size: 3770, `cumulative_lookups`: 2354237, `cumulative_hits`: 414512, `cumulative_hitratio`: 0.18 + - Run 2: + - Time: 3.01s user 0.38s system 0% cpu 9:15.65 total + - Tomcat max JVM heap usage: 2.17 GiB + - `filterCache` size: 3945, `cumulative_lookups`: 4708473, `cumulative_hits`: 829093, `cumulative_hitratio`: 0.18 + - Run 3: + - Time: 3.01s user 0.40s system 0% cpu 9:01.31 total + - Tomcat max JVM heap usage: 2.07 GiB + - `filterCache` size: 3770, `cumulative_lookups`: 7062712, `cumulative_hits`: 1243632, `cumulative_hitratio`: 0.18 +- The biggest takeaway I have is that this workload benefits from a larger `filterCache` (for Solr fq parameter), but barely uses the `queryResultCache` (for Solr q parameter) at all + - The number of hits goes up and the time taken decreases when we increase the `filterCache`, and total JVM heap memory doesn't seem to increase much at all + - I guess the `queryResultCache` size is always 2 because I'm only doing two queries: `type:0` and `type:2` (downloads and views, respectively) +- Here is the general pattern of running three sequential indexing runs as seen in VisualVM while monitoring the Tomcat process: + +![VisualVM Tomcat 4096 filterCache](/cgspace-notes/2019/04/visualvm-solr-indexing-4096-filterCache.png) + +- I ran one test with a `filterCache` of 16384 to try to see if I could make the Tomcat JVM memory balloon, but actually it *drastically* increased the performance and memory usage of the dspace-statistics-api indexer +- 4GB heap, CMS GC, 16384 filter cache, 512 query cache, with 28 million documents in two shards + - Run 1: + - Time: 2.85s user 0.42s system 2% cpu 2:28.92 total + - Tomcat max JVM heap usage: 1.90 GiB + - `filterCache` size: 14851, `cumulative_lookups`: 2354237, `cumulative_hits`: 2331186, `cumulative_hitratio`: 0.99 + - Run 2: + - Time: 2.90s user 0.37s system 2% cpu 2:23.50 total + - Tomcat max JVM heap usage: 1.27 GiB + - `filterCache` size: 15834, `cumulative_lookups`: 4708476, `cumulative_hits`: 4664762, `cumulative_hitratio`: 0.99 + - Run 3: + - Time: 2.93s user 0.39s system 2% cpu 2:26.17 total + - Tomcat max JVM heap usage: 1.05 GiB + - `filterCache` size: 15248, `cumulative_lookups`: 7062715, `cumulative_hits`: 6998267, `cumulative_hitratio`: 0.99 +- The JVM garbage collection graph is MUCH flatter, and memory usage is much lower (not to mention a drop in GC-related CPU usage)! + +![VisualVM Tomcat 16384 filterCache](/cgspace-notes/2019/04/visualvm-solr-indexing-16384-filterCache.png) + diff --git a/docs/2019-04/index.html b/docs/2019-04/index.html index 8897b5a80..3d5d33503 100644 --- a/docs/2019-04/index.html +++ b/docs/2019-04/index.html @@ -38,7 +38,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace - + @@ -81,9 +81,9 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace "@type": "BlogPosting", "headline": "April, 2019", "url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-04\/", - "wordCount": "3963", + "wordCount": "4630", "datePublished": "2019-04-01T09:00:43\x2b03:00", - "dateModified": "2019-04-15T23:01:19\x2b03:00", + "dateModified": "2019-04-16T13:07:33\x2b03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -972,6 +972,106 @@ sys 2m13.463s
statistics
and statistics-2018
) and monitored the memory usage of Tomcat in VisualVMfilterCache
size: 482, cumulative_lookups
: 7062712, cumulative_hits
: 167903, cumulative_hitratio
: 0.02filterCache
size: 951, cumulative_lookups
: 7062712, cumulative_hits
: 254379, cumulative_hitratio
: 0.04filterCache
size: 1901, cumulative_lookups
: 2354237, cumulative_hits
: 180111, cumulative_hitratio
: 0.08filterCache
size: 1901, cumulative_lookups
: 4708473, cumulative_hits
: 360068, cumulative_hitratio
: 0.08filterCache
size: 1901, cumulative_lookups
: 7062712, cumulative_hits
: 540020, cumulative_hitratio
: 0.08filterCache
size: 3770, cumulative_lookups
: 2354237, cumulative_hits
: 414512, cumulative_hitratio
: 0.18filterCache
size: 3945, cumulative_lookups
: 4708473, cumulative_hits
: 829093, cumulative_hitratio
: 0.18filterCache
size: 3770, cumulative_lookups
: 7062712, cumulative_hits
: 1243632, cumulative_hitratio
: 0.18filterCache
(for Solr fq parameter), but barely uses the queryResultCache
(for Solr q parameter) at all
+
+filterCache
, and total JVM heap memory doesn’t seem to increase much at allqueryResultCache
size is always 2 because I’m only doing two queries: type:0
and type:2
(downloads and views, respectively)filterCache
of 16384 to try to see if I could make the Tomcat JVM memory balloon, but actually it drastically increased the performance and memory usage of the dspace-statistics-api indexerfilterCache
size: 14851, cumulative_lookups
: 2354237, cumulative_hits
: 2331186, cumulative_hitratio
: 0.99filterCache
size: 15834, cumulative_lookups
: 4708476, cumulative_hits
: 4664762, cumulative_hitratio
: 0.99filterCache
size: 15248, cumulative_lookups
: 7062715, cumulative_hits
: 6998267, cumulative_hitratio
: 0.99