mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-12-23 21:44:30 +01:00
Add notes for 2019-04-17
This commit is contained in:
parent
24473571ff
commit
4a4bd34e0e
@ -688,4 +688,82 @@ sys 2m13.463s
|
||||
|
||||
- Export IITA's community from CGSpace because they want to experiment with importing it into their internal DSpace for some testing or something
|
||||
|
||||
## 2019-04-17
|
||||
|
||||
- Reading an interesting [blog post about Solr caching](https://teaspoon-consulting.com/articles/solr-cache-tuning.html)
|
||||
- Did some tests of the dspace-statistics-api on my local DSpace instance with 28 million documents in a sharded statistics core (`statistics` and `statistics-2018`) and monitored the memory usage of Tomcat in VisualVM
|
||||
- 4GB heap, CMS GC, 512 filter cache, 512 query cache, with 28 million documents in two shards
|
||||
- Run 1:
|
||||
- Time: 3.11s user 0.44s system 0% cpu 13:45.07 total
|
||||
- Tomcat (not Solr) max JVM heap usage: 2.04 GiB
|
||||
- Run 2:
|
||||
- Time: 3.23s user 0.43s system 0% cpu 13:46.10 total
|
||||
- Tomcat (not Solr) max JVM heap usage: 2.06 GiB
|
||||
- Run 3:
|
||||
- Time: 3.23s user 0.42s system 0% cpu 13:14.70 total
|
||||
- Tomcat (not Solr) max JVM heap usage: 2.13 GiB
|
||||
- `filterCache` size: 482, `cumulative_lookups`: 7062712, `cumulative_hits`: 167903, `cumulative_hitratio`: 0.02
|
||||
- queryResultCache size: 2
|
||||
- 4GB heap, CMS GC, 1024 filter cache, 512 query cache, with 28 million documents in two shards
|
||||
- Run 1:
|
||||
- Time: 2.92s user 0.39s system 0% cpu 12:33.08 total
|
||||
- Tomcat (not Solr) max JVM heap usage: 2.16 GiB
|
||||
- Run 2:
|
||||
- Time: 3.10s user 0.39s system 0% cpu 12:25.32 total
|
||||
- Tomcat (not Solr) max JVM heap usage: 2.07 GiB
|
||||
- Run 3:
|
||||
- Time: 3.29s user 0.36s system 0% cpu 11:53.47 total
|
||||
- Tomcat (not Solr) max JVM heap usage: 2.08 GiB
|
||||
- `filterCache` size: 951, `cumulative_lookups`: 7062712, `cumulative_hits`: 254379, `cumulative_hitratio`: 0.04
|
||||
- 4GB heap, CMS GC, 2048 filter cache, 512 query cache, with 28 million documents in two shards
|
||||
- Run 1:
|
||||
- Time: 2.90s user 0.48s system 0% cpu 10:37.31 total
|
||||
- Tomcat max JVM heap usage: 1.96 GiB
|
||||
- `filterCache` size: 1901, `cumulative_lookups`: 2354237, `cumulative_hits`: 180111, `cumulative_hitratio`: 0.08
|
||||
- Run 2:
|
||||
- Time: 2.97s user 0.39s system 0% cpu 10:40.06 total
|
||||
- Tomcat max JVM heap usage: 2.09 GiB
|
||||
- `filterCache` size: 1901, `cumulative_lookups`: 4708473, `cumulative_hits`: 360068, `cumulative_hitratio`: 0.08
|
||||
- Run 3:
|
||||
- Time: 3.28s user 0.37s system 0% cpu 10:49.56 total
|
||||
- Tomcat max JVM heap usage: 2.05 GiB
|
||||
- `filterCache` size: 1901, `cumulative_lookups`: 7062712, `cumulative_hits`: 540020, `cumulative_hitratio`: 0.08
|
||||
- 4GB heap, CMS GC, 4096 filter cache, 512 query cache, with 28 million documents in two shards
|
||||
- Run 1:
|
||||
- Time: 2.88s user 0.35s system 0% cpu 8:29.55 total
|
||||
- Tomcat max JVM heap usage: 2.15 GiB
|
||||
- `filterCache` size: 3770, `cumulative_lookups`: 2354237, `cumulative_hits`: 414512, `cumulative_hitratio`: 0.18
|
||||
- Run 2:
|
||||
- Time: 3.01s user 0.38s system 0% cpu 9:15.65 total
|
||||
- Tomcat max JVM heap usage: 2.17 GiB
|
||||
- `filterCache` size: 3945, `cumulative_lookups`: 4708473, `cumulative_hits`: 829093, `cumulative_hitratio`: 0.18
|
||||
- Run 3:
|
||||
- Time: 3.01s user 0.40s system 0% cpu 9:01.31 total
|
||||
- Tomcat max JVM heap usage: 2.07 GiB
|
||||
- `filterCache` size: 3770, `cumulative_lookups`: 7062712, `cumulative_hits`: 1243632, `cumulative_hitratio`: 0.18
|
||||
- The biggest takeaway I have is that this workload benefits from a larger `filterCache` (for Solr fq parameter), but barely uses the `queryResultCache` (for Solr q parameter) at all
|
||||
- The number of hits goes up and the time taken decreases when we increase the `filterCache`, and total JVM heap memory doesn't seem to increase much at all
|
||||
- I guess the `queryResultCache` size is always 2 because I'm only doing two queries: `type:0` and `type:2` (downloads and views, respectively)
|
||||
- Here is the general pattern of running three sequential indexing runs as seen in VisualVM while monitoring the Tomcat process:
|
||||
|
||||
![VisualVM Tomcat 4096 filterCache](/cgspace-notes/2019/04/visualvm-solr-indexing-4096-filterCache.png)
|
||||
|
||||
- I ran one test with a `filterCache` of 16384 to try to see if I could make the Tomcat JVM memory balloon, but actually it *drastically* increased the performance and memory usage of the dspace-statistics-api indexer
|
||||
- 4GB heap, CMS GC, 16384 filter cache, 512 query cache, with 28 million documents in two shards
|
||||
- Run 1:
|
||||
- Time: 2.85s user 0.42s system 2% cpu 2:28.92 total
|
||||
- Tomcat max JVM heap usage: 1.90 GiB
|
||||
- `filterCache` size: 14851, `cumulative_lookups`: 2354237, `cumulative_hits`: 2331186, `cumulative_hitratio`: 0.99
|
||||
- Run 2:
|
||||
- Time: 2.90s user 0.37s system 2% cpu 2:23.50 total
|
||||
- Tomcat max JVM heap usage: 1.27 GiB
|
||||
- `filterCache` size: 15834, `cumulative_lookups`: 4708476, `cumulative_hits`: 4664762, `cumulative_hitratio`: 0.99
|
||||
- Run 3:
|
||||
- Time: 2.93s user 0.39s system 2% cpu 2:26.17 total
|
||||
- Tomcat max JVM heap usage: 1.05 GiB
|
||||
- `filterCache` size: 15248, `cumulative_lookups`: 7062715, `cumulative_hits`: 6998267, `cumulative_hitratio`: 0.99
|
||||
- The JVM garbage collection graph is MUCH flatter, and memory usage is much lower (not to mention a drop in GC-related CPU usage)!
|
||||
|
||||
![VisualVM Tomcat 16384 filterCache](/cgspace-notes/2019/04/visualvm-solr-indexing-16384-filterCache.png)
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
@ -38,7 +38,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-04/" />
|
||||
<meta property="article:published_time" content="2019-04-01T09:00:43+03:00"/>
|
||||
<meta property="article:modified_time" content="2019-04-15T23:01:19+03:00"/>
|
||||
<meta property="article:modified_time" content="2019-04-16T13:07:33+03:00"/>
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="April, 2019"/>
|
||||
@ -81,9 +81,9 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
||||
"@type": "BlogPosting",
|
||||
"headline": "April, 2019",
|
||||
"url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-04\/",
|
||||
"wordCount": "3963",
|
||||
"wordCount": "4630",
|
||||
"datePublished": "2019-04-01T09:00:43\x2b03:00",
|
||||
"dateModified": "2019-04-15T23:01:19\x2b03:00",
|
||||
"dateModified": "2019-04-16T13:07:33\x2b03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -972,6 +972,106 @@ sys 2m13.463s
|
||||
<li>Export IITA’s community from CGSpace because they want to experiment with importing it into their internal DSpace for some testing or something</li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2019-04-17">2019-04-17</h2>
|
||||
|
||||
<ul>
|
||||
<li>Reading an interesting <a href="https://teaspoon-consulting.com/articles/solr-cache-tuning.html">blog post about Solr caching</a></li>
|
||||
<li>Did some tests of the dspace-statistics-api on my local DSpace instance with 28 million documents in a sharded statistics core (<code>statistics</code> and <code>statistics-2018</code>) and monitored the memory usage of Tomcat in VisualVM</li>
|
||||
<li>4GB heap, CMS GC, 512 filter cache, 512 query cache, with 28 million documents in two shards
|
||||
|
||||
<ul>
|
||||
<li>Run 1:</li>
|
||||
<li>Time: 3.11s user 0.44s system 0% cpu 13:45.07 total</li>
|
||||
<li>Tomcat (not Solr) max JVM heap usage: 2.04 GiB</li>
|
||||
<li>Run 2:</li>
|
||||
<li>Time: 3.23s user 0.43s system 0% cpu 13:46.10 total</li>
|
||||
<li>Tomcat (not Solr) max JVM heap usage: 2.06 GiB</li>
|
||||
<li>Run 3:</li>
|
||||
<li>Time: 3.23s user 0.42s system 0% cpu 13:14.70 total</li>
|
||||
<li>Tomcat (not Solr) max JVM heap usage: 2.13 GiB</li>
|
||||
<li><code>filterCache</code> size: 482, <code>cumulative_lookups</code>: 7062712, <code>cumulative_hits</code>: 167903, <code>cumulative_hitratio</code>: 0.02</li>
|
||||
<li>queryResultCache size: 2</li>
|
||||
</ul></li>
|
||||
<li>4GB heap, CMS GC, 1024 filter cache, 512 query cache, with 28 million documents in two shards
|
||||
|
||||
<ul>
|
||||
<li>Run 1:</li>
|
||||
<li>Time: 2.92s user 0.39s system 0% cpu 12:33.08 total</li>
|
||||
<li>Tomcat (not Solr) max JVM heap usage: 2.16 GiB</li>
|
||||
<li>Run 2:</li>
|
||||
<li>Time: 3.10s user 0.39s system 0% cpu 12:25.32 total</li>
|
||||
<li>Tomcat (not Solr) max JVM heap usage: 2.07 GiB</li>
|
||||
<li>Run 3:</li>
|
||||
<li>Time: 3.29s user 0.36s system 0% cpu 11:53.47 total</li>
|
||||
<li>Tomcat (not Solr) max JVM heap usage: 2.08 GiB</li>
|
||||
<li><code>filterCache</code> size: 951, <code>cumulative_lookups</code>: 7062712, <code>cumulative_hits</code>: 254379, <code>cumulative_hitratio</code>: 0.04</li>
|
||||
</ul></li>
|
||||
<li>4GB heap, CMS GC, 2048 filter cache, 512 query cache, with 28 million documents in two shards
|
||||
|
||||
<ul>
|
||||
<li>Run 1:</li>
|
||||
<li>Time: 2.90s user 0.48s system 0% cpu 10:37.31 total</li>
|
||||
<li>Tomcat max JVM heap usage: 1.96 GiB</li>
|
||||
<li><code>filterCache</code> size: 1901, <code>cumulative_lookups</code>: 2354237, <code>cumulative_hits</code>: 180111, <code>cumulative_hitratio</code>: 0.08</li>
|
||||
<li>Run 2:</li>
|
||||
<li>Time: 2.97s user 0.39s system 0% cpu 10:40.06 total</li>
|
||||
<li>Tomcat max JVM heap usage: 2.09 GiB</li>
|
||||
<li><code>filterCache</code> size: 1901, <code>cumulative_lookups</code>: 4708473, <code>cumulative_hits</code>: 360068, <code>cumulative_hitratio</code>: 0.08</li>
|
||||
<li>Run 3:</li>
|
||||
<li>Time: 3.28s user 0.37s system 0% cpu 10:49.56 total</li>
|
||||
<li>Tomcat max JVM heap usage: 2.05 GiB</li>
|
||||
<li><code>filterCache</code> size: 1901, <code>cumulative_lookups</code>: 7062712, <code>cumulative_hits</code>: 540020, <code>cumulative_hitratio</code>: 0.08</li>
|
||||
</ul></li>
|
||||
<li>4GB heap, CMS GC, 4096 filter cache, 512 query cache, with 28 million documents in two shards
|
||||
|
||||
<ul>
|
||||
<li>Run 1:</li>
|
||||
<li>Time: 2.88s user 0.35s system 0% cpu 8:29.55 total</li>
|
||||
<li>Tomcat max JVM heap usage: 2.15 GiB</li>
|
||||
<li><code>filterCache</code> size: 3770, <code>cumulative_lookups</code>: 2354237, <code>cumulative_hits</code>: 414512, <code>cumulative_hitratio</code>: 0.18</li>
|
||||
<li>Run 2:</li>
|
||||
<li>Time: 3.01s user 0.38s system 0% cpu 9:15.65 total</li>
|
||||
<li>Tomcat max JVM heap usage: 2.17 GiB</li>
|
||||
<li><code>filterCache</code> size: 3945, <code>cumulative_lookups</code>: 4708473, <code>cumulative_hits</code>: 829093, <code>cumulative_hitratio</code>: 0.18</li>
|
||||
<li>Run 3:</li>
|
||||
<li>Time: 3.01s user 0.40s system 0% cpu 9:01.31 total</li>
|
||||
<li>Tomcat max JVM heap usage: 2.07 GiB</li>
|
||||
<li><code>filterCache</code> size: 3770, <code>cumulative_lookups</code>: 7062712, <code>cumulative_hits</code>: 1243632, <code>cumulative_hitratio</code>: 0.18</li>
|
||||
</ul></li>
|
||||
<li>The biggest takeaway I have is that this workload benefits from a larger <code>filterCache</code> (for Solr fq parameter), but barely uses the <code>queryResultCache</code> (for Solr q parameter) at all
|
||||
|
||||
<ul>
|
||||
<li>The number of hits goes up and the time taken decreases when we increase the <code>filterCache</code>, and total JVM heap memory doesn’t seem to increase much at all</li>
|
||||
<li>I guess the <code>queryResultCache</code> size is always 2 because I’m only doing two queries: <code>type:0</code> and <code>type:2</code> (downloads and views, respectively)</li>
|
||||
</ul></li>
|
||||
<li>Here is the general pattern of running three sequential indexing runs as seen in VisualVM while monitoring the Tomcat process:</li>
|
||||
</ul>
|
||||
|
||||
<p><img src="/cgspace-notes/2019/04/visualvm-solr-indexing-4096-filterCache.png" alt="VisualVM Tomcat 4096 filterCache" /></p>
|
||||
|
||||
<ul>
|
||||
<li>I ran one test with a <code>filterCache</code> of 16384 to try to see if I could make the Tomcat JVM memory balloon, but actually it <em>drastically</em> increased the performance and memory usage of the dspace-statistics-api indexer</li>
|
||||
<li>4GB heap, CMS GC, 16384 filter cache, 512 query cache, with 28 million documents in two shards
|
||||
|
||||
<ul>
|
||||
<li>Run 1:</li>
|
||||
<li>Time: 2.85s user 0.42s system 2% cpu 2:28.92 total</li>
|
||||
<li>Tomcat max JVM heap usage: 1.90 GiB</li>
|
||||
<li><code>filterCache</code> size: 14851, <code>cumulative_lookups</code>: 2354237, <code>cumulative_hits</code>: 2331186, <code>cumulative_hitratio</code>: 0.99</li>
|
||||
<li>Run 2:</li>
|
||||
<li>Time: 2.90s user 0.37s system 2% cpu 2:23.50 total</li>
|
||||
<li>Tomcat max JVM heap usage: 1.27 GiB</li>
|
||||
<li><code>filterCache</code> size: 15834, <code>cumulative_lookups</code>: 4708476, <code>cumulative_hits</code>: 4664762, <code>cumulative_hitratio</code>: 0.99</li>
|
||||
<li>Run 3:</li>
|
||||
<li>Time: 2.93s user 0.39s system 2% cpu 2:26.17 total</li>
|
||||
<li>Tomcat max JVM heap usage: 1.05 GiB</li>
|
||||
<li><code>filterCache</code> size: 15248, <code>cumulative_lookups</code>: 7062715, <code>cumulative_hits</code>: 6998267, <code>cumulative_hitratio</code>: 0.99</li>
|
||||
</ul></li>
|
||||
<li>The JVM garbage collection graph is MUCH flatter, and memory usage is much lower (not to mention a drop in GC-related CPU usage)!</li>
|
||||
</ul>
|
||||
|
||||
<p><img src="/cgspace-notes/2019/04/visualvm-solr-indexing-16384-filterCache.png" alt="VisualVM Tomcat 16384 filterCache" /></p>
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
||||
|
||||
|
BIN
docs/2019/04/visualvm-solr-indexing-16384-filterCache.png
Normal file
BIN
docs/2019/04/visualvm-solr-indexing-16384-filterCache.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 80 KiB |
BIN
docs/2019/04/visualvm-solr-indexing-4096-filterCache.png
Normal file
BIN
docs/2019/04/visualvm-solr-indexing-4096-filterCache.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 93 KiB |
@ -4,30 +4,30 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2019-04/</loc>
|
||||
<lastmod>2019-04-15T23:01:19+03:00</lastmod>
|
||||
<lastmod>2019-04-16T13:07:33+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||
<lastmod>2019-04-15T23:01:19+03:00</lastmod>
|
||||
<lastmod>2019-04-16T13:07:33+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||
<lastmod>2019-04-15T23:01:19+03:00</lastmod>
|
||||
<lastmod>2019-04-16T13:07:33+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||
<lastmod>2019-04-15T23:01:19+03:00</lastmod>
|
||||
<lastmod>2019-04-16T13:07:33+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||
<lastmod>2019-04-15T23:01:19+03:00</lastmod>
|
||||
<lastmod>2019-04-16T13:07:33+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
|
BIN
static/2019/04/visualvm-solr-indexing-16384-filterCache.png
Normal file
BIN
static/2019/04/visualvm-solr-indexing-16384-filterCache.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 80 KiB |
BIN
static/2019/04/visualvm-solr-indexing-4096-filterCache.png
Normal file
BIN
static/2019/04/visualvm-solr-indexing-4096-filterCache.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 93 KiB |
Loading…
Reference in New Issue
Block a user