Update notes for 2018-09-25

This commit is contained in:
Alan Orth 2018-09-26 03:29:56 +03:00
parent bc869de57d
commit 025016667d
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
3 changed files with 12 additions and 16 deletions

View File

@ -513,9 +513,7 @@ http://localhost:8081/solr/statistics/update?commit=true&stream.body=<delete><qu
- After a few hours the Solr statistics core is down to 44GB on CGSpace! - After a few hours the Solr statistics core is down to 44GB on CGSpace!
- I did a *major* refactor and logic fix in the DSpace Statistics API's `indexer.py` - I did a *major* refactor and logic fix in the DSpace Statistics API's `indexer.py`
- Basically, it turns out that using `facet.mincount=1` is really beneficial for me because it reduces the size of the Solr result set, reduces the amount of data we need to ingest into PostgreSQL, and the API returns HTTP 404 Not Found for items without views or downloads anyways - Basically, it turns out that using `facet.mincount=1` is really beneficial for me because it reduces the size of the Solr result set, reduces the amount of data we need to ingest into PostgreSQL, and the API returns HTTP 404 Not Found for items without views or downloads anyways
- I deployed the new version on CGSpace and now there are only 92,000 pages of item views, which is half that it was before... but still seems like way too many (we only have about 74,000 items, which I'd assume would be 740 pages of 100) - I deployed the new version on CGSpace and now it looks pretty good!
- Anyways, the indexing crashed kinda... I think
- But systemd seems to have restarted it, and I now see:
``` ```
Indexing item views (page 28 of 753) Indexing item views (page 28 of 753)
@ -523,6 +521,6 @@ Indexing item views (page 28 of 753)
Indexing item downloads (page 260 of 260) Indexing item downloads (page 260 of 260)
``` ```
- So that looks really awesome! I wonder if the first crash was due to some caching shit in Solr? - And now it's fast as hell due to the muuuuch smaller Solr statistics core
<!-- vim: set sw=2 ts=2: --> <!-- vim: set sw=2 ts=2: -->

View File

@ -18,7 +18,7 @@ I&rsquo;m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I
" /> " />
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-09/" /><meta property="article:published_time" content="2018-09-02T09:55:54&#43;03:00"/> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-09/" /><meta property="article:published_time" content="2018-09-02T09:55:54&#43;03:00"/>
<meta property="article:modified_time" content="2018-09-25T23:54:29&#43;03:00"/> <meta property="article:modified_time" content="2018-09-26T03:24:46&#43;03:00"/>
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="September, 2018"/> <meta name="twitter:title" content="September, 2018"/>
<meta name="twitter:description" content="2018-09-02 <meta name="twitter:description" content="2018-09-02
@ -41,9 +41,9 @@ I&rsquo;m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I
"@type": "BlogPosting", "@type": "BlogPosting",
"headline": "September, 2018", "headline": "September, 2018",
"url": "https://alanorth.github.io/cgspace-notes/2018-09/", "url": "https://alanorth.github.io/cgspace-notes/2018-09/",
"wordCount": "4129", "wordCount": "4073",
"datePublished": "2018-09-02T09:55:54&#43;03:00", "datePublished": "2018-09-02T09:55:54&#43;03:00",
"dateModified": "2018-09-25T23:54:29&#43;03:00", "dateModified": "2018-09-26T03:24:46&#43;03:00",
"author": { "author": {
"@type": "Person", "@type": "Person",
"name": "Alan Orth" "name": "Alan Orth"
@ -698,9 +698,7 @@ dspacestatistics-&gt; (id INT PRIMARY KEY, views INT DEFAULT 0, downloads INT DE
<li>After a few hours the Solr statistics core is down to 44GB on CGSpace!</li> <li>After a few hours the Solr statistics core is down to 44GB on CGSpace!</li>
<li>I did a <em>major</em> refactor and logic fix in the DSpace Statistics API&rsquo;s <code>indexer.py</code></li> <li>I did a <em>major</em> refactor and logic fix in the DSpace Statistics API&rsquo;s <code>indexer.py</code></li>
<li>Basically, it turns out that using <code>facet.mincount=1</code> is really beneficial for me because it reduces the size of the Solr result set, reduces the amount of data we need to ingest into PostgreSQL, and the API returns HTTP 404 Not Found for items without views or downloads anyways</li> <li>Basically, it turns out that using <code>facet.mincount=1</code> is really beneficial for me because it reduces the size of the Solr result set, reduces the amount of data we need to ingest into PostgreSQL, and the API returns HTTP 404 Not Found for items without views or downloads anyways</li>
<li>I deployed the new version on CGSpace and now there are only 92,000 pages of item views, which is half that it was before&hellip; but still seems like way too many (we only have about 74,000 items, which I&rsquo;d assume would be 740 pages of 100)</li> <li>I deployed the new version on CGSpace and now it looks pretty good!</li>
<li>Anyways, the indexing crashed kinda&hellip; I think</li>
<li>But systemd seems to have restarted it, and I now see:</li>
</ul> </ul>
<pre><code>Indexing item views (page 28 of 753) <pre><code>Indexing item views (page 28 of 753)
@ -709,7 +707,7 @@ Indexing item downloads (page 260 of 260)
</code></pre> </code></pre>
<ul> <ul>
<li>So that looks really awesome! I wonder if the first crash was due to some caching shit in Solr?</li> <li>And now it&rsquo;s fast as hell due to the muuuuch smaller Solr statistics core</li>
</ul> </ul>
<!-- vim: set sw=2 ts=2: --> <!-- vim: set sw=2 ts=2: -->

View File

@ -4,7 +4,7 @@
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/2018-09/</loc> <loc>https://alanorth.github.io/cgspace-notes/2018-09/</loc>
<lastmod>2018-09-25T23:54:29+03:00</lastmod> <lastmod>2018-09-26T03:24:46+03:00</lastmod>
</url> </url>
<url> <url>
@ -184,7 +184,7 @@
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/</loc> <loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2018-09-25T23:54:29+03:00</lastmod> <lastmod>2018-09-26T03:24:46+03:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>
@ -195,7 +195,7 @@
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc> <loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
<lastmod>2018-09-25T23:54:29+03:00</lastmod> <lastmod>2018-09-26T03:24:46+03:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>
@ -207,13 +207,13 @@
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc> <loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2018-09-25T23:54:29+03:00</lastmod> <lastmod>2018-09-26T03:24:46+03:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc> <loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
<lastmod>2018-09-25T23:54:29+03:00</lastmod> <lastmod>2018-09-26T03:24:46+03:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>