Regenerate public

2025-01-27 05:49:12 +01:00 · 2019-03-01 12:17:17 +01:00
parent d257a40705
commit bb8206e0e9
75 changed files with 1318 additions and 825 deletions
--- a/docs/categories/page/2/index.html
+++ b/docs/categories/page/2/index.html
@ -10,7 +10,7 @@
 <meta property="og:type" content="website" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />

-<meta property="og:updated_time" content="2018-04-01T16:13:54&#43;02:00"/>
+<meta property="og:updated_time" content="2018-05-01T16:43:54&#43;03:00"/>

 <meta name="twitter:card" content="summary"/>
 <meta name="twitter:title" content="Categories"/>
@ -29,7 +29,7 @@
    "@type": "Person",
    "name": "Alan Orth"
  },
-  "dateModified": "2018-04-01T16:13:54&#43;02:00",
+  "dateModified": "2018-05-01T16:43:54&#43;03:00",
  "keywords": "notes,notes,",
  "description": "Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."
 }
@ -92,6 +92,35 @@



+<article class="blog-post">
+  <header>
+    <h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2018-05/">May, 2018</a></h2>
+    <p class="blog-post-meta"><time datetime="2018-05-01T16:43:54&#43;03:00">Tue May 01, 2018</time> by Alan Orth in 
+
+<i class="fa fa-tag" aria-hidden="true"></i>&nbsp;<a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
+
+</p>
+  </header>
+  <h2 id="2018-05-01">2018-05-01</h2>
+
+<ul>
+<li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
+
+<ul>
+<li><a href="http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E">http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E</a></li>
+<li><a href="http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E">http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E</a></li>
+</ul></li>
+<li>Then I reduced the JVM heap size from 6144 back to 5120m</li>
+<li>Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to support hosts choosing which distribution they want to use</li>
+</ul>
+  <a href='https://alanorth.github.io/cgspace-notes/2018-05/'>Read more →</a>
+</article> 
+
+
+
+
+
+
 <article class="blog-post">
  <header>
    <h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2018-04/">April, 2018</a></h2>
@ -396,46 +425,6 @@ COPY 54701



-
-<article class="blog-post">
-  <header>
-    <h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2017-08/">August, 2017</a></h2>
-    <p class="blog-post-meta"><time datetime="2017-08-01T11:51:52&#43;03:00">Tue Aug 01, 2017</time> by Alan Orth in 
-
-<i class="fa fa-tag" aria-hidden="true"></i>&nbsp;<a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
-
-</p>
-  </header>
-  <h2 id="2017-08-01">2017-08-01</h2>
-
-<ul>
-<li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li>
-<li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li>
-<li>The good thing is that, according to <code>dspace.log.2017-08-01</code>, they are all using the same Tomcat session</li>
-<li>This means our Tomcat Crawler Session Valve is working</li>
-<li>But many of the bots are browsing dynamic URLs like:
-
-<ul>
-<li>/handle/10568/3353/discover</li>
-<li>/handle/10568/16510/browse</li>
-</ul></li>
-<li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs&hellip; we will need to find a way to forbid them from accessing these!</li>
-<li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
-<li>It turns out that we&rsquo;re already adding the <code>X-Robots-Tag &quot;none&quot;</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li>
-<li>Also, the bot has to successfully browse the page first so it can receive the HTTP header&hellip;</li>
-<li>We might actually have to <em>block</em> these requests with HTTP 403 depending on the user agent</li>
-<li>Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415</li>
-<li>This was due to newline characters in the <code>dc.description.abstract</code> column, which caused OpenRefine to choke when exporting the CSV</li>
-<li>I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using <code>g/^$/d</code></li>
-<li>Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet</li>
-</ul>
-  <a href='https://alanorth.github.io/cgspace-notes/2017-08/'>Read more →</a>
-</article> 
-
-
-
-
-
 <nav class="blog-pagination">
  
  <a class="btn btn-outline-primary" href="/cgspace-notes/categories/" rel="prev" role="button">Previous page</a>
@ -460,6 +449,8 @@ COPY 54701
    <ol class="list-unstyled">


+<li><a href="/cgspace-notes/2019-03/">March, 2019</a></li>
+
 <li><a href="/cgspace-notes/2019-02/">February, 2019</a></li>

 <li><a href="/cgspace-notes/2019-01/">January, 2019</a></li>
@ -468,8 +459,6 @@ COPY 54701

 <li><a href="/cgspace-notes/2018-11/">November, 2018</a></li>

-<li><a href="/cgspace-notes/2018-10/">October, 2018</a></li>
-
    </ol>
  </section>