Add notes for 2017-11-19

This commit is contained in:
2017-11-19 16:11:12 +02:00
parent 6fd723cfaa
commit 0f8c0fda83
3 changed files with 98 additions and 8 deletions

View File

@ -38,7 +38,7 @@ COPY 54701
<meta property="article:published_time" content="2017-11-02T09:37:54&#43;02:00"/>
<meta property="article:modified_time" content="2017-11-16T10:15:33&#43;02:00"/>
<meta property="article:modified_time" content="2017-11-17T12:35:53&#43;02:00"/>
@ -86,9 +86,9 @@ COPY 54701
"@type": "BlogPosting",
"headline": "November, 2017",
"url": "https://alanorth.github.io/cgspace-notes/2017-11/",
"wordCount": "4062",
"wordCount": "4282",
"datePublished": "2017-11-02T09:37:54&#43;02:00",
"dateModified": "2017-11-16T10:15:33&#43;02:00",
"dateModified": "2017-11-17T12:35:53&#43;02:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -924,6 +924,53 @@ dspace6=# CREATE EXTENSION pgcrypto;
<p><img src="/cgspace-notes/2017/11/jconsole-sessions.png" alt="Jconsole sessions for XMLUI" /></p>
<h2 id="2017-11-19">2017-11-19</h2>
<ul>
<li>Linode sent an alert that CGSpace was using a lot of CPU around 46 AM</li>
<li>Looking in the nginx access logs I see the most active XMLUI users between 4 and 6 AM:</li>
</ul>
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &quot;19/Nov/2017:0[456]&quot; | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
111 66.249.66.155
171 5.9.6.51
188 54.162.241.40
229 207.46.13.23
233 207.46.13.137
247 40.77.167.6
251 207.46.13.36
275 68.180.229.254
325 104.196.152.243
1610 66.249.66.153
</code></pre>
<ul>
<li>66.249.66.153 appears to be Googlebot:</li>
</ul>
<pre><code>66.249.66.153 - - [19/Nov/2017:06:26:01 +0000] &quot;GET /handle/10568/2203 HTTP/1.1&quot; 200 6309 &quot;-&quot; &quot;Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)&quot;
</code></pre>
<ul>
<li>We know Googlebot is persistent but behaves well, so I guess it was just a coincidence that it came at a time when we had other traffic and server activity</li>
<li>In related news, I see an Atmire update process going for many hours and responsible for hundreds of thousands of log entries (two thirds of all log entries)</li>
</ul>
<pre><code>$ wc -l dspace.log.2017-11-19
388472 dspace.log.2017-11-19
$ grep -c com.atmire.utils.UpdateSolrStatsMetadata dspace.log.2017-11-19
267494
</code></pre>
<ul>
<li>WTF is this process doing every day, and for so many hours?</li>
<li>In unrelated news, when I was looking at the DSpace logs I saw a bunch of errors like this:</li>
</ul>
<pre><code>2017-11-19 03:00:32,806 INFO org.apache.pdfbox.pdfparser.PDFParser @ Document is encrypted
2017-11-19 03:00:32,807 ERROR org.apache.pdfbox.filter.FlateFilter @ FlateFilter: stop reading corrupt stream due to a DataFormatException
</code></pre>