Add notes for 2017-11-19

This commit is contained in:
Alan Orth 2017-11-19 16:11:12 +02:00
parent 6fd723cfaa
commit 0f8c0fda83
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
3 changed files with 98 additions and 8 deletions

View File

@ -693,3 +693,46 @@ $ jconsole -J-DsocksProxyHost=localhost -J-DsocksProxyPort=7777 service:jmx:rmi:
- Here is the Jconsole screen after looping `http --print Hh https://dspacetest.cgiar.org/handle/10568/1` for a few minutes:
![Jconsole sessions for XMLUI](/cgspace-notes/2017/11/jconsole-sessions.png)
## 2017-11-19
- Linode sent an alert that CGSpace was using a lot of CPU around 46 AM
- Looking in the nginx access logs I see the most active XMLUI users between 4 and 6 AM:
```
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "19/Nov/2017:0[456]" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
111 66.249.66.155
171 5.9.6.51
188 54.162.241.40
229 207.46.13.23
233 207.46.13.137
247 40.77.167.6
251 207.46.13.36
275 68.180.229.254
325 104.196.152.243
1610 66.249.66.153
```
- 66.249.66.153 appears to be Googlebot:
```
66.249.66.153 - - [19/Nov/2017:06:26:01 +0000] "GET /handle/10568/2203 HTTP/1.1" 200 6309 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
```
- We know Googlebot is persistent but behaves well, so I guess it was just a coincidence that it came at a time when we had other traffic and server activity
- In related news, I see an Atmire update process going for many hours and responsible for hundreds of thousands of log entries (two thirds of all log entries)
```
$ wc -l dspace.log.2017-11-19
388472 dspace.log.2017-11-19
$ grep -c com.atmire.utils.UpdateSolrStatsMetadata dspace.log.2017-11-19
267494
```
- WTF is this process doing every day, and for so many hours?
- In unrelated news, when I was looking at the DSpace logs I saw a bunch of errors like this:
```
2017-11-19 03:00:32,806 INFO org.apache.pdfbox.pdfparser.PDFParser @ Document is encrypted
2017-11-19 03:00:32,807 ERROR org.apache.pdfbox.filter.FlateFilter @ FlateFilter: stop reading corrupt stream due to a DataFormatException
```

View File

@ -38,7 +38,7 @@ COPY 54701
<meta property="article:published_time" content="2017-11-02T09:37:54&#43;02:00"/>
<meta property="article:modified_time" content="2017-11-16T10:15:33&#43;02:00"/>
<meta property="article:modified_time" content="2017-11-17T12:35:53&#43;02:00"/>
@ -86,9 +86,9 @@ COPY 54701
"@type": "BlogPosting",
"headline": "November, 2017",
"url": "https://alanorth.github.io/cgspace-notes/2017-11/",
"wordCount": "4062",
"wordCount": "4282",
"datePublished": "2017-11-02T09:37:54&#43;02:00",
"dateModified": "2017-11-16T10:15:33&#43;02:00",
"dateModified": "2017-11-17T12:35:53&#43;02:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -924,6 +924,53 @@ dspace6=# CREATE EXTENSION pgcrypto;
<p><img src="/cgspace-notes/2017/11/jconsole-sessions.png" alt="Jconsole sessions for XMLUI" /></p>
<h2 id="2017-11-19">2017-11-19</h2>
<ul>
<li>Linode sent an alert that CGSpace was using a lot of CPU around 46 AM</li>
<li>Looking in the nginx access logs I see the most active XMLUI users between 4 and 6 AM:</li>
</ul>
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &quot;19/Nov/2017:0[456]&quot; | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
111 66.249.66.155
171 5.9.6.51
188 54.162.241.40
229 207.46.13.23
233 207.46.13.137
247 40.77.167.6
251 207.46.13.36
275 68.180.229.254
325 104.196.152.243
1610 66.249.66.153
</code></pre>
<ul>
<li>66.249.66.153 appears to be Googlebot:</li>
</ul>
<pre><code>66.249.66.153 - - [19/Nov/2017:06:26:01 +0000] &quot;GET /handle/10568/2203 HTTP/1.1&quot; 200 6309 &quot;-&quot; &quot;Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)&quot;
</code></pre>
<ul>
<li>We know Googlebot is persistent but behaves well, so I guess it was just a coincidence that it came at a time when we had other traffic and server activity</li>
<li>In related news, I see an Atmire update process going for many hours and responsible for hundreds of thousands of log entries (two thirds of all log entries)</li>
</ul>
<pre><code>$ wc -l dspace.log.2017-11-19
388472 dspace.log.2017-11-19
$ grep -c com.atmire.utils.UpdateSolrStatsMetadata dspace.log.2017-11-19
267494
</code></pre>
<ul>
<li>WTF is this process doing every day, and for so many hours?</li>
<li>In unrelated news, when I was looking at the DSpace logs I saw a bunch of errors like this:</li>
</ul>
<pre><code>2017-11-19 03:00:32,806 INFO org.apache.pdfbox.pdfparser.PDFParser @ Document is encrypted
2017-11-19 03:00:32,807 ERROR org.apache.pdfbox.filter.FlateFilter @ FlateFilter: stop reading corrupt stream due to a DataFormatException
</code></pre>

View File

@ -4,7 +4,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/2017-11/</loc>
<lastmod>2017-11-16T10:15:33+02:00</lastmod>
<lastmod>2017-11-17T12:35:53+02:00</lastmod>
</url>
<url>
@ -134,7 +134,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2017-11-16T10:15:33+02:00</lastmod>
<lastmod>2017-11-17T12:35:53+02:00</lastmod>
<priority>0</priority>
</url>
@ -145,7 +145,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
<lastmod>2017-11-16T10:15:33+02:00</lastmod>
<lastmod>2017-11-17T12:35:53+02:00</lastmod>
<priority>0</priority>
</url>
@ -157,13 +157,13 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/post/</loc>
<lastmod>2017-11-16T10:15:33+02:00</lastmod>
<lastmod>2017-11-17T12:35:53+02:00</lastmod>
<priority>0</priority>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
<lastmod>2017-11-16T10:15:33+02:00</lastmod>
<lastmod>2017-11-17T12:35:53+02:00</lastmod>
<priority>0</priority>
</url>