mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-26 08:28:18 +01:00
Add notes for 2017-11-23
This commit is contained in:
parent
a8cb05a2de
commit
c2b15214a8
@ -780,3 +780,49 @@ $ grep -c com.atmire.utils.UpdateSolrStatsMetadata dspace.log.2017-11-19
|
|||||||
- In other news, it looks like the JVM garbage collection pattern is back to its standard jigsaw pattern after switching back to CMS a few days ago:
|
- In other news, it looks like the JVM garbage collection pattern is back to its standard jigsaw pattern after switching back to CMS a few days ago:
|
||||||
|
|
||||||
![Tomcat JVM with CMS GC](/cgspace-notes/2017/11/tomcat-jvm-cms.png)
|
![Tomcat JVM with CMS GC](/cgspace-notes/2017/11/tomcat-jvm-cms.png)
|
||||||
|
|
||||||
|
## 2017-11-23
|
||||||
|
|
||||||
|
- Linode alerted again that CPU usage was high on CGSpace from 4:13 to 6:13 AM
|
||||||
|
- I see a lot of Googlebot (66.249.66.90) in the XMLUI access logs
|
||||||
|
|
||||||
|
```
|
||||||
|
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "23/Nov/2017:0[456]" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
||||||
|
88 66.249.66.91
|
||||||
|
140 68.180.229.254
|
||||||
|
155 54.196.2.131
|
||||||
|
182 54.224.164.166
|
||||||
|
301 157.55.39.79
|
||||||
|
315 207.46.13.36
|
||||||
|
331 207.46.13.23
|
||||||
|
358 207.46.13.137
|
||||||
|
565 104.196.152.243
|
||||||
|
1570 66.249.66.90
|
||||||
|
```
|
||||||
|
|
||||||
|
- ... and the usual REST scrapers from CIAT (45.5.184.196) and CCAFS (70.32.83.92):
|
||||||
|
|
||||||
|
```
|
||||||
|
# cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 | grep -E "23/Nov/2017:0[456]" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
||||||
|
5 190.120.6.219
|
||||||
|
6 104.198.9.108
|
||||||
|
14 104.196.152.243
|
||||||
|
21 112.134.150.6
|
||||||
|
22 157.55.39.79
|
||||||
|
22 207.46.13.137
|
||||||
|
23 207.46.13.36
|
||||||
|
26 207.46.13.23
|
||||||
|
942 45.5.184.196
|
||||||
|
3995 70.32.83.92
|
||||||
|
```
|
||||||
|
|
||||||
|
- These IPs crawling the REST API don't specify user agents and I'd assume they are creating many Tomcat sessions
|
||||||
|
- I would catch them in nginx to assign a "bot" user agent to them so that the Tomcat Crawler Session Manager valve could deal with them, but they seem to create any really — at least not in the dspace.log:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ grep 70.32.83.92 dspace.log.2017-11-23 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
|
||||||
|
2
|
||||||
|
```
|
||||||
|
|
||||||
|
- I'm wondering if REST works differently, or just doesn't log these sessions?
|
||||||
|
- I wonder if they are measurable via JMX MBeans?
|
||||||
|
@ -86,7 +86,7 @@ COPY 54701
|
|||||||
"@type": "BlogPosting",
|
"@type": "BlogPosting",
|
||||||
"headline": "November, 2017",
|
"headline": "November, 2017",
|
||||||
"url": "https://alanorth.github.io/cgspace-notes/2017-11/",
|
"url": "https://alanorth.github.io/cgspace-notes/2017-11/",
|
||||||
"wordCount": "4549",
|
"wordCount": "4773",
|
||||||
"datePublished": "2017-11-02T09:37:54+02:00",
|
"datePublished": "2017-11-02T09:37:54+02:00",
|
||||||
"dateModified": "2017-11-22T10:20:44+02:00",
|
"dateModified": "2017-11-22T10:20:44+02:00",
|
||||||
"author": {
|
"author": {
|
||||||
@ -1025,6 +1025,57 @@ $ grep -c com.atmire.utils.UpdateSolrStatsMetadata dspace.log.2017-11-19
|
|||||||
|
|
||||||
<p><img src="/cgspace-notes/2017/11/tomcat-jvm-cms.png" alt="Tomcat JVM with CMS GC" /></p>
|
<p><img src="/cgspace-notes/2017/11/tomcat-jvm-cms.png" alt="Tomcat JVM with CMS GC" /></p>
|
||||||
|
|
||||||
|
<h2 id="2017-11-23">2017-11-23</h2>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>Linode alerted again that CPU usage was high on CGSpace from 4:13 to 6:13 AM</li>
|
||||||
|
<li>I see a lot of Googlebot (66.249.66.90) in the XMLUI access logs</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "23/Nov/2017:0[456]" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
||||||
|
88 66.249.66.91
|
||||||
|
140 68.180.229.254
|
||||||
|
155 54.196.2.131
|
||||||
|
182 54.224.164.166
|
||||||
|
301 157.55.39.79
|
||||||
|
315 207.46.13.36
|
||||||
|
331 207.46.13.23
|
||||||
|
358 207.46.13.137
|
||||||
|
565 104.196.152.243
|
||||||
|
1570 66.249.66.90
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>… and the usual REST scrapers from CIAT (45.5.184.196) and CCAFS (70.32.83.92):</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code># cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 | grep -E "23/Nov/2017:0[456]" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
||||||
|
5 190.120.6.219
|
||||||
|
6 104.198.9.108
|
||||||
|
14 104.196.152.243
|
||||||
|
21 112.134.150.6
|
||||||
|
22 157.55.39.79
|
||||||
|
22 207.46.13.137
|
||||||
|
23 207.46.13.36
|
||||||
|
26 207.46.13.23
|
||||||
|
942 45.5.184.196
|
||||||
|
3995 70.32.83.92
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>These IPs crawling the REST API don’t specify user agents and I’d assume they are creating many Tomcat sessions</li>
|
||||||
|
<li>I would catch them in nginx to assign a “bot” user agent to them so that the Tomcat Crawler Session Manager valve could deal with them, but they seem to create any really — at least not in the dspace.log:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>$ grep 70.32.83.92 dspace.log.2017-11-23 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
|
||||||
|
2
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>I’m wondering if REST works differently, or just doesn’t log these sessions?</li>
|
||||||
|
<li>I wonder if they are measurable via JMX MBeans?</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user