Update notes for 2018-07-09

This commit is contained in:
Alan Orth 2018-07-10 00:22:48 +03:00
parent 5ad4a8f80e
commit df5896d076
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
3 changed files with 80 additions and 8 deletions

View File

@ -179,5 +179,40 @@ org.apache.solr.client.solrj.SolrServerException: IOException occured when talki
```
- But not sure what caused that...
- I got a message from Linode tonight that CPU usage was high on CGSpace for the past few hours around 8PM GMT
- Looking in the nginx logs I see the top ten IP addresses active today:
```
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "09/Jul/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1691 40.77.167.84
1701 40.77.167.69
1718 50.116.102.77
1872 137.108.70.6
2172 157.55.39.234
2190 207.46.13.47
2848 178.154.200.38
4367 35.227.26.162
4387 70.32.83.92
4738 95.108.181.88
```
- Of those, *all* except `70.32.83.92` and `50.116.102.77` are *NOT* re-using their Tomcat sessions, for example from the XMLUI logs:
```
$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=95.108.181.88' dspace.log.2018-07-09
4435
```
- `95.108.181.88` appears to be Yandex, so I dunno why it's creating so many sessions, as its user agent should match Tomcat's Crawler Session Manager Valve
- `70.32.83.92` is on MediaTemple but I'm not sure who it is. They are mostly hitting REST so I guess that's fine
- `35.227.26.162` doesn't declare a user agent and is on Google Cloud, so I should probably mark them as a bot in nginx
- `178.154.200.38` is Yandex again
- `207.46.13.47` is Bing
- `157.55.39.234` is Bing
- `137.108.70.6` is our old friend CORE bot
- `50.116.102.77` doesn't declare a user agent and lives on HostGator, but mostly just hits the REST API so I guess that's fine
- `40.77.167.84` is Bing again
- Interestingly, the first time that I see `35.227.26.162` was on 2018-06-08
- I've added `35.227.26.162` to the bot tagging logic in the nginx vhost
<!-- vim: set sw=2 ts=2: -->

View File

@ -30,7 +30,7 @@ There is insufficient memory for the Java Runtime Environment to continue.
<meta property="article:published_time" content="2018-07-01T12:56:54&#43;03:00"/>
<meta property="article:modified_time" content="2018-07-09T07:51:04&#43;03:00"/>
<meta property="article:modified_time" content="2018-07-09T16:45:50&#43;03:00"/>
@ -71,9 +71,9 @@ There is insufficient memory for the Java Runtime Environment to continue.
"@type": "BlogPosting",
"headline": "July, 2018",
"url": "https://alanorth.github.io/cgspace-notes/2018-07/",
"wordCount": "1213",
"wordCount": "1454",
"datePublished": "2018-07-01T12:56:54&#43;03:00",
"dateModified": "2018-07-09T07:51:04&#43;03:00",
"dateModified": "2018-07-09T16:45:50&#43;03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -342,6 +342,43 @@ org.apache.solr.client.solrj.SolrServerException: IOException occured when talki
<ul>
<li>But not sure what caused that&hellip;</li>
<li>I got a message from Linode tonight that CPU usage was high on CGSpace for the past few hours around 8PM GMT</li>
<li>Looking in the nginx logs I see the top ten IP addresses active today:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;09/Jul/2018&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1691 40.77.167.84
1701 40.77.167.69
1718 50.116.102.77
1872 137.108.70.6
2172 157.55.39.234
2190 207.46.13.47
2848 178.154.200.38
4367 35.227.26.162
4387 70.32.83.92
4738 95.108.181.88
</code></pre>
<ul>
<li>Of those, <em>all</em> except <code>70.32.83.92</code> and <code>50.116.102.77</code> are <em>NOT</em> re-using their Tomcat sessions, for example from the XMLUI logs:</li>
</ul>
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=95.108.181.88' dspace.log.2018-07-09
4435
</code></pre>
<ul>
<li><code>95.108.181.88</code> appears to be Yandex, so I dunno why it&rsquo;s creating so many sessions, as its user agent should match Tomcat&rsquo;s Crawler Session Manager Valve</li>
<li><code>70.32.83.92</code> is on MediaTemple but I&rsquo;m not sure who it is. They are mostly hitting REST so I guess that&rsquo;s fine</li>
<li><code>35.227.26.162</code> doesn&rsquo;t declare a user agent and is on Google Cloud, so I should probably mark them as a bot in nginx</li>
<li><code>178.154.200.38</code> is Yandex again</li>
<li><code>207.46.13.47</code> is Bing</li>
<li><code>157.55.39.234</code> is Bing</li>
<li><code>137.108.70.6</code> is our old friend CORE bot</li>
<li><code>50.116.102.77</code> doesn&rsquo;t declare a user agent and lives on HostGator, but mostly just hits the REST API so I guess that&rsquo;s fine</li>
<li><code>40.77.167.84</code> is Bing again</li>
<li>Interestingly, the first time that I see <code>35.227.26.162</code> was on 2018-06-08</li>
<li>I&rsquo;ve added <code>35.227.26.162</code> to the bot tagging logic in the nginx vhost</li>
</ul>
<!-- vim: set sw=2 ts=2: -->

View File

@ -4,7 +4,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/2018-07/</loc>
<lastmod>2018-07-09T07:51:04+03:00</lastmod>
<lastmod>2018-07-09T16:45:50+03:00</lastmod>
</url>
<url>
@ -174,7 +174,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2018-07-09T07:51:04+03:00</lastmod>
<lastmod>2018-07-09T16:45:50+03:00</lastmod>
<priority>0</priority>
</url>
@ -185,7 +185,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
<lastmod>2018-07-09T07:51:04+03:00</lastmod>
<lastmod>2018-07-09T16:45:50+03:00</lastmod>
<priority>0</priority>
</url>
@ -197,13 +197,13 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2018-07-09T07:51:04+03:00</lastmod>
<lastmod>2018-07-09T16:45:50+03:00</lastmod>
<priority>0</priority>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
<lastmod>2018-07-09T07:51:04+03:00</lastmod>
<lastmod>2018-07-09T16:45:50+03:00</lastmod>
<priority>0</priority>
</url>