Update notes for 2017-10-31

This commit is contained in:
Alan Orth 2017-11-01 12:16:17 +02:00
parent db726df881
commit 31dde1c16d
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
4 changed files with 47 additions and 14 deletions

View File

@ -338,3 +338,18 @@ WARNING: [SetPropertiesRule]{Server/Service/Engine/Host/Valve} Setting property
``` ```
# goaccess /var/log/nginx/access.log --log-format=COMBINED # goaccess /var/log/nginx/access.log --log-format=COMBINED
``` ```
- According to Uptime Robot CGSpace went down and up a few times
- I had a look at goaccess and I saw that CORE was actively indexing
- Also, PostgreSQL connections were at 91 (with the max being 60 per web app, hmmm)
- I'm really starting to get annoyed with these guys, and thinking about blocking their IP address for a few days to see if CGSpace becomes more stable
- Actually, come to think of it, they aren't even obeying `robots.txt`, because we actually disallow `/discover` and `/search-filter` URLs but they are hitting those massively:
```
# grep "CORE/0.6" /var/log/nginx/access.log | grep -o -E "GET /(discover|search-filter)" | sort -n | uniq -c | sort -rn
158058 GET /discover
14260 GET /search-filter
```
- I tested a URL of pattern `/discover` in Google's webmaster tools and it was indeed identified as blocked
- I will send feedback to the CORE bot team

View File

@ -28,7 +28,7 @@ Add Katherine Lutz to the groups for content sumission and edit steps of the CGI
<meta property="article:published_time" content="2017-10-01T08:07:54&#43;03:00"/> <meta property="article:published_time" content="2017-10-01T08:07:54&#43;03:00"/>
<meta property="article:modified_time" content="2017-10-31T13:35:56&#43;02:00"/> <meta property="article:modified_time" content="2017-10-31T15:38:27&#43;02:00"/>
@ -66,9 +66,9 @@ Add Katherine Lutz to the groups for content sumission and edit steps of the CGI
"@type": "BlogPosting", "@type": "BlogPosting",
"headline": "October, 2017", "headline": "October, 2017",
"url": "https://alanorth.github.io/cgspace-notes/2017-10/", "url": "https://alanorth.github.io/cgspace-notes/2017-10/",
"wordCount": "2468", "wordCount": "2613",
"datePublished": "2017-10-01T08:07:54&#43;03:00", "datePublished": "2017-10-01T08:07:54&#43;03:00",
"dateModified": "2017-10-31T13:35:56&#43;02:00", "dateModified": "2017-10-31T15:38:27&#43;02:00",
"author": { "author": {
"@type": "Person", "@type": "Person",
"name": "Alan Orth" "name": "Alan Orth"
@ -522,6 +522,24 @@ session_id=6C30F10B4351A4ED83EC6ED50AFD6B6A
<pre><code># goaccess /var/log/nginx/access.log --log-format=COMBINED <pre><code># goaccess /var/log/nginx/access.log --log-format=COMBINED
</code></pre> </code></pre>
<ul>
<li>According to Uptime Robot CGSpace went down and up a few times</li>
<li>I had a look at goaccess and I saw that CORE was actively indexing</li>
<li>Also, PostgreSQL connections were at 91 (with the max being 60 per web app, hmmm)</li>
<li>I&rsquo;m really starting to get annoyed with these guys, and thinking about blocking their IP address for a few days to see if CGSpace becomes more stable</li>
<li>Actually, come to think of it, they aren&rsquo;t even obeying <code>robots.txt</code>, because we actually disallow <code>/discover</code> and <code>/search-filter</code> URLs but they are hitting those massively:</li>
</ul>
<pre><code># grep &quot;CORE/0.6&quot; /var/log/nginx/access.log | grep -o -E &quot;GET /(discover|search-filter)&quot; | sort -n | uniq -c | sort -rn
158058 GET /discover
14260 GET /search-filter
</code></pre>
<ul>
<li>I tested a URL of pattern <code>/discover</code> in Google&rsquo;s webmaster tools and it was indeed identified as blocked</li>
<li>I will send feedback to the CORE bot team</li>
</ul>

View File

@ -28,7 +28,7 @@ Disallow: /cgspace-notes/2015-12/
Disallow: /cgspace-notes/2015-11/ Disallow: /cgspace-notes/2015-11/
Disallow: /cgspace-notes/ Disallow: /cgspace-notes/
Disallow: /cgspace-notes/categories/ Disallow: /cgspace-notes/categories/
Disallow: /cgspace-notes/categories/notes/
Disallow: /cgspace-notes/tags/notes/ Disallow: /cgspace-notes/tags/notes/
Disallow: /cgspace-notes/categories/notes/
Disallow: /cgspace-notes/post/ Disallow: /cgspace-notes/post/
Disallow: /cgspace-notes/tags/ Disallow: /cgspace-notes/tags/

View File

@ -4,7 +4,7 @@
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/2017-10/</loc> <loc>https://alanorth.github.io/cgspace-notes/2017-10/</loc>
<lastmod>2017-10-31T13:35:56+02:00</lastmod> <lastmod>2017-10-31T15:38:27+02:00</lastmod>
</url> </url>
<url> <url>
@ -129,7 +129,7 @@
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/</loc> <loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2017-10-31T13:35:56+02:00</lastmod> <lastmod>2017-10-31T15:38:27+02:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>
@ -138,27 +138,27 @@
<priority>0</priority> <priority>0</priority>
</url> </url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
<lastmod>2017-10-31T15:38:27+02:00</lastmod>
<priority>0</priority>
</url>
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc> <loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
<lastmod>2017-09-28T12:00:49+03:00</lastmod> <lastmod>2017-09-28T12:00:49+03:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
<lastmod>2017-10-31T13:35:56+02:00</lastmod>
<priority>0</priority>
</url>
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/post/</loc> <loc>https://alanorth.github.io/cgspace-notes/post/</loc>
<lastmod>2017-10-31T13:35:56+02:00</lastmod> <lastmod>2017-10-31T15:38:27+02:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc> <loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
<lastmod>2017-10-31T13:35:56+02:00</lastmod> <lastmod>2017-10-31T15:38:27+02:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>