mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-22 06:35:03 +01:00
Update notes for 2018-09-10
This commit is contained in:
parent
c3a3af4e9f
commit
6dd5e7850b
@ -138,5 +138,64 @@ UPDATE 15
|
||||
- Start working on adding metadata for access and usage rights that we started earlier in 2018 (and also in 2017)
|
||||
- The current `cg.identifier.status` field will become "Access rights" and `dc.rights` will become "Usage rights"
|
||||
- I have some work in progress on the [`5_x-rights` branch](https://github.com/alanorth/DSpace/tree/5_x-rights)
|
||||
- Linode said that CGSpace (linode18) had a high CPU load earlier today
|
||||
- When I looked, I see it's the same Russian IP that I noticed last month:
|
||||
|
||||
```
|
||||
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "10/Sep/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
1459 157.55.39.202
|
||||
1579 95.108.181.88
|
||||
1615 157.55.39.147
|
||||
1714 66.249.64.91
|
||||
1924 50.116.102.77
|
||||
3696 157.55.39.106
|
||||
3763 157.55.39.148
|
||||
4470 70.32.83.92
|
||||
4724 35.237.175.180
|
||||
14132 5.9.6.51
|
||||
```
|
||||
|
||||
- And this bot is still creating more Tomcat sessions than Nginx requests (WTF?):
|
||||
|
||||
```
|
||||
# grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=5.9.6.51' dspace.log.2018-09-10
|
||||
14133
|
||||
```
|
||||
|
||||
- The user agent is still the same:
|
||||
|
||||
```
|
||||
Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)
|
||||
```
|
||||
|
||||
- I added `.*crawl.*` to the Tomcat Session Crawler Manager Valve, so I'm not sure why the bot is creating so many sessions...
|
||||
- I just tested that user agent on CGSpace and it *does not* create a new session:
|
||||
|
||||
```
|
||||
$ http --print Hh https://cgspace.cgiar.org 'User-Agent:Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)'
|
||||
GET / HTTP/1.1
|
||||
Accept: */*
|
||||
Accept-Encoding: gzip, deflate
|
||||
Connection: keep-alive
|
||||
Host: cgspace.cgiar.org
|
||||
User-Agent: Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)
|
||||
|
||||
HTTP/1.1 200 OK
|
||||
Connection: keep-alive
|
||||
Content-Encoding: gzip
|
||||
Content-Language: en-US
|
||||
Content-Type: text/html;charset=utf-8
|
||||
Date: Mon, 10 Sep 2018 20:43:04 GMT
|
||||
Server: nginx
|
||||
Strict-Transport-Security: max-age=15768000
|
||||
Transfer-Encoding: chunked
|
||||
Vary: Accept-Encoding
|
||||
X-Cocoon-Version: 2.2.0
|
||||
X-Content-Type-Options: nosniff
|
||||
X-Frame-Options: SAMEORIGIN
|
||||
X-XSS-Protection: 1; mode=block
|
||||
```
|
||||
|
||||
- I will have to keep an eye on it and perhaps add it to the list of "bad bots" that get rate limited
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
@ -29,7 +29,7 @@ I ran all system updates on DSpace Test and rebooted it
|
||||
" />
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-08/" /><meta property="article:published_time" content="2018-08-01T11:52:54+03:00"/>
|
||||
<meta property="article:modified_time" content="2018-09-02T11:01:40+03:00"/>
|
||||
<meta property="article:modified_time" content="2018-09-10T23:35:46+03:00"/>
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="August, 2018"/>
|
||||
<meta name="twitter:description" content="2018-08-01
|
||||
@ -65,7 +65,7 @@ I ran all system updates on DSpace Test and rebooted it
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2018-08/",
|
||||
"wordCount": "2748",
|
||||
"datePublished": "2018-08-01T11:52:54+03:00",
|
||||
"dateModified": "2018-09-02T11:01:40+03:00",
|
||||
"dateModified": "2018-09-10T23:35:46+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -340,7 +340,7 @@ sys 2m20.248s
|
||||
|
||||
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep '19/Aug/2018' | grep -c 5.9.6.51
|
||||
1553
|
||||
# grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=5.9.6.51' dspace.log.2018-08-19
|
||||
# grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=5.9.6.51' dspace.log.2018-08-19
|
||||
1724
|
||||
</code></pre>
|
||||
|
||||
|
@ -18,7 +18,7 @@ I’m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I
|
||||
" />
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-09/" /><meta property="article:published_time" content="2018-09-02T09:55:54+03:00"/>
|
||||
<meta property="article:modified_time" content="2018-09-10T11:59:08+03:00"/>
|
||||
<meta property="article:modified_time" content="2018-09-10T18:19:00+03:00"/>
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="September, 2018"/>
|
||||
<meta name="twitter:description" content="2018-09-02
|
||||
@ -41,9 +41,9 @@ I’m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I
|
||||
"@type": "BlogPosting",
|
||||
"headline": "September, 2018",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2018-09/",
|
||||
"wordCount": "996",
|
||||
"wordCount": "1221",
|
||||
"datePublished": "2018-09-02T09:55:54+03:00",
|
||||
"dateModified": "2018-09-10T11:59:08+03:00",
|
||||
"dateModified": "2018-09-10T18:19:00+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -257,6 +257,69 @@ UPDATE 15
|
||||
<li>Start working on adding metadata for access and usage rights that we started earlier in 2018 (and also in 2017)</li>
|
||||
<li>The current <code>cg.identifier.status</code> field will become “Access rights” and <code>dc.rights</code> will become “Usage rights”</li>
|
||||
<li>I have some work in progress on the <a href="https://github.com/alanorth/DSpace/tree/5_x-rights"><code>5_x-rights</code> branch</a></li>
|
||||
<li>Linode said that CGSpace (linode18) had a high CPU load earlier today</li>
|
||||
<li>When I looked, I see it’s the same Russian IP that I noticed last month:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "10/Sep/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
1459 157.55.39.202
|
||||
1579 95.108.181.88
|
||||
1615 157.55.39.147
|
||||
1714 66.249.64.91
|
||||
1924 50.116.102.77
|
||||
3696 157.55.39.106
|
||||
3763 157.55.39.148
|
||||
4470 70.32.83.92
|
||||
4724 35.237.175.180
|
||||
14132 5.9.6.51
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>And this bot is still creating more Tomcat sessions than Nginx requests (WTF?):</li>
|
||||
</ul>
|
||||
|
||||
<pre><code># grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=5.9.6.51' dspace.log.2018-09-10
|
||||
14133
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>The user agent is still the same:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>I added <code>.*crawl.*</code> to the Tomcat Session Crawler Manager Valve, so I’m not sure why the bot is creating so many sessions…</li>
|
||||
<li>I just tested that user agent on CGSpace and it <em>does not</em> create a new session:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>$ http --print Hh https://cgspace.cgiar.org 'User-Agent:Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)'
|
||||
GET / HTTP/1.1
|
||||
Accept: */*
|
||||
Accept-Encoding: gzip, deflate
|
||||
Connection: keep-alive
|
||||
Host: cgspace.cgiar.org
|
||||
User-Agent: Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)
|
||||
|
||||
HTTP/1.1 200 OK
|
||||
Connection: keep-alive
|
||||
Content-Encoding: gzip
|
||||
Content-Language: en-US
|
||||
Content-Type: text/html;charset=utf-8
|
||||
Date: Mon, 10 Sep 2018 20:43:04 GMT
|
||||
Server: nginx
|
||||
Strict-Transport-Security: max-age=15768000
|
||||
Transfer-Encoding: chunked
|
||||
Vary: Accept-Encoding
|
||||
X-Cocoon-Version: 2.2.0
|
||||
X-Content-Type-Options: nosniff
|
||||
X-Frame-Options: SAMEORIGIN
|
||||
X-XSS-Protection: 1; mode=block
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>I will have to keep an eye on it and perhaps add it to the list of “bad bots” that get rate limited</li>
|
||||
</ul>
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
@ -39,7 +39,7 @@ Disallow: /cgspace-notes/2015-12/
|
||||
Disallow: /cgspace-notes/2015-11/
|
||||
Disallow: /cgspace-notes/
|
||||
Disallow: /cgspace-notes/categories/
|
||||
Disallow: /cgspace-notes/tags/notes/
|
||||
Disallow: /cgspace-notes/categories/notes/
|
||||
Disallow: /cgspace-notes/tags/notes/
|
||||
Disallow: /cgspace-notes/posts/
|
||||
Disallow: /cgspace-notes/tags/
|
||||
|
@ -4,12 +4,12 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2018-09/</loc>
|
||||
<lastmod>2018-09-10T11:59:08+03:00</lastmod>
|
||||
<lastmod>2018-09-10T18:19:00+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2018-08/</loc>
|
||||
<lastmod>2018-09-02T11:01:40+03:00</lastmod>
|
||||
<lastmod>2018-09-10T23:35:46+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
@ -184,7 +184,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||
<lastmod>2018-09-10T11:59:08+03:00</lastmod>
|
||||
<lastmod>2018-09-10T18:19:00+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
@ -193,27 +193,27 @@
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||
<lastmod>2018-09-10T11:59:08+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
|
||||
<lastmod>2018-03-09T22:10:33+02:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||
<lastmod>2018-09-10T18:19:00+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||
<lastmod>2018-09-10T11:59:08+03:00</lastmod>
|
||||
<lastmod>2018-09-10T18:19:00+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||
<lastmod>2018-09-10T11:59:08+03:00</lastmod>
|
||||
<lastmod>2018-09-10T18:19:00+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user