mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-22 06:35:03 +01:00
Update notes for 2018-10-03
This commit is contained in:
parent
99b4ebbcab
commit
20db5ef775
@ -53,5 +53,47 @@ Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752
|
||||
|
||||
- It appears to be Jim Lorenzen... I need to check that later!
|
||||
- I merged the changes to the `5_x-prod` branch ([#390](https://github.com/ilri/DSpace/pull/390))
|
||||
- Linode sent another alert about CPU usage on CGSpace (linode18) this evening
|
||||
- It seems that Moayad is making quite a lot of requests today:
|
||||
|
||||
```
|
||||
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Oct/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
1594 157.55.39.160
|
||||
1627 157.55.39.173
|
||||
1774 136.243.6.84
|
||||
4228 35.237.175.180
|
||||
4497 70.32.83.92
|
||||
4856 66.249.64.59
|
||||
7120 50.116.102.77
|
||||
12518 138.201.49.199
|
||||
87646 34.218.226.147
|
||||
111729 213.139.53.62
|
||||
```
|
||||
|
||||
- But in super positive news, he says they are using my new [dspace-statistics-api](https://github.com/alanorth/dspace-statistics-api) and it's MUCH faster than using Atmire CUA's internal "restlet" API
|
||||
- I don't recognize the `138.201.49.199` IP, but it is in Germany (Hetzner) and appears to be paginating over some browse pages and downloading bitstreams:
|
||||
|
||||
```
|
||||
# grep 138.201.49.199 /var/log/nginx/access.log | grep -o -E 'GET /[a-z]+' | sort | uniq -c
|
||||
8324 GET /bitstream
|
||||
4193 GET /handle
|
||||
```
|
||||
|
||||
- Suspiciously, it's only grabbing the CGIAR System Office community (handle prefix 10947):
|
||||
|
||||
```
|
||||
# grep 138.201.49.199 /var/log/nginx/access.log | grep -o -E 'GET /handle/[0-9]{5}' | sort | uniq -c
|
||||
7 GET /handle/10568
|
||||
4186 GET /handle/10947
|
||||
```
|
||||
|
||||
- The user agent is suspicious too:
|
||||
|
||||
```
|
||||
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36
|
||||
```
|
||||
|
||||
- It's clearly a bot and it's not re-using its Tomcat session, so I will add its IP to the nginx bad bot list
|
||||
- I looked in Solr's statistics core and these hits were actually all counted as `isBot:false` (of course)... hmmm
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="2018-10-01 Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items I created a GitHub issue to track this #389, because I’m super busy in Nairobi right now 2018-10-03 I see Moayad was busy collecting item views and downloads from CGSpace yesterday: # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Oct/2018" | awk '{print $1} ' | sort | uniq -c | sort -n | tail -n 10 933 40." />
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-10/" /><meta property="article:published_time" content="2018-10-01T22:31:54+03:00"/>
|
||||
<meta property="article:modified_time" content="2018-10-03T11:52:48+03:00"/>
|
||||
<meta property="article:modified_time" content="2018-10-03T17:54:58+03:00"/>
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="October, 2018"/>
|
||||
@ -24,9 +24,9 @@
|
||||
"@type": "BlogPosting",
|
||||
"headline": "October, 2018",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2018-10/",
|
||||
"wordCount": "231",
|
||||
"wordCount": "460",
|
||||
"datePublished": "2018-10-01T22:31:54+03:00",
|
||||
"dateModified": "2018-10-03T11:52:48+03:00",
|
||||
"dateModified": "2018-10-03T17:54:58+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -149,6 +149,52 @@ Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752
|
||||
<ul>
|
||||
<li>It appears to be Jim Lorenzen… I need to check that later!</li>
|
||||
<li>I merged the changes to the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/390">#390</a>)</li>
|
||||
<li>Linode sent another alert about CPU usage on CGSpace (linode18) this evening</li>
|
||||
<li>It seems that Moayad is making quite a lot of requests today:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Oct/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
1594 157.55.39.160
|
||||
1627 157.55.39.173
|
||||
1774 136.243.6.84
|
||||
4228 35.237.175.180
|
||||
4497 70.32.83.92
|
||||
4856 66.249.64.59
|
||||
7120 50.116.102.77
|
||||
12518 138.201.49.199
|
||||
87646 34.218.226.147
|
||||
111729 213.139.53.62
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>But in super positive news, he says they are using my new <a href="https://github.com/alanorth/dspace-statistics-api">dspace-statistics-api</a> and it’s MUCH faster than using Atmire CUA’s internal “restlet” API</li>
|
||||
<li>I don’t recognize the <code>138.201.49.199</code> IP, but it is in Germany (Hetzner) and appears to be paginating over some browse pages and downloading bitstreams:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code># grep 138.201.49.199 /var/log/nginx/access.log | grep -o -E 'GET /[a-z]+' | sort | uniq -c
|
||||
8324 GET /bitstream
|
||||
4193 GET /handle
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>Suspiciously, it’s only grabbing the CGIAR System Office community (handle prefix 10947):</li>
|
||||
</ul>
|
||||
|
||||
<pre><code># grep 138.201.49.199 /var/log/nginx/access.log | grep -o -E 'GET /handle/[0-9]{5}' | sort | uniq -c
|
||||
7 GET /handle/10568
|
||||
4186 GET /handle/10947
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>The user agent is suspicious too:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>It’s clearly a bot and it’s not re-using its Tomcat session, so I will add its IP to the nginx bad bot list</li>
|
||||
<li>I looked in Solr’s statistics core and these hits were actually all counted as <code>isBot:false</code> (of course)… hmmm</li>
|
||||
</ul>
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
@ -40,7 +40,7 @@ Disallow: /cgspace-notes/2015-12/
|
||||
Disallow: /cgspace-notes/2015-11/
|
||||
Disallow: /cgspace-notes/
|
||||
Disallow: /cgspace-notes/categories/
|
||||
Disallow: /cgspace-notes/tags/notes/
|
||||
Disallow: /cgspace-notes/categories/notes/
|
||||
Disallow: /cgspace-notes/tags/notes/
|
||||
Disallow: /cgspace-notes/posts/
|
||||
Disallow: /cgspace-notes/tags/
|
||||
|
@ -4,7 +4,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2018-10/</loc>
|
||||
<lastmod>2018-10-03T11:52:48+03:00</lastmod>
|
||||
<lastmod>2018-10-03T17:54:58+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
@ -189,7 +189,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||
<lastmod>2018-10-03T11:52:48+03:00</lastmod>
|
||||
<lastmod>2018-10-03T17:54:58+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
@ -198,27 +198,27 @@
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||
<lastmod>2018-10-03T11:52:48+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
|
||||
<lastmod>2018-03-09T22:10:33+02:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||
<lastmod>2018-10-03T17:54:58+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||
<lastmod>2018-10-03T11:52:48+03:00</lastmod>
|
||||
<lastmod>2018-10-03T17:54:58+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||
<lastmod>2018-10-03T11:52:48+03:00</lastmod>
|
||||
<lastmod>2018-10-03T17:54:58+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user