mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-12-26 15:04:30 +01:00
Update notes for 2019-02-19
This commit is contained in:
parent
238ae1678f
commit
845ec58520
@ -932,5 +932,51 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
|
||||
![Usage stats](/cgspace-notes/2019/02/usage-stats.png)
|
||||
|
||||
- I need to follow up with the DSpace developers and Atmire to see how they classify which requests are bots so we can try to estimate the impact caused by these users and perhaps try to update the list to make the stats more accurate
|
||||
- I found one IP address in Nigeria that has an Android user agent and has requested a bitstream from [10568/96140](https://cgspace.cgiar.org/handle/10568/96140) almost 200 times:
|
||||
|
||||
```
|
||||
# grep 41.190.30.105 /var/log/nginx/access.log | grep -c 'acgg_progress_report.pdf'
|
||||
185
|
||||
```
|
||||
|
||||
- Wow, and another IP in Nigeria made a bunch more yesterday from the same user agent:
|
||||
|
||||
```
|
||||
# grep 41.190.3.229 /var/log/nginx/access.log.1 | grep -c 'acgg_progress_report.pdf'
|
||||
346
|
||||
```
|
||||
|
||||
- In the last two days alone there were 1,000 requests for this PDF, mostly from Nigeria!
|
||||
|
||||
```
|
||||
# zcat --force /var/log/nginx/{access,error,library-access}.log /var/log/nginx/{access,error,library-access}.log.1 | grep acgg_progress_report.pdf | grep -v 'upstream response is buffered' | awk '{print $1}' | sort | uniq -c | sort -n
|
||||
1 139.162.146.60
|
||||
1 157.55.39.159
|
||||
1 196.188.127.94
|
||||
1 196.190.127.16
|
||||
1 197.183.33.222
|
||||
1 66.249.66.221
|
||||
2 104.237.146.139
|
||||
2 175.158.209.61
|
||||
2 196.190.63.120
|
||||
2 196.191.127.118
|
||||
2 213.55.99.121
|
||||
2 82.145.223.103
|
||||
3 197.250.96.248
|
||||
4 196.191.127.125
|
||||
4 197.156.77.24
|
||||
5 105.112.75.237
|
||||
185 41.190.30.105
|
||||
346 41.190.3.229
|
||||
503 41.190.31.73
|
||||
```
|
||||
|
||||
- That is so weird, they are all using this Android user agent:
|
||||
|
||||
```
|
||||
Mozilla/5.0 (Linux; Android 7.0; TECNO Camon CX Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/33.0.0.0 Mobile Safari/537.36
|
||||
```
|
||||
|
||||
- I wrote a quick and dirty Python script called `resolve-addresses.py` to resolve IP addresses to their owning organization's name, ASN, and country using the [IPAPI.co API](https://ipapi.co)
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
@ -42,7 +42,7 @@ sys 0m1.979s
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-02/" />
|
||||
<meta property="article:published_time" content="2019-02-01T21:37:30+02:00"/>
|
||||
<meta property="article:modified_time" content="2019-02-18T16:30:34-08:00"/>
|
||||
<meta property="article:modified_time" content="2019-02-19T12:42:33-08:00"/>
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="February, 2019"/>
|
||||
@ -89,9 +89,9 @@ sys 0m1.979s
|
||||
"@type": "BlogPosting",
|
||||
"headline": "February, 2019",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2019-02/",
|
||||
"wordCount": "5236",
|
||||
"wordCount": "5473",
|
||||
"datePublished": "2019-02-01T21:37:30+02:00",
|
||||
"dateModified": "2019-02-18T16:30:34-08:00",
|
||||
"dateModified": "2019-02-19T12:42:33-08:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -1212,6 +1212,60 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
|
||||
|
||||
<p><img src="/cgspace-notes/2019/02/usage-stats.png" alt="Usage stats" /></p>
|
||||
|
||||
<ul>
|
||||
<li>I need to follow up with the DSpace developers and Atmire to see how they classify which requests are bots so we can try to estimate the impact caused by these users and perhaps try to update the list to make the stats more accurate</li>
|
||||
<li>I found one IP address in Nigeria that has an Android user agent and has requested a bitstream from <a href="https://cgspace.cgiar.org/handle/10568/96140"><sup>10568</sup>⁄<sub>96140</sub></a> almost 200 times:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code># grep 41.190.30.105 /var/log/nginx/access.log | grep -c 'acgg_progress_report.pdf'
|
||||
185
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>Wow, and another IP in Nigeria made a bunch more yesterday from the same user agent:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code># grep 41.190.3.229 /var/log/nginx/access.log.1 | grep -c 'acgg_progress_report.pdf'
|
||||
346
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>In the last two days alone there were 1,000 requests for this PDF, mostly from Nigeria!</li>
|
||||
</ul>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/{access,error,library-access}.log /var/log/nginx/{access,error,library-access}.log.1 | grep acgg_progress_report.pdf | grep -v 'upstream response is buffered' | awk '{print $1}' | sort | uniq -c | sort -n
|
||||
1 139.162.146.60
|
||||
1 157.55.39.159
|
||||
1 196.188.127.94
|
||||
1 196.190.127.16
|
||||
1 197.183.33.222
|
||||
1 66.249.66.221
|
||||
2 104.237.146.139
|
||||
2 175.158.209.61
|
||||
2 196.190.63.120
|
||||
2 196.191.127.118
|
||||
2 213.55.99.121
|
||||
2 82.145.223.103
|
||||
3 197.250.96.248
|
||||
4 196.191.127.125
|
||||
4 197.156.77.24
|
||||
5 105.112.75.237
|
||||
185 41.190.30.105
|
||||
346 41.190.3.229
|
||||
503 41.190.31.73
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>That is so weird, they are all using this Android user agent:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>Mozilla/5.0 (Linux; Android 7.0; TECNO Camon CX Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/33.0.0.0 Mobile Safari/537.36
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>I wrote a quick and dirty Python script called <code>resolve-addresses.py</code> to resolve IP addresses to their owning organization’s name, ASN, and country using the <a href="https://ipapi.co">IPAPI.co API</a></li>
|
||||
</ul>
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
||||
|
||||
|
@ -4,7 +4,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2019-02/</loc>
|
||||
<lastmod>2019-02-18T16:30:34-08:00</lastmod>
|
||||
<lastmod>2019-02-19T12:42:33-08:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
@ -209,7 +209,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||
<lastmod>2019-02-18T16:30:34-08:00</lastmod>
|
||||
<lastmod>2019-02-19T12:42:33-08:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
@ -220,7 +220,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||
<lastmod>2019-02-18T16:30:34-08:00</lastmod>
|
||||
<lastmod>2019-02-19T12:42:33-08:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
@ -232,13 +232,13 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||
<lastmod>2019-02-18T16:30:34-08:00</lastmod>
|
||||
<lastmod>2019-02-19T12:42:33-08:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||
<lastmod>2019-02-18T16:30:34-08:00</lastmod>
|
||||
<lastmod>2019-02-19T12:42:33-08:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user