mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-12-01 19:08:18 +01:00
Update notes for 2019-05-12
This commit is contained in:
parent
96890358bd
commit
f5e85561b5
@ -322,11 +322,21 @@ UPDATE metadatavalue SET text_lang='es_ES' WHERE resource_type_id=2 AND metadata
|
|||||||
- So this was definitely an attack of some sort... only God knows why
|
- So this was definitely an attack of some sort... only God knows why
|
||||||
- I noticed a few new bots that don't use the word "bot" in their user agent and therefore don't match Tomcat's Crawler Session Manager Valve:
|
- I noticed a few new bots that don't use the word "bot" in their user agent and therefore don't match Tomcat's Crawler Session Manager Valve:
|
||||||
- `Blackboard Safeassign`
|
- `Blackboard Safeassign`
|
||||||
|
- `Unpaywall`
|
||||||
|
|
||||||
## 2019-05-12
|
## 2019-05-12
|
||||||
|
|
||||||
|
- I see that the Unpaywall bot is resonsible for a few thousand XMLUI sessions every day (IP addresses come from nginx access.log):
|
||||||
|
|
||||||
|
```
|
||||||
|
$ cat dspace.log.2019-05-11 | grep -E 'ip_addr=(100.26.206.188|100.27.19.233|107.22.98.199|174.129.156.41|18.205.243.110|18.205.245.200|18.207.176.164|18.207.209.186|18.212.126.89|18.212.5.59|18.213.4.150|18.232.120.6|18.234.180.224|18.234.81.13|3.208.23.222|34.201.121.183|34.201.241.214|34.201.39.122|34.203.188.39|34.207.197.154|34.207.232.63|34.207.91.147|34.224.86.47|34.227.205.181|34.228.220.218|34.229.223.120|35.171.160.166|35.175.175.202|3.80.201.39|3.81.120.70|3.81.43.53|3.84.152.19|3.85.113.253|3.85.237.139|3.85.56.100|3.87.23.95|3.87.248.240|3.87.250.3|3.87.62.129|3.88.13.9|3.88.57.237|3.89.71.15|3.90.17.242|3.90.68.247|3.91.44.91|3.92.138.47|3.94.250.180|52.200.78.128|52.201.223.200|52.90.114.186|52.90.48.73|54.145.91.243|54.160.246.228|54.165.66.180|54.166.219.216|54.166.238.172|54.167.89.152|54.174.94.223|54.196.18.211|54.198.234.175|54.208.8.172|54.224.146.147|54.234.169.91|54.235.29.216|54.237.196.147|54.242.68.231|54.82.6.96|54.87.12.181|54.89.217.141|54.89.234.182|54.90.81.216|54.91.104.162)' | grep -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||||
|
2206
|
||||||
|
```
|
||||||
|
|
||||||
|
- I added "Unpaywall" to the list of bots in the Tomcat Crawler Session Manager Valve
|
||||||
- Set up nginx to use TLS and proxy pass to NodeJS on the AReS development server (linode20)
|
- Set up nginx to use TLS and proxy pass to NodeJS on the AReS development server (linode20)
|
||||||
- Run all system updates on linode20 and reboot it
|
- Run all system updates on linode20 and reboot it
|
||||||
- Also, there is 10 to 20% CPU steal on that VM, so I will ask Linode to move it to another host
|
- Also, there is 10 to 20% CPU steal on that VM, so I will ask Linode to move it to another host
|
||||||
|
- Commit changes to the `resolve-addresses.py` script to add proper CSV output support
|
||||||
|
|
||||||
<!-- vim: set sw=2 ts=2: -->
|
<!-- vim: set sw=2 ts=2: -->
|
||||||
|
@ -28,7 +28,7 @@ But after this I tried to delete the item from the XMLUI and it is still present
|
|||||||
<meta property="og:type" content="article" />
|
<meta property="og:type" content="article" />
|
||||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-05/" />
|
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-05/" />
|
||||||
<meta property="article:published_time" content="2019-05-01T07:37:43+03:00"/>
|
<meta property="article:published_time" content="2019-05-01T07:37:43+03:00"/>
|
||||||
<meta property="article:modified_time" content="2019-05-10T17:27:11+03:00"/>
|
<meta property="article:modified_time" content="2019-05-12T10:39:10+03:00"/>
|
||||||
|
|
||||||
<meta name="twitter:card" content="summary"/>
|
<meta name="twitter:card" content="summary"/>
|
||||||
<meta name="twitter:title" content="May, 2019"/>
|
<meta name="twitter:title" content="May, 2019"/>
|
||||||
@ -61,9 +61,9 @@ But after this I tried to delete the item from the XMLUI and it is still present
|
|||||||
"@type": "BlogPosting",
|
"@type": "BlogPosting",
|
||||||
"headline": "May, 2019",
|
"headline": "May, 2019",
|
||||||
"url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-05\/",
|
"url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-05\/",
|
||||||
"wordCount": "2254",
|
"wordCount": "2323",
|
||||||
"datePublished": "2019-05-01T07:37:43\x2b03:00",
|
"datePublished": "2019-05-01T07:37:43\x2b03:00",
|
||||||
"dateModified": "2019-05-10T17:27:11\x2b03:00",
|
"dateModified": "2019-05-12T10:39:10\x2b03:00",
|
||||||
"author": {
|
"author": {
|
||||||
"@type": "Person",
|
"@type": "Person",
|
||||||
"name": "Alan Orth"
|
"name": "Alan Orth"
|
||||||
@ -518,15 +518,28 @@ UPDATE metadatavalue SET text_lang='es_ES' WHERE resource_type_id=2 AND metadata
|
|||||||
|
|
||||||
<ul>
|
<ul>
|
||||||
<li><code>Blackboard Safeassign</code></li>
|
<li><code>Blackboard Safeassign</code></li>
|
||||||
|
<li><code>Unpaywall</code></li>
|
||||||
</ul></li>
|
</ul></li>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
<h2 id="2019-05-12">2019-05-12</h2>
|
<h2 id="2019-05-12">2019-05-12</h2>
|
||||||
|
|
||||||
<ul>
|
<ul>
|
||||||
<li>Set up nginx to use TLS and proxy pass to NodeJS on the AReS development server (linode20)</li>
|
<li><p>I see that the Unpaywall bot is resonsible for a few thousand XMLUI sessions every day (IP addresses come from nginx access.log):</p>
|
||||||
<li>Run all system updates on linode20 and reboot it</li>
|
|
||||||
<li>Also, there is 10 to 20% CPU steal on that VM, so I will ask Linode to move it to another host</li>
|
<pre><code>$ cat dspace.log.2019-05-11 | grep -E 'ip_addr=(100.26.206.188|100.27.19.233|107.22.98.199|174.129.156.41|18.205.243.110|18.205.245.200|18.207.176.164|18.207.209.186|18.212.126.89|18.212.5.59|18.213.4.150|18.232.120.6|18.234.180.224|18.234.81.13|3.208.23.222|34.201.121.183|34.201.241.214|34.201.39.122|34.203.188.39|34.207.197.154|34.207.232.63|34.207.91.147|34.224.86.47|34.227.205.181|34.228.220.218|34.229.223.120|35.171.160.166|35.175.175.202|3.80.201.39|3.81.120.70|3.81.43.53|3.84.152.19|3.85.113.253|3.85.237.139|3.85.56.100|3.87.23.95|3.87.248.240|3.87.250.3|3.87.62.129|3.88.13.9|3.88.57.237|3.89.71.15|3.90.17.242|3.90.68.247|3.91.44.91|3.92.138.47|3.94.250.180|52.200.78.128|52.201.223.200|52.90.114.186|52.90.48.73|54.145.91.243|54.160.246.228|54.165.66.180|54.166.219.216|54.166.238.172|54.167.89.152|54.174.94.223|54.196.18.211|54.198.234.175|54.208.8.172|54.224.146.147|54.234.169.91|54.235.29.216|54.237.196.147|54.242.68.231|54.82.6.96|54.87.12.181|54.89.217.141|54.89.234.182|54.90.81.216|54.91.104.162)' | grep -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||||
|
2206
|
||||||
|
</code></pre></li>
|
||||||
|
|
||||||
|
<li><p>I added “Unpaywall” to the list of bots in the Tomcat Crawler Session Manager Valve</p></li>
|
||||||
|
|
||||||
|
<li><p>Set up nginx to use TLS and proxy pass to NodeJS on the AReS development server (linode20)</p></li>
|
||||||
|
|
||||||
|
<li><p>Run all system updates on linode20 and reboot it</p></li>
|
||||||
|
|
||||||
|
<li><p>Also, there is 10 to 20% CPU steal on that VM, so I will ask Linode to move it to another host</p></li>
|
||||||
|
|
||||||
|
<li><p>Commit changes to the <code>resolve-addresses.py</code> script to add proper CSV output support</p></li>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
<!-- vim: set sw=2 ts=2: -->
|
<!-- vim: set sw=2 ts=2: -->
|
||||||
|
@ -4,30 +4,30 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||||
<lastmod>2019-05-10T17:27:11+03:00</lastmod>
|
<lastmod>2019-05-12T10:39:10+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/2019-05/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/2019-05/</loc>
|
||||||
<lastmod>2019-05-10T17:27:11+03:00</lastmod>
|
<lastmod>2019-05-12T10:39:10+03:00</lastmod>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||||
<lastmod>2019-05-10T17:27:11+03:00</lastmod>
|
<lastmod>2019-05-12T10:39:10+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||||
<lastmod>2019-05-10T17:27:11+03:00</lastmod>
|
<lastmod>2019-05-12T10:39:10+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||||
<lastmod>2019-05-10T17:27:11+03:00</lastmod>
|
<lastmod>2019-05-12T10:39:10+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user