Add notes for 2021-11-07

This commit is contained in:
Alan Orth 2021-11-07 11:26:32 +02:00
parent 2ca9096495
commit b3df4ff58f
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
26 changed files with 108 additions and 31 deletions

View File

@ -36,4 +36,41 @@ $ ./run.sh -s http://localhost:8081/solr/statistics-2019 -a import -o statistics
- I checked on CGSpace's and I can't find them there either, but I see them in Solr when I query in the admin UI
- I need to debug that, but it doesn't seem to be related to the sharding...
## 2021-11-04
- I spent a little bit of time debugging the Solr bug with the statistics-2019 shard but couldn't reproduce it for the few items I tested
- So that's good, it seems the sharding worked
- Linode alerted me to high CPU usage on CGSpace (linode18) yesterday
- Looking at the Solr hits from yesterday I see 91.213.50.11 making 2,300 requests
- According to AbuseIPDB.com this is owned by Registrarus LLC (registrarus.ru) and it has been reported for malicious activity by several users
- The ASN is 50340 (SELECTEL-MSK, RU)
- They are attempting SQL injection:
```console
91.213.50.11 - - [03/Nov/2021:06:47:20 +0100] "HEAD /bitstream/handle/10568/106239/U19ArtSimonikovaChromosomeInthomNodev.pdf?sequence=1%60%20WHERE%206158%3D6158%20AND%204894%3D4741--%20kIlq&isAllowed=y HTTP/1.1" 200 0 "https://cgspace.cgiar.org:443/bitstream/handle/10568/106239/U19ArtSimonikovaChromosomeInthomNodev.pdf" "Mozilla/5.0 (X11; U; Linux i686; en-CA; rv:1.8.0.10) Gecko/20070223 Fedora/1.5.0.10-1.fc5 Firefox/1.5.0.10"
```
- Another is in China, and they grabbed 1,200 PDFs from the REST API in under an hour:
```console
# zgrep 222.129.53.160 /var/log/nginx/rest.log.2.gz | wc -l
1178
```
- I will continue to split the Solr statistics back into year-shards on DSpace Test (linode26)
- Today I did all 2018 stats...
- I want to see if there is a noticeable change in JVM memory, Solr response time, etc
## 2021-11-07
- Update all Docker containers on AReS and rebuild OpenRXV:
```console
$ docker images | grep -v ^REPO | sed 's/ \+/:/g' | cut -d: -f1,2 | xargs -L1 docker pull
$ docker-compose build
```
- Then restart the server and start a fresh harvest
- Continue splitting the Solr statistics into yearly shards on DSpace Test (doing 2017 today)
<!-- vim: set sw=2 ts=2: -->

View File

@ -18,7 +18,7 @@ $ zstd statistics-2019.json
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2021-11/" />
<meta property="article:published_time" content="2021-11-02T22:27:07+02:00" />
<meta property="article:modified_time" content="2021-11-01T10:49:21+02:00" />
<meta property="article:modified_time" content="2021-11-03T15:56:15+02:00" />
@ -42,9 +42,9 @@ $ zstd statistics-2019.json
"@type": "BlogPosting",
"headline": "November, 2021",
"url": "https://alanorth.github.io/cgspace-notes/2021-11/",
"wordCount": "238",
"wordCount": "468",
"datePublished": "2021-11-02T22:27:07+02:00",
"dateModified": "2021-11-01T10:49:21+02:00",
"dateModified": "2021-11-03T15:56:15+02:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -149,6 +149,46 @@ $ ./run.sh -s http://localhost:8081/solr/statistics-2019 -a import -o statistics
</ul>
</li>
</ul>
<h2 id="2021-11-04">2021-11-04</h2>
<ul>
<li>I spent a little bit of time debugging the Solr bug with the statistics-2019 shard but couldn&rsquo;t reproduce it for the few items I tested
<ul>
<li>So that&rsquo;s good, it seems the sharding worked</li>
</ul>
</li>
<li>Linode alerted me to high CPU usage on CGSpace (linode18) yesterday
<ul>
<li>Looking at the Solr hits from yesterday I see 91.213.50.11 making 2,300 requests</li>
<li>According to AbuseIPDB.com this is owned by Registrarus LLC (registrarus.ru) and it has been reported for malicious activity by several users</li>
<li>The ASN is 50340 (SELECTEL-MSK, RU)</li>
<li>They are attempting SQL injection:</li>
</ul>
</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">91.213.50.11 - - [03/Nov/2021:06:47:20 +0100] &quot;HEAD /bitstream/handle/10568/106239/U19ArtSimonikovaChromosomeInthomNodev.pdf?sequence=1%60%20WHERE%206158%3D6158%20AND%204894%3D4741--%20kIlq&amp;isAllowed=y HTTP/1.1&quot; 200 0 &quot;https://cgspace.cgiar.org:443/bitstream/handle/10568/106239/U19ArtSimonikovaChromosomeInthomNodev.pdf&quot; &quot;Mozilla/5.0 (X11; U; Linux i686; en-CA; rv:1.8.0.10) Gecko/20070223 Fedora/1.5.0.10-1.fc5 Firefox/1.5.0.10&quot;
</code></pre><ul>
<li>Another is in China, and they grabbed 1,200 PDFs from the REST API in under an hour:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console"># zgrep 222.129.53.160 /var/log/nginx/rest.log.2.gz | wc -l
1178
</code></pre><ul>
<li>I will continue to split the Solr statistics back into year-shards on DSpace Test (linode26)
<ul>
<li>Today I did all 2018 stats&hellip;</li>
<li>I want to see if there is a noticeable change in JVM memory, Solr response time, etc</li>
</ul>
</li>
</ul>
<h2 id="2021-11-07">2021-11-07</h2>
<ul>
<li>Update all Docker containers on AReS and rebuild OpenRXV:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ docker images | grep -v ^REPO | sed 's/ \+/:/g' | cut -d: -f1,2 | xargs -L1 docker pull
$ docker-compose build
</code></pre><ul>
<li>Then restart the server and start a fresh harvest</li>
<li>Continue splitting the Solr statistics into yearly shards on DSpace Test (doing 2017 today)</li>
</ul>
<!-- raw HTML omitted -->

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
<meta property="og:updated_time" content="2021-11-01T10:49:21+02:00" />
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2021-11-01T10:49:21+02:00" />
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2021-11-01T10:49:21+02:00" />
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2021-11-01T10:49:21+02:00" />
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2021-11-01T10:49:21+02:00" />
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2021-11-01T10:49:21+02:00" />
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2021-11-01T10:49:21+02:00" />
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2021-11-01T10:49:21+02:00" />
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2021-11-01T10:49:21+02:00" />
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2021-11-01T10:49:21+02:00" />
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2021-11-01T10:49:21+02:00" />
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2021-11-01T10:49:21+02:00" />
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2021-11-01T10:49:21+02:00" />
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2021-11-01T10:49:21+02:00" />
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2021-11-01T10:49:21+02:00" />
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2021-11-01T10:49:21+02:00" />
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2021-11-01T10:49:21+02:00" />
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2021-11-01T10:49:21+02:00" />
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2021-11-01T10:49:21+02:00" />
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2021-11-01T10:49:21+02:00" />
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2021-11-01T10:49:21+02:00" />
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2021-11-01T10:49:21+02:00" />
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2021-11-01T10:49:21+02:00" />
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />

View File

@ -3,19 +3,19 @@
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<url>
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
<lastmod>2021-11-01T10:49:21+02:00</lastmod>
<lastmod>2021-11-03T15:56:15+02:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2021-11-01T10:49:21+02:00</lastmod>
<lastmod>2021-11-03T15:56:15+02:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
<lastmod>2021-11-01T10:49:21+02:00</lastmod>
<lastmod>2021-11-03T15:56:15+02:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/2021-11/</loc>
<lastmod>2021-11-01T10:49:21+02:00</lastmod>
<lastmod>2021-11-03T15:56:15+02:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2021-11-01T10:49:21+02:00</lastmod>
<lastmod>2021-11-03T15:56:15+02:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/2021-10/</loc>
<lastmod>2021-11-01T10:48:13+02:00</lastmod>