Compare commits

..

2 Commits

Author SHA1 Message Date
fc1e83e76d
Add notes for 2022-07-04 2022-07-04 22:10:02 +03:00
9a5acf2e32
Add notes for 2022-07-04 2022-07-04 17:20:01 +03:00
38 changed files with 145 additions and 37 deletions

View File

@ -32,4 +32,54 @@ Time: 399.751 ms
- Start a harvest on AReS
## 2022-07-04
- Linode told me that CGSpace had high load yesterday
- I also got some up and down notices from UptimeRobot
- Looking now, I see there was a very high CPU and database pool load, but a mostly normal DSpace session count
![CPU load day](/cgspace-notes/2022/07/cpu-day.png)
![JDBC pool day](/cgspace-notes/2022/07/jmx_tomcat_dbpools-day.png)
- Seems we have some old database transactions since 2022-06-27:
![PostgreSQL locks week](/cgspace-notes/2022/07/postgres_locks_ALL-week.png)
![PostgreSQL query length week](/cgspace-notes/2022/07/postgres_querylength_ALL-week.png)
- Looking at the top connections to nginx yesterday:
```console
# awk '{print $1}' /var/log/nginx/{access,library-access,oai,rest}.log.1 | sort | uniq -c | sort -h | tail
1132 64.124.8.34
1146 2a01:4f8:1c17:5550::1
1380 137.184.159.211
1533 64.124.8.59
4013 80.248.237.167
4776 54.195.118.125
10482 45.5.186.2
11177 172.104.229.92
15855 2a01:7e00::f03c:91ff:fe9a:3a37
22179 64.39.98.251
```
- And the total number of unique IPs:
```console
# awk '{print $1}' /var/log/nginx/{access,library-access,oai,rest}.log.1 | sort -u | wc -l
6952
```
- This seems low, so it must have been from the request patterns by certain visitors
- 64.39.98.251 is Qualys, and I'm debating blocking [all their IPs](https://pci.qualys.com/static/help/merchant/getting_started/check_scanner_ip_addresses.htm) using a geo block in nginx (need to test)
- The top few are known ILRI and other CGIAR scrapers, but 80.248.237.167 is on InternetVikings in Sweden, using a normal user agentand scraping Discover
- 64.124.8.59 is making requests with a normal user agent and belongs to Castle Global or Zayo
- I ran all system updates and rebooted the server (could have just restarted PostgreSQL but I thought I might as well do everything)
- I implemented a geo mapping for the user agent mapping AND the nginx `limit_req_zone` by extracting the networks into an external file and including it in two different geo mapping blocks
- This is clever and relies on the fact that we can use defaults in both cases
- First, we map the user agent of requests from these networks to "bot" so that Tomcat and Solr handle them accordingly
- Second, we use this as a key in a `limit_req_zone`, which relies on a default mapping of '' (and nginx doesn't evaluate empty cache keys)
- I noticed that CIP uploaded a number of Georgian presentations with `dcterms.language` set to English and Other so I changed them to "ka"
- Perhaps we need to update our list of languages to include all instead of the most common ones
- I wrote a script `ilri/iso-639-value-pairs.py` to extract the names and Alpha 2 codes for all ISO 639-1 languages from pycountry and added them to `input-forms.xml`
<!-- vim: set sw=2 ts=2: -->

View File

@ -26,7 +26,7 @@ There seem to be many more of these:
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2022-06/" />
<meta property="article:published_time" content="2022-06-06T09:01:36+03:00" />
<meta property="article:modified_time" content="2022-06-30T16:48:03+03:00" />
<meta property="article:modified_time" content="2022-07-04T09:25:14+03:00" />
@ -60,7 +60,7 @@ There seem to be many more of these:
"url": "https://alanorth.github.io/cgspace-notes/2022-06/",
"wordCount": "1786",
"datePublished": "2022-06-06T09:01:36+03:00",
"dateModified": "2022-06-30T16:48:03+03:00",
"dateModified": "2022-07-04T09:25:14+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"

View File

@ -19,7 +19,7 @@ Also, the trgm functions I&rsquo;ve used before are case insensitive, but Levens
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2022-07/" />
<meta property="article:published_time" content="2022-07-02T14:07:36+03:00" />
<meta property="article:modified_time" content="2022-07-02T14:07:36+03:00" />
<meta property="article:modified_time" content="2022-07-04T17:20:01+03:00" />
@ -44,9 +44,9 @@ Also, the trgm functions I&rsquo;ve used before are case insensitive, but Levens
"@type": "BlogPosting",
"headline": "July, 2022",
"url": "https://alanorth.github.io/cgspace-notes/2022-07/",
"wordCount": "164",
"wordCount": "532",
"datePublished": "2022-07-02T14:07:36+03:00",
"dateModified": "2022-07-02T14:07:36+03:00",
"dateModified": "2022-07-04T17:20:01+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -147,6 +147,64 @@ Also, the trgm functions I&rsquo;ve used before are case insensitive, but Levens
<ul>
<li>Start a harvest on AReS</li>
</ul>
<h2 id="2022-07-04">2022-07-04</h2>
<ul>
<li>Linode told me that CGSpace had high load yesterday
<ul>
<li>I also got some up and down notices from UptimeRobot</li>
<li>Looking now, I see there was a very high CPU and database pool load, but a mostly normal DSpace session count</li>
</ul>
</li>
</ul>
<p><img src="/cgspace-notes/2022/07/cpu-day.png" alt="CPU load day">
<img src="/cgspace-notes/2022/07/jmx_tomcat_dbpools-day.png" alt="JDBC pool day"></p>
<ul>
<li>Seems we have some old database transactions since 2022-06-27:</li>
</ul>
<p><img src="/cgspace-notes/2022/07/postgres_locks_ALL-week.png" alt="PostgreSQL locks week">
<img src="/cgspace-notes/2022/07/postgres_querylength_ALL-week.png" alt="PostgreSQL query length week"></p>
<ul>
<li>Looking at the top connections to nginx yesterday:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># awk <span style="color:#e6db74">&#39;{print $1}&#39;</span> /var/log/nginx/<span style="color:#f92672">{</span>access,library-access,oai,rest<span style="color:#f92672">}</span>.log.1 | sort | uniq -c | sort -h | tail
</span></span><span style="display:flex;"><span> 1132 64.124.8.34
</span></span><span style="display:flex;"><span> 1146 2a01:4f8:1c17:5550::1
</span></span><span style="display:flex;"><span> 1380 137.184.159.211
</span></span><span style="display:flex;"><span> 1533 64.124.8.59
</span></span><span style="display:flex;"><span> 4013 80.248.237.167
</span></span><span style="display:flex;"><span> 4776 54.195.118.125
</span></span><span style="display:flex;"><span> 10482 45.5.186.2
</span></span><span style="display:flex;"><span> 11177 172.104.229.92
</span></span><span style="display:flex;"><span> 15855 2a01:7e00::f03c:91ff:fe9a:3a37
</span></span><span style="display:flex;"><span> 22179 64.39.98.251
</span></span></code></pre></div><ul>
<li>And the total number of unique IPs:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># awk <span style="color:#e6db74">&#39;{print $1}&#39;</span> /var/log/nginx/<span style="color:#f92672">{</span>access,library-access,oai,rest<span style="color:#f92672">}</span>.log.1 | sort -u | wc -l
</span></span><span style="display:flex;"><span>6952
</span></span></code></pre></div><ul>
<li>This seems low, so it must have been from the request patterns by certain visitors
<ul>
<li>64.39.98.251 is Qualys, and I&rsquo;m debating blocking <a href="https://pci.qualys.com/static/help/merchant/getting_started/check_scanner_ip_addresses.htm">all their IPs</a> using a geo block in nginx (need to test)</li>
<li>The top few are known ILRI and other CGIAR scrapers, but 80.248.237.167 is on InternetVikings in Sweden, using a normal user agentand scraping Discover</li>
<li>64.124.8.59 is making requests with a normal user agent and belongs to Castle Global or Zayo</li>
</ul>
</li>
<li>I ran all system updates and rebooted the server (could have just restarted PostgreSQL but I thought I might as well do everything)</li>
<li>I implemented a geo mapping for the user agent mapping AND the nginx <code>limit_req_zone</code> by extracting the networks into an external file and including it in two different geo mapping blocks
<ul>
<li>This is clever and relies on the fact that we can use defaults in both cases</li>
<li>First, we map the user agent of requests from these networks to &ldquo;bot&rdquo; so that Tomcat and Solr handle them accordingly</li>
<li>Second, we use this as a key in a <code>limit_req_zone</code>, which relies on a default mapping of &rsquo;&rsquo; (and nginx doesn&rsquo;t evaluate empty cache keys)</li>
</ul>
</li>
<li>I noticed that CIP uploaded a number of Georgian presentations with <code>dcterms.language</code> set to English and Other so I changed them to &ldquo;ka&rdquo;
<ul>
<li>Perhaps we need to update our list of languages to include all instead of the most common ones</li>
</ul>
</li>
<li>I wrote a script <code>ilri/iso-639-value-pairs.py</code> to extract the names and Alpha 2 codes for all ISO 639-1 languages from pycountry and added them to <code>input-forms.xml</code></li>
</ul>
<!-- raw HTML omitted -->

BIN
docs/2022/07/cpu-day.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 14 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.7 KiB

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
<meta property="og:updated_time" content="2022-07-02T14:07:36+03:00" />
<meta property="og:updated_time" content="2022-07-04T17:20:01+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-07-02T14:07:36+03:00" />
<meta property="og:updated_time" content="2022-07-04T17:20:01+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-07-02T14:07:36+03:00" />
<meta property="og:updated_time" content="2022-07-04T17:20:01+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-07-02T14:07:36+03:00" />
<meta property="og:updated_time" content="2022-07-04T17:20:01+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-07-02T14:07:36+03:00" />
<meta property="og:updated_time" content="2022-07-04T17:20:01+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-07-02T14:07:36+03:00" />
<meta property="og:updated_time" content="2022-07-04T17:20:01+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-07-02T14:07:36+03:00" />
<meta property="og:updated_time" content="2022-07-04T17:20:01+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-07-02T14:07:36+03:00" />
<meta property="og:updated_time" content="2022-07-04T17:20:01+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-07-02T14:07:36+03:00" />
<meta property="og:updated_time" content="2022-07-04T17:20:01+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-07-02T14:07:36+03:00" />
<meta property="og:updated_time" content="2022-07-04T17:20:01+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-07-02T14:07:36+03:00" />
<meta property="og:updated_time" content="2022-07-04T17:20:01+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-07-02T14:07:36+03:00" />
<meta property="og:updated_time" content="2022-07-04T17:20:01+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-07-02T14:07:36+03:00" />
<meta property="og:updated_time" content="2022-07-04T17:20:01+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-07-02T14:07:36+03:00" />
<meta property="og:updated_time" content="2022-07-04T17:20:01+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-07-02T14:07:36+03:00" />
<meta property="og:updated_time" content="2022-07-04T17:20:01+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-07-02T14:07:36+03:00" />
<meta property="og:updated_time" content="2022-07-04T17:20:01+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-07-02T14:07:36+03:00" />
<meta property="og:updated_time" content="2022-07-04T17:20:01+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-07-02T14:07:36+03:00" />
<meta property="og:updated_time" content="2022-07-04T17:20:01+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-07-02T14:07:36+03:00" />
<meta property="og:updated_time" content="2022-07-04T17:20:01+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-07-02T14:07:36+03:00" />
<meta property="og:updated_time" content="2022-07-04T17:20:01+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-07-02T14:07:36+03:00" />
<meta property="og:updated_time" content="2022-07-04T17:20:01+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-07-02T14:07:36+03:00" />
<meta property="og:updated_time" content="2022-07-04T17:20:01+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-07-02T14:07:36+03:00" />
<meta property="og:updated_time" content="2022-07-04T17:20:01+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-07-02T14:07:36+03:00" />
<meta property="og:updated_time" content="2022-07-04T17:20:01+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-07-02T14:07:36+03:00" />
<meta property="og:updated_time" content="2022-07-04T17:20:01+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-07-02T14:07:36+03:00" />
<meta property="og:updated_time" content="2022-07-04T17:20:01+03:00" />

View File

@ -3,22 +3,22 @@
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<url>
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
<lastmod>2022-07-02T14:07:36+03:00</lastmod>
<lastmod>2022-07-04T17:20:01+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2022-07-02T14:07:36+03:00</lastmod>
<lastmod>2022-07-04T17:20:01+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/2022-07/</loc>
<lastmod>2022-07-02T14:07:36+03:00</lastmod>
<lastmod>2022-07-04T17:20:01+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
<lastmod>2022-07-02T14:07:36+03:00</lastmod>
<lastmod>2022-07-04T17:20:01+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2022-07-02T14:07:36+03:00</lastmod>
<lastmod>2022-07-04T17:20:01+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/2022-06/</loc>
<lastmod>2022-06-30T16:48:03+03:00</lastmod>
<lastmod>2022-07-04T09:25:14+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/2022-05/</loc>
<lastmod>2022-05-30T16:00:02+03:00</lastmod>

BIN
static/2022/07/cpu-day.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 14 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.7 KiB