Add notes for 2022-03-04

This commit is contained in:
2022-03-04 15:30:06 +03:00
parent 7453499827
commit 27acbac859
115 changed files with 6550 additions and 6444 deletions

View File

@ -30,7 +30,7 @@ Export another list of ALL subjects on CGSpace, including AGROVOC and non-AGROVO
localhost/dspace63= > \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-01-all-subjects.csv WITH CSV HEADER;
COPY 20994
"/>
<meta name="generator" content="Hugo 0.92.2" />
<meta name="generator" content="Hugo 0.93.1" />
@ -120,17 +120,17 @@ COPY 20994
<ul>
<li>Export another list of ALL subjects on CGSpace, including AGROVOC and non-AGROVOC for Enrico:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">localhost/dspace63= &gt; \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-01-all-subjects.csv WITH CSV HEADER;
COPY 20994
</code></pre></div><h2 id="2021-07-04">2021-07-04</h2>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>localhost/dspace63= &gt; \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-01-all-subjects.csv WITH CSV HEADER;
</span></span><span style="display:flex;"><span>COPY 20994
</span></span></code></pre></div><h2 id="2021-07-04">2021-07-04</h2>
<ul>
<li>Update all Docker containers on the AReS server (linode20) and rebuild OpenRXV:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ cd OpenRXV
$ docker-compose -f docker/docker-compose.yml down
$ docker images | grep -v ^REPO | sed <span style="color:#e6db74">&#39;s/ \+/:/g&#39;</span> | cut -d: -f1,2 | xargs -L1 docker pull
$ docker-compose -f docker/docker-compose.yml build
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ cd OpenRXV
</span></span><span style="display:flex;"><span>$ docker-compose -f docker/docker-compose.yml down
</span></span><span style="display:flex;"><span>$ docker images | grep -v ^REPO | sed <span style="color:#e6db74">&#39;s/ \+/:/g&#39;</span> | cut -d: -f1,2 | xargs -L1 docker pull
</span></span><span style="display:flex;"><span>$ docker-compose -f docker/docker-compose.yml build
</span></span></code></pre></div><ul>
<li>Then run all system updates and reboot the server</li>
<li>After the server came back up I cloned the <code>openrxv-items-final</code> index to <code>openrxv-items-temp</code> and started the plugins
<ul>
@ -172,27 +172,27 @@ $ docker-compose -f docker/docker-compose.yml build
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ./ilri/check-spider-hits.sh -f /tmp/spiders -p
Purging 95 hits from Drupal in statistics
Purging 38 hits from DTS Agent in statistics
Purging 601 hits from Microsoft Office Existence Discovery in statistics
Purging 51 hits from Site24x7 in statistics
Purging 62 hits from Trello in statistics
Purging 13574 hits from WhatsApp in statistics
Purging 144 hits from FlipboardProxy in statistics
Purging 37 hits from LinkWalker in statistics
Purging 1 hits from [Ll]ink.?[Cc]heck.? in statistics
Purging 427 hits from WordPress in statistics
<span style="color:#960050;background-color:#1e0010">
</span><span style="color:#960050;background-color:#1e0010"></span>Total number of bot hits purged: 15030
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./ilri/check-spider-hits.sh -f /tmp/spiders -p
</span></span><span style="display:flex;"><span>Purging 95 hits from Drupal in statistics
</span></span><span style="display:flex;"><span>Purging 38 hits from DTS Agent in statistics
</span></span><span style="display:flex;"><span>Purging 601 hits from Microsoft Office Existence Discovery in statistics
</span></span><span style="display:flex;"><span>Purging 51 hits from Site24x7 in statistics
</span></span><span style="display:flex;"><span>Purging 62 hits from Trello in statistics
</span></span><span style="display:flex;"><span>Purging 13574 hits from WhatsApp in statistics
</span></span><span style="display:flex;"><span>Purging 144 hits from FlipboardProxy in statistics
</span></span><span style="display:flex;"><span>Purging 37 hits from LinkWalker in statistics
</span></span><span style="display:flex;"><span>Purging 1 hits from [Ll]ink.?[Cc]heck.? in statistics
</span></span><span style="display:flex;"><span>Purging 427 hits from WordPress in statistics
</span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>Total number of bot hits purged: 15030
</span></span></code></pre></div><ul>
<li>Meet with the CGIARAGROVOC task group to discuss how we want to do the workflow for submitting new terms to AGROVOC</li>
<li>I extracted another list of all subjects to check against AGROVOC:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">\COPY (SELECT DISTINCT(LOWER(text_value)) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-06-all-subjects.csv WITH CSV HEADER;
$ csvcut -c <span style="color:#ae81ff">1</span> /tmp/2021-07-06-all-subjects.csv | sed 1d &gt; /tmp/2021-07-06-all-subjects.txt
$ ./ilri/agrovoc-lookup.py -i /tmp/2021-07-06-all-subjects.txt -o /tmp/2021-07-06-agrovoc-results-all-subjects.csv -d
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>\COPY (SELECT DISTINCT(LOWER(text_value)) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-06-all-subjects.csv WITH CSV HEADER;
</span></span><span style="display:flex;"><span>$ csvcut -c <span style="color:#ae81ff">1</span> /tmp/2021-07-06-all-subjects.csv | sed 1d &gt; /tmp/2021-07-06-all-subjects.txt
</span></span><span style="display:flex;"><span>$ ./ilri/agrovoc-lookup.py -i /tmp/2021-07-06-all-subjects.txt -o /tmp/2021-07-06-agrovoc-results-all-subjects.csv -d
</span></span></code></pre></div><ul>
<li>Test <a href="https://github.com/DSpace/DSpace/pull/3162">Hrafn Malmquist&rsquo;s proposed DBCP2 changes</a> for DSpace 6.4 (DS-4574)
<ul>
<li>His changes reminded me that we can perhaps switch back to using this pooling instead of Tomcat 7&rsquo;s JDBC pooling via JNDI</li>
@ -205,84 +205,84 @@ $ ./ilri/agrovoc-lookup.py -i /tmp/2021-07-06-all-subjects.txt -o /tmp/2021-07-0
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># <span style="color:#66d9ef">for</span> num in <span style="color:#f92672">{</span>10..26<span style="color:#f92672">}</span>; <span style="color:#66d9ef">do</span> echo <span style="color:#e6db74">&#34;2021-06-</span>$num<span style="color:#e6db74">&#34;</span>; zcat /var/log/nginx/access.log.*.gz /var/log/nginx/library-access.log.*.gz | grep <span style="color:#e6db74">&#34;</span>$num<span style="color:#e6db74">/Jun/2021&#34;</span> | awk <span style="color:#e6db74">&#39;{print $1}&#39;</span> | sort | uniq | wc -l; <span style="color:#66d9ef">done</span>
2021-06-10
10693
2021-06-11
10587
2021-06-12
7958
2021-06-13
7681
2021-06-14
12639
2021-06-15
15388
2021-06-16
12245
2021-06-17
11187
2021-06-18
9684
2021-06-19
7835
2021-06-20
7198
2021-06-21
10380
2021-06-22
10255
2021-06-23
15878
2021-06-24
9963
2021-06-25
9439
2021-06-26
7930
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># <span style="color:#66d9ef">for</span> num in <span style="color:#f92672">{</span>10..26<span style="color:#f92672">}</span>; <span style="color:#66d9ef">do</span> echo <span style="color:#e6db74">&#34;2021-06-</span>$num<span style="color:#e6db74">&#34;</span>; zcat /var/log/nginx/access.log.*.gz /var/log/nginx/library-access.log.*.gz | grep <span style="color:#e6db74">&#34;</span>$num<span style="color:#e6db74">/Jun/2021&#34;</span> | awk <span style="color:#e6db74">&#39;{print $1}&#39;</span> | sort | uniq | wc -l; <span style="color:#66d9ef">done</span>
</span></span><span style="display:flex;"><span>2021-06-10
</span></span><span style="display:flex;"><span>10693
</span></span><span style="display:flex;"><span>2021-06-11
</span></span><span style="display:flex;"><span>10587
</span></span><span style="display:flex;"><span>2021-06-12
</span></span><span style="display:flex;"><span>7958
</span></span><span style="display:flex;"><span>2021-06-13
</span></span><span style="display:flex;"><span>7681
</span></span><span style="display:flex;"><span>2021-06-14
</span></span><span style="display:flex;"><span>12639
</span></span><span style="display:flex;"><span>2021-06-15
</span></span><span style="display:flex;"><span>15388
</span></span><span style="display:flex;"><span>2021-06-16
</span></span><span style="display:flex;"><span>12245
</span></span><span style="display:flex;"><span>2021-06-17
</span></span><span style="display:flex;"><span>11187
</span></span><span style="display:flex;"><span>2021-06-18
</span></span><span style="display:flex;"><span>9684
</span></span><span style="display:flex;"><span>2021-06-19
</span></span><span style="display:flex;"><span>7835
</span></span><span style="display:flex;"><span>2021-06-20
</span></span><span style="display:flex;"><span>7198
</span></span><span style="display:flex;"><span>2021-06-21
</span></span><span style="display:flex;"><span>10380
</span></span><span style="display:flex;"><span>2021-06-22
</span></span><span style="display:flex;"><span>10255
</span></span><span style="display:flex;"><span>2021-06-23
</span></span><span style="display:flex;"><span>15878
</span></span><span style="display:flex;"><span>2021-06-24
</span></span><span style="display:flex;"><span>9963
</span></span><span style="display:flex;"><span>2021-06-25
</span></span><span style="display:flex;"><span>9439
</span></span><span style="display:flex;"><span>2021-06-26
</span></span><span style="display:flex;"><span>7930
</span></span></code></pre></div><ul>
<li>Similarly, the number of connections to the REST API was around the average for the recent weeks before:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># <span style="color:#66d9ef">for</span> num in <span style="color:#f92672">{</span>10..26<span style="color:#f92672">}</span>; <span style="color:#66d9ef">do</span> echo <span style="color:#e6db74">&#34;2021-06-</span>$num<span style="color:#e6db74">&#34;</span>; zcat /var/log/nginx/rest.*.gz | grep <span style="color:#e6db74">&#34;</span>$num<span style="color:#e6db74">/Jun/2021&#34;</span> | awk <span style="color:#e6db74">&#39;{print $1}&#39;</span> | sort | uniq | wc -l; <span style="color:#66d9ef">done</span>
2021-06-10
1183
2021-06-11
1074
2021-06-12
911
2021-06-13
892
2021-06-14
1320
2021-06-15
1257
2021-06-16
1208
2021-06-17
1119
2021-06-18
965
2021-06-19
985
2021-06-20
854
2021-06-21
1098
2021-06-22
1028
2021-06-23
1375
2021-06-24
1135
2021-06-25
969
2021-06-26
904
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># <span style="color:#66d9ef">for</span> num in <span style="color:#f92672">{</span>10..26<span style="color:#f92672">}</span>; <span style="color:#66d9ef">do</span> echo <span style="color:#e6db74">&#34;2021-06-</span>$num<span style="color:#e6db74">&#34;</span>; zcat /var/log/nginx/rest.*.gz | grep <span style="color:#e6db74">&#34;</span>$num<span style="color:#e6db74">/Jun/2021&#34;</span> | awk <span style="color:#e6db74">&#39;{print $1}&#39;</span> | sort | uniq | wc -l; <span style="color:#66d9ef">done</span>
</span></span><span style="display:flex;"><span>2021-06-10
</span></span><span style="display:flex;"><span>1183
</span></span><span style="display:flex;"><span>2021-06-11
</span></span><span style="display:flex;"><span>1074
</span></span><span style="display:flex;"><span>2021-06-12
</span></span><span style="display:flex;"><span>911
</span></span><span style="display:flex;"><span>2021-06-13
</span></span><span style="display:flex;"><span>892
</span></span><span style="display:flex;"><span>2021-06-14
</span></span><span style="display:flex;"><span>1320
</span></span><span style="display:flex;"><span>2021-06-15
</span></span><span style="display:flex;"><span>1257
</span></span><span style="display:flex;"><span>2021-06-16
</span></span><span style="display:flex;"><span>1208
</span></span><span style="display:flex;"><span>2021-06-17
</span></span><span style="display:flex;"><span>1119
</span></span><span style="display:flex;"><span>2021-06-18
</span></span><span style="display:flex;"><span>965
</span></span><span style="display:flex;"><span>2021-06-19
</span></span><span style="display:flex;"><span>985
</span></span><span style="display:flex;"><span>2021-06-20
</span></span><span style="display:flex;"><span>854
</span></span><span style="display:flex;"><span>2021-06-21
</span></span><span style="display:flex;"><span>1098
</span></span><span style="display:flex;"><span>2021-06-22
</span></span><span style="display:flex;"><span>1028
</span></span><span style="display:flex;"><span>2021-06-23
</span></span><span style="display:flex;"><span>1375
</span></span><span style="display:flex;"><span>2021-06-24
</span></span><span style="display:flex;"><span>1135
</span></span><span style="display:flex;"><span>2021-06-25
</span></span><span style="display:flex;"><span>969
</span></span><span style="display:flex;"><span>2021-06-26
</span></span><span style="display:flex;"><span>904
</span></span></code></pre></div><ul>
<li>According to goaccess, the traffic spike started at 2AM (remember that the first &ldquo;Pool empty&rdquo; error in dspace.log was at 4:01AM):</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># zcat /var/log/nginx/access.log.1<span style="color:#f92672">[</span>45<span style="color:#f92672">]</span>.gz /var/log/nginx/library-access.log.1<span style="color:#f92672">[</span>45<span style="color:#f92672">]</span>.gz | grep -E <span style="color:#e6db74">&#39;23/Jun/2021&#39;</span> | goaccess --log-format<span style="color:#f92672">=</span>COMBINED -
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># zcat /var/log/nginx/access.log.1<span style="color:#f92672">[</span>45<span style="color:#f92672">]</span>.gz /var/log/nginx/library-access.log.1<span style="color:#f92672">[</span>45<span style="color:#f92672">]</span>.gz | grep -E <span style="color:#e6db74">&#39;23/Jun/2021&#39;</span> | goaccess --log-format<span style="color:#f92672">=</span>COMBINED -
</span></span></code></pre></div><ul>
<li>Moayad sent a fix for the add missing items plugins issue (<a href="https://github.com/ilri/OpenRXV/pull/107">#107</a>)
<ul>
<li>It works MUCH faster because it correctly identifies the missing handles in each repository</li>
@ -311,19 +311,19 @@ $ ./ilri/agrovoc-lookup.py -i /tmp/2021-07-06-all-subjects.txt -o /tmp/2021-07-0
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">postgres@linode18:~$ psql -c &#39;SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;&#39; | wc -l
2302
postgres@linode18:~$ psql -c &#39;SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;&#39; | wc -l
2564
postgres@linode18:~$ psql -c &#39;SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;&#39; | wc -l
2530
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>postgres@linode18:~$ psql -c &#39;SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;&#39; | wc -l
</span></span><span style="display:flex;"><span>2302
</span></span><span style="display:flex;"><span>postgres@linode18:~$ psql -c &#39;SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;&#39; | wc -l
</span></span><span style="display:flex;"><span>2564
</span></span><span style="display:flex;"><span>postgres@linode18:~$ psql -c &#39;SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;&#39; | wc -l
</span></span><span style="display:flex;"><span>2530
</span></span></code></pre></div><ul>
<li>The locks are held by XMLUI, not REST API or OAI:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">postgres@linode18:~$ psql -c &#39;SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;&#39; | grep -o -E &#39;(dspaceWeb|dspaceApi)&#39; | sort | uniq -c | sort -n
57 dspaceApi
2671 dspaceWeb
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>postgres@linode18:~$ psql -c &#39;SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;&#39; | grep -o -E &#39;(dspaceWeb|dspaceApi)&#39; | sort | uniq -c | sort -n
</span></span><span style="display:flex;"><span> 57 dspaceApi
</span></span><span style="display:flex;"><span> 2671 dspaceWeb
</span></span></code></pre></div><ul>
<li>I ran all updates on the server (linode18) and restarted it, then DSpace came back up</li>
<li>I sent a message to Atmire, as I never heard from them last week when we blocked access to the REST API for two days for them to investigate the server issues</li>
<li>Clone the <code>openrxv-items-temp</code> index on AReS and re-run all the plugins, but most of the &ldquo;dspace_add_missing_items&rdquo; tasks failed so I will just run a full re-harvest</li>
@ -338,31 +338,31 @@ postgres@linode18:~$ psql -c &#39;SELECT * FROM pg_locks pl LEFT JOIN pg_stat_ac
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># grepcidr 91.243.191.0/24 /var/log/nginx/access.log | awk <span style="color:#e6db74">&#39;{print $1}&#39;</span> | sort | uniq -c | sort -n
32 91.243.191.124
33 91.243.191.129
33 91.243.191.200
34 91.243.191.115
34 91.243.191.154
34 91.243.191.234
34 91.243.191.56
35 91.243.191.187
35 91.243.191.91
36 91.243.191.58
37 91.243.191.209
39 91.243.191.119
39 91.243.191.144
39 91.243.191.55
40 91.243.191.112
40 91.243.191.182
40 91.243.191.57
40 91.243.191.98
41 91.243.191.106
44 91.243.191.79
45 91.243.191.151
46 91.243.191.103
56 91.243.191.172
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># grepcidr 91.243.191.0/24 /var/log/nginx/access.log | awk <span style="color:#e6db74">&#39;{print $1}&#39;</span> | sort | uniq -c | sort -n
</span></span><span style="display:flex;"><span> 32 91.243.191.124
</span></span><span style="display:flex;"><span> 33 91.243.191.129
</span></span><span style="display:flex;"><span> 33 91.243.191.200
</span></span><span style="display:flex;"><span> 34 91.243.191.115
</span></span><span style="display:flex;"><span> 34 91.243.191.154
</span></span><span style="display:flex;"><span> 34 91.243.191.234
</span></span><span style="display:flex;"><span> 34 91.243.191.56
</span></span><span style="display:flex;"><span> 35 91.243.191.187
</span></span><span style="display:flex;"><span> 35 91.243.191.91
</span></span><span style="display:flex;"><span> 36 91.243.191.58
</span></span><span style="display:flex;"><span> 37 91.243.191.209
</span></span><span style="display:flex;"><span> 39 91.243.191.119
</span></span><span style="display:flex;"><span> 39 91.243.191.144
</span></span><span style="display:flex;"><span> 39 91.243.191.55
</span></span><span style="display:flex;"><span> 40 91.243.191.112
</span></span><span style="display:flex;"><span> 40 91.243.191.182
</span></span><span style="display:flex;"><span> 40 91.243.191.57
</span></span><span style="display:flex;"><span> 40 91.243.191.98
</span></span><span style="display:flex;"><span> 41 91.243.191.106
</span></span><span style="display:flex;"><span> 44 91.243.191.79
</span></span><span style="display:flex;"><span> 45 91.243.191.151
</span></span><span style="display:flex;"><span> 46 91.243.191.103
</span></span><span style="display:flex;"><span> 56 91.243.191.172
</span></span></code></pre></div><ul>
<li>I found a few people complaining about these Russian attacks too:
<ul>
<li><a href="https://community.cloudflare.com/t/russian-ddos-completley-unmitigated-by-cloudflare/284578">https://community.cloudflare.com/t/russian-ddos-completley-unmitigated-by-cloudflare/284578</a></li>
@ -392,22 +392,22 @@ postgres@linode18:~$ psql -c &#39;SELECT * FROM pg_locks pl LEFT JOIN pg_stat_ac
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ./asn -n 45.80.217.235
<span style="color:#960050;background-color:#1e0010">
</span><span style="color:#960050;background-color:#1e0010"></span>╭──────────────────────────────╮
│ ASN lookup for 45.80.217.235 │
╰──────────────────────────────╯
<span style="color:#960050;background-color:#1e0010">
</span><span style="color:#960050;background-color:#1e0010"></span> 45.80.217.235 ┌PTR -
├ASN 46844 (ST-BGP, US)
├ORG Sharktech
├NET 45.80.217.0/24 (TrafficTransitSolutionNet)
├ABU info@traffictransitsolution.us
├ROA ✓ VALID (1 ROA found)
├TYP Proxy host Hosting/DC
├GEO Los Angeles, California (US)
└REP ✓ NONE
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./asn -n 45.80.217.235
</span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>╭──────────────────────────────╮
</span></span><span style="display:flex;"><span>│ ASN lookup for 45.80.217.235 │
</span></span><span style="display:flex;"><span>╰──────────────────────────────╯
</span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> 45.80.217.235 ┌PTR -
</span></span><span style="display:flex;"><span> ├ASN 46844 (ST-BGP, US)
</span></span><span style="display:flex;"><span> ├ORG Sharktech
</span></span><span style="display:flex;"><span> ├NET 45.80.217.0/24 (TrafficTransitSolutionNet)
</span></span><span style="display:flex;"><span> ├ABU info@traffictransitsolution.us
</span></span><span style="display:flex;"><span> ├ROA ✓ VALID (1 ROA found)
</span></span><span style="display:flex;"><span> ├TYP Proxy host Hosting/DC
</span></span><span style="display:flex;"><span> ├GEO Los Angeles, California (US)
</span></span><span style="display:flex;"><span> └REP ✓ NONE
</span></span></code></pre></div><ul>
<li>Slowly slowly I manually built up a list of the IPs, ISP names, and network blocks, for example:</li>
</ul>
<pre tabindex="0"><code class="language-csv" data-lang="csv">IP, Organization, Website, Network
@ -496,56 +496,56 @@ postgres@linode18:~$ psql -c &#39;SELECT * FROM pg_locks pl LEFT JOIN pg_stat_ac
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># grep -v -E <span style="color:#e6db74">&#34;(mahider|Googlebot|Turnitin|Grammarly|Unpaywall|UptimeRobot|bot)&#34;</span> /var/log/nginx/access.log | awk <span style="color:#e6db74">&#39;{print $1}&#39;</span> | sort | uniq &gt; /tmp/ips-sorted.txt
# wc -l /tmp/ips-sorted.txt
10776 /tmp/ips-sorted.txt
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># grep -v -E <span style="color:#e6db74">&#34;(mahider|Googlebot|Turnitin|Grammarly|Unpaywall|UptimeRobot|bot)&#34;</span> /var/log/nginx/access.log | awk <span style="color:#e6db74">&#39;{print $1}&#39;</span> | sort | uniq &gt; /tmp/ips-sorted.txt
</span></span><span style="display:flex;"><span># wc -l /tmp/ips-sorted.txt
</span></span><span style="display:flex;"><span>10776 /tmp/ips-sorted.txt
</span></span></code></pre></div><ul>
<li>Then resolve them all:</li>
</ul>
<pre tabindex="0"><code class="language-console:" data-lang="console:">$ ./ilri/resolve-addresses-geoip2.py -i /tmp/ips-sorted.txt -o /tmp/out.csv
</code></pre><ul>
<li>Then get the top 10 organizations and top ten ASNs:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ csvcut -c <span style="color:#ae81ff">2</span> /tmp/out.csv | sed 1d | sort | uniq -c | sort -n | tail -n <span style="color:#ae81ff">10</span>
213 AMAZON-AES
218 ASN-QUADRANET-GLOBAL
246 Silverstar Invest Limited
347 Ethiopian Telecommunication Corporation
475 DEDIPATH-LLC
504 AS-COLOCROSSING
598 UAB Rakrejus
814 UGB Hosting OU
1010 ST-BGP
1757 Global Layer B.V.
$ csvcut -c <span style="color:#ae81ff">3</span> /tmp/out.csv | sed 1d | sort | uniq -c | sort -n | tail -n <span style="color:#ae81ff">10</span>
213 14618
218 8100
246 35624
347 24757
475 35913
504 36352
598 62282
814 206485
1010 46844
1757 49453
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ csvcut -c <span style="color:#ae81ff">2</span> /tmp/out.csv | sed 1d | sort | uniq -c | sort -n | tail -n <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span> 213 AMAZON-AES
</span></span><span style="display:flex;"><span> 218 ASN-QUADRANET-GLOBAL
</span></span><span style="display:flex;"><span> 246 Silverstar Invest Limited
</span></span><span style="display:flex;"><span> 347 Ethiopian Telecommunication Corporation
</span></span><span style="display:flex;"><span> 475 DEDIPATH-LLC
</span></span><span style="display:flex;"><span> 504 AS-COLOCROSSING
</span></span><span style="display:flex;"><span> 598 UAB Rakrejus
</span></span><span style="display:flex;"><span> 814 UGB Hosting OU
</span></span><span style="display:flex;"><span> 1010 ST-BGP
</span></span><span style="display:flex;"><span> 1757 Global Layer B.V.
</span></span><span style="display:flex;"><span>$ csvcut -c <span style="color:#ae81ff">3</span> /tmp/out.csv | sed 1d | sort | uniq -c | sort -n | tail -n <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span> 213 14618
</span></span><span style="display:flex;"><span> 218 8100
</span></span><span style="display:flex;"><span> 246 35624
</span></span><span style="display:flex;"><span> 347 24757
</span></span><span style="display:flex;"><span> 475 35913
</span></span><span style="display:flex;"><span> 504 36352
</span></span><span style="display:flex;"><span> 598 62282
</span></span><span style="display:flex;"><span> 814 206485
</span></span><span style="display:flex;"><span> 1010 46844
</span></span><span style="display:flex;"><span> 1757 49453
</span></span></code></pre></div><ul>
<li>I will download blocklists for all these except Ethiopian Telecom, Quadranet, and Amazon, though I&rsquo;m concerned about Global Layer because it&rsquo;s a huge ASN that seems to have legit hosts too&hellip;?</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ wget https://asn.ipinfo.app/api/text/nginx/AS49453
$ wget https://asn.ipinfo.app/api/text/nginx/AS46844
$ wget https://asn.ipinfo.app/api/text/nginx/AS206485
$ wget https://asn.ipinfo.app/api/text/nginx/AS62282
$ wget https://asn.ipinfo.app/api/text/nginx/AS36352
$ wget https://asn.ipinfo.app/api/text/nginx/AS35624
$ cat AS* | sort | uniq &gt; /tmp/abusive-networks.txt
$ wc -l /tmp/abusive-networks.txt
2276 /tmp/abusive-networks.txt
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ wget https://asn.ipinfo.app/api/text/nginx/AS49453
</span></span><span style="display:flex;"><span>$ wget https://asn.ipinfo.app/api/text/nginx/AS46844
</span></span><span style="display:flex;"><span>$ wget https://asn.ipinfo.app/api/text/nginx/AS206485
</span></span><span style="display:flex;"><span>$ wget https://asn.ipinfo.app/api/text/nginx/AS62282
</span></span><span style="display:flex;"><span>$ wget https://asn.ipinfo.app/api/text/nginx/AS36352
</span></span><span style="display:flex;"><span>$ wget https://asn.ipinfo.app/api/text/nginx/AS35624
</span></span><span style="display:flex;"><span>$ cat AS* | sort | uniq &gt; /tmp/abusive-networks.txt
</span></span><span style="display:flex;"><span>$ wc -l /tmp/abusive-networks.txt
</span></span><span style="display:flex;"><span>2276 /tmp/abusive-networks.txt
</span></span></code></pre></div><ul>
<li>Combining with my existing rules and filtering uniques:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ cat roles/dspace/templates/nginx/abusive-networks.conf.j2 /tmp/abusive-networks.txt | grep deny | sort | uniq | wc -l
2298
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ cat roles/dspace/templates/nginx/abusive-networks.conf.j2 /tmp/abusive-networks.txt | grep deny | sort | uniq | wc -l
</span></span><span style="display:flex;"><span>2298
</span></span></code></pre></div><ul>
<li><a href="https://scamalytics.com/ip/isp/2021-06">According to Scamalytics all these are high risk ISPs</a> (as recently as 2021-06) so I will just keep blocking them</li>
<li>I deployed the block list on CGSpace (linode18) and the load is down to 1.0 but I see there are still some DDoS IPs getting through&hellip; sigh</li>
<li>The next thing I need to do is purge all the IPs from Solr using grepcidr&hellip;</li>
@ -558,12 +558,12 @@ $ wc -l /tmp/abusive-networks.txt
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ sudo zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/access.log.2 | grep -E <span style="color:#e6db74">&#34; (200|499) &#34;</span> | awk <span style="color:#e6db74">&#39;{print $1}&#39;</span> | sort | uniq &gt; /tmp/all-ips.txt
$ ./ilri/resolve-addresses-geoip2.py -i /tmp/all-ips.txt -o /tmp/all-ips-out.csv
$ csvgrep -c asn -r <span style="color:#e6db74">&#39;^(206485|35624|36352|46844|49453|62282)$&#39;</span> /tmp/all-ips-out.csv | csvcut -c ip | sed 1d | sort | uniq &gt; /tmp/all-ips-to-block.txt
$ wc -l /tmp/all-ips-to-block.txt
5095 /tmp/all-ips-to-block.txt
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ sudo zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/access.log.2 | grep -E <span style="color:#e6db74">&#34; (200|499) &#34;</span> | awk <span style="color:#e6db74">&#39;{print $1}&#39;</span> | sort | uniq &gt; /tmp/all-ips.txt
</span></span><span style="display:flex;"><span>$ ./ilri/resolve-addresses-geoip2.py -i /tmp/all-ips.txt -o /tmp/all-ips-out.csv
</span></span><span style="display:flex;"><span>$ csvgrep -c asn -r <span style="color:#e6db74">&#39;^(206485|35624|36352|46844|49453|62282)$&#39;</span> /tmp/all-ips-out.csv | csvcut -c ip | sed 1d | sort | uniq &gt; /tmp/all-ips-to-block.txt
</span></span><span style="display:flex;"><span>$ wc -l /tmp/all-ips-to-block.txt
</span></span><span style="display:flex;"><span>5095 /tmp/all-ips-to-block.txt
</span></span></code></pre></div><ul>
<li>Then I added them to the normal ipset we are already using with firewalld
<ul>
<li>I will check again in a few hours and ban more</li>
@ -571,10 +571,10 @@ $ wc -l /tmp/all-ips-to-block.txt
</li>
<li>I decided to extract the networks from the GeoIP database with <code>resolve-addresses-geoip2.py</code> so I can block them more efficiently than using the 5,000 IPs in an ipset:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ csvgrep -c asn -r <span style="color:#e6db74">&#39;^(206485|35624|36352|46844|49453|62282)$&#39;</span> /tmp/all-ips-out.csv | csvcut -c network | sed 1d | sort | uniq &gt; /tmp/all-networks-to-block.txt
$ grep deny roles/dspace/templates/nginx/abusive-networks.conf.j2 | sort | uniq | wc -l
2354
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ csvgrep -c asn -r <span style="color:#e6db74">&#39;^(206485|35624|36352|46844|49453|62282)$&#39;</span> /tmp/all-ips-out.csv | csvcut -c network | sed 1d | sort | uniq &gt; /tmp/all-networks-to-block.txt
</span></span><span style="display:flex;"><span>$ grep deny roles/dspace/templates/nginx/abusive-networks.conf.j2 | sort | uniq | wc -l
</span></span><span style="display:flex;"><span>2354
</span></span></code></pre></div><ul>
<li>Combined with the previous networks this brings about 200 more for a total of 2,354 networks
<ul>
<li>I think I need to re-work the ipset stuff in my common Ansible role so that I can add such abusive networks as an iptables ipset / nftables set, and have a cron job to update them daily (from <a href="https://www.spamhaus.org/drop/">Spamhaus&rsquo;s DROP and EDROP lists</a>, for example)</li>
@ -582,51 +582,51 @@ $ grep deny roles/dspace/templates/nginx/abusive-networks.conf.j2 | sort | uniq
</li>
<li>Then I got a list of all the 5,095 IPs from above and used <code>check-spider-ip-hits.sh</code> to purge them from Solr:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ilri/check-spider-ip-hits.sh -f /tmp/all-ips-to-block.txt -p
...
Total number of bot hits purged: 197116
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ilri/check-spider-ip-hits.sh -f /tmp/all-ips-to-block.txt -p
</span></span><span style="display:flex;"><span>...
</span></span><span style="display:flex;"><span>Total number of bot hits purged: 197116
</span></span></code></pre></div><ul>
<li>I started a harvest on AReS and it finished in a few hours now that the load on CGSpace is back to a normal level</li>
</ul>
<h2 id="2021-07-20">2021-07-20</h2>
<ul>
<li>Looking again at the IPs making connections to CGSpace over the last few days from these seven ASNs, it&rsquo;s much higher than I noticed yesterday:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ csvgrep -c asn -r <span style="color:#e6db74">&#39;^(49453|46844|206485|62282|36352|35913|35624)$&#39;</span> /tmp/out.csv | csvcut -c ip | sed 1d | sort | uniq | wc -l
5643
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ csvgrep -c asn -r <span style="color:#e6db74">&#39;^(49453|46844|206485|62282|36352|35913|35624)$&#39;</span> /tmp/out.csv | csvcut -c ip | sed 1d | sort | uniq | wc -l
</span></span><span style="display:flex;"><span>5643
</span></span></code></pre></div><ul>
<li>I purged 27,000 more hits from the Solr stats using this new list of IPs with my <code>check-spider-ip-hits.sh</code> script</li>
<li>Surprise surprise, I checked the nginx logs from 2021-06-23 when we last had issues with thousands of XMLUI sessions and PostgreSQL connections and I see IPs from the same ASNs!</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ sudo zcat --force /var/log/nginx/access.log.27.gz /var/log/nginx/access.log.28.gz | grep -E <span style="color:#e6db74">&#34; (200|499) &#34;</span> | grep -v -E <span style="color:#e6db74">&#34;(mahider|Googlebot|Turnitin|Grammarly|Unpaywall|UptimeRobot|bot)&#34;</span> | awk <span style="color:#e6db74">&#39;{print $1}&#39;</span> | sort | uniq &gt; /tmp/all-ips-june-23.txt
$ ./ilri/resolve-addresses-geoip2.py -i /tmp/all-ips-june-23.txt -o /tmp/out.csv
$ csvcut -c 2,4 /tmp/out.csv | sed 1d | sort | uniq -c | sort -n | tail -n <span style="color:#ae81ff">15</span>
265 GOOGLE,15169
277 Silverstar Invest Limited,35624
280 FACEBOOK,32934
288 SAFARICOM-LIMITED,33771
399 AMAZON-AES,14618
427 MICROSOFT-CORP-MSN-AS-BLOCK,8075
455 Opera Software AS,39832
481 MTN NIGERIA Communication limited,29465
502 DEDIPATH-LLC,35913
506 AS-COLOCROSSING,36352
602 UAB Rakrejus,62282
822 ST-BGP,46844
874 Ethiopian Telecommunication Corporation,24757
912 UGB Hosting OU,206485
1607 Global Layer B.V.,49453
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ sudo zcat --force /var/log/nginx/access.log.27.gz /var/log/nginx/access.log.28.gz | grep -E <span style="color:#e6db74">&#34; (200|499) &#34;</span> | grep -v -E <span style="color:#e6db74">&#34;(mahider|Googlebot|Turnitin|Grammarly|Unpaywall|UptimeRobot|bot)&#34;</span> | awk <span style="color:#e6db74">&#39;{print $1}&#39;</span> | sort | uniq &gt; /tmp/all-ips-june-23.txt
</span></span><span style="display:flex;"><span>$ ./ilri/resolve-addresses-geoip2.py -i /tmp/all-ips-june-23.txt -o /tmp/out.csv
</span></span><span style="display:flex;"><span>$ csvcut -c 2,4 /tmp/out.csv | sed 1d | sort | uniq -c | sort -n | tail -n <span style="color:#ae81ff">15</span>
</span></span><span style="display:flex;"><span> 265 GOOGLE,15169
</span></span><span style="display:flex;"><span> 277 Silverstar Invest Limited,35624
</span></span><span style="display:flex;"><span> 280 FACEBOOK,32934
</span></span><span style="display:flex;"><span> 288 SAFARICOM-LIMITED,33771
</span></span><span style="display:flex;"><span> 399 AMAZON-AES,14618
</span></span><span style="display:flex;"><span> 427 MICROSOFT-CORP-MSN-AS-BLOCK,8075
</span></span><span style="display:flex;"><span> 455 Opera Software AS,39832
</span></span><span style="display:flex;"><span> 481 MTN NIGERIA Communication limited,29465
</span></span><span style="display:flex;"><span> 502 DEDIPATH-LLC,35913
</span></span><span style="display:flex;"><span> 506 AS-COLOCROSSING,36352
</span></span><span style="display:flex;"><span> 602 UAB Rakrejus,62282
</span></span><span style="display:flex;"><span> 822 ST-BGP,46844
</span></span><span style="display:flex;"><span> 874 Ethiopian Telecommunication Corporation,24757
</span></span><span style="display:flex;"><span> 912 UGB Hosting OU,206485
</span></span><span style="display:flex;"><span> 1607 Global Layer B.V.,49453
</span></span></code></pre></div><ul>
<li>Again it was over 5,000 IPs:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ csvgrep -c asn -r <span style="color:#e6db74">&#39;^(49453|46844|206485|62282|36352|35913|35624)$&#39;</span> /tmp/out.csv | csvcut -c ip | sed 1d | sort | uniq | wc -l
5228
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ csvgrep -c asn -r <span style="color:#e6db74">&#39;^(49453|46844|206485|62282|36352|35913|35624)$&#39;</span> /tmp/out.csv | csvcut -c ip | sed 1d | sort | uniq | wc -l
</span></span><span style="display:flex;"><span>5228
</span></span></code></pre></div><ul>
<li>Interestingly, it seems these are five thousand <em>different</em> IP addresses than the attack from last weekend, as there are over 10,000 unique ones if I combine them!</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ cat /tmp/ips-june23.txt /tmp/ips-jul16.txt | sort | uniq | wc -l
10458
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ cat /tmp/ips-june23.txt /tmp/ips-jul16.txt | sort | uniq | wc -l
</span></span><span style="display:flex;"><span>10458
</span></span></code></pre></div><ul>
<li>I purged all the (26,000) hits from these new IP addresses from Solr as well</li>
<li>Looking back at my notes for the 2019-05 attack I see that I had already identified most of these network providers (!)&hellip;
<ul>
@ -636,30 +636,30 @@ $ csvcut -c 2,4 /tmp/out.csv | sed 1d | sort | uniq -c | sort -n | tail -n <span
</li>
<li>Adding QuadraNet brings the total networks seen during these two attacks to 262, and the number of unique IPs to 10900:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/access.log.2 /var/log/nginx/access.log.3 /var/log/nginx/access.log.4 /var/log/nginx/access.log.5 /var/log/nginx/access.log.27.gz /var/log/nginx/access.log.28.gz | grep -E <span style="color:#e6db74">&#34; (200|499) &#34;</span> | grep -v -E <span style="color:#e6db74">&#34;(mahider|Googlebot|Turnitin|Grammarly|Unpaywall|UptimeRobot|bot)&#34;</span> | awk <span style="color:#e6db74">&#39;{print $1}&#39;</span> | sort | uniq &gt; /tmp/ddos-ips.txt
# wc -l /tmp/ddos-ips.txt
54002 /tmp/ddos-ips.txt
$ ./ilri/resolve-addresses-geoip2.py -i /tmp/ddos-ips.txt -o /tmp/ddos-ips.csv
$ csvgrep -c asn -r <span style="color:#e6db74">&#39;^(49453|46844|206485|62282|36352|35913|35624|8100)$&#39;</span> /tmp/ddos-ips.csv | csvcut -c ip | sed 1d | sort | uniq &gt; /tmp/ddos-ips-to-purge.txt
$ wc -l /tmp/ddos-ips-to-purge.txt
10900 /tmp/ddos-ips-to-purge.txt
$ csvgrep -c asn -r <span style="color:#e6db74">&#39;^(49453|46844|206485|62282|36352|35913|35624|8100)$&#39;</span> /tmp/ddos-ips.csv | csvcut -c network | sed 1d | sort | uniq &gt; /tmp/ddos-networks-to-block.txt
$ wc -l /tmp/ddos-networks-to-block.txt
262 /tmp/ddos-networks-to-block.txt
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/access.log.2 /var/log/nginx/access.log.3 /var/log/nginx/access.log.4 /var/log/nginx/access.log.5 /var/log/nginx/access.log.27.gz /var/log/nginx/access.log.28.gz | grep -E <span style="color:#e6db74">&#34; (200|499) &#34;</span> | grep -v -E <span style="color:#e6db74">&#34;(mahider|Googlebot|Turnitin|Grammarly|Unpaywall|UptimeRobot|bot)&#34;</span> | awk <span style="color:#e6db74">&#39;{print $1}&#39;</span> | sort | uniq &gt; /tmp/ddos-ips.txt
</span></span><span style="display:flex;"><span># wc -l /tmp/ddos-ips.txt
</span></span><span style="display:flex;"><span>54002 /tmp/ddos-ips.txt
</span></span><span style="display:flex;"><span>$ ./ilri/resolve-addresses-geoip2.py -i /tmp/ddos-ips.txt -o /tmp/ddos-ips.csv
</span></span><span style="display:flex;"><span>$ csvgrep -c asn -r <span style="color:#e6db74">&#39;^(49453|46844|206485|62282|36352|35913|35624|8100)$&#39;</span> /tmp/ddos-ips.csv | csvcut -c ip | sed 1d | sort | uniq &gt; /tmp/ddos-ips-to-purge.txt
</span></span><span style="display:flex;"><span>$ wc -l /tmp/ddos-ips-to-purge.txt
</span></span><span style="display:flex;"><span>10900 /tmp/ddos-ips-to-purge.txt
</span></span><span style="display:flex;"><span>$ csvgrep -c asn -r <span style="color:#e6db74">&#39;^(49453|46844|206485|62282|36352|35913|35624|8100)$&#39;</span> /tmp/ddos-ips.csv | csvcut -c network | sed 1d | sort | uniq &gt; /tmp/ddos-networks-to-block.txt
</span></span><span style="display:flex;"><span>$ wc -l /tmp/ddos-networks-to-block.txt
</span></span><span style="display:flex;"><span>262 /tmp/ddos-networks-to-block.txt
</span></span></code></pre></div><ul>
<li>The new total number of networks to block, including the network prefixes for these ASNs downloaded from asn.ipinfo.app, is 4,007:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ wget https://asn.ipinfo.app/api/text/nginx/AS49453 <span style="color:#ae81ff">\
</span><span style="color:#ae81ff"></span>https://asn.ipinfo.app/api/text/nginx/AS46844 \
https://asn.ipinfo.app/api/text/nginx/AS206485 \
https://asn.ipinfo.app/api/text/nginx/AS62282 \
https://asn.ipinfo.app/api/text/nginx/AS36352 \
https://asn.ipinfo.app/api/text/nginx/AS35913 \
https://asn.ipinfo.app/api/text/nginx/AS35624 \
https://asn.ipinfo.app/api/text/nginx/AS8100
$ cat AS* /tmp/ddos-networks-to-block.txt | sed -e <span style="color:#e6db74">&#39;/^$/d&#39;</span> -e <span style="color:#e6db74">&#39;/^#/d&#39;</span> -e <span style="color:#e6db74">&#39;/^{/d&#39;</span> -e <span style="color:#e6db74">&#39;s/deny //&#39;</span> -e <span style="color:#e6db74">&#39;s/;//&#39;</span> | sort | uniq | wc -l
4007
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ wget https://asn.ipinfo.app/api/text/nginx/AS49453 <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>https://asn.ipinfo.app/api/text/nginx/AS46844 \
</span></span><span style="display:flex;"><span>https://asn.ipinfo.app/api/text/nginx/AS206485 \
</span></span><span style="display:flex;"><span>https://asn.ipinfo.app/api/text/nginx/AS62282 \
</span></span><span style="display:flex;"><span>https://asn.ipinfo.app/api/text/nginx/AS36352 \
</span></span><span style="display:flex;"><span>https://asn.ipinfo.app/api/text/nginx/AS35913 \
</span></span><span style="display:flex;"><span>https://asn.ipinfo.app/api/text/nginx/AS35624 \
</span></span><span style="display:flex;"><span>https://asn.ipinfo.app/api/text/nginx/AS8100
</span></span><span style="display:flex;"><span>$ cat AS* /tmp/ddos-networks-to-block.txt | sed -e <span style="color:#e6db74">&#39;/^$/d&#39;</span> -e <span style="color:#e6db74">&#39;/^#/d&#39;</span> -e <span style="color:#e6db74">&#39;/^{/d&#39;</span> -e <span style="color:#e6db74">&#39;s/deny //&#39;</span> -e <span style="color:#e6db74">&#39;s/;//&#39;</span> | sort | uniq | wc -l
</span></span><span style="display:flex;"><span>4007
</span></span></code></pre></div><ul>
<li>I re-applied these networks to nginx on CGSpace (linode18) and DSpace Test (linode26), and purged 14,000 more Solr statistics hits from these IPs</li>
</ul>
<h2 id="2021-07-22">2021-07-22</h2>