Add notes for 2021-11-08

This commit is contained in:
2021-11-09 06:29:52 +02:00
parent b3df4ff58f
commit 9afe5c13f9
110 changed files with 1827 additions and 1737 deletions

View File

@ -30,7 +30,7 @@ Export another list of ALL subjects on CGSpace, including AGROVOC and non-AGROVO
localhost/dspace63= > \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-01-all-subjects.csv WITH CSV HEADER;
COPY 20994
"/>
<meta name="generator" content="Hugo 0.88.1" />
<meta name="generator" content="Hugo 0.89.2" />
@ -120,17 +120,17 @@ COPY 20994
<ul>
<li>Export another list of ALL subjects on CGSpace, including AGROVOC and non-AGROVOC for Enrico:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">localhost/dspace63= &gt; \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-01-all-subjects.csv WITH CSV HEADER;
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">localhost/dspace63= &gt; \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-01-all-subjects.csv WITH CSV HEADER;
COPY 20994
</code></pre><h2 id="2021-07-04">2021-07-04</h2>
</code></pre></div><h2 id="2021-07-04">2021-07-04</h2>
<ul>
<li>Update all Docker containers on the AReS server (linode20) and rebuild OpenRXV:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ cd OpenRXV
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ cd OpenRXV
$ docker-compose -f docker/docker-compose.yml down
$ docker images | grep -v ^REPO | sed 's/ \+/:/g' | cut -d: -f1,2 | xargs -L1 docker pull
$ docker images | grep -v ^REPO | sed <span style="color:#e6db74">&#39;s/ \+/:/g&#39;</span> | cut -d: -f1,2 | xargs -L1 docker pull
$ docker-compose -f docker/docker-compose.yml build
</code></pre><ul>
</code></pre></div><ul>
<li>Then run all system updates and reboot the server</li>
<li>After the server came back up I cloned the <code>openrxv-items-final</code> index to <code>openrxv-items-temp</code> and started the plugins
<ul>
@ -172,7 +172,7 @@ $ docker-compose -f docker/docker-compose.yml build
</ul>
</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ ./ilri/check-spider-hits.sh -f /tmp/spiders -p
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ./ilri/check-spider-hits.sh -f /tmp/spiders -p
Purging 95 hits from Drupal in statistics
Purging 38 hits from DTS Agent in statistics
Purging 601 hits from Microsoft Office Existence Discovery in statistics
@ -183,16 +183,16 @@ Purging 144 hits from FlipboardProxy in statistics
Purging 37 hits from LinkWalker in statistics
Purging 1 hits from [Ll]ink.?[Cc]heck.? in statistics
Purging 427 hits from WordPress in statistics
Total number of bot hits purged: 15030
</code></pre><ul>
<span style="color:#960050;background-color:#1e0010">
</span><span style="color:#960050;background-color:#1e0010"></span>Total number of bot hits purged: 15030
</code></pre></div><ul>
<li>Meet with the CGIARAGROVOC task group to discuss how we want to do the workflow for submitting new terms to AGROVOC</li>
<li>I extracted another list of all subjects to check against AGROVOC:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">\COPY (SELECT DISTINCT(LOWER(text_value)) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-06-all-subjects.csv WITH CSV HEADER;
$ csvcut -c 1 /tmp/2021-07-06-all-subjects.csv | sed 1d &gt; /tmp/2021-07-06-all-subjects.txt
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">\COPY (SELECT DISTINCT(LOWER(text_value)) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-06-all-subjects.csv WITH CSV HEADER;
$ csvcut -c <span style="color:#ae81ff">1</span> /tmp/2021-07-06-all-subjects.csv | sed 1d &gt; /tmp/2021-07-06-all-subjects.txt
$ ./ilri/agrovoc-lookup.py -i /tmp/2021-07-06-all-subjects.txt -o /tmp/2021-07-06-agrovoc-results-all-subjects.csv -d
</code></pre><ul>
</code></pre></div><ul>
<li>Test <a href="https://github.com/DSpace/DSpace/pull/3162">Hrafn Malmquist&rsquo;s proposed DBCP2 changes</a> for DSpace 6.4 (DS-4574)
<ul>
<li>His changes reminded me that we can perhaps switch back to using this pooling instead of Tomcat 7&rsquo;s JDBC pooling via JNDI</li>
@ -205,7 +205,7 @@ $ ./ilri/agrovoc-lookup.py -i /tmp/2021-07-06-all-subjects.txt -o /tmp/2021-07-0
</ul>
</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console"># for num in {10..26}; do echo &quot;2021-06-$num&quot;; zcat /var/log/nginx/access.log.*.gz /var/log/nginx/library-access.log.*.gz | grep &quot;$num/Jun/2021&quot; | awk '{print $1}' | sort | uniq | wc -l; done
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># <span style="color:#66d9ef">for</span> num in <span style="color:#f92672">{</span>10..26<span style="color:#f92672">}</span>; <span style="color:#66d9ef">do</span> echo <span style="color:#e6db74">&#34;2021-06-</span>$num<span style="color:#e6db74">&#34;</span>; zcat /var/log/nginx/access.log.*.gz /var/log/nginx/library-access.log.*.gz | grep <span style="color:#e6db74">&#34;</span>$num<span style="color:#e6db74">/Jun/2021&#34;</span> | awk <span style="color:#e6db74">&#39;{print $1}&#39;</span> | sort | uniq | wc -l; <span style="color:#66d9ef">done</span>
2021-06-10
10693
2021-06-11
@ -240,10 +240,10 @@ $ ./ilri/agrovoc-lookup.py -i /tmp/2021-07-06-all-subjects.txt -o /tmp/2021-07-0
9439
2021-06-26
7930
</code></pre><ul>
</code></pre></div><ul>
<li>Similarly, the number of connections to the REST API was around the average for the recent weeks before:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console"># for num in {10..26}; do echo &quot;2021-06-$num&quot;; zcat /var/log/nginx/rest.*.gz | grep &quot;$num/Jun/2021&quot; | awk '{print $1}' | sort | uniq | wc -l; done
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># <span style="color:#66d9ef">for</span> num in <span style="color:#f92672">{</span>10..26<span style="color:#f92672">}</span>; <span style="color:#66d9ef">do</span> echo <span style="color:#e6db74">&#34;2021-06-</span>$num<span style="color:#e6db74">&#34;</span>; zcat /var/log/nginx/rest.*.gz | grep <span style="color:#e6db74">&#34;</span>$num<span style="color:#e6db74">/Jun/2021&#34;</span> | awk <span style="color:#e6db74">&#39;{print $1}&#39;</span> | sort | uniq | wc -l; <span style="color:#66d9ef">done</span>
2021-06-10
1183
2021-06-11
@ -278,11 +278,11 @@ $ ./ilri/agrovoc-lookup.py -i /tmp/2021-07-06-all-subjects.txt -o /tmp/2021-07-0
969
2021-06-26
904
</code></pre><ul>
</code></pre></div><ul>
<li>According to goaccess, the traffic spike started at 2AM (remember that the first &ldquo;Pool empty&rdquo; error in dspace.log was at 4:01AM):</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console"># zcat /var/log/nginx/access.log.1[45].gz /var/log/nginx/library-access.log.1[45].gz | grep -E '23/Jun/2021' | goaccess --log-format=COMBINED -
</code></pre><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># zcat /var/log/nginx/access.log.1<span style="color:#f92672">[</span>45<span style="color:#f92672">]</span>.gz /var/log/nginx/library-access.log.1<span style="color:#f92672">[</span>45<span style="color:#f92672">]</span>.gz | grep -E <span style="color:#e6db74">&#39;23/Jun/2021&#39;</span> | goaccess --log-format<span style="color:#f92672">=</span>COMBINED -
</code></pre></div><ul>
<li>Moayad sent a fix for the add missing items plugins issue (<a href="https://github.com/ilri/OpenRXV/pull/107">#107</a>)
<ul>
<li>It works MUCH faster because it correctly identifies the missing handles in each repository</li>
@ -311,19 +311,19 @@ $ ./ilri/agrovoc-lookup.py -i /tmp/2021-07-06-all-subjects.txt -o /tmp/2021-07-0
</ul>
</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">postgres@linode18:~$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">postgres@linode18:~$ psql -c &#39;SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;&#39; | wc -l
2302
postgres@linode18:~$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
postgres@linode18:~$ psql -c &#39;SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;&#39; | wc -l
2564
postgres@linode18:~$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
postgres@linode18:~$ psql -c &#39;SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;&#39; | wc -l
2530
</code></pre><ul>
</code></pre></div><ul>
<li>The locks are held by XMLUI, not REST API or OAI:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">postgres@linode18:~$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | grep -o -E '(dspaceWeb|dspaceApi)' | sort | uniq -c | sort -n
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">postgres@linode18:~$ psql -c &#39;SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;&#39; | grep -o -E &#39;(dspaceWeb|dspaceApi)&#39; | sort | uniq -c | sort -n
57 dspaceApi
2671 dspaceWeb
</code></pre><ul>
</code></pre></div><ul>
<li>I ran all updates on the server (linode18) and restarted it, then DSpace came back up</li>
<li>I sent a message to Atmire, as I never heard from them last week when we blocked access to the REST API for two days for them to investigate the server issues</li>
<li>Clone the <code>openrxv-items-temp</code> index on AReS and re-run all the plugins, but most of the &ldquo;dspace_add_missing_items&rdquo; tasks failed so I will just run a full re-harvest</li>
@ -338,7 +338,7 @@ postgres@linode18:~$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activi
</ul>
</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console"># grepcidr 91.243.191.0/24 /var/log/nginx/access.log | awk '{print $1}' | sort | uniq -c | sort -n
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># grepcidr 91.243.191.0/24 /var/log/nginx/access.log | awk <span style="color:#e6db74">&#39;{print $1}&#39;</span> | sort | uniq -c | sort -n
32 91.243.191.124
33 91.243.191.129
33 91.243.191.200
@ -362,7 +362,7 @@ postgres@linode18:~$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activi
45 91.243.191.151
46 91.243.191.103
56 91.243.191.172
</code></pre><ul>
</code></pre></div><ul>
<li>I found a few people complaining about these Russian attacks too:
<ul>
<li><a href="https://community.cloudflare.com/t/russian-ddos-completley-unmitigated-by-cloudflare/284578">https://community.cloudflare.com/t/russian-ddos-completley-unmitigated-by-cloudflare/284578</a></li>
@ -392,13 +392,13 @@ postgres@linode18:~$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activi
</ul>
</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ ./asn -n 45.80.217.235
╭──────────────────────────────╮
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ./asn -n 45.80.217.235
<span style="color:#960050;background-color:#1e0010">
</span><span style="color:#960050;background-color:#1e0010"></span>╭──────────────────────────────╮
│ ASN lookup for 45.80.217.235 │
╰──────────────────────────────╯
45.80.217.235 ┌PTR -
<span style="color:#960050;background-color:#1e0010">
</span><span style="color:#960050;background-color:#1e0010"></span> 45.80.217.235 ┌PTR -
├ASN 46844 (ST-BGP, US)
├ORG Sharktech
├NET 45.80.217.0/24 (TrafficTransitSolutionNet)
@ -407,7 +407,7 @@ postgres@linode18:~$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activi
├TYP Proxy host Hosting/DC
├GEO Los Angeles, California (US)
└REP ✓ NONE
</code></pre><ul>
</code></pre></div><ul>
<li>Slowly slowly I manually built up a list of the IPs, ISP names, and network blocks, for example:</li>
</ul>
<pre tabindex="0"><code class="language-csv" data-lang="csv">IP, Organization, Website, Network
@ -496,17 +496,17 @@ postgres@linode18:~$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activi
</ul>
</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console"># grep -v -E &quot;(mahider|Googlebot|Turnitin|Grammarly|Unpaywall|UptimeRobot|bot)&quot; /var/log/nginx/access.log | awk '{print $1}' | sort | uniq &gt; /tmp/ips-sorted.txt
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># grep -v -E <span style="color:#e6db74">&#34;(mahider|Googlebot|Turnitin|Grammarly|Unpaywall|UptimeRobot|bot)&#34;</span> /var/log/nginx/access.log | awk <span style="color:#e6db74">&#39;{print $1}&#39;</span> | sort | uniq &gt; /tmp/ips-sorted.txt
# wc -l /tmp/ips-sorted.txt
10776 /tmp/ips-sorted.txt
</code></pre><ul>
</code></pre></div><ul>
<li>Then resolve them all:</li>
</ul>
<pre tabindex="0"><code class="language-console:" data-lang="console:">$ ./ilri/resolve-addresses-geoip2.py -i /tmp/ips-sorted.txt -o /tmp/out.csv
</code></pre><ul>
<li>Then get the top 10 organizations and top ten ASNs:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ csvcut -c 2 /tmp/out.csv | sed 1d | sort | uniq -c | sort -n | tail -n 10
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ csvcut -c <span style="color:#ae81ff">2</span> /tmp/out.csv | sed 1d | sort | uniq -c | sort -n | tail -n <span style="color:#ae81ff">10</span>
213 AMAZON-AES
218 ASN-QUADRANET-GLOBAL
246 Silverstar Invest Limited
@ -517,7 +517,7 @@ postgres@linode18:~$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activi
814 UGB Hosting OU
1010 ST-BGP
1757 Global Layer B.V.
$ csvcut -c 3 /tmp/out.csv | sed 1d | sort | uniq -c | sort -n | tail -n 10
$ csvcut -c <span style="color:#ae81ff">3</span> /tmp/out.csv | sed 1d | sort | uniq -c | sort -n | tail -n <span style="color:#ae81ff">10</span>
213 14618
218 8100
246 35624
@ -528,10 +528,10 @@ $ csvcut -c 3 /tmp/out.csv | sed 1d | sort | uniq -c | sort -n | tail -n 10
814 206485
1010 46844
1757 49453
</code></pre><ul>
</code></pre></div><ul>
<li>I will download blocklists for all these except Ethiopian Telecom, Quadranet, and Amazon, though I&rsquo;m concerned about Global Layer because it&rsquo;s a huge ASN that seems to have legit hosts too&hellip;?</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ wget https://asn.ipinfo.app/api/text/nginx/AS49453
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ wget https://asn.ipinfo.app/api/text/nginx/AS49453
$ wget https://asn.ipinfo.app/api/text/nginx/AS46844
$ wget https://asn.ipinfo.app/api/text/nginx/AS206485
$ wget https://asn.ipinfo.app/api/text/nginx/AS62282
@ -540,12 +540,12 @@ $ wget https://asn.ipinfo.app/api/text/nginx/AS35624
$ cat AS* | sort | uniq &gt; /tmp/abusive-networks.txt
$ wc -l /tmp/abusive-networks.txt
2276 /tmp/abusive-networks.txt
</code></pre><ul>
</code></pre></div><ul>
<li>Combining with my existing rules and filtering uniques:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ cat roles/dspace/templates/nginx/abusive-networks.conf.j2 /tmp/abusive-networks.txt | grep deny | sort | uniq | wc -l
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ cat roles/dspace/templates/nginx/abusive-networks.conf.j2 /tmp/abusive-networks.txt | grep deny | sort | uniq | wc -l
2298
</code></pre><ul>
</code></pre></div><ul>
<li><a href="https://scamalytics.com/ip/isp/2021-06">According to Scamalytics all these are high risk ISPs</a> (as recently as 2021-06) so I will just keep blocking them</li>
<li>I deployed the block list on CGSpace (linode18) and the load is down to 1.0 but I see there are still some DDoS IPs getting through&hellip; sigh</li>
<li>The next thing I need to do is purge all the IPs from Solr using grepcidr&hellip;</li>
@ -558,12 +558,12 @@ $ wc -l /tmp/abusive-networks.txt
</ul>
</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ sudo zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/access.log.2 | grep -E &quot; (200|499) &quot; | awk '{print $1}' | sort | uniq &gt; /tmp/all-ips.txt
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ sudo zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/access.log.2 | grep -E <span style="color:#e6db74">&#34; (200|499) &#34;</span> | awk <span style="color:#e6db74">&#39;{print $1}&#39;</span> | sort | uniq &gt; /tmp/all-ips.txt
$ ./ilri/resolve-addresses-geoip2.py -i /tmp/all-ips.txt -o /tmp/all-ips-out.csv
$ csvgrep -c asn -r '^(206485|35624|36352|46844|49453|62282)$' /tmp/all-ips-out.csv | csvcut -c ip | sed 1d | sort | uniq &gt; /tmp/all-ips-to-block.txt
$ csvgrep -c asn -r <span style="color:#e6db74">&#39;^(206485|35624|36352|46844|49453|62282)$&#39;</span> /tmp/all-ips-out.csv | csvcut -c ip | sed 1d | sort | uniq &gt; /tmp/all-ips-to-block.txt
$ wc -l /tmp/all-ips-to-block.txt
5095 /tmp/all-ips-to-block.txt
</code></pre><ul>
</code></pre></div><ul>
<li>Then I added them to the normal ipset we are already using with firewalld
<ul>
<li>I will check again in a few hours and ban more</li>
@ -571,10 +571,10 @@ $ wc -l /tmp/all-ips-to-block.txt
</li>
<li>I decided to extract the networks from the GeoIP database with <code>resolve-addresses-geoip2.py</code> so I can block them more efficiently than using the 5,000 IPs in an ipset:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ csvgrep -c asn -r '^(206485|35624|36352|46844|49453|62282)$' /tmp/all-ips-out.csv | csvcut -c network | sed 1d | sort | uniq &gt; /tmp/all-networks-to-block.txt
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ csvgrep -c asn -r <span style="color:#e6db74">&#39;^(206485|35624|36352|46844|49453|62282)$&#39;</span> /tmp/all-ips-out.csv | csvcut -c network | sed 1d | sort | uniq &gt; /tmp/all-networks-to-block.txt
$ grep deny roles/dspace/templates/nginx/abusive-networks.conf.j2 | sort | uniq | wc -l
2354
</code></pre><ul>
</code></pre></div><ul>
<li>Combined with the previous networks this brings about 200 more for a total of 2,354 networks
<ul>
<li>I think I need to re-work the ipset stuff in my common Ansible role so that I can add such abusive networks as an iptables ipset / nftables set, and have a cron job to update them daily (from <a href="https://www.spamhaus.org/drop/">Spamhaus&rsquo;s DROP and EDROP lists</a>, for example)</li>
@ -582,25 +582,25 @@ $ grep deny roles/dspace/templates/nginx/abusive-networks.conf.j2 | sort | uniq
</li>
<li>Then I got a list of all the 5,095 IPs from above and used <code>check-spider-ip-hits.sh</code> to purge them from Solr:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ ilri/check-spider-ip-hits.sh -f /tmp/all-ips-to-block.txt -p
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ilri/check-spider-ip-hits.sh -f /tmp/all-ips-to-block.txt -p
...
Total number of bot hits purged: 197116
</code></pre><ul>
</code></pre></div><ul>
<li>I started a harvest on AReS and it finished in a few hours now that the load on CGSpace is back to a normal level</li>
</ul>
<h2 id="2021-07-20">2021-07-20</h2>
<ul>
<li>Looking again at the IPs making connections to CGSpace over the last few days from these seven ASNs, it&rsquo;s much higher than I noticed yesterday:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ csvgrep -c asn -r '^(49453|46844|206485|62282|36352|35913|35624)$' /tmp/out.csv | csvcut -c ip | sed 1d | sort | uniq | wc -l
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ csvgrep -c asn -r <span style="color:#e6db74">&#39;^(49453|46844|206485|62282|36352|35913|35624)$&#39;</span> /tmp/out.csv | csvcut -c ip | sed 1d | sort | uniq | wc -l
5643
</code></pre><ul>
</code></pre></div><ul>
<li>I purged 27,000 more hits from the Solr stats using this new list of IPs with my <code>check-spider-ip-hits.sh</code> script</li>
<li>Surprise surprise, I checked the nginx logs from 2021-06-23 when we last had issues with thousands of XMLUI sessions and PostgreSQL connections and I see IPs from the same ASNs!</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ sudo zcat --force /var/log/nginx/access.log.27.gz /var/log/nginx/access.log.28.gz | grep -E &quot; (200|499) &quot; | grep -v -E &quot;(mahider|Googlebot|Turnitin|Grammarly|Unpaywall|UptimeRobot|bot)&quot; | awk '{print $1}' | sort | uniq &gt; /tmp/all-ips-june-23.txt
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ sudo zcat --force /var/log/nginx/access.log.27.gz /var/log/nginx/access.log.28.gz | grep -E <span style="color:#e6db74">&#34; (200|499) &#34;</span> | grep -v -E <span style="color:#e6db74">&#34;(mahider|Googlebot|Turnitin|Grammarly|Unpaywall|UptimeRobot|bot)&#34;</span> | awk <span style="color:#e6db74">&#39;{print $1}&#39;</span> | sort | uniq &gt; /tmp/all-ips-june-23.txt
$ ./ilri/resolve-addresses-geoip2.py -i /tmp/all-ips-june-23.txt -o /tmp/out.csv
$ csvcut -c 2,4 /tmp/out.csv | sed 1d | sort | uniq -c | sort -n | tail -n 15
$ csvcut -c 2,4 /tmp/out.csv | sed 1d | sort | uniq -c | sort -n | tail -n <span style="color:#ae81ff">15</span>
265 GOOGLE,15169
277 Silverstar Invest Limited,35624
280 FACEBOOK,32934
@ -616,17 +616,17 @@ $ csvcut -c 2,4 /tmp/out.csv | sed 1d | sort | uniq -c | sort -n | tail -n 15
874 Ethiopian Telecommunication Corporation,24757
912 UGB Hosting OU,206485
1607 Global Layer B.V.,49453
</code></pre><ul>
</code></pre></div><ul>
<li>Again it was over 5,000 IPs:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ csvgrep -c asn -r '^(49453|46844|206485|62282|36352|35913|35624)$' /tmp/out.csv | csvcut -c ip | sed 1d | sort | uniq | wc -l
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ csvgrep -c asn -r <span style="color:#e6db74">&#39;^(49453|46844|206485|62282|36352|35913|35624)$&#39;</span> /tmp/out.csv | csvcut -c ip | sed 1d | sort | uniq | wc -l
5228
</code></pre><ul>
</code></pre></div><ul>
<li>Interestingly, it seems these are five thousand <em>different</em> IP addresses than the attack from last weekend, as there are over 10,000 unique ones if I combine them!</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ cat /tmp/ips-june23.txt /tmp/ips-jul16.txt | sort | uniq | wc -l
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ cat /tmp/ips-june23.txt /tmp/ips-jul16.txt | sort | uniq | wc -l
10458
</code></pre><ul>
</code></pre></div><ul>
<li>I purged all the (26,000) hits from these new IP addresses from Solr as well</li>
<li>Looking back at my notes for the 2019-05 attack I see that I had already identified most of these network providers (!)&hellip;
<ul>
@ -636,30 +636,30 @@ $ csvcut -c 2,4 /tmp/out.csv | sed 1d | sort | uniq -c | sort -n | tail -n 15
</li>
<li>Adding QuadraNet brings the total networks seen during these two attacks to 262, and the number of unique IPs to 10900:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console"># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/access.log.2 /var/log/nginx/access.log.3 /var/log/nginx/access.log.4 /var/log/nginx/access.log.5 /var/log/nginx/access.log.27.gz /var/log/nginx/access.log.28.gz | grep -E &quot; (200|499) &quot; | grep -v -E &quot;(mahider|Googlebot|Turnitin|Grammarly|Unpaywall|UptimeRobot|bot)&quot; | awk '{print $1}' | sort | uniq &gt; /tmp/ddos-ips.txt
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/access.log.2 /var/log/nginx/access.log.3 /var/log/nginx/access.log.4 /var/log/nginx/access.log.5 /var/log/nginx/access.log.27.gz /var/log/nginx/access.log.28.gz | grep -E <span style="color:#e6db74">&#34; (200|499) &#34;</span> | grep -v -E <span style="color:#e6db74">&#34;(mahider|Googlebot|Turnitin|Grammarly|Unpaywall|UptimeRobot|bot)&#34;</span> | awk <span style="color:#e6db74">&#39;{print $1}&#39;</span> | sort | uniq &gt; /tmp/ddos-ips.txt
# wc -l /tmp/ddos-ips.txt
54002 /tmp/ddos-ips.txt
$ ./ilri/resolve-addresses-geoip2.py -i /tmp/ddos-ips.txt -o /tmp/ddos-ips.csv
$ csvgrep -c asn -r '^(49453|46844|206485|62282|36352|35913|35624|8100)$' /tmp/ddos-ips.csv | csvcut -c ip | sed 1d | sort | uniq &gt; /tmp/ddos-ips-to-purge.txt
$ csvgrep -c asn -r <span style="color:#e6db74">&#39;^(49453|46844|206485|62282|36352|35913|35624|8100)$&#39;</span> /tmp/ddos-ips.csv | csvcut -c ip | sed 1d | sort | uniq &gt; /tmp/ddos-ips-to-purge.txt
$ wc -l /tmp/ddos-ips-to-purge.txt
10900 /tmp/ddos-ips-to-purge.txt
$ csvgrep -c asn -r '^(49453|46844|206485|62282|36352|35913|35624|8100)$' /tmp/ddos-ips.csv | csvcut -c network | sed 1d | sort | uniq &gt; /tmp/ddos-networks-to-block.txt
$ csvgrep -c asn -r <span style="color:#e6db74">&#39;^(49453|46844|206485|62282|36352|35913|35624|8100)$&#39;</span> /tmp/ddos-ips.csv | csvcut -c network | sed 1d | sort | uniq &gt; /tmp/ddos-networks-to-block.txt
$ wc -l /tmp/ddos-networks-to-block.txt
262 /tmp/ddos-networks-to-block.txt
</code></pre><ul>
</code></pre></div><ul>
<li>The new total number of networks to block, including the network prefixes for these ASNs downloaded from asn.ipinfo.app, is 4,007:</li>
</ul>
<pre tabindex="0"><code class="language-console" data-lang="console">$ wget https://asn.ipinfo.app/api/text/nginx/AS49453 \
https://asn.ipinfo.app/api/text/nginx/AS46844 \
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ wget https://asn.ipinfo.app/api/text/nginx/AS49453 <span style="color:#ae81ff">\
</span><span style="color:#ae81ff"></span>https://asn.ipinfo.app/api/text/nginx/AS46844 \
https://asn.ipinfo.app/api/text/nginx/AS206485 \
https://asn.ipinfo.app/api/text/nginx/AS62282 \
https://asn.ipinfo.app/api/text/nginx/AS36352 \
https://asn.ipinfo.app/api/text/nginx/AS35913 \
https://asn.ipinfo.app/api/text/nginx/AS35624 \
https://asn.ipinfo.app/api/text/nginx/AS8100
$ cat AS* /tmp/ddos-networks-to-block.txt | sed -e '/^$/d' -e '/^#/d' -e '/^{/d' -e 's/deny //' -e 's/;//' | sort | uniq | wc -l
$ cat AS* /tmp/ddos-networks-to-block.txt | sed -e <span style="color:#e6db74">&#39;/^$/d&#39;</span> -e <span style="color:#e6db74">&#39;/^#/d&#39;</span> -e <span style="color:#e6db74">&#39;/^{/d&#39;</span> -e <span style="color:#e6db74">&#39;s/deny //&#39;</span> -e <span style="color:#e6db74">&#39;s/;//&#39;</span> | sort | uniq | wc -l
4007
</code></pre><ul>
</code></pre></div><ul>
<li>I re-applied these networks to nginx on CGSpace (linode18) and DSpace Test (linode26), and purged 14,000 more Solr statistics hits from these IPs</li>
</ul>
<h2 id="2021-07-22">2021-07-22</h2>