mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2021-11-08
This commit is contained in:
@ -30,7 +30,7 @@ Export another list of ALL subjects on CGSpace, including AGROVOC and non-AGROVO
|
||||
localhost/dspace63= > \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-01-all-subjects.csv WITH CSV HEADER;
|
||||
COPY 20994
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.88.1" />
|
||||
<meta name="generator" content="Hugo 0.89.2" />
|
||||
|
||||
|
||||
|
||||
@ -120,17 +120,17 @@ COPY 20994
|
||||
<ul>
|
||||
<li>Export another list of ALL subjects on CGSpace, including AGROVOC and non-AGROVOC for Enrico:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">localhost/dspace63= > \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-01-all-subjects.csv WITH CSV HEADER;
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">localhost/dspace63= > \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-01-all-subjects.csv WITH CSV HEADER;
|
||||
COPY 20994
|
||||
</code></pre><h2 id="2021-07-04">2021-07-04</h2>
|
||||
</code></pre></div><h2 id="2021-07-04">2021-07-04</h2>
|
||||
<ul>
|
||||
<li>Update all Docker containers on the AReS server (linode20) and rebuild OpenRXV:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ cd OpenRXV
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ cd OpenRXV
|
||||
$ docker-compose -f docker/docker-compose.yml down
|
||||
$ docker images | grep -v ^REPO | sed 's/ \+/:/g' | cut -d: -f1,2 | xargs -L1 docker pull
|
||||
$ docker images | grep -v ^REPO | sed <span style="color:#e6db74">'s/ \+/:/g'</span> | cut -d: -f1,2 | xargs -L1 docker pull
|
||||
$ docker-compose -f docker/docker-compose.yml build
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>Then run all system updates and reboot the server</li>
|
||||
<li>After the server came back up I cloned the <code>openrxv-items-final</code> index to <code>openrxv-items-temp</code> and started the plugins
|
||||
<ul>
|
||||
@ -172,7 +172,7 @@ $ docker-compose -f docker/docker-compose.yml build
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ ./ilri/check-spider-hits.sh -f /tmp/spiders -p
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ./ilri/check-spider-hits.sh -f /tmp/spiders -p
|
||||
Purging 95 hits from Drupal in statistics
|
||||
Purging 38 hits from DTS Agent in statistics
|
||||
Purging 601 hits from Microsoft Office Existence Discovery in statistics
|
||||
@ -183,16 +183,16 @@ Purging 144 hits from FlipboardProxy in statistics
|
||||
Purging 37 hits from LinkWalker in statistics
|
||||
Purging 1 hits from [Ll]ink.?[Cc]heck.? in statistics
|
||||
Purging 427 hits from WordPress in statistics
|
||||
|
||||
Total number of bot hits purged: 15030
|
||||
</code></pre><ul>
|
||||
<span style="color:#960050;background-color:#1e0010">
|
||||
</span><span style="color:#960050;background-color:#1e0010"></span>Total number of bot hits purged: 15030
|
||||
</code></pre></div><ul>
|
||||
<li>Meet with the CGIAR–AGROVOC task group to discuss how we want to do the workflow for submitting new terms to AGROVOC</li>
|
||||
<li>I extracted another list of all subjects to check against AGROVOC:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">\COPY (SELECT DISTINCT(LOWER(text_value)) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-06-all-subjects.csv WITH CSV HEADER;
|
||||
$ csvcut -c 1 /tmp/2021-07-06-all-subjects.csv | sed 1d > /tmp/2021-07-06-all-subjects.txt
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">\COPY (SELECT DISTINCT(LOWER(text_value)) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-06-all-subjects.csv WITH CSV HEADER;
|
||||
$ csvcut -c <span style="color:#ae81ff">1</span> /tmp/2021-07-06-all-subjects.csv | sed 1d > /tmp/2021-07-06-all-subjects.txt
|
||||
$ ./ilri/agrovoc-lookup.py -i /tmp/2021-07-06-all-subjects.txt -o /tmp/2021-07-06-agrovoc-results-all-subjects.csv -d
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>Test <a href="https://github.com/DSpace/DSpace/pull/3162">Hrafn Malmquist’s proposed DBCP2 changes</a> for DSpace 6.4 (DS-4574)
|
||||
<ul>
|
||||
<li>His changes reminded me that we can perhaps switch back to using this pooling instead of Tomcat 7’s JDBC pooling via JNDI</li>
|
||||
@ -205,7 +205,7 @@ $ ./ilri/agrovoc-lookup.py -i /tmp/2021-07-06-all-subjects.txt -o /tmp/2021-07-0
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console"># for num in {10..26}; do echo "2021-06-$num"; zcat /var/log/nginx/access.log.*.gz /var/log/nginx/library-access.log.*.gz | grep "$num/Jun/2021" | awk '{print $1}' | sort | uniq | wc -l; done
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># <span style="color:#66d9ef">for</span> num in <span style="color:#f92672">{</span>10..26<span style="color:#f92672">}</span>; <span style="color:#66d9ef">do</span> echo <span style="color:#e6db74">"2021-06-</span>$num<span style="color:#e6db74">"</span>; zcat /var/log/nginx/access.log.*.gz /var/log/nginx/library-access.log.*.gz | grep <span style="color:#e6db74">"</span>$num<span style="color:#e6db74">/Jun/2021"</span> | awk <span style="color:#e6db74">'{print $1}'</span> | sort | uniq | wc -l; <span style="color:#66d9ef">done</span>
|
||||
2021-06-10
|
||||
10693
|
||||
2021-06-11
|
||||
@ -240,10 +240,10 @@ $ ./ilri/agrovoc-lookup.py -i /tmp/2021-07-06-all-subjects.txt -o /tmp/2021-07-0
|
||||
9439
|
||||
2021-06-26
|
||||
7930
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>Similarly, the number of connections to the REST API was around the average for the recent weeks before:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console"># for num in {10..26}; do echo "2021-06-$num"; zcat /var/log/nginx/rest.*.gz | grep "$num/Jun/2021" | awk '{print $1}' | sort | uniq | wc -l; done
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># <span style="color:#66d9ef">for</span> num in <span style="color:#f92672">{</span>10..26<span style="color:#f92672">}</span>; <span style="color:#66d9ef">do</span> echo <span style="color:#e6db74">"2021-06-</span>$num<span style="color:#e6db74">"</span>; zcat /var/log/nginx/rest.*.gz | grep <span style="color:#e6db74">"</span>$num<span style="color:#e6db74">/Jun/2021"</span> | awk <span style="color:#e6db74">'{print $1}'</span> | sort | uniq | wc -l; <span style="color:#66d9ef">done</span>
|
||||
2021-06-10
|
||||
1183
|
||||
2021-06-11
|
||||
@ -278,11 +278,11 @@ $ ./ilri/agrovoc-lookup.py -i /tmp/2021-07-06-all-subjects.txt -o /tmp/2021-07-0
|
||||
969
|
||||
2021-06-26
|
||||
904
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>According to goaccess, the traffic spike started at 2AM (remember that the first “Pool empty” error in dspace.log was at 4:01AM):</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console"># zcat /var/log/nginx/access.log.1[45].gz /var/log/nginx/library-access.log.1[45].gz | grep -E '23/Jun/2021' | goaccess --log-format=COMBINED -
|
||||
</code></pre><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># zcat /var/log/nginx/access.log.1<span style="color:#f92672">[</span>45<span style="color:#f92672">]</span>.gz /var/log/nginx/library-access.log.1<span style="color:#f92672">[</span>45<span style="color:#f92672">]</span>.gz | grep -E <span style="color:#e6db74">'23/Jun/2021'</span> | goaccess --log-format<span style="color:#f92672">=</span>COMBINED -
|
||||
</code></pre></div><ul>
|
||||
<li>Moayad sent a fix for the add missing items plugins issue (<a href="https://github.com/ilri/OpenRXV/pull/107">#107</a>)
|
||||
<ul>
|
||||
<li>It works MUCH faster because it correctly identifies the missing handles in each repository</li>
|
||||
@ -311,19 +311,19 @@ $ ./ilri/agrovoc-lookup.py -i /tmp/2021-07-06-all-subjects.txt -o /tmp/2021-07-0
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">postgres@linode18:~$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">postgres@linode18:~$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
|
||||
2302
|
||||
postgres@linode18:~$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
|
||||
postgres@linode18:~$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
|
||||
2564
|
||||
postgres@linode18:~$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
|
||||
postgres@linode18:~$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
|
||||
2530
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>The locks are held by XMLUI, not REST API or OAI:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">postgres@linode18:~$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | grep -o -E '(dspaceWeb|dspaceApi)' | sort | uniq -c | sort -n
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">postgres@linode18:~$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | grep -o -E '(dspaceWeb|dspaceApi)' | sort | uniq -c | sort -n
|
||||
57 dspaceApi
|
||||
2671 dspaceWeb
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>I ran all updates on the server (linode18) and restarted it, then DSpace came back up</li>
|
||||
<li>I sent a message to Atmire, as I never heard from them last week when we blocked access to the REST API for two days for them to investigate the server issues</li>
|
||||
<li>Clone the <code>openrxv-items-temp</code> index on AReS and re-run all the plugins, but most of the “dspace_add_missing_items” tasks failed so I will just run a full re-harvest</li>
|
||||
@ -338,7 +338,7 @@ postgres@linode18:~$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activi
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console"># grepcidr 91.243.191.0/24 /var/log/nginx/access.log | awk '{print $1}' | sort | uniq -c | sort -n
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># grepcidr 91.243.191.0/24 /var/log/nginx/access.log | awk <span style="color:#e6db74">'{print $1}'</span> | sort | uniq -c | sort -n
|
||||
32 91.243.191.124
|
||||
33 91.243.191.129
|
||||
33 91.243.191.200
|
||||
@ -362,7 +362,7 @@ postgres@linode18:~$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activi
|
||||
45 91.243.191.151
|
||||
46 91.243.191.103
|
||||
56 91.243.191.172
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>I found a few people complaining about these Russian attacks too:
|
||||
<ul>
|
||||
<li><a href="https://community.cloudflare.com/t/russian-ddos-completley-unmitigated-by-cloudflare/284578">https://community.cloudflare.com/t/russian-ddos-completley-unmitigated-by-cloudflare/284578</a></li>
|
||||
@ -392,13 +392,13 @@ postgres@linode18:~$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activi
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ ./asn -n 45.80.217.235
|
||||
|
||||
╭──────────────────────────────╮
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ./asn -n 45.80.217.235
|
||||
<span style="color:#960050;background-color:#1e0010">
|
||||
</span><span style="color:#960050;background-color:#1e0010"></span>╭──────────────────────────────╮
|
||||
│ ASN lookup for 45.80.217.235 │
|
||||
╰──────────────────────────────╯
|
||||
|
||||
45.80.217.235 ┌PTR -
|
||||
<span style="color:#960050;background-color:#1e0010">
|
||||
</span><span style="color:#960050;background-color:#1e0010"></span> 45.80.217.235 ┌PTR -
|
||||
├ASN 46844 (ST-BGP, US)
|
||||
├ORG Sharktech
|
||||
├NET 45.80.217.0/24 (TrafficTransitSolutionNet)
|
||||
@ -407,7 +407,7 @@ postgres@linode18:~$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activi
|
||||
├TYP Proxy host Hosting/DC
|
||||
├GEO Los Angeles, California (US)
|
||||
└REP ✓ NONE
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>Slowly slowly I manually built up a list of the IPs, ISP names, and network blocks, for example:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-csv" data-lang="csv">IP, Organization, Website, Network
|
||||
@ -496,17 +496,17 @@ postgres@linode18:~$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activi
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console"># grep -v -E "(mahider|Googlebot|Turnitin|Grammarly|Unpaywall|UptimeRobot|bot)" /var/log/nginx/access.log | awk '{print $1}' | sort | uniq > /tmp/ips-sorted.txt
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># grep -v -E <span style="color:#e6db74">"(mahider|Googlebot|Turnitin|Grammarly|Unpaywall|UptimeRobot|bot)"</span> /var/log/nginx/access.log | awk <span style="color:#e6db74">'{print $1}'</span> | sort | uniq > /tmp/ips-sorted.txt
|
||||
# wc -l /tmp/ips-sorted.txt
|
||||
10776 /tmp/ips-sorted.txt
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>Then resolve them all:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console:" data-lang="console:">$ ./ilri/resolve-addresses-geoip2.py -i /tmp/ips-sorted.txt -o /tmp/out.csv
|
||||
</code></pre><ul>
|
||||
<li>Then get the top 10 organizations and top ten ASNs:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ csvcut -c 2 /tmp/out.csv | sed 1d | sort | uniq -c | sort -n | tail -n 10
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ csvcut -c <span style="color:#ae81ff">2</span> /tmp/out.csv | sed 1d | sort | uniq -c | sort -n | tail -n <span style="color:#ae81ff">10</span>
|
||||
213 AMAZON-AES
|
||||
218 ASN-QUADRANET-GLOBAL
|
||||
246 Silverstar Invest Limited
|
||||
@ -517,7 +517,7 @@ postgres@linode18:~$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activi
|
||||
814 UGB Hosting OU
|
||||
1010 ST-BGP
|
||||
1757 Global Layer B.V.
|
||||
$ csvcut -c 3 /tmp/out.csv | sed 1d | sort | uniq -c | sort -n | tail -n 10
|
||||
$ csvcut -c <span style="color:#ae81ff">3</span> /tmp/out.csv | sed 1d | sort | uniq -c | sort -n | tail -n <span style="color:#ae81ff">10</span>
|
||||
213 14618
|
||||
218 8100
|
||||
246 35624
|
||||
@ -528,10 +528,10 @@ $ csvcut -c 3 /tmp/out.csv | sed 1d | sort | uniq -c | sort -n | tail -n 10
|
||||
814 206485
|
||||
1010 46844
|
||||
1757 49453
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>I will download blocklists for all these except Ethiopian Telecom, Quadranet, and Amazon, though I’m concerned about Global Layer because it’s a huge ASN that seems to have legit hosts too…?</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ wget https://asn.ipinfo.app/api/text/nginx/AS49453
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ wget https://asn.ipinfo.app/api/text/nginx/AS49453
|
||||
$ wget https://asn.ipinfo.app/api/text/nginx/AS46844
|
||||
$ wget https://asn.ipinfo.app/api/text/nginx/AS206485
|
||||
$ wget https://asn.ipinfo.app/api/text/nginx/AS62282
|
||||
@ -540,12 +540,12 @@ $ wget https://asn.ipinfo.app/api/text/nginx/AS35624
|
||||
$ cat AS* | sort | uniq > /tmp/abusive-networks.txt
|
||||
$ wc -l /tmp/abusive-networks.txt
|
||||
2276 /tmp/abusive-networks.txt
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>Combining with my existing rules and filtering uniques:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ cat roles/dspace/templates/nginx/abusive-networks.conf.j2 /tmp/abusive-networks.txt | grep deny | sort | uniq | wc -l
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ cat roles/dspace/templates/nginx/abusive-networks.conf.j2 /tmp/abusive-networks.txt | grep deny | sort | uniq | wc -l
|
||||
2298
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li><a href="https://scamalytics.com/ip/isp/2021-06">According to Scamalytics all these are high risk ISPs</a> (as recently as 2021-06) so I will just keep blocking them</li>
|
||||
<li>I deployed the block list on CGSpace (linode18) and the load is down to 1.0 but I see there are still some DDoS IPs getting through… sigh</li>
|
||||
<li>The next thing I need to do is purge all the IPs from Solr using grepcidr…</li>
|
||||
@ -558,12 +558,12 @@ $ wc -l /tmp/abusive-networks.txt
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ sudo zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/access.log.2 | grep -E " (200|499) " | awk '{print $1}' | sort | uniq > /tmp/all-ips.txt
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ sudo zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/access.log.2 | grep -E <span style="color:#e6db74">" (200|499) "</span> | awk <span style="color:#e6db74">'{print $1}'</span> | sort | uniq > /tmp/all-ips.txt
|
||||
$ ./ilri/resolve-addresses-geoip2.py -i /tmp/all-ips.txt -o /tmp/all-ips-out.csv
|
||||
$ csvgrep -c asn -r '^(206485|35624|36352|46844|49453|62282)$' /tmp/all-ips-out.csv | csvcut -c ip | sed 1d | sort | uniq > /tmp/all-ips-to-block.txt
|
||||
$ csvgrep -c asn -r <span style="color:#e6db74">'^(206485|35624|36352|46844|49453|62282)$'</span> /tmp/all-ips-out.csv | csvcut -c ip | sed 1d | sort | uniq > /tmp/all-ips-to-block.txt
|
||||
$ wc -l /tmp/all-ips-to-block.txt
|
||||
5095 /tmp/all-ips-to-block.txt
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>Then I added them to the normal ipset we are already using with firewalld
|
||||
<ul>
|
||||
<li>I will check again in a few hours and ban more</li>
|
||||
@ -571,10 +571,10 @@ $ wc -l /tmp/all-ips-to-block.txt
|
||||
</li>
|
||||
<li>I decided to extract the networks from the GeoIP database with <code>resolve-addresses-geoip2.py</code> so I can block them more efficiently than using the 5,000 IPs in an ipset:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ csvgrep -c asn -r '^(206485|35624|36352|46844|49453|62282)$' /tmp/all-ips-out.csv | csvcut -c network | sed 1d | sort | uniq > /tmp/all-networks-to-block.txt
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ csvgrep -c asn -r <span style="color:#e6db74">'^(206485|35624|36352|46844|49453|62282)$'</span> /tmp/all-ips-out.csv | csvcut -c network | sed 1d | sort | uniq > /tmp/all-networks-to-block.txt
|
||||
$ grep deny roles/dspace/templates/nginx/abusive-networks.conf.j2 | sort | uniq | wc -l
|
||||
2354
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>Combined with the previous networks this brings about 200 more for a total of 2,354 networks
|
||||
<ul>
|
||||
<li>I think I need to re-work the ipset stuff in my common Ansible role so that I can add such abusive networks as an iptables ipset / nftables set, and have a cron job to update them daily (from <a href="https://www.spamhaus.org/drop/">Spamhaus’s DROP and EDROP lists</a>, for example)</li>
|
||||
@ -582,25 +582,25 @@ $ grep deny roles/dspace/templates/nginx/abusive-networks.conf.j2 | sort | uniq
|
||||
</li>
|
||||
<li>Then I got a list of all the 5,095 IPs from above and used <code>check-spider-ip-hits.sh</code> to purge them from Solr:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ ilri/check-spider-ip-hits.sh -f /tmp/all-ips-to-block.txt -p
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ilri/check-spider-ip-hits.sh -f /tmp/all-ips-to-block.txt -p
|
||||
...
|
||||
Total number of bot hits purged: 197116
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>I started a harvest on AReS and it finished in a few hours now that the load on CGSpace is back to a normal level</li>
|
||||
</ul>
|
||||
<h2 id="2021-07-20">2021-07-20</h2>
|
||||
<ul>
|
||||
<li>Looking again at the IPs making connections to CGSpace over the last few days from these seven ASNs, it’s much higher than I noticed yesterday:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ csvgrep -c asn -r '^(49453|46844|206485|62282|36352|35913|35624)$' /tmp/out.csv | csvcut -c ip | sed 1d | sort | uniq | wc -l
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ csvgrep -c asn -r <span style="color:#e6db74">'^(49453|46844|206485|62282|36352|35913|35624)$'</span> /tmp/out.csv | csvcut -c ip | sed 1d | sort | uniq | wc -l
|
||||
5643
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>I purged 27,000 more hits from the Solr stats using this new list of IPs with my <code>check-spider-ip-hits.sh</code> script</li>
|
||||
<li>Surprise surprise, I checked the nginx logs from 2021-06-23 when we last had issues with thousands of XMLUI sessions and PostgreSQL connections and I see IPs from the same ASNs!</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ sudo zcat --force /var/log/nginx/access.log.27.gz /var/log/nginx/access.log.28.gz | grep -E " (200|499) " | grep -v -E "(mahider|Googlebot|Turnitin|Grammarly|Unpaywall|UptimeRobot|bot)" | awk '{print $1}' | sort | uniq > /tmp/all-ips-june-23.txt
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ sudo zcat --force /var/log/nginx/access.log.27.gz /var/log/nginx/access.log.28.gz | grep -E <span style="color:#e6db74">" (200|499) "</span> | grep -v -E <span style="color:#e6db74">"(mahider|Googlebot|Turnitin|Grammarly|Unpaywall|UptimeRobot|bot)"</span> | awk <span style="color:#e6db74">'{print $1}'</span> | sort | uniq > /tmp/all-ips-june-23.txt
|
||||
$ ./ilri/resolve-addresses-geoip2.py -i /tmp/all-ips-june-23.txt -o /tmp/out.csv
|
||||
$ csvcut -c 2,4 /tmp/out.csv | sed 1d | sort | uniq -c | sort -n | tail -n 15
|
||||
$ csvcut -c 2,4 /tmp/out.csv | sed 1d | sort | uniq -c | sort -n | tail -n <span style="color:#ae81ff">15</span>
|
||||
265 GOOGLE,15169
|
||||
277 Silverstar Invest Limited,35624
|
||||
280 FACEBOOK,32934
|
||||
@ -616,17 +616,17 @@ $ csvcut -c 2,4 /tmp/out.csv | sed 1d | sort | uniq -c | sort -n | tail -n 15
|
||||
874 Ethiopian Telecommunication Corporation,24757
|
||||
912 UGB Hosting OU,206485
|
||||
1607 Global Layer B.V.,49453
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>Again it was over 5,000 IPs:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ csvgrep -c asn -r '^(49453|46844|206485|62282|36352|35913|35624)$' /tmp/out.csv | csvcut -c ip | sed 1d | sort | uniq | wc -l
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ csvgrep -c asn -r <span style="color:#e6db74">'^(49453|46844|206485|62282|36352|35913|35624)$'</span> /tmp/out.csv | csvcut -c ip | sed 1d | sort | uniq | wc -l
|
||||
5228
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>Interestingly, it seems these are five thousand <em>different</em> IP addresses than the attack from last weekend, as there are over 10,000 unique ones if I combine them!</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ cat /tmp/ips-june23.txt /tmp/ips-jul16.txt | sort | uniq | wc -l
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ cat /tmp/ips-june23.txt /tmp/ips-jul16.txt | sort | uniq | wc -l
|
||||
10458
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>I purged all the (26,000) hits from these new IP addresses from Solr as well</li>
|
||||
<li>Looking back at my notes for the 2019-05 attack I see that I had already identified most of these network providers (!)…
|
||||
<ul>
|
||||
@ -636,30 +636,30 @@ $ csvcut -c 2,4 /tmp/out.csv | sed 1d | sort | uniq -c | sort -n | tail -n 15
|
||||
</li>
|
||||
<li>Adding QuadraNet brings the total networks seen during these two attacks to 262, and the number of unique IPs to 10900:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console"># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/access.log.2 /var/log/nginx/access.log.3 /var/log/nginx/access.log.4 /var/log/nginx/access.log.5 /var/log/nginx/access.log.27.gz /var/log/nginx/access.log.28.gz | grep -E " (200|499) " | grep -v -E "(mahider|Googlebot|Turnitin|Grammarly|Unpaywall|UptimeRobot|bot)" | awk '{print $1}' | sort | uniq > /tmp/ddos-ips.txt
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/access.log.2 /var/log/nginx/access.log.3 /var/log/nginx/access.log.4 /var/log/nginx/access.log.5 /var/log/nginx/access.log.27.gz /var/log/nginx/access.log.28.gz | grep -E <span style="color:#e6db74">" (200|499) "</span> | grep -v -E <span style="color:#e6db74">"(mahider|Googlebot|Turnitin|Grammarly|Unpaywall|UptimeRobot|bot)"</span> | awk <span style="color:#e6db74">'{print $1}'</span> | sort | uniq > /tmp/ddos-ips.txt
|
||||
# wc -l /tmp/ddos-ips.txt
|
||||
54002 /tmp/ddos-ips.txt
|
||||
$ ./ilri/resolve-addresses-geoip2.py -i /tmp/ddos-ips.txt -o /tmp/ddos-ips.csv
|
||||
$ csvgrep -c asn -r '^(49453|46844|206485|62282|36352|35913|35624|8100)$' /tmp/ddos-ips.csv | csvcut -c ip | sed 1d | sort | uniq > /tmp/ddos-ips-to-purge.txt
|
||||
$ csvgrep -c asn -r <span style="color:#e6db74">'^(49453|46844|206485|62282|36352|35913|35624|8100)$'</span> /tmp/ddos-ips.csv | csvcut -c ip | sed 1d | sort | uniq > /tmp/ddos-ips-to-purge.txt
|
||||
$ wc -l /tmp/ddos-ips-to-purge.txt
|
||||
10900 /tmp/ddos-ips-to-purge.txt
|
||||
$ csvgrep -c asn -r '^(49453|46844|206485|62282|36352|35913|35624|8100)$' /tmp/ddos-ips.csv | csvcut -c network | sed 1d | sort | uniq > /tmp/ddos-networks-to-block.txt
|
||||
$ csvgrep -c asn -r <span style="color:#e6db74">'^(49453|46844|206485|62282|36352|35913|35624|8100)$'</span> /tmp/ddos-ips.csv | csvcut -c network | sed 1d | sort | uniq > /tmp/ddos-networks-to-block.txt
|
||||
$ wc -l /tmp/ddos-networks-to-block.txt
|
||||
262 /tmp/ddos-networks-to-block.txt
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>The new total number of networks to block, including the network prefixes for these ASNs downloaded from asn.ipinfo.app, is 4,007:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ wget https://asn.ipinfo.app/api/text/nginx/AS49453 \
|
||||
https://asn.ipinfo.app/api/text/nginx/AS46844 \
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ wget https://asn.ipinfo.app/api/text/nginx/AS49453 <span style="color:#ae81ff">\
|
||||
</span><span style="color:#ae81ff"></span>https://asn.ipinfo.app/api/text/nginx/AS46844 \
|
||||
https://asn.ipinfo.app/api/text/nginx/AS206485 \
|
||||
https://asn.ipinfo.app/api/text/nginx/AS62282 \
|
||||
https://asn.ipinfo.app/api/text/nginx/AS36352 \
|
||||
https://asn.ipinfo.app/api/text/nginx/AS35913 \
|
||||
https://asn.ipinfo.app/api/text/nginx/AS35624 \
|
||||
https://asn.ipinfo.app/api/text/nginx/AS8100
|
||||
$ cat AS* /tmp/ddos-networks-to-block.txt | sed -e '/^$/d' -e '/^#/d' -e '/^{/d' -e 's/deny //' -e 's/;//' | sort | uniq | wc -l
|
||||
$ cat AS* /tmp/ddos-networks-to-block.txt | sed -e <span style="color:#e6db74">'/^$/d'</span> -e <span style="color:#e6db74">'/^#/d'</span> -e <span style="color:#e6db74">'/^{/d'</span> -e <span style="color:#e6db74">'s/deny //'</span> -e <span style="color:#e6db74">'s/;//'</span> | sort | uniq | wc -l
|
||||
4007
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>I re-applied these networks to nginx on CGSpace (linode18) and DSpace Test (linode26), and purged 14,000 more Solr statistics hits from these IPs</li>
|
||||
</ul>
|
||||
<h2 id="2021-07-22">2021-07-22</h2>
|
||||
|
Reference in New Issue
Block a user