mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-22 14:45:03 +01:00
626 lines
29 KiB
HTML
626 lines
29 KiB
HTML
<!DOCTYPE html>
|
||
<html lang="en" >
|
||
|
||
<head>
|
||
<meta charset="utf-8">
|
||
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
|
||
|
||
|
||
<meta property="og:title" content="July, 2021" />
|
||
<meta property="og:description" content="2021-07-01
|
||
|
||
Export another list of ALL subjects on CGSpace, including AGROVOC and non-AGROVOC for Enrico:
|
||
|
||
localhost/dspace63= > \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-01-all-subjects.csv WITH CSV HEADER;
|
||
COPY 20994
|
||
" />
|
||
<meta property="og:type" content="article" />
|
||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2021-06/" />
|
||
<meta property="article:published_time" content="2021-06-01T08:53:07+03:00" />
|
||
<meta property="article:modified_time" content="2021-07-11T15:59:16+03:00" />
|
||
|
||
|
||
|
||
<meta name="twitter:card" content="summary"/>
|
||
<meta name="twitter:title" content="July, 2021"/>
|
||
<meta name="twitter:description" content="2021-07-01
|
||
|
||
Export another list of ALL subjects on CGSpace, including AGROVOC and non-AGROVOC for Enrico:
|
||
|
||
localhost/dspace63= > \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-01-all-subjects.csv WITH CSV HEADER;
|
||
COPY 20994
|
||
"/>
|
||
<meta name="generator" content="Hugo 0.85.0" />
|
||
|
||
|
||
|
||
<script type="application/ld+json">
|
||
{
|
||
"@context": "http://schema.org",
|
||
"@type": "BlogPosting",
|
||
"headline": "July, 2021",
|
||
"url": "https://alanorth.github.io/cgspace-notes/2021-06/",
|
||
"wordCount": "2439",
|
||
"datePublished": "2021-06-01T08:53:07+03:00",
|
||
"dateModified": "2021-07-11T15:59:16+03:00",
|
||
"author": {
|
||
"@type": "Person",
|
||
"name": "Alan Orth"
|
||
},
|
||
"keywords": "Notes"
|
||
}
|
||
</script>
|
||
|
||
|
||
|
||
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2021-06/">
|
||
|
||
<title>July, 2021 | CGSpace Notes</title>
|
||
|
||
|
||
<!-- combined, minified CSS -->
|
||
|
||
<link href="https://alanorth.github.io/cgspace-notes/css/style.beb8012edc08ba10be012f079d618dc243812267efe62e11f22fe49618f976a4.css" rel="stylesheet" integrity="sha256-vrgBLtwIuhC+AS8HnWGNwkOBImfv5i4R8i/klhj5dqQ=" crossorigin="anonymous">
|
||
|
||
|
||
<!-- minified Font Awesome for SVG icons -->
|
||
|
||
<script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.ffbfea088a9a1666ec65c3a8cb4906e2a0e4f92dc70dbbf400a125ad2422123a.js" integrity="sha256-/7/qCIqaFmbsZcOoy0kG4qDk+S3HDbv0AKElrSQiEjo=" crossorigin="anonymous"></script>
|
||
|
||
<!-- RSS 2.0 feed -->
|
||
|
||
|
||
|
||
|
||
</head>
|
||
|
||
<body>
|
||
|
||
|
||
<div class="blog-masthead">
|
||
<div class="container">
|
||
<nav class="nav blog-nav">
|
||
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
|
||
</nav>
|
||
</div>
|
||
</div>
|
||
|
||
|
||
|
||
|
||
<header class="blog-header">
|
||
<div class="container">
|
||
<h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
|
||
<p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
|
||
</div>
|
||
</header>
|
||
|
||
|
||
|
||
|
||
<div class="container">
|
||
<div class="row">
|
||
<div class="col-sm-8 blog-main">
|
||
|
||
|
||
|
||
|
||
<article class="blog-post">
|
||
<header>
|
||
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2021-06/">July, 2021</a></h2>
|
||
<p class="blog-post-meta">
|
||
<time datetime="2021-06-01T08:53:07+03:00">Tue Jun 01, 2021</time>
|
||
in
|
||
<span class="fas fa-folder" aria-hidden="true"></span> <a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
|
||
|
||
|
||
</p>
|
||
</header>
|
||
<h2 id="2021-07-01">2021-07-01</h2>
|
||
<ul>
|
||
<li>Export another list of ALL subjects on CGSpace, including AGROVOC and non-AGROVOC for Enrico:</li>
|
||
</ul>
|
||
<pre><code class="language-console" data-lang="console">localhost/dspace63= > \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-01-all-subjects.csv WITH CSV HEADER;
|
||
COPY 20994
|
||
</code></pre><h2 id="2021-07-04">2021-07-04</h2>
|
||
<ul>
|
||
<li>Update all Docker containers on the AReS server (linode20) and rebuild OpenRXV:</li>
|
||
</ul>
|
||
<pre><code class="language-console" data-lang="console">$ cd OpenRXV
|
||
$ docker-compose -f docker/docker-compose.yml down
|
||
$ docker images | grep -v ^REPO | sed 's/ \+/:/g' | cut -d: -f1,2 | xargs -L1 docker pull
|
||
$ docker-compose -f docker/docker-compose.yml build
|
||
</code></pre><ul>
|
||
<li>Then run all system updates and reboot the server</li>
|
||
<li>After the server came back up I cloned the <code>openrxv-items-final</code> index to <code>openrxv-items-temp</code> and started the plugins
|
||
<ul>
|
||
<li>This will hopefully be faster than a full re-harvest…</li>
|
||
</ul>
|
||
</li>
|
||
<li>I opened a few GitHub issues for OpenRXV bugs:
|
||
<ul>
|
||
<li><a href="https://github.com/ilri/OpenRXV/issues/103">Hide “metadata structure” section in repository setup</a></li>
|
||
<li><a href="https://github.com/ilri/OpenRXV/issues/104">Improve “start plugins” and “commit indexing” buttons</a></li>
|
||
<li><a href="https://github.com/ilri/OpenRXV/issues/105">Allow running plugins individually</a></li>
|
||
<li><a href="https://github.com/ilri/OpenRXV/issues/106">Hide the “DSpace add missing items”</a></li>
|
||
</ul>
|
||
</li>
|
||
<li>Rebuild DSpace Test (linode26) from a fresh Ubuntu 20.04 image on Linode</li>
|
||
<li>The start plugins on AReS had seventy-five errors from the <code>dspace_add_missing_items</code> plugin for some reason so I had to start a fresh indexing</li>
|
||
<li>I noticed that the WorldFish data has dozens of incorrect countries so I should talk to Salem about that because they manage it
|
||
<ul>
|
||
<li>Also I noticed that we weren’t using the Country formatter in OpenRXV for the WorldFish country field, so some values don’t get mapped properly</li>
|
||
<li>I added some value mappings to fix some issues with WorldFish data and added a few more fields to the repository harvesting config and started a fresh re-indexing</li>
|
||
</ul>
|
||
</li>
|
||
</ul>
|
||
<h2 id="2021-07-05">2021-07-05</h2>
|
||
<ul>
|
||
<li>The AReS harvesting last night succeeded and I started the plugins</li>
|
||
<li>Margarita from CCAFS asked if we can create a new field for AICCRA publications
|
||
<ul>
|
||
<li>I asked her to clarify what they want</li>
|
||
<li>AICCRA is an initiative so it might be better to create new field for that, for example <code>cg.contributor.initiative</code></li>
|
||
</ul>
|
||
</li>
|
||
</ul>
|
||
<h2 id="2021-07-06">2021-07-06</h2>
|
||
<ul>
|
||
<li>Atmire merged my spider user agent changes from last month so I will update the <code>example</code> list we use in DSpace and remove the new ones from my <code>ilri</code> override file
|
||
<ul>
|
||
<li>Also, I concatenated all our user agents into one file and purged all hits:</li>
|
||
</ul>
|
||
</li>
|
||
</ul>
|
||
<pre><code class="language-console" data-lang="console">$ ./ilri/check-spider-hits.sh -f /tmp/spiders -p
|
||
Purging 95 hits from Drupal in statistics
|
||
Purging 38 hits from DTS Agent in statistics
|
||
Purging 601 hits from Microsoft Office Existence Discovery in statistics
|
||
Purging 51 hits from Site24x7 in statistics
|
||
Purging 62 hits from Trello in statistics
|
||
Purging 13574 hits from WhatsApp in statistics
|
||
Purging 144 hits from FlipboardProxy in statistics
|
||
Purging 37 hits from LinkWalker in statistics
|
||
Purging 1 hits from [Ll]ink.?[Cc]heck.? in statistics
|
||
Purging 427 hits from WordPress in statistics
|
||
|
||
Total number of bot hits purged: 15030
|
||
</code></pre><ul>
|
||
<li>Meet with the CGIAR–AGROVOC task group to discuss how we want to do the workflow for submitting new terms to AGROVOC</li>
|
||
<li>I extracted another list of all subjects to check against AGROVOC:</li>
|
||
</ul>
|
||
<pre><code class="language-console" data-lang="console">\COPY (SELECT DISTINCT(LOWER(text_value)) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-06-all-subjects.csv WITH CSV HEADER;
|
||
$ csvcut -c 1 /tmp/2021-07-06-all-subjects.csv | sed 1d > /tmp/2021-07-06-all-subjects.txt
|
||
$ ./ilri/agrovoc-lookup.py -i /tmp/2021-07-06-all-subjects.txt -o /tmp/2021-07-06-agrovoc-results-all-subjects.csv -d
|
||
</code></pre><ul>
|
||
<li>Test <a href="https://github.com/DSpace/DSpace/pull/3162">Hrafn Malmquist’s proposed DBCP2 changes</a> for DSpace 6.4 (DS-4574)
|
||
<ul>
|
||
<li>His changes reminded me that we can perhaps switch back to using this pooling instead of Tomcat 7’s JDBC pooling via JNDI</li>
|
||
<li>Tomcat 8 has DBCP2 built in, but we are stuck on Tomcat 7 for now</li>
|
||
</ul>
|
||
</li>
|
||
<li>Looking into the database issues we had last month on 2021-06-23
|
||
<ul>
|
||
<li>I think it might have been some kind of attack because the number of XMLUI sessions was through the roof at one point (10,000!) and the number of unique IPs accessing the server that day is much higher than any other day:</li>
|
||
</ul>
|
||
</li>
|
||
</ul>
|
||
<pre><code class="language-console" data-lang="console"># for num in {10..26}; do echo "2021-06-$num"; zcat /var/log/nginx/access.log.*.gz /var/log/nginx/library-access.log.*.gz | grep "$num/Jun/2021" | awk '{print $1}' | sort | uniq | wc -l; done
|
||
2021-06-10
|
||
10693
|
||
2021-06-11
|
||
10587
|
||
2021-06-12
|
||
7958
|
||
2021-06-13
|
||
7681
|
||
2021-06-14
|
||
12639
|
||
2021-06-15
|
||
15388
|
||
2021-06-16
|
||
12245
|
||
2021-06-17
|
||
11187
|
||
2021-06-18
|
||
9684
|
||
2021-06-19
|
||
7835
|
||
2021-06-20
|
||
7198
|
||
2021-06-21
|
||
10380
|
||
2021-06-22
|
||
10255
|
||
2021-06-23
|
||
15878
|
||
2021-06-24
|
||
9963
|
||
2021-06-25
|
||
9439
|
||
2021-06-26
|
||
7930
|
||
</code></pre><ul>
|
||
<li>Similarly, the number of connections to the REST API was around the average for the recent weeks before:</li>
|
||
</ul>
|
||
<pre><code class="language-console" data-lang="console"># for num in {10..26}; do echo "2021-06-$num"; zcat /var/log/nginx/rest.*.gz | grep "$num/Jun/2021" | awk '{print $1}' | sort | uniq | wc -l; done
|
||
2021-06-10
|
||
1183
|
||
2021-06-11
|
||
1074
|
||
2021-06-12
|
||
911
|
||
2021-06-13
|
||
892
|
||
2021-06-14
|
||
1320
|
||
2021-06-15
|
||
1257
|
||
2021-06-16
|
||
1208
|
||
2021-06-17
|
||
1119
|
||
2021-06-18
|
||
965
|
||
2021-06-19
|
||
985
|
||
2021-06-20
|
||
854
|
||
2021-06-21
|
||
1098
|
||
2021-06-22
|
||
1028
|
||
2021-06-23
|
||
1375
|
||
2021-06-24
|
||
1135
|
||
2021-06-25
|
||
969
|
||
2021-06-26
|
||
904
|
||
</code></pre><ul>
|
||
<li>According to goaccess, the traffic spike started at 2AM (remember that the first “Pool empty” error in dspace.log was at 4:01AM):</li>
|
||
</ul>
|
||
<pre><code class="language-console" data-lang="console"># zcat /var/log/nginx/access.log.1[45].gz /var/log/nginx/library-access.log.1[45].gz | grep -E '23/Jun/2021' | goaccess --log-format=COMBINED -
|
||
</code></pre><ul>
|
||
<li>Moayad sent a fix for the add missing items plugins issue (<a href="https://github.com/ilri/OpenRXV/pull/107">#107</a>)
|
||
<ul>
|
||
<li>It works MUCH faster because it correctly identifies the missing handles in each repository</li>
|
||
<li>Also it adds better debug messages to the api logs</li>
|
||
</ul>
|
||
</li>
|
||
</ul>
|
||
<h2 id="2021-07-08">2021-07-08</h2>
|
||
<ul>
|
||
<li>Atmire plans to debug the database connection issues on CGSpace (linode18) today so they asked me to make the REST API inaccessible for today and tomorrow
|
||
<ul>
|
||
<li>I adjusted nginx to give an HTTP 403 as well as a an error message to contact me</li>
|
||
</ul>
|
||
</li>
|
||
</ul>
|
||
<h2 id="2021-07-11">2021-07-11</h2>
|
||
<ul>
|
||
<li>Start an indexing on AReS</li>
|
||
</ul>
|
||
<h2 id="2021-07-17">2021-07-17</h2>
|
||
<ul>
|
||
<li>I’m in Cyprus mostly offline, but I noticed that CGSpace was down
|
||
<ul>
|
||
<li>I checked and there was a blank white page with HTTP 200</li>
|
||
<li>There are thousands of locks in PostgreSQL:</li>
|
||
</ul>
|
||
</li>
|
||
</ul>
|
||
<pre><code class="language-console" data-lang="console">postgres@linode18:~$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
|
||
2302
|
||
postgres@linode18:~$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
|
||
2564
|
||
postgres@linode18:~$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
|
||
2530
|
||
</code></pre><ul>
|
||
<li>The locks are held by XMLUI, not REST API or OAI:</li>
|
||
</ul>
|
||
<pre><code class="language-console" data-lang="console">postgres@linode18:~$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | grep -o -E '(dspaceWeb|dspaceApi)' | sort | uniq -c | sort -n
|
||
57 dspaceApi
|
||
2671 dspaceWeb
|
||
</code></pre><ul>
|
||
<li>I ran all updates on the server (linode18) and restarted it, then DSpace came back up</li>
|
||
<li>I sent a message to Atmire, as I never heard from them last week when we blocked access to the REST API for two days for them to investigate the server issues</li>
|
||
<li>Clone the <code>openrxv-items-temp</code> index on AReS and re-run all the plugins, but most of the “dspace_add_missing_items” tasks failed so I will just run a full re-harvest</li>
|
||
<li>The load on CGSpace is 45.00… the nginx access.log is going so fast I can’t even read it
|
||
<ul>
|
||
<li>I see lots of IPs from AS206485 that are changing their user agents (Linux, Windows, macOS…)</li>
|
||
<li>This is finegroupservers.com aka “UGB - UGB Hosting OU”</li>
|
||
<li>I will get a list of their IP blocks from <a href="https://asn.ipinfo.app/AS206485">ipinfo.app</a> and block them in nginx</li>
|
||
<li>There is another group of IPs that are owned by an ISP called “TrafficTransitSolution LLC” that does not have its own ASN unfortunately</li>
|
||
<li>“TrafficTransitSolution LLC” seems to be affiliated with AS206485 (UGB Hosting OU) anyways, but they sometimes use AS49453 Global Layer B.V.G also</li>
|
||
<li>I found a tool that lets you grep a file by CIDR, so I can use that to purge hits from Solr eventually:</li>
|
||
</ul>
|
||
</li>
|
||
</ul>
|
||
<pre><code class="language-console" data-lang="console"># grepcidr 91.243.191.0/24 /var/log/nginx/access.log | awk '{print $1}' | sort | uniq -c | sort -n
|
||
32 91.243.191.124
|
||
33 91.243.191.129
|
||
33 91.243.191.200
|
||
34 91.243.191.115
|
||
34 91.243.191.154
|
||
34 91.243.191.234
|
||
34 91.243.191.56
|
||
35 91.243.191.187
|
||
35 91.243.191.91
|
||
36 91.243.191.58
|
||
37 91.243.191.209
|
||
39 91.243.191.119
|
||
39 91.243.191.144
|
||
39 91.243.191.55
|
||
40 91.243.191.112
|
||
40 91.243.191.182
|
||
40 91.243.191.57
|
||
40 91.243.191.98
|
||
41 91.243.191.106
|
||
44 91.243.191.79
|
||
45 91.243.191.151
|
||
46 91.243.191.103
|
||
56 91.243.191.172
|
||
</code></pre><ul>
|
||
<li>I found a few people complaining about these Russian attacks too:
|
||
<ul>
|
||
<li><a href="https://community.cloudflare.com/t/russian-ddos-completley-unmitigated-by-cloudflare/284578">https://community.cloudflare.com/t/russian-ddos-completley-unmitigated-by-cloudflare/284578</a></li>
|
||
<li><a href="https://vklader.com/ddos-2020-may/">https://vklader.com/ddos-2020-may/</a></li>
|
||
</ul>
|
||
</li>
|
||
<li>According to AbuseIPDB.com and whois data provided by the asn tool, I see these organizations, networks, and ISPs all seem to be related:
|
||
<ul>
|
||
<li>Sharktech</li>
|
||
<li>LIR LLC / lir.am</li>
|
||
<li>TrafficTransitSolution LLC / traffictransitsolution.us</li>
|
||
<li>Fine Group Servers Solutions LLC / finegroupservers.com</li>
|
||
<li>UGB</li>
|
||
<li>Bulgakov Alexey Yurievich</li>
|
||
<li>Dmitry Vorozhtsov / fitz-isp.uk / UGB</li>
|
||
<li>Auction LLC / dauction.ru / UGB / traffictransitsolution.us</li>
|
||
<li>Alax LLC / alaxona.com / finegroupservers.com</li>
|
||
<li>Sysoev Aleksey Anatolevich / jobbuzzactiv.com / traffictransitsolution.us</li>
|
||
<li>Bulgakov Alexey Yurievich / UGB / blockchainnetworksolutions.co.uk / <a href="mailto:info@finegroupservers.com">info@finegroupservers.com</a></li>
|
||
<li>UAB Rakrejus</li>
|
||
</ul>
|
||
</li>
|
||
<li>I looked in the nginx log and copied a few IP addresses that were suspicious
|
||
<ul>
|
||
<li>First I looked them up in AbuseIPDB.com to get the ISP name and website</li>
|
||
<li>Then I looked them up with the <a href="https://github.com/nitefood/asn">asn</a> tool, ie:</li>
|
||
</ul>
|
||
</li>
|
||
</ul>
|
||
<pre><code class="language-console" data-lang="console">$ ./asn -n 45.80.217.235
|
||
|
||
╭──────────────────────────────╮
|
||
│ ASN lookup for 45.80.217.235 │
|
||
╰──────────────────────────────╯
|
||
|
||
45.80.217.235 ┌PTR -
|
||
├ASN 46844 (ST-BGP, US)
|
||
├ORG Sharktech
|
||
├NET 45.80.217.0/24 (TrafficTransitSolutionNet)
|
||
├ABU info@traffictransitsolution.us
|
||
├ROA ✓ VALID (1 ROA found)
|
||
├TYP Proxy host Hosting/DC
|
||
├GEO Los Angeles, California (US)
|
||
└REP ✓ NONE
|
||
</code></pre><ul>
|
||
<li>Slowly slowly I manually built up a list of the IPs, ISP names, and network blocks, for example:</li>
|
||
</ul>
|
||
<pre><code class="language-csv" data-lang="csv">IP, Organization, Website, Network
|
||
45.148.126.246, TrafficTransitSolution LLC, traffictransitsolution.us, 45.148.126.0/24 (Net-traffictransitsolution-15)
|
||
45.138.102.253, TrafficTransitSolution LLC, traffictransitsolution.us, 45.138.102.0/24 (Net-traffictransitsolution-11)
|
||
45.140.205.104, Bulgakov Alexey Yurievich, finegroupservers.com, 45.140.204.0/23 (CHINA_NETWORK)
|
||
185.68.247.63, Fine Group Servers Solutions LLC, finegroupservers.com, 185.68.247.0/24 (Net-finegroupservers-18)
|
||
213.232.123.188, Fine Group Servers Solutions LLC, finegroupservers.com, 213.232.123.0/24 (Net-finegroupservers-12)
|
||
45.80.217.235, TrafficTransitSolution LLC, traffictransitsolution.us, 45.80.217.0/24 (TrafficTransitSolutionNet)
|
||
185.81.144.202, Fine Group Servers Solutions LLC, finegroupservers.com, 185.81.144.0/24 (Net-finegroupservers-19)
|
||
109.106.22.114, TrafficTransitSolution LLC, traffictransitsolution.us, 109.106.22.0/24 (TrafficTransitSolutionNet)
|
||
185.68.247.200, Fine Group Servers Solutions LLC, finegroupservers.com, 185.68.247.0/24 (Net-finegroupservers-18)
|
||
45.80.105.252, Bulgakov Alexey Yurievich, finegroupservers.com, 45.80.104.0/23 (NET-BNSL2-1)
|
||
185.233.187.156, Dmitry Vorozhtsov, mgn-host.ru, 185.233.187.0/24 (GB-FITZISP-20181106)
|
||
185.88.100.75, TrafficTransitSolution LLC, traffictransitsolution.us, 185.88.100.0/24 (Net-traffictransitsolution-17)
|
||
194.104.8.154, TrafficTransitSolution LLC, traffictransitsolution.us, 194.104.8.0/24 (Net-traffictransitsolution-37)
|
||
185.102.112.46, Fine Group Servers Solutions LLC, finegroupservers.com, 185.102.112.0/24 (Net-finegroupservers-13)
|
||
212.193.12.64, Fine Group Servers Solutions LLC, finegroupservers.com, 212.193.12.0/24 (FINE_GROUP_SERVERS_SOLUTIONS_LLC)
|
||
91.243.191.129, Auction LLC, dauction.ru, 91.243.191.0/24 (TR-QN-20180917)
|
||
45.148.232.161, Nikolaeva Ekaterina Sergeevna, blockchainnetworksolutions.co.uk, 45.148.232.0/23 (LONDON_NETWORK)
|
||
147.78.181.191, TrafficTransitSolution LLC, traffictransitsolution.us, 147.78.181.0/24 (Net-traffictransitsolution-58)
|
||
77.83.27.90, Alax LLC, alaxona.com, 77.83.27.0/24 (FINEGROUPSERVERS-LEASE)
|
||
185.250.46.119, Dmitry Vorozhtsov, mgn-host.ru, 185.250.46.0/23 (GB-FITZISP-20181106)
|
||
94.231.219.106, LIR LLC, lir.am, 94.231.219.0/24 (CN-NET-219)
|
||
45.12.65.56, Sysoev Aleksey Anatolevich, jobbuzzactiv.com / traffictransitsolution.us, 45.12.65.0/24 (TrafficTransitSolutionNet)
|
||
45.140.206.31, Bulgakov Alexey Yurievich, blockchainnetworksolutions.co.uk / info@finegroupservers.com, 45.140.206.0/23 (FRANKFURT_NETWORK)
|
||
84.54.57.130, LIR LLC, lir.am / traffictransitsolution.us, 84.54.56.0/23 (CN-FTNET-5456)
|
||
178.20.214.235, Alaxona Internet Inc., alaxona.com / finegroupservers.com, 178.20.214.0/24 (FINEGROUPSERVERS-LEASE)
|
||
37.44.253.204, Atex LLC, atex.ru / blockchainnetworksolutions.co.uk, 37.44.252.0/23 (NL-FTN-44252)
|
||
46.161.61.242, Petersburg Internet Network Ltd., pinspb.ru / abusemail@depo40.ru, 46.161.61.0/24 (FineTransitDE)
|
||
194.87.113.141, Fine Group Servers Solutions LLC, finegroupservers.com, 194.87.113.0/24 (FINE_GROUP_SERVERS_SOLUTIONS_LLC)
|
||
109.94.223.217, LIR LLC, lir.am / traffictransitsolution.us, 109.94.223.0/24 (CN-NET-223)
|
||
94.231.217.115, LIR LLC, lir.am / traffictransitsolution.us, 94.231.217.0/24 (TR-NET-217)
|
||
146.185.202.214, Petersburg Internet Network Ltd., pinspb.ru / abusemail@depo40.ru / abuse@ripe.net, 146.185.202.0/24 (FineTransitRU)
|
||
194.58.68.110, Fine Group Servers Solutions LLC, finegroupservers.com, 194.58.68.0/24 (FINE_GROUP_SERVERS_SOLUTIONS_LLC)
|
||
94.154.131.237, TrafficTransitSolution LLC, traffictransitsolution.us, 94.154.131.0/24 (TrafficTransitSolutionNet)
|
||
193.202.8.245, Fine Group Servers Solutions LLC, finegroupservers.com, 193.202.8.0/21 (FTL5)
|
||
212.192.27.33, Fine Group Servers Solutions LLC, finegroupservers.com, 212.192.27.0/24 (FINE_GROUP_SERVERS_SOLUTIONS_LLC)
|
||
193.202.87.218, Fine Group Servers Solutions LLC, finegroupservers.com, 193.202.84.0/22 (FTEL-2)
|
||
146.185.200.52, Petersburg Internet Network Ltd., pinspb.ru / abusemail@depo40.ru / abuse@ripe.net, 146.185.200.0/24 (FineTransitRU)
|
||
194.104.11.11, TrafficTransitSolution LLC, traffictransitsolution.us, 194.104.11.0/24 (Net-traffictransitsolution-40)
|
||
185.50.250.145, ATOMOHOST LLC, atomohost.com, 185.50.250.0/24 (Silverstar_Invest_Limited)
|
||
37.9.46.68, Petersburg Internet Network Ltd., pinspb.ru / abusemail@depo40.ru / abuse@ripe.net / , 37.9.44.0/22 (QUALITYNETWORK)
|
||
185.81.145.14, Fine Group Servers Solutions LLC, finegroupservers.com, 185.81.145.0/24 (Net-finegroupservers-20)
|
||
5.183.255.72, TrafficTransitSolution LLC, traffictransitsolution.us, 5.183.255.0/24 (Net-traffictransitsolution-32)
|
||
84.54.58.204, LIR LLC, lir.am / traffictransitsolution.us, 84.54.58.0/24 (GB-BTTGROUP-20181119)
|
||
109.236.55.175, Mosnet LLC, mosnetworks.ru / info@traffictransitsolution.us, 109.236.55.0/24 (CN-NET-55)
|
||
5.133.123.184, Mosnet LLC, mosnet.ru / abuse@blockchainnetworksolutions.co.uk, 5.133.123.0/24 (DE-NET5133123)
|
||
5.181.168.90, Fine Group Servers Solutions LLC, finegroupservers.com, 5.181.168.0/24 (Net-finegroupservers-5)
|
||
185.61.217.86, TrafficTransitSolution LLC, traffictransitsolution.us, 185.61.217.0/24 (Net-traffictransitsolution-46)
|
||
217.145.227.84, TrafficTransitSolution LLC, traffictransitsolution.us, 217.145.227.0/24 (Net-traffictransitsolution-64)
|
||
193.56.75.29, Auction LLC, dauction.ru / abuse@blockchainnetworksolutions.co.uk, 193.56.75.0/24 (CN-NET-75)
|
||
45.132.184.212, TrafficTransitSolution LLC, traffictransitsolution.us, 45.132.184.0/24 (Net-traffictransitsolution-5)
|
||
45.10.167.239, TrafficTransitSolution LLC, traffictransitsolution.us, 45.10.167.0/24 (Net-traffictransitsolution-28)
|
||
109.94.222.106, Express Courier LLC, expcourier.ru / info@traffictransitsolution.us, 109.94.222.0/24 (IN-NET-222)
|
||
62.76.232.218, Fine Group Servers Solutions LLC, finegroupservers.com, 62.76.232.0/24 (FINE_GROUP_SERVERS_SOLUTIONS_LLC)
|
||
147.78.183.221, TrafficTransitSolution LLC, traffictransitsolution.us, 147.78.183.0/24 (Net-traffictransitsolution-60)
|
||
94.158.22.202, Auction LLC, dauction.ru / info@traffictransitsolution.us, 94.158.22.0/24 (FR-QN-20180917)
|
||
85.202.194.33, Mosnet LLC, mosnet.ru / info@traffictransitsolution.us, 85.202.194.0/24 (DE-QN-20180917)
|
||
193.187.93.150, Fine Group Servers Solutions LLC, finegroupservers.com, 193.187.92.0/22 (FTL3)
|
||
185.250.45.149, Dmitry Vorozhtsov, mgn-host.ru / abuse@fitz-isp.uk, 185.250.44.0/23 (GB-FITZISP-20181106)
|
||
185.50.251.75, ATOMOHOST LLC, atomohost.com, 185.50.251.0/24 (Silverstar_Invest_Limited)
|
||
5.183.254.117, TrafficTransitSolution LLC, traffictransitsolution.us, 5.183.254.0/24 (Net-traffictransitsolution-31)
|
||
45.132.186.187, TrafficTransitSolution LLC, traffictransitsolution.us, 45.132.186.0/24 (Net-traffictransitsolution-7)
|
||
83.171.252.105, Teleport LLC, teleport.az / abuse@blockchainnetworksolutions.co.uk, 83.171.252.0/23 (DE-FTNET-252)
|
||
45.148.127.37, TrafficTransitSolution LLC, traffictransitsolution.us, 45.148.127.0/24 (Net-traffictransitsolution-16)
|
||
194.87.115.133, Fine Group Servers Solutions LLC, finegroupservers.com, 194.87.115.0/24 (FINE_GROUP_SERVERS_SOLUTIONS_LLC)
|
||
193.233.250.100, OOO Freenet Group, free.net / abuse@vmage.ru, 193.233.250.0/24 (TrafficTransitSolutionNet)
|
||
194.87.116.246, Fine Group Servers Solutions LLC, finegroupservers.com, 194.87.116.0/24 (FINE_GROUP_SERVERS_SOLUTIONS_LLC)
|
||
195.133.25.244, Fine Group Servers Solutions LLC, finegroupservers.com, 195.133.25.0/24 (FINE_GROUP_SERVERS_SOLUTIONS_LLC)
|
||
77.220.194.159, Fine Group Servers Solutions LLC, finegroupservers.com, 77.220.194.0/24 (Net-finegroupservers-3)
|
||
185.89.101.177, ATOMOHOST LLC, atomohost.com, 185.89.100.0/23 (QUALITYNETWORK)
|
||
193.151.191.133, Alax LLC, alaxona.com / info@finegroupservers.com, 193.151.191.0/24 (FINEGROUPSERVERS-LEASE)
|
||
5.181.170.147, Fine Group Servers Solutions LLC, finegroupservers.com, 5.181.170.0/24 (Net-finegroupservers-7)
|
||
193.233.249.167, OOO Freenet Group, free.net / abuse@vmage.ru, 193.233.249.0/24 (TrafficTransitSolutionNet)
|
||
46.161.59.90, Petersburg Internet Network Ltd., pinspb.ru / abusemail@depo40.ru, 46.161.59.0/24 (FineTransitJP)
|
||
213.108.3.74, TrafficTransitSolution LLC, traffictransitsolution.us, 213.108.3.0/24 (Net-traffictransitsolution-24)
|
||
193.233.251.238, OOO Freenet Group, free.net / abuse@vmage.ru, 193.233.251.0/24 (TrafficTransitSolutionNet)
|
||
178.20.215.224, Alaxona Internet Inc., alaxona.com / info@finegroupservers.com, 178.20.215.0/24 (FINEGROUPSERVERS-LEASE)
|
||
45.159.22.199, Server LLC, ixserv.ru / info@finegroupservers.com, 45.159.22.0/24 (FINEGROUPSERVERS-LEASE)
|
||
109.236.53.244, Mosnet LLC, mosnet.ru, info@traffictransitsolution.us, 109.236.53.0/24 (TR-NET-53)
|
||
</code></pre><ul>
|
||
<li>I found a better way to get the ASNs using my <code>resolve-addresses-geoip2.py</code> script
|
||
<ul>
|
||
<li>First, get a list of all IPs making requests to nginx today:</li>
|
||
</ul>
|
||
</li>
|
||
</ul>
|
||
<pre><code class="language-console" data-lang="console"># grep -v -E "(mahider|Googlebot|Turnitin|Grammarly|Unpaywall|UptimeRobot|bot)" /var/log/nginx/access.log | awk '{print $1}' | sort | uniq > /tmp/ips-sorted.txt
|
||
# wc -l /tmp/ips-sorted.txt
|
||
10776 /tmp/ips-sorted.txt
|
||
</code></pre><ul>
|
||
<li>Then resolve them all:</li>
|
||
</ul>
|
||
<pre><code class="language-console:" data-lang="console:">$ ./ilri/resolve-addresses-geoip2.py -i /tmp/ips-sorted.txt -o /tmp/out.csv
|
||
</code></pre><ul>
|
||
<li>Then get the top 10 organizations and top ten ASNs:</li>
|
||
</ul>
|
||
<pre><code class="language-console" data-lang="console">$ csvcut -c 2 /tmp/out.csv | sed 1d | sort | uniq -c | sort -n | tail -n 10
|
||
213 AMAZON-AES
|
||
218 ASN-QUADRANET-GLOBAL
|
||
246 Silverstar Invest Limited
|
||
347 Ethiopian Telecommunication Corporation
|
||
475 DEDIPATH-LLC
|
||
504 AS-COLOCROSSING
|
||
598 UAB Rakrejus
|
||
814 UGB Hosting OU
|
||
1010 ST-BGP
|
||
1757 Global Layer B.V.
|
||
$ csvcut -c 3 /tmp/out.csv | sed 1d | sort | uniq -c | sort -n | tail -n 10
|
||
213 14618
|
||
218 8100
|
||
246 35624
|
||
347 24757
|
||
475 35913
|
||
504 36352
|
||
598 62282
|
||
814 206485
|
||
1010 46844
|
||
1757 49453
|
||
</code></pre><ul>
|
||
<li>I will download blocklists for all these except Ethiopian Telecom, Quadranet, and Amazon, though I’m concerned about Global Layer because it’s a huge ASN that seems to have legit hosts too…?</li>
|
||
</ul>
|
||
<pre><code class="language-console" data-lang="console">$ wget https://asn.ipinfo.app/api/text/nginx/AS49453
|
||
$ wget https://asn.ipinfo.app/api/text/nginx/AS46844
|
||
$ wget https://asn.ipinfo.app/api/text/nginx/AS206485
|
||
$ wget https://asn.ipinfo.app/api/text/nginx/AS62282
|
||
$ wget https://asn.ipinfo.app/api/text/nginx/AS36352
|
||
$ wget https://asn.ipinfo.app/api/text/nginx/AS35624
|
||
$ cat AS* | sort | uniq > /tmp/abusive-networks.txt
|
||
$ wc -l /tmp/abusive-networks.txt
|
||
2276 /tmp/abusive-networks.txt
|
||
</code></pre><ul>
|
||
<li>Combining with my existing rules and filtering uniques:</li>
|
||
</ul>
|
||
<pre><code class="language-console" data-lang="console">$ cat roles/dspace/templates/nginx/abusive-networks.conf.j2 /tmp/abusive-networks.txt | grep deny | sort | uniq | wc -l
|
||
2298
|
||
</code></pre><ul>
|
||
<li><a href="https://scamalytics.com/ip/isp">According to Scamlytics all these are high risk ISPs</a> (as recently as 2021-06) so I will just keep blocking them</li>
|
||
<li>I deployed the block list on CGSpace (linode18) and the load is down to 1.0 but I see there are still some DDoS IPs getting through… sigh</li>
|
||
<li>The next thing I need to do is purge all the IPs from Solr using grepcidr…</li>
|
||
</ul>
|
||
<!-- raw HTML omitted -->
|
||
|
||
|
||
|
||
|
||
|
||
</article>
|
||
|
||
|
||
|
||
</div> <!-- /.blog-main -->
|
||
|
||
<aside class="col-sm-3 ml-auto blog-sidebar">
|
||
|
||
|
||
|
||
<section class="sidebar-module">
|
||
<h4>Recent Posts</h4>
|
||
<ol class="list-unstyled">
|
||
|
||
|
||
<li><a href="/cgspace-notes/2021-06/">June, 2021</a></li>
|
||
|
||
<li><a href="/cgspace-notes/2021-06/">July, 2021</a></li>
|
||
|
||
<li><a href="/cgspace-notes/2021-05/">May, 2021</a></li>
|
||
|
||
<li><a href="/cgspace-notes/2021-04/">April, 2021</a></li>
|
||
|
||
<li><a href="/cgspace-notes/2021-03/">March, 2021</a></li>
|
||
|
||
</ol>
|
||
</section>
|
||
|
||
|
||
|
||
|
||
<section class="sidebar-module">
|
||
<h4>Links</h4>
|
||
<ol class="list-unstyled">
|
||
|
||
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
|
||
|
||
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
|
||
|
||
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
|
||
|
||
</ol>
|
||
</section>
|
||
|
||
</aside>
|
||
|
||
|
||
</div> <!-- /.row -->
|
||
</div> <!-- /.container -->
|
||
|
||
|
||
|
||
<footer class="blog-footer">
|
||
<p dir="auto">
|
||
|
||
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
|
||
|
||
</p>
|
||
<p>
|
||
<a href="#">Back to top</a>
|
||
</p>
|
||
</footer>
|
||
|
||
|
||
</body>
|
||
|
||
</html>
|