mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-25 08:00:18 +01:00
Add notes for 2023-01-17
This commit is contained in:
parent
3f4e42fe37
commit
ddb1ce8f4e
@ -153,4 +153,136 @@ $ curl -v "https://dspace7test.ilri.org/api/core/items" -H "Authorization: Beare
|
||||
|
||||
- Start a harvest on AReS
|
||||
|
||||
## 2023-01-16
|
||||
|
||||
- Batch import four IFPRI items for CGIAR Initiative on Low-Emission Food Systems
|
||||
- Batch import another twenty-eight items for IFPRI across several Initiatives
|
||||
- On this one I did quite a bit of extra work to check for CRPs and data/code URLs in the acknowledgements, licenses, volume/issue/extent, etc
|
||||
- I fixed some authors, an ISBN, and added extra AGROVOC keywords from the abstracts
|
||||
- Then I checked for duplicates and ran it through csv-metadata-quality to make sure the countries/regions matched and there were no duplicate metadata values
|
||||
|
||||
## 2023-01-17
|
||||
|
||||
- Batch import another twenty-three items for IFPRI across several Initiatives
|
||||
- I checked the IFPRI eBrary for extra CRPs and data/code URLs in the acknowledgements, licenses, volume/issue/extent, etc
|
||||
- I fixed some authors, an ISBN, and added extra AGROVOC keywords from the abstracts
|
||||
- Then I found and removed one duplicate in these items, as well as another on CGSpace already (!): 10568/126669
|
||||
- Then I ran it through csv-metadata-quality to make sure the countries/regions matched and there were no duplicate metadata values
|
||||
- I exported the Initiatives collection to check the mappings, regions, and other metadata with csv-metadata-quality
|
||||
- I also added a bunch of ORCID identifiers to my list and tagged 837 new metadata values on CGSpace
|
||||
- There is a high load on CGSpace pretty regularly
|
||||
- Looking at Munin it shows there is a marked increase in DSpace sessions the last few weeks:
|
||||
|
||||
![DSpace sessions year](/cgspace-notes/2023/01/jmx_dspace_sessions-year.png)
|
||||
|
||||
- Is this attributable to all the PRMS harvesting?
|
||||
- I also see some PostgreSQL locks starting earlier today:
|
||||
|
||||
![PostgreSQL locks day](/cgspace-notes/2023/01/postgres_connections_ALL-day.png)
|
||||
|
||||
- I'm curious to see what kinds of IPs have been connecting, so I will look at the last few weeks:
|
||||
|
||||
```console
|
||||
# zcat --force /var/log/nginx/{rest,access,library-access,oai}.log /var/log/nginx/{rest,access,library-access,oai}.log.1 /var/log/nginx/{rest,access,library-access,oai}.log.{2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25}.gz | awk '{print $1}' | sort | uniq > /tmp/2023-01-17-cgspace-ips.txt
|
||||
# wc -l /tmp/2023-01-17-cgspace-ips.txt
|
||||
129446 /tmp/2023-01-17-cgspace-ips.txt
|
||||
```
|
||||
|
||||
- I ran the IPs through my `resolve-addresses-geoip2.py` script to resolve their ASNs/networks, then extracted some lists of data center ISPs by eyeballing them (Amazon, Google, Microsoft, Apple, DigitalOcean, HostRoyale, and a dozen others):
|
||||
|
||||
```console
|
||||
$ csvgrep -c asn -r '^(8075|714|16276|15169|23576|24940|13238|32934|14061|12876|55286|203020|204287|7922|50245|6939|16509|14618)$' \
|
||||
/tmp/2023-01-17-cgspace-ips.csv | csvcut -c network | \
|
||||
sed 1d | sort | uniq > /tmp/networks-to-block.txt
|
||||
$ wc -l /tmp/networks-to-block.txt
|
||||
776 /tmp/networks-to-block.txt
|
||||
```
|
||||
|
||||
- I added the list of networks to nginx's `bot-networks.conf` so they will all be heavily rate limited
|
||||
- Looking at the Munin stats again I see the load has been extra high since yesterday morning:
|
||||
|
||||
![CPU week](/cgspace-notes/2023/01/cpu-week.png)
|
||||
|
||||
- But still, it's suspicious that there are so many PostgreSQL locks
|
||||
- Looking at the Solr stats to check the hits the last month (actually I skipped December because I was so busy)
|
||||
- I see 31.148.223.10 is on ALFA TELECOM s.r.o. in Russia and it made 43,000 requests this month (and 400,000 more last month!)
|
||||
- I see 18.203.245.60 is on Amazon and it uses weird user agents, different with each request
|
||||
- I see 3.249.192.212 is on Amazon and it uses weird user agents, different with each request
|
||||
- I see 34.244.160.145 is on Amazon and it uses weird user agents, different with each request
|
||||
- I see 52.213.59.101 is on Amazon and it uses weird user agents, different with each request
|
||||
- I see 91.209.8.29 is in Bulgaria on DGM EOOD and is low risk according to Scamlytics, but their user agent is all lower case and it's a data center ISP so nope
|
||||
- I see 54.78.176.127 is on Amazon and it uses weird user agents, different with each request
|
||||
- I see 54.246.128.111 is on Amazon and it uses weird user agents, different with each request
|
||||
- I see 54.74.197.53 is on Amazon and it uses weird user agents, different with each request
|
||||
- I see 52.16.103.133 is on Amazon and it uses weird user agents, different with each request
|
||||
- I see 63.32.99.252 is on Amazon and it uses weird user agents, different with each request
|
||||
- I see 176.34.141.181 is on Amazon and it uses weird user agents, different with each request
|
||||
- I see 34.243.17.80 is on Amazon and it uses weird user agents, different with each request
|
||||
- I see 34.240.206.16 is on Amazon and it uses weird user agents, different with each request
|
||||
- I see 18.203.81.120 is on Amazon and it uses weird user agents, different with each request
|
||||
- I see 176.97.210.106 is on Tube Hosting and is rate VERY BAD, malicious, scammy on everything I checked
|
||||
- I see 79.110.73.54 is on ALFA TELCOM / Serverel and is using a different, weird user agent with each request
|
||||
- There are too many to count... so I will purge these and then move on to user agents
|
||||
- I purged hits from those IPs:
|
||||
|
||||
```console
|
||||
$ ./ilri/check-spider-ip-hits.sh -f /tmp/ips.txt -p
|
||||
Purging 439185 hits from 31.148.223.10 in statistics
|
||||
Purging 2151 hits from 18.203.245.60 in statistics
|
||||
Purging 1990 hits from 3.249.192.212 in statistics
|
||||
Purging 1975 hits from 34.244.160.145 in statistics
|
||||
Purging 1969 hits from 52.213.59.101 in statistics
|
||||
Purging 2540 hits from 91.209.8.29 in statistics
|
||||
Purging 1624 hits from 54.78.176.127 in statistics
|
||||
Purging 1236 hits from 54.74.197.53 in statistics
|
||||
Purging 1327 hits from 54.246.128.111 in statistics
|
||||
Purging 1108 hits from 52.16.103.133 in statistics
|
||||
Purging 1045 hits from 63.32.99.252 in statistics
|
||||
Purging 999 hits from 176.34.141.181 in statistics
|
||||
Purging 997 hits from 34.243.17.80 in statistics
|
||||
Purging 985 hits from 34.240.206.16 in statistics
|
||||
Purging 862 hits from 18.203.81.120 in statistics
|
||||
Purging 1654 hits from 176.97.210.106 in statistics
|
||||
Purging 1628 hits from 51.81.193.200 in statistics
|
||||
Purging 1020 hits from 79.110.73.54 in statistics
|
||||
Purging 842 hits from 35.153.105.213 in statistics
|
||||
Purging 1689 hits from 54.164.237.125 in statistics
|
||||
|
||||
Total number of bot hits purged: 466826
|
||||
```
|
||||
|
||||
- Looking at user agents in Solr statistics from 2022-12 and 2023-01 I see some weird ones:
|
||||
- `azure-logic-apps/1.0 (workflow e1f855704d6543f48be6205c40f4083f; version 08585300079823949478) microsoft-flow/1.0`
|
||||
- `Gov employment data scraper ([[your email]])`
|
||||
- `Microsoft.Data.Mashup (https://go.microsoft.com/fwlink/?LinkID=304225)`
|
||||
- `crownpeak`
|
||||
- `Mozilla/5.0 (compatible)`
|
||||
- Also, a ton of them are lower case, which I've never seen before... it might be possible, but looks super fishy to me:
|
||||
- `mozilla/5.0 (x11; ubuntu; linux x86_64; rv:84.0) gecko/20100101 firefox/86.0`
|
||||
- `mozilla/5.0 (macintosh; intel mac os x 11_3) applewebkit/537.36 (khtml, like gecko) chrome/89.0.4389.90 safari/537.36`
|
||||
- `mozilla/5.0 (windows nt 10.0; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/86.0.4240.75 safari/537.36`
|
||||
- `mozilla/5.0 (windows nt 10.0; win64; x64; rv:86.0) gecko/20100101 firefox/86.0`
|
||||
- `mozilla/5.0 (x11; linux x86_64) applewebkit/537.36 (khtml, like gecko) chrome/90.0.4430.93 safari/537.36`
|
||||
- `mozilla/5.0 (windows nt 10.0; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/92.0.4515.159 safari/537.36`
|
||||
- `mozilla/5.0 (windows nt 10.0; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/88.0.4324.104 safari/537.36`
|
||||
- `mozilla/5.0 (x11; linux x86_64) applewebkit/537.36 (khtml, like gecko) chrome/86.0.4240.75 safari/537.36`
|
||||
- I purged some of those:
|
||||
|
||||
```console
|
||||
$ ./ilri/check-spider-hits.sh -f /tmp/agents.txt -p
|
||||
Purging 1658 hits from azure-logic-apps\/1.0 in statistics
|
||||
Purging 948 hits from Gov employment data scraper in statistics
|
||||
Purging 786 hits from Microsoft\.Data\.Mashup in statistics
|
||||
Purging 303 hits from crownpeak in statistics
|
||||
Purging 332 hits from Mozilla\/5.0 (compatible) in statistics
|
||||
|
||||
Total number of bot hits purged: 4027
|
||||
```
|
||||
|
||||
- Then I ran all system updates on the server and rebooted it
|
||||
- Hopefully this clears the locks and the nginx mitigation helps with the load from non-human hosts in large data centers
|
||||
- I need to re-work how I'm doing this whitelisting and blacklisting... it's way too complicated now
|
||||
- Export entire CGSpace to check Initiative mappings, and add nineteen...
|
||||
- Start a harvest on AReS
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
@ -19,7 +19,7 @@ I see we have some new ones that aren’t in our list if I combine with this
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2023-01/" />
|
||||
<meta property="article:published_time" content="2023-01-01T08:44:36+03:00" />
|
||||
<meta property="article:modified_time" content="2023-01-12T23:11:42+03:00" />
|
||||
<meta property="article:modified_time" content="2023-01-15T08:10:16+03:00" />
|
||||
|
||||
|
||||
|
||||
@ -44,9 +44,9 @@ I see we have some new ones that aren’t in our list if I combine with this
|
||||
"@type": "BlogPosting",
|
||||
"headline": "January, 2023",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2023-01/",
|
||||
"wordCount": "1103",
|
||||
"wordCount": "2250",
|
||||
"datePublished": "2023-01-01T08:44:36+03:00",
|
||||
"dateModified": "2023-01-12T23:11:42+03:00",
|
||||
"dateModified": "2023-01-15T08:10:16+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -307,6 +307,151 @@ I see we have some new ones that aren’t in our list if I combine with this
|
||||
<ul>
|
||||
<li>Start a harvest on AReS</li>
|
||||
</ul>
|
||||
<h2 id="2023-01-16">2023-01-16</h2>
|
||||
<ul>
|
||||
<li>Batch import four IFPRI items for CGIAR Initiative on Low-Emission Food Systems</li>
|
||||
<li>Batch import another twenty-eight items for IFPRI across several Initiatives
|
||||
<ul>
|
||||
<li>On this one I did quite a bit of extra work to check for CRPs and data/code URLs in the acknowledgements, licenses, volume/issue/extent, etc</li>
|
||||
<li>I fixed some authors, an ISBN, and added extra AGROVOC keywords from the abstracts</li>
|
||||
<li>Then I checked for duplicates and ran it through csv-metadata-quality to make sure the countries/regions matched and there were no duplicate metadata values</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<h2 id="2023-01-17">2023-01-17</h2>
|
||||
<ul>
|
||||
<li>Batch import another twenty-three items for IFPRI across several Initiatives
|
||||
<ul>
|
||||
<li>I checked the IFPRI eBrary for extra CRPs and data/code URLs in the acknowledgements, licenses, volume/issue/extent, etc</li>
|
||||
<li>I fixed some authors, an ISBN, and added extra AGROVOC keywords from the abstracts</li>
|
||||
<li>Then I found and removed one duplicate in these items, as well as another on CGSpace already (!): 10568/126669</li>
|
||||
<li>Then I ran it through csv-metadata-quality to make sure the countries/regions matched and there were no duplicate metadata values</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>I exported the Initiatives collection to check the mappings, regions, and other metadata with csv-metadata-quality</li>
|
||||
<li>I also added a bunch of ORCID identifiers to my list and tagged 837 new metadata values on CGSpace</li>
|
||||
<li>There is a high load on CGSpace pretty regularly
|
||||
<ul>
|
||||
<li>Looking at Munin it shows there is a marked increase in DSpace sessions the last few weeks:</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p><img src="/cgspace-notes/2023/01/jmx_dspace_sessions-year.png" alt="DSpace sessions year"></p>
|
||||
<ul>
|
||||
<li>Is this attributable to all the PRMS harvesting?</li>
|
||||
<li>I also see some PostgreSQL locks starting earlier today:</li>
|
||||
</ul>
|
||||
<p><img src="/cgspace-notes/2023/01/postgres_connections_ALL-day.png" alt="PostgreSQL locks day"></p>
|
||||
<ul>
|
||||
<li>I’m curious to see what kinds of IPs have been connecting, so I will look at the last few weeks:</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># zcat --force /var/log/nginx/<span style="color:#f92672">{</span>rest,access,library-access,oai<span style="color:#f92672">}</span>.log /var/log/nginx/<span style="color:#f92672">{</span>rest,access,library-access,oai<span style="color:#f92672">}</span>.log.1 /var/log/nginx/<span style="color:#f92672">{</span>rest,access,library-access,oai<span style="color:#f92672">}</span>.log.<span style="color:#f92672">{</span>2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25<span style="color:#f92672">}</span>.gz | awk <span style="color:#e6db74">'{print $1}'</span> | sort | uniq > /tmp/2023-01-17-cgspace-ips.txt
|
||||
</span></span><span style="display:flex;"><span># wc -l /tmp/2023-01-17-cgspace-ips.txt
|
||||
</span></span><span style="display:flex;"><span>129446 /tmp/2023-01-17-cgspace-ips.txt
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>I ran the IPs through my <code>resolve-addresses-geoip2.py</code> script to resolve their ASNs/networks, then extracted some lists of data center ISPs by eyeballing them (Amazon, Google, Microsoft, Apple, DigitalOcean, HostRoyale, and a dozen others):</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ csvgrep -c asn -r <span style="color:#e6db74">'^(8075|714|16276|15169|23576|24940|13238|32934|14061|12876|55286|203020|204287|7922|50245|6939|16509|14618)$'</span> <span style="color:#ae81ff">\
|
||||
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> /tmp/2023-01-17-cgspace-ips.csv | csvcut -c network | \
|
||||
</span></span><span style="display:flex;"><span> sed 1d | sort | uniq > /tmp/networks-to-block.txt
|
||||
</span></span><span style="display:flex;"><span>$ wc -l /tmp/networks-to-block.txt
|
||||
</span></span><span style="display:flex;"><span>776 /tmp/networks-to-block.txt
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>I added the list of networks to nginx’s <code>bot-networks.conf</code> so they will all be heavily rate limited</li>
|
||||
<li>Looking at the Munin stats again I see the load has been extra high since yesterday morning:</li>
|
||||
</ul>
|
||||
<p><img src="/cgspace-notes/2023/01/cpu-week.png" alt="CPU week"></p>
|
||||
<ul>
|
||||
<li>But still, it’s suspicious that there are so many PostgreSQL locks</li>
|
||||
<li>Looking at the Solr stats to check the hits the last month (actually I skipped December because I was so busy)
|
||||
<ul>
|
||||
<li>I see 31.148.223.10 is on ALFA TELECOM s.r.o. in Russia and it made 43,000 requests this month (and 400,000 more last month!)</li>
|
||||
<li>I see 18.203.245.60 is on Amazon and it uses weird user agents, different with each request</li>
|
||||
<li>I see 3.249.192.212 is on Amazon and it uses weird user agents, different with each request</li>
|
||||
<li>I see 34.244.160.145 is on Amazon and it uses weird user agents, different with each request</li>
|
||||
<li>I see 52.213.59.101 is on Amazon and it uses weird user agents, different with each request</li>
|
||||
<li>I see 91.209.8.29 is in Bulgaria on DGM EOOD and is low risk according to Scamlytics, but their user agent is all lower case and it’s a data center ISP so nope</li>
|
||||
<li>I see 54.78.176.127 is on Amazon and it uses weird user agents, different with each request</li>
|
||||
<li>I see 54.246.128.111 is on Amazon and it uses weird user agents, different with each request</li>
|
||||
<li>I see 54.74.197.53 is on Amazon and it uses weird user agents, different with each request</li>
|
||||
<li>I see 52.16.103.133 is on Amazon and it uses weird user agents, different with each request</li>
|
||||
<li>I see 63.32.99.252 is on Amazon and it uses weird user agents, different with each request</li>
|
||||
<li>I see 176.34.141.181 is on Amazon and it uses weird user agents, different with each request</li>
|
||||
<li>I see 34.243.17.80 is on Amazon and it uses weird user agents, different with each request</li>
|
||||
<li>I see 34.240.206.16 is on Amazon and it uses weird user agents, different with each request</li>
|
||||
<li>I see 18.203.81.120 is on Amazon and it uses weird user agents, different with each request</li>
|
||||
<li>I see 176.97.210.106 is on Tube Hosting and is rate VERY BAD, malicious, scammy on everything I checked</li>
|
||||
<li>I see 79.110.73.54 is on ALFA TELCOM / Serverel and is using a different, weird user agent with each request</li>
|
||||
<li>There are too many to count… so I will purge these and then move on to user agents</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>I purged hits from those IPs:</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./ilri/check-spider-ip-hits.sh -f /tmp/ips.txt -p
|
||||
</span></span><span style="display:flex;"><span>Purging 439185 hits from 31.148.223.10 in statistics
|
||||
</span></span><span style="display:flex;"><span>Purging 2151 hits from 18.203.245.60 in statistics
|
||||
</span></span><span style="display:flex;"><span>Purging 1990 hits from 3.249.192.212 in statistics
|
||||
</span></span><span style="display:flex;"><span>Purging 1975 hits from 34.244.160.145 in statistics
|
||||
</span></span><span style="display:flex;"><span>Purging 1969 hits from 52.213.59.101 in statistics
|
||||
</span></span><span style="display:flex;"><span>Purging 2540 hits from 91.209.8.29 in statistics
|
||||
</span></span><span style="display:flex;"><span>Purging 1624 hits from 54.78.176.127 in statistics
|
||||
</span></span><span style="display:flex;"><span>Purging 1236 hits from 54.74.197.53 in statistics
|
||||
</span></span><span style="display:flex;"><span>Purging 1327 hits from 54.246.128.111 in statistics
|
||||
</span></span><span style="display:flex;"><span>Purging 1108 hits from 52.16.103.133 in statistics
|
||||
</span></span><span style="display:flex;"><span>Purging 1045 hits from 63.32.99.252 in statistics
|
||||
</span></span><span style="display:flex;"><span>Purging 999 hits from 176.34.141.181 in statistics
|
||||
</span></span><span style="display:flex;"><span>Purging 997 hits from 34.243.17.80 in statistics
|
||||
</span></span><span style="display:flex;"><span>Purging 985 hits from 34.240.206.16 in statistics
|
||||
</span></span><span style="display:flex;"><span>Purging 862 hits from 18.203.81.120 in statistics
|
||||
</span></span><span style="display:flex;"><span>Purging 1654 hits from 176.97.210.106 in statistics
|
||||
</span></span><span style="display:flex;"><span>Purging 1628 hits from 51.81.193.200 in statistics
|
||||
</span></span><span style="display:flex;"><span>Purging 1020 hits from 79.110.73.54 in statistics
|
||||
</span></span><span style="display:flex;"><span>Purging 842 hits from 35.153.105.213 in statistics
|
||||
</span></span><span style="display:flex;"><span>Purging 1689 hits from 54.164.237.125 in statistics
|
||||
</span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
|
||||
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>Total number of bot hits purged: 466826
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>Looking at user agents in Solr statistics from 2022-12 and 2023-01 I see some weird ones:
|
||||
<ul>
|
||||
<li><code>azure-logic-apps/1.0 (workflow e1f855704d6543f48be6205c40f4083f; version 08585300079823949478) microsoft-flow/1.0</code></li>
|
||||
<li><code>Gov employment data scraper ([[your email]])</code></li>
|
||||
<li><code>Microsoft.Data.Mashup (https://go.microsoft.com/fwlink/?LinkID=304225)</code></li>
|
||||
<li><code>crownpeak</code></li>
|
||||
<li><code>Mozilla/5.0 (compatible)</code></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>Also, a ton of them are lower case, which I’ve never seen before… it might be possible, but looks super fishy to me:
|
||||
<ul>
|
||||
<li><code>mozilla/5.0 (x11; ubuntu; linux x86_64; rv:84.0) gecko/20100101 firefox/86.0</code></li>
|
||||
<li><code>mozilla/5.0 (macintosh; intel mac os x 11_3) applewebkit/537.36 (khtml, like gecko) chrome/89.0.4389.90 safari/537.36</code></li>
|
||||
<li><code>mozilla/5.0 (windows nt 10.0; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/86.0.4240.75 safari/537.36</code></li>
|
||||
<li><code>mozilla/5.0 (windows nt 10.0; win64; x64; rv:86.0) gecko/20100101 firefox/86.0</code></li>
|
||||
<li><code>mozilla/5.0 (x11; linux x86_64) applewebkit/537.36 (khtml, like gecko) chrome/90.0.4430.93 safari/537.36</code></li>
|
||||
<li><code>mozilla/5.0 (windows nt 10.0; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/92.0.4515.159 safari/537.36</code></li>
|
||||
<li><code>mozilla/5.0 (windows nt 10.0; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/88.0.4324.104 safari/537.36</code></li>
|
||||
<li><code>mozilla/5.0 (x11; linux x86_64) applewebkit/537.36 (khtml, like gecko) chrome/86.0.4240.75 safari/537.36</code></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>I purged some of those:</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./ilri/check-spider-hits.sh -f /tmp/agents.txt -p
|
||||
</span></span><span style="display:flex;"><span>Purging 1658 hits from azure-logic-apps\/1.0 in statistics
|
||||
</span></span><span style="display:flex;"><span>Purging 948 hits from Gov employment data scraper in statistics
|
||||
</span></span><span style="display:flex;"><span>Purging 786 hits from Microsoft\.Data\.Mashup in statistics
|
||||
</span></span><span style="display:flex;"><span>Purging 303 hits from crownpeak in statistics
|
||||
</span></span><span style="display:flex;"><span>Purging 332 hits from Mozilla\/5.0 (compatible) in statistics
|
||||
</span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
|
||||
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>Total number of bot hits purged: 4027
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>Then I ran all system updates on the server and rebooted it
|
||||
<ul>
|
||||
<li>Hopefully this clears the locks and the nginx mitigation helps with the load from non-human hosts in large data centers</li>
|
||||
<li>I need to re-work how I’m doing this whitelisting and blacklisting… it’s way too complicated now</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>Export entire CGSpace to check Initiative mappings, and add nineteen…</li>
|
||||
<li>Start a harvest on AReS</li>
|
||||
</ul>
|
||||
<!-- raw HTML omitted -->
|
||||
|
||||
|
||||
|
BIN
docs/2023/01/cpu-week.png
Normal file
BIN
docs/2023/01/cpu-week.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 17 KiB |
BIN
docs/2023/01/jmx_dspace_sessions-year.png
Normal file
BIN
docs/2023/01/jmx_dspace_sessions-year.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 11 KiB |
BIN
docs/2023/01/postgres_connections_ALL-day.png
Normal file
BIN
docs/2023/01/postgres_connections_ALL-day.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 18 KiB |
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
|
||||
<meta property="og:updated_time" content="2023-01-12T23:11:42+03:00" />
|
||||
<meta property="og:updated_time" content="2023-01-15T08:10:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2023-01-12T23:11:42+03:00" />
|
||||
<meta property="og:updated_time" content="2023-01-15T08:10:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2023-01-12T23:11:42+03:00" />
|
||||
<meta property="og:updated_time" content="2023-01-15T08:10:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2023-01-12T23:11:42+03:00" />
|
||||
<meta property="og:updated_time" content="2023-01-15T08:10:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2023-01-12T23:11:42+03:00" />
|
||||
<meta property="og:updated_time" content="2023-01-15T08:10:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2023-01-12T23:11:42+03:00" />
|
||||
<meta property="og:updated_time" content="2023-01-15T08:10:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2023-01-12T23:11:42+03:00" />
|
||||
<meta property="og:updated_time" content="2023-01-15T08:10:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2023-01-12T23:11:42+03:00" />
|
||||
<meta property="og:updated_time" content="2023-01-15T08:10:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2023-01-12T23:11:42+03:00" />
|
||||
<meta property="og:updated_time" content="2023-01-15T08:10:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2023-01-12T23:11:42+03:00" />
|
||||
<meta property="og:updated_time" content="2023-01-15T08:10:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2023-01-12T23:11:42+03:00" />
|
||||
<meta property="og:updated_time" content="2023-01-15T08:10:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2023-01-12T23:11:42+03:00" />
|
||||
<meta property="og:updated_time" content="2023-01-15T08:10:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2023-01-12T23:11:42+03:00" />
|
||||
<meta property="og:updated_time" content="2023-01-15T08:10:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2023-01-12T23:11:42+03:00" />
|
||||
<meta property="og:updated_time" content="2023-01-15T08:10:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2023-01-12T23:11:42+03:00" />
|
||||
<meta property="og:updated_time" content="2023-01-15T08:10:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2023-01-12T23:11:42+03:00" />
|
||||
<meta property="og:updated_time" content="2023-01-15T08:10:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2023-01-12T23:11:42+03:00" />
|
||||
<meta property="og:updated_time" content="2023-01-15T08:10:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2023-01-12T23:11:42+03:00" />
|
||||
<meta property="og:updated_time" content="2023-01-15T08:10:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2023-01-12T23:11:42+03:00" />
|
||||
<meta property="og:updated_time" content="2023-01-15T08:10:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2023-01-12T23:11:42+03:00" />
|
||||
<meta property="og:updated_time" content="2023-01-15T08:10:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2023-01-12T23:11:42+03:00" />
|
||||
<meta property="og:updated_time" content="2023-01-15T08:10:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2023-01-12T23:11:42+03:00" />
|
||||
<meta property="og:updated_time" content="2023-01-15T08:10:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2023-01-12T23:11:42+03:00" />
|
||||
<meta property="og:updated_time" content="2023-01-15T08:10:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2023-01-12T23:11:42+03:00" />
|
||||
<meta property="og:updated_time" content="2023-01-15T08:10:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2023-01-12T23:11:42+03:00" />
|
||||
<meta property="og:updated_time" content="2023-01-15T08:10:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2023-01-12T23:11:42+03:00" />
|
||||
<meta property="og:updated_time" content="2023-01-15T08:10:16+03:00" />
|
||||
|
||||
|
||||
|
||||
|
@ -3,19 +3,19 @@
|
||||
xmlns:xhtml="http://www.w3.org/1999/xhtml">
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
|
||||
<lastmod>2023-01-12T23:11:42+03:00</lastmod>
|
||||
<lastmod>2023-01-15T08:10:16+03:00</lastmod>
|
||||
</url><url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||
<lastmod>2023-01-12T23:11:42+03:00</lastmod>
|
||||
<lastmod>2023-01-15T08:10:16+03:00</lastmod>
|
||||
</url><url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2023-01/</loc>
|
||||
<lastmod>2023-01-12T23:11:42+03:00</lastmod>
|
||||
<lastmod>2023-01-15T08:10:16+03:00</lastmod>
|
||||
</url><url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
|
||||
<lastmod>2023-01-12T23:11:42+03:00</lastmod>
|
||||
<lastmod>2023-01-15T08:10:16+03:00</lastmod>
|
||||
</url><url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||
<lastmod>2023-01-12T23:11:42+03:00</lastmod>
|
||||
<lastmod>2023-01-15T08:10:16+03:00</lastmod>
|
||||
</url><url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2022-12/</loc>
|
||||
<lastmod>2023-01-01T10:12:13+02:00</lastmod>
|
||||
|
BIN
static/2023/01/cpu-week.png
Normal file
BIN
static/2023/01/cpu-week.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 17 KiB |
BIN
static/2023/01/jmx_dspace_sessions-year.png
Normal file
BIN
static/2023/01/jmx_dspace_sessions-year.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 11 KiB |
BIN
static/2023/01/postgres_connections_ALL-day.png
Normal file
BIN
static/2023/01/postgres_connections_ALL-day.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 18 KiB |
Loading…
Reference in New Issue
Block a user