diff --git a/content/posts/2021-10.md b/content/posts/2021-10.md index f9109890f..7773fa6c4 100644 --- a/content/posts/2021-10.md +++ b/content/posts/2021-10.md @@ -87,7 +87,7 @@ $ csvgrep -c asn -m 14618 /tmp/mozilla-4.0-ips.csv | csvcut -c ip | sed 1d | tee 290382 GET /handle/10568/83389 ``` -- Before I purge all those I will ask someone Samuel Stacey from the System office to hopefully get an insight... +- Before I purge all those I will ask someone Samuel Stacey from the System Office to hopefully get an insight... - Meeting with Michael Victor, Peter, Jane, and Abenet about the future of repositories in the One CGIAR - Meeting with Michelle from Altmetric about their new CSV upload system - I sent her some examples of Handles that have DOIs, but no linked score (yet) to see if an association will be created when she uploads them @@ -107,4 +107,17 @@ $ ./ilri/agrovoc-lookup.py -i /tmp/agrovoc-sorted.txt -o /tmp/agrovoc-matches.cs $ csvgrep -c 'number of matches' -m '0' /tmp/agrovoc-matches.csv | csvcut -c 1 > /tmp/invalid-agrovoc.csv ``` +## 2021-10-05 + +- Sam put me in touch with Dodi from the System Office web team and he confirmed that the Amazon requests are not theirs + - I added `Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)` to the list of bad bots in nginx + - I purged all the Amazon IPs using this user agent, as well as the few other IPs I identified yesterday + +```console +$ ./ilri/check-spider-ip-hits.sh -f /tmp/robot-ips.txt -p +... + +Total number of bot hits purged: 465119 +``` + diff --git a/docs/2021-10/index.html b/docs/2021-10/index.html index 5e6507691..259c15b42 100644 --- a/docs/2021-10/index.html +++ b/docs/2021-10/index.html @@ -25,7 +25,7 @@ So we have 1879/7100 (26.46%) matching already - + @@ -56,9 +56,9 @@ So we have 1879/7100 (26.46%) matching already "@type": "BlogPosting", "headline": "October, 2021", "url": "https://alanorth.github.io/cgspace-notes/2021-10/", - "wordCount": "697", + "wordCount": "771", "datePublished": "2021-10-01T11:14:07+03:00", - "dateModified": "2021-10-01T11:14:07+03:00", + "dateModified": "2021-10-04T19:40:13+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -216,7 +216,7 @@ $ csvgrep -c asn -m 14618 /tmp/mozilla-4.0-ips.csv | csvcut -c ip | sed 1d | tee 1607 GET /handle/10568/103816 290382 GET /handle/10568/83389