diff --git a/content/posts/2021-06.md b/content/posts/2021-06.md index e58adeaf3..c5394dcee 100644 --- a/content/posts/2021-06.md +++ b/content/posts/2021-06.md @@ -68,4 +68,21 @@ elasticdump --input=/home/aorth/openrxv-items_data.json --output=http://localhos - Dump and re-create indexes on AReS (as above) so I can do a harvest +## 2021-06-16 + +- Looking at the Solr statistics on CGSpace for last month I see many requests from hosts using seemingly normal Windows browser user agents, but using the MSN bot's DNS + - For example, user agent `Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0; Trident/5.0)` with DNS `msnbot-131-253-25-91.search.msn.com.` + - I queried Solr for all hits using the MSN bot DNS (`dns:*msnbot* AND dns:*.msn.com.`) and found 457,706 + - I extracted their IPs using Solr's CSV format and ran them through my `resolve-addresses.py` script and found that they all belong to MICROSOFT-CORP-MSN-AS-BLOCK (AS8075) + - Note that [Microsoft's docs say that reverse lookups on Bingbot IPs will always have "search.msn.com"](https://www.bing.com/webmasters/help/how-to-verify-bingbot-3905dc26) so it is safe to purge these as non-human traffic + - I purged the hits with `ilri/check-spider-ip-hits.sh` (though I had to do it in 3 batches because I forgot to increase the `facet.limit` so I was only getting them 100 at a time) +- Moayad sent a pull request a few days ago to re-work the harvesting on OpenRXV + - It will hopefully also fix the duplicate and missing items issues + - I had a Skype with him to discuss + - I got it running on podman-compose, but I had to fix the storage permissions on the Elasticsearch volume after the first time it tries (and fails) to run: + +```console +$ podman unshare chown 1000:1000 /home/aorth/.local/share/containers/storage/volumes/docker_esData_7/_data +``` + diff --git a/docs/2021-06/index.html b/docs/2021-06/index.html index ab249f437..ca6d5e00d 100644 --- a/docs/2021-06/index.html +++ b/docs/2021-06/index.html @@ -20,7 +20,7 @@ I simply started it and AReS was running again: - + @@ -46,9 +46,9 @@ I simply started it and AReS was running again: "@type": "BlogPosting", "headline": "June, 2021", "url": "https://alanorth.github.io/cgspace-notes/2021-06/", - "wordCount": "415", + "wordCount": "627", "datePublished": "2021-06-01T10:51:07+03:00", - "dateModified": "2021-06-10T21:41:44+03:00", + "dateModified": "2021-06-14T15:09:07+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -189,7 +189,27 @@ elasticdump --input=/home/aorth/openrxv-items_data.json --output=http://localhos - +

2021-06-16

+ +
$ podman unshare chown 1000:1000 /home/aorth/.local/share/containers/storage/volumes/docker_esData_7/_data
+
diff --git a/docs/categories/index.html b/docs/categories/index.html index 78ebc57d1..d4560cb1b 100644 --- a/docs/categories/index.html +++ b/docs/categories/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html index 092f4d8df..ab6f3e8d5 100644 --- a/docs/categories/notes/index.html +++ b/docs/categories/notes/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html index 5402488a8..b6788b861 100644 --- a/docs/categories/notes/page/2/index.html +++ b/docs/categories/notes/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html index bcdb8899a..b52bb83cc 100644 --- a/docs/categories/notes/page/3/index.html +++ b/docs/categories/notes/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html index 188f306d7..ed239c8dd 100644 --- a/docs/categories/notes/page/4/index.html +++ b/docs/categories/notes/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html index deba2eabb..4bf88c868 100644 --- a/docs/categories/notes/page/5/index.html +++ b/docs/categories/notes/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/index.html b/docs/index.html index b899868f9..f5b38e687 100644 --- a/docs/index.html +++ b/docs/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/2/index.html b/docs/page/2/index.html index 97c46af0c..bab75bbba 100644 --- a/docs/page/2/index.html +++ b/docs/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/3/index.html b/docs/page/3/index.html index f14f5ca21..4474fa79e 100644 --- a/docs/page/3/index.html +++ b/docs/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/4/index.html b/docs/page/4/index.html index 992440147..a6feb840e 100644 --- a/docs/page/4/index.html +++ b/docs/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/5/index.html b/docs/page/5/index.html index ed6d468b7..71f167c96 100644 --- a/docs/page/5/index.html +++ b/docs/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/6/index.html b/docs/page/6/index.html index 412ad3f61..b7df5cd3c 100644 --- a/docs/page/6/index.html +++ b/docs/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/7/index.html b/docs/page/7/index.html index 5d104035c..52333d1a1 100644 --- a/docs/page/7/index.html +++ b/docs/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/8/index.html b/docs/page/8/index.html index b156059db..fc09cb62e 100644 --- a/docs/page/8/index.html +++ b/docs/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/index.html b/docs/posts/index.html index 962ac5ea5..a7c3a35ea 100644 --- a/docs/posts/index.html +++ b/docs/posts/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html index 6bc425f54..abc00dc6b 100644 --- a/docs/posts/page/2/index.html +++ b/docs/posts/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html index ec5ea6555..ab332783d 100644 --- a/docs/posts/page/3/index.html +++ b/docs/posts/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html index 6c0b36a53..de276e530 100644 --- a/docs/posts/page/4/index.html +++ b/docs/posts/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html index 1234ba3a9..0ea551a45 100644 --- a/docs/posts/page/5/index.html +++ b/docs/posts/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html index 45727658b..62093e967 100644 --- a/docs/posts/page/6/index.html +++ b/docs/posts/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html index 34d64ad17..2fe1d161f 100644 --- a/docs/posts/page/7/index.html +++ b/docs/posts/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/8/index.html b/docs/posts/page/8/index.html index 48c684d90..7d8824a5c 100644 --- a/docs/posts/page/8/index.html +++ b/docs/posts/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index d9f7fcbde..7609d5e8c 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -3,19 +3,19 @@ xmlns:xhtml="http://www.w3.org/1999/xhtml"> https://alanorth.github.io/cgspace-notes/categories/ - 2021-06-10T21:41:44+03:00 + 2021-06-14T15:09:07+03:00 https://alanorth.github.io/cgspace-notes/ - 2021-06-10T21:41:44+03:00 + 2021-06-14T15:09:07+03:00 https://alanorth.github.io/cgspace-notes/2021-06/ - 2021-06-10T21:41:44+03:00 + 2021-06-14T15:09:07+03:00 https://alanorth.github.io/cgspace-notes/categories/notes/ - 2021-06-10T21:41:44+03:00 + 2021-06-14T15:09:07+03:00 https://alanorth.github.io/cgspace-notes/posts/ - 2021-06-10T21:41:44+03:00 + 2021-06-14T15:09:07+03:00 https://alanorth.github.io/cgspace-notes/2021-05/ 2021-05-30T22:09:06+03:00