1
0
mirror of https://github.com/alanorth/cgspace-notes.git synced 2025-01-27 05:49:12 +01:00

Add notes for 2022-05-04

This commit is contained in:
2022-05-04 11:09:45 +03:00
parent cf8f13d09c
commit b0ba32c97c
121 changed files with 1590 additions and 1026 deletions
content/posts
docs
2015-11
2015-12
2016-01
2016-02
2016-03
2016-04
2016-05
2016-06
2016-07
2016-08
2016-09
2016-10
2016-11
2016-12
2017-01
2017-02
2017-03
2017-04
2017-05
2017-06
2017-07
2017-08
2017-09
2017-10
2017-11
2017-12
2018-01
2018-02
2018-03
2018-04
2018-05
2018-06
2018-07
2018-08
2018-09
2018-10
2018-11
2018-12
2019-01
2019-02
2019-03
2019-04
2019-05
2019-06
2019-07
2019-08
2019-09
2019-10
2019-11
2019-12
2020-01
2020-02
2020-03
2020-04
2020-05
2020-06
2020-07
2020-08
2020-09
2020-10
2020-11
2020-12
2021-01
2021-02
2021-03
2021-04
2021-05
2021-06
2021-07
2021-08
2021-09
2021-10
2021-11
2021-12
2022-01
2022-02
2022-03
2022-04
2022-05
404.html
categories
cgiar-library-migration
cgspace-cgcorev2-migration
cgspace-dspace6-upgrade
index.htmlindex.xml
page
posts
robots.txtsitemap.xml
tags

@ -392,4 +392,11 @@ Total number of bot hits purged: 343
- 54.162.92.93
- 54.226.171.89
## 2022-04-28
- Had a meeting with FAO and the team from SEAFDAC, who run many repositories that are integrated with AGROVOC
- Elvi from SEAFDAC has modified the [DSpace-CRIS 6.x VIAF lookup plugin to query AGROVOC](https://github.com/eulereadgbe/DSpace/blob/sair-6.3/dspace-api/src/main/java/org/dspace/content/authority/AgrovocAuthority.java)
- Also, they are doing a nice integration similar to the WorldFish / MELSpace repositories where they store the AGROVOC URIs in DSpace and show the terms with an icon in the UI
- See: https://repository.seafdec.org.ph/handle/10862/6320
<!-- vim: set sw=2 ts=2: -->

44
content/posts/2022-05.md Normal file

@ -0,0 +1,44 @@
---
title: "May, 2022"
date: 2022-05-04T09:13:39+03:00
author: "Alan Orth"
categories: ["Notes"]
---
## 2022-05-04
- I found a few more IPs making requests using the shady Chrome 44 user agent in the last few days so I will add them to the block list too:
- 18.207.136.176
- 185.189.36.248
- 50.118.223.78
- 52.70.76.123
- 3.236.10.11
- Looking at the Solr statistics for 2022-04
- 52.191.137.59 is Microsoft, but they are using a normal user agent and making tens of thousands of requests
- 64.39.98.62 is owned by Qualys, and all their requests are probing for /etc/passwd etc
- 185.192.69.15 is in the Netherlands and is using a normal user agent, but making excessive automated HTTP requests to paths forbidden in robots.txt
- 157.55.39.159 is owned by Microsoft and identifies as bingbot so I don't know why its requests were logged in Solr
- 52.233.67.176 is owned by Microsoft and uses a normal user agent, but making excessive automated HTTP requests
- 157.55.39.144 is owned by Microsoft and uses a normal user agent, but making excessive automated HTTP requests
- 207.46.13.177 is owned by Microsoft and identifies as bingbot so I don't know why its requests were logged in Solr
- If I query Solr for `time:2022-04* AND dns:*msnbot* AND dns:*.msn.com.` I see a handful of IPs that made 41,000 requests
- I purged 93,974 hits from these IPs using my `check-spider-ip-hits.sh` script
<!--more-->
- Now looking at the Solr statistics by user agent I see:
- `SomeRandomText`
- `RestSharp/106.11.7.0`
- `MetaInspector/5.7.0 (+https://github.com/jaimeiniesta/metainspector)`
- `wp_is_mobile`
- `Mozilla/5.0 (compatible; um-LN/1.0; mailto: techinfo@ubermetrics-technologies.com; Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1"`
- `insomnia/2022.2.1`
- `ZoteroTranslationServer`
- `omgili/0.5 +http://omgili.com`
- `curb`
- `Sprout Social (Link Attachment)`
- I purged 2,900 hits from these user agents from Solr using my `check-spider-hits.sh` script
- I made a [pull request to COUNTER-Robots](https://github.com/atmire/COUNTER-Robots/pull/54) for some of these agents
- In the mean time I will add them to our local overrides in DSpace
<!-- vim: set sw=2 ts=2: -->