diff --git a/content/posts/2022-02.md b/content/posts/2022-02.md
index 4f6d2a392..e2753eb5d 100644
--- a/content/posts/2022-02.md
+++ b/content/posts/2022-02.md
@@ -63,4 +63,57 @@ sys 2m51.573s
- That is at Cornell... hmmmm who could that be?!
- Oh, the OpenArchives initiative is at Cornell... maybe this is an automated periodic check?
+## 2022-02-02
+
+- Looking at the top user agents and IP addresses in CGSpace's Solr statistics for 2022-01
+ - 64.39.98.40 made 26,000 requests, owned by Qualys so it's some kind of security scanning
+ - 45.134.26.171 made 8,000 requests and it's own by some Russian company and makes requests like this hmmmmm:
+
+```console
+45.134.26.171 - - [12/Jan/2022:06:25:27 +0100] "GET /bitstream/handle/10568/81964/varietal-2faea58f.pdf?sequence=1 HTTP/1.1" 200 1157807 "https://cgspace.cgiar.org:443/bitstream/handle/10568/81964/varietal-2faea58f.pdf" "Opera/9.64 (Windows NT 6.1; U; MRA 5.5 (build 02842); ru) Presto/2.1.1)) AND 4734=CTXSYS.DRITHSX.SN(4734,(CHR(113)||CHR(120)||CHR(120)||CHR(112)||CHR(113)||(SELECT (CASE WHEN (4734=4734) THEN 1 ELSE 0 END) FROM DUAL)||CHR(113)||CHR(120)||CHR(113)||CHR(122)||CHR(113))) AND ((3917=3917"
+```
+
+- 3.225.28.105 made 3,000 requests mostly for one CIAT collection on the REST API and it is owned by Amazon
+ - The user agent is sometimes a normal user one, and sometimes `Apache-HttpClient/4.3.4 (java 1.5)`
+- 217.182.21.193 made 2,400 requests and is on OVH
+- I purged these hits
+
+```console
+$ ./ilri/check-spider-ip-hits.sh -f /tmp/ips.txt -p
+Purging 26817 hits from 64.39.98.40 in statistics
+Purging 9446 hits from 45.134.26.171 in statistics
+Purging 6490 hits from 3.225.28.105 in statistics
+Purging 11949 hits from 217.182.21.193 in statistics
+
+Total number of bot hits purged: 54702
+```
+
+- Export donors and affiliations from CGSpace database:
+
+```console
+localhost/dspace63= ☘ \COPY (SELECT DISTINCT text_value as "cg.contributor.donor", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 248 GROUP BY text_value ORDER BY count DESC) to /tmp/2022-02-02-donors.csv WITH CSV HEADER;
+COPY 1036
+localhost/dspace63= ☘ \COPY (SELECT DISTINCT text_value as "cg.contributor.affiliation", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC) to /tmp/2022-02-02-affiliations.csv WITH CSV HEADER;
+COPY 7901
+```
+
+- Then check matches against the latest ROR dump:
+
+```console
+$ csvcut -c cg.contributor.donor /tmp/2022-02-02-donors.csv | sed '1d' > /tmp/2022-02-02-donors.txt
+$ ./ilri/ror-lookup.py -i /tmp/2022-02-02-donors.txt -r 2021-09-23-ror-data.json -o /tmp/donor-ror-matches.csv
+...
+```
+
+- I see we have 258/1036 (24.9%) of our donors matching ROR (as of the 2021-09-23 ROR dump)
+- I see we have 1986/7901 (25.1%) of our affiliations matching ROR (as of the 2021-09-23 ROR dump)
+- Update the PostgreSQL JDBC driver to 42.3.2 in the Ansible Infrastructure playbooks and deploy on DSpace Test
+- Mishell from CIP sent me a copy of a security scan their ICT had done on CGSpace using QualysGuard
+ - The report was very long and generic, highlighting low-severity things like being able to post crap to search forms and have it appear on the results page
+ - Also they say we're using old jQuery and bootstrap, etc (fair enough) but there are no exploits per se
+ - At least now I know why all those Qualys IPs are scanning us all the time!!!
+- Mishell also said she's having issues logging into CGSpace
+ - According to the logs her account is failing on LDAP authentication
+ - I checked CGSpace's LDAP credentials using ldapsearch and was able to connect so it's gotta be something with her account
+
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index c68e244d2..133069e28 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -3,19 +3,19 @@
xmlns:xhtml="http://www.w3.org/1999/xhtml">
https://alanorth.github.io/cgspace-notes/categories/
- 2022-02-01T17:54:45+03:00
+ 2022-02-02T09:11:43+03:00
https://alanorth.github.io/cgspace-notes/
- 2022-02-01T17:54:45+03:00
+ 2022-02-02T09:11:43+03:00
https://alanorth.github.io/cgspace-notes/2022-02/
- 2022-02-01T17:54:45+03:00
+ 2022-02-02T09:11:43+03:00
https://alanorth.github.io/cgspace-notes/categories/notes/
- 2022-02-01T17:54:45+03:00
+ 2022-02-02T09:11:43+03:00
https://alanorth.github.io/cgspace-notes/posts/
- 2022-02-01T17:54:45+03:00
+ 2022-02-02T09:11:43+03:00
https://alanorth.github.io/cgspace-notes/2022-01/
2022-01-31T09:00:59+03:00