diff --git a/content/posts/2022-02.md b/content/posts/2022-02.md index 4f6d2a392..e2753eb5d 100644 --- a/content/posts/2022-02.md +++ b/content/posts/2022-02.md @@ -63,4 +63,57 @@ sys 2m51.573s - That is at Cornell... hmmmm who could that be?! - Oh, the OpenArchives initiative is at Cornell... maybe this is an automated periodic check? +## 2022-02-02 + +- Looking at the top user agents and IP addresses in CGSpace's Solr statistics for 2022-01 + - 64.39.98.40 made 26,000 requests, owned by Qualys so it's some kind of security scanning + - 45.134.26.171 made 8,000 requests and it's own by some Russian company and makes requests like this hmmmmm: + +```console +45.134.26.171 - - [12/Jan/2022:06:25:27 +0100] "GET /bitstream/handle/10568/81964/varietal-2faea58f.pdf?sequence=1 HTTP/1.1" 200 1157807 "https://cgspace.cgiar.org:443/bitstream/handle/10568/81964/varietal-2faea58f.pdf" "Opera/9.64 (Windows NT 6.1; U; MRA 5.5 (build 02842); ru) Presto/2.1.1)) AND 4734=CTXSYS.DRITHSX.SN(4734,(CHR(113)||CHR(120)||CHR(120)||CHR(112)||CHR(113)||(SELECT (CASE WHEN (4734=4734) THEN 1 ELSE 0 END) FROM DUAL)||CHR(113)||CHR(120)||CHR(113)||CHR(122)||CHR(113))) AND ((3917=3917" +``` + +- 3.225.28.105 made 3,000 requests mostly for one CIAT collection on the REST API and it is owned by Amazon + - The user agent is sometimes a normal user one, and sometimes `Apache-HttpClient/4.3.4 (java 1.5)` +- 217.182.21.193 made 2,400 requests and is on OVH +- I purged these hits + +```console +$ ./ilri/check-spider-ip-hits.sh -f /tmp/ips.txt -p +Purging 26817 hits from 64.39.98.40 in statistics +Purging 9446 hits from 45.134.26.171 in statistics +Purging 6490 hits from 3.225.28.105 in statistics +Purging 11949 hits from 217.182.21.193 in statistics + +Total number of bot hits purged: 54702 +``` + +- Export donors and affiliations from CGSpace database: + +```console +localhost/dspace63= ☘ \COPY (SELECT DISTINCT text_value as "cg.contributor.donor", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 248 GROUP BY text_value ORDER BY count DESC) to /tmp/2022-02-02-donors.csv WITH CSV HEADER; +COPY 1036 +localhost/dspace63= ☘ \COPY (SELECT DISTINCT text_value as "cg.contributor.affiliation", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC) to /tmp/2022-02-02-affiliations.csv WITH CSV HEADER; +COPY 7901 +``` + +- Then check matches against the latest ROR dump: + +```console +$ csvcut -c cg.contributor.donor /tmp/2022-02-02-donors.csv | sed '1d' > /tmp/2022-02-02-donors.txt +$ ./ilri/ror-lookup.py -i /tmp/2022-02-02-donors.txt -r 2021-09-23-ror-data.json -o /tmp/donor-ror-matches.csv +... +``` + +- I see we have 258/1036 (24.9%) of our donors matching ROR (as of the 2021-09-23 ROR dump) +- I see we have 1986/7901 (25.1%) of our affiliations matching ROR (as of the 2021-09-23 ROR dump) +- Update the PostgreSQL JDBC driver to 42.3.2 in the Ansible Infrastructure playbooks and deploy on DSpace Test +- Mishell from CIP sent me a copy of a security scan their ICT had done on CGSpace using QualysGuard + - The report was very long and generic, highlighting low-severity things like being able to post crap to search forms and have it appear on the results page + - Also they say we're using old jQuery and bootstrap, etc (fair enough) but there are no exploits per se + - At least now I know why all those Qualys IPs are scanning us all the time!!! +- Mishell also said she's having issues logging into CGSpace + - According to the logs her account is failing on LDAP authentication + - I checked CGSpace's LDAP credentials using ldapsearch and was able to connect so it's gotta be something with her account + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index c68e244d2..133069e28 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -3,19 +3,19 @@ xmlns:xhtml="http://www.w3.org/1999/xhtml"> https://alanorth.github.io/cgspace-notes/categories/ - 2022-02-01T17:54:45+03:00 + 2022-02-02T09:11:43+03:00 https://alanorth.github.io/cgspace-notes/ - 2022-02-01T17:54:45+03:00 + 2022-02-02T09:11:43+03:00 https://alanorth.github.io/cgspace-notes/2022-02/ - 2022-02-01T17:54:45+03:00 + 2022-02-02T09:11:43+03:00 https://alanorth.github.io/cgspace-notes/categories/notes/ - 2022-02-01T17:54:45+03:00 + 2022-02-02T09:11:43+03:00 https://alanorth.github.io/cgspace-notes/posts/ - 2022-02-01T17:54:45+03:00 + 2022-02-02T09:11:43+03:00 https://alanorth.github.io/cgspace-notes/2022-01/ 2022-01-31T09:00:59+03:00