mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-12-19 11:42:19 +01:00
Add notes for 2022-02-02
This commit is contained in:
parent
b6951579f6
commit
df9927603f
@ -63,4 +63,57 @@ sys 2m51.573s
|
||||
- That is at Cornell... hmmmm who could that be?!
|
||||
- Oh, the OpenArchives initiative is at Cornell... maybe this is an automated periodic check?
|
||||
|
||||
## 2022-02-02
|
||||
|
||||
- Looking at the top user agents and IP addresses in CGSpace's Solr statistics for 2022-01
|
||||
- 64.39.98.40 made 26,000 requests, owned by Qualys so it's some kind of security scanning
|
||||
- 45.134.26.171 made 8,000 requests and it's own by some Russian company and makes requests like this hmmmmm:
|
||||
|
||||
```console
|
||||
45.134.26.171 - - [12/Jan/2022:06:25:27 +0100] "GET /bitstream/handle/10568/81964/varietal-2faea58f.pdf?sequence=1 HTTP/1.1" 200 1157807 "https://cgspace.cgiar.org:443/bitstream/handle/10568/81964/varietal-2faea58f.pdf" "Opera/9.64 (Windows NT 6.1; U; MRA 5.5 (build 02842); ru) Presto/2.1.1)) AND 4734=CTXSYS.DRITHSX.SN(4734,(CHR(113)||CHR(120)||CHR(120)||CHR(112)||CHR(113)||(SELECT (CASE WHEN (4734=4734) THEN 1 ELSE 0 END) FROM DUAL)||CHR(113)||CHR(120)||CHR(113)||CHR(122)||CHR(113))) AND ((3917=3917"
|
||||
```
|
||||
|
||||
- 3.225.28.105 made 3,000 requests mostly for one CIAT collection on the REST API and it is owned by Amazon
|
||||
- The user agent is sometimes a normal user one, and sometimes `Apache-HttpClient/4.3.4 (java 1.5)`
|
||||
- 217.182.21.193 made 2,400 requests and is on OVH
|
||||
- I purged these hits
|
||||
|
||||
```console
|
||||
$ ./ilri/check-spider-ip-hits.sh -f /tmp/ips.txt -p
|
||||
Purging 26817 hits from 64.39.98.40 in statistics
|
||||
Purging 9446 hits from 45.134.26.171 in statistics
|
||||
Purging 6490 hits from 3.225.28.105 in statistics
|
||||
Purging 11949 hits from 217.182.21.193 in statistics
|
||||
|
||||
Total number of bot hits purged: 54702
|
||||
```
|
||||
|
||||
- Export donors and affiliations from CGSpace database:
|
||||
|
||||
```console
|
||||
localhost/dspace63= ☘ \COPY (SELECT DISTINCT text_value as "cg.contributor.donor", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 248 GROUP BY text_value ORDER BY count DESC) to /tmp/2022-02-02-donors.csv WITH CSV HEADER;
|
||||
COPY 1036
|
||||
localhost/dspace63= ☘ \COPY (SELECT DISTINCT text_value as "cg.contributor.affiliation", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC) to /tmp/2022-02-02-affiliations.csv WITH CSV HEADER;
|
||||
COPY 7901
|
||||
```
|
||||
|
||||
- Then check matches against the latest ROR dump:
|
||||
|
||||
```console
|
||||
$ csvcut -c cg.contributor.donor /tmp/2022-02-02-donors.csv | sed '1d' > /tmp/2022-02-02-donors.txt
|
||||
$ ./ilri/ror-lookup.py -i /tmp/2022-02-02-donors.txt -r 2021-09-23-ror-data.json -o /tmp/donor-ror-matches.csv
|
||||
...
|
||||
```
|
||||
|
||||
- I see we have 258/1036 (24.9%) of our donors matching ROR (as of the 2021-09-23 ROR dump)
|
||||
- I see we have 1986/7901 (25.1%) of our affiliations matching ROR (as of the 2021-09-23 ROR dump)
|
||||
- Update the PostgreSQL JDBC driver to 42.3.2 in the Ansible Infrastructure playbooks and deploy on DSpace Test
|
||||
- Mishell from CIP sent me a copy of a security scan their ICT had done on CGSpace using QualysGuard
|
||||
- The report was very long and generic, highlighting low-severity things like being able to post crap to search forms and have it appear on the results page
|
||||
- Also they say we're using old jQuery and bootstrap, etc (fair enough) but there are no exploits per se
|
||||
- At least now I know why all those Qualys IPs are scanning us all the time!!!
|
||||
- Mishell also said she's having issues logging into CGSpace
|
||||
- According to the logs her account is failing on LDAP authentication
|
||||
- I checked CGSpace's LDAP credentials using ldapsearch and was able to connect so it's gotta be something with her account
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
@ -3,19 +3,19 @@
|
||||
xmlns:xhtml="http://www.w3.org/1999/xhtml">
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
|
||||
<lastmod>2022-02-01T17:54:45+03:00</lastmod>
|
||||
<lastmod>2022-02-02T09:11:43+03:00</lastmod>
|
||||
</url><url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||
<lastmod>2022-02-01T17:54:45+03:00</lastmod>
|
||||
<lastmod>2022-02-02T09:11:43+03:00</lastmod>
|
||||
</url><url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2022-02/</loc>
|
||||
<lastmod>2022-02-01T17:54:45+03:00</lastmod>
|
||||
<lastmod>2022-02-02T09:11:43+03:00</lastmod>
|
||||
</url><url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
|
||||
<lastmod>2022-02-01T17:54:45+03:00</lastmod>
|
||||
<lastmod>2022-02-02T09:11:43+03:00</lastmod>
|
||||
</url><url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||
<lastmod>2022-02-01T17:54:45+03:00</lastmod>
|
||||
<lastmod>2022-02-02T09:11:43+03:00</lastmod>
|
||||
</url><url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2022-01/</loc>
|
||||
<lastmod>2022-01-31T09:00:59+03:00</lastmod>
|
||||
|
Loading…
Reference in New Issue
Block a user