Purging 9446 hits from 45.134.26.171 in statistics
Purging 6490 hits from 3.225.28.105 in statistics
Purging 11949 hits from 217.182.21.193 in statistics
Total number of bot hits purged: 54702
```
- Export donors and affiliations from CGSpace database:
```console
localhost/dspace63= ☘ \COPY (SELECT DISTINCT text_value as "cg.contributor.donor", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 248 GROUP BY text_value ORDER BY count DESC) to /tmp/2022-02-02-donors.csv WITH CSV HEADER;
COPY 1036
localhost/dspace63= ☘ \COPY (SELECT DISTINCT text_value as "cg.contributor.affiliation", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC) to /tmp/2022-02-02-affiliations.csv WITH CSV HEADER;
COPY 7901
```
- Then check matches against the latest ROR dump:
```console
$ csvcut -c cg.contributor.donor /tmp/2022-02-02-donors.csv | sed '1d' > /tmp/2022-02-02-donors.txt
- I see we have 258/1036 (24.9%) of our donors matching ROR (as of the 2021-09-23 ROR dump)
- I see we have 1986/7901 (25.1%) of our affiliations matching ROR (as of the 2021-09-23 ROR dump)
- Update the PostgreSQL JDBC driver to 42.3.2 in the Ansible Infrastructure playbooks and deploy on DSpace Test
- Mishell from CIP sent me a copy of a security scan their ICT had done on CGSpace using QualysGuard
- The report was very long and generic, highlighting low-severity things like being able to post crap to search forms and have it appear on the results page
- Also they say we're using old jQuery and bootstrap, etc (fair enough) but there are no exploits per se
- At least now I know why all those Qualys IPs are scanning us all the time!!!
- Mishell also said she's having issues logging into CGSpace
- According to the logs her account is failing on LDAP authentication
- I checked CGSpace's LDAP credentials using ldapsearch and was able to connect so it's gotta be something with her account
- I synchronized DSpace Test with a fresh snapshot of CGSpace
- I noticed a bunch of thumbnails missing for items submitted in the last week on CGSpace so I ran the `dspace filter-media` script manually and eventually it crashed:
Full Filter Name: org.dspace.app.mediafilter.PoiWordFilter
org.dspace.app.mediafilter.PoiWordFilter
File: Agreement_on_the_Estab_of_ILRI.doc.txt
FILTERED: bitstream 31db7d05-5369-4309-adeb-3b888c80b73d (item: 10568/67391) and created 'Agreement_on_the_Estab_of_ILRI.doc.txt'
```
- Meeting with the repositories working group to discuss issues moving forward in the One CGIAR
## 2022-02-07
- Gaia sent me her feedback on the duplicates for the TAC and ICW items for CGSpace a few days ago
- I used the IDs marked "delete" in her spreadsheet to create a custom text facet with this GREL in OpenRefine:
```console
or(
isNotNull(value.match('1')),
isNotNull(value.match('4')),
isNotNull(value.match('5')),
isNotNull(value.match('6')),
isNotNull(value.match('8')),
...
sNotNull(value.match('178')),
isNotNull(value.match('186')),
isNotNull(value.match('188')),
isNotNull(value.match('189')),
isNotNull(value.match('197'))
)
```
- Then I flagged all of these (seventy-five items)...
- I decided to flag the deletes instead of star the keeps because there are some items in the original file that we not marked as duplicates so we have to keep those too
- I generated the next batch of 200 items, from IDs 201 to 400, checked them for duplicates, and then added the PDF file names to the CSV for reference: