- A few days ago Gaia sent me her notes on the fourth batch of TAC/ICW documents (items 701–980 in the spreadsheet)
- I created a filter in LibreOffice and selected the IDs for items with the action "delete", then I created a custom text facet in OpenRefine with this GREL:
```
or(
isNotNull(value.match('707')),
isNotNull(value.match('709')),
isNotNull(value.match('710')),
isNotNull(value.match('711')),
isNotNull(value.match('713')),
isNotNull(value.match('717')),
isNotNull(value.match('718')),
...
isNotNull(value.match('821'))
)
```
- Then I flagged all matching records, exported a CSV to use with SAFBuilder, and imported them on DSpace Test:
Due to abuse we no longer permit requests without a user agent. Please specify a descriptive user agent, for example containing the word 'bot', if you are accessing the site programmatically. For more information see here: https://dspacetest.cgiar.org/page/about.
```
- I note that the nginx log shows '-' for a request with an empty user agent, which would be indistinguishable from a request with a '-', for example these were successful:
- Maria from ABC asked about a reporting discrepancy on AReS
- I think it's because the last harvest was over the weekend, and she was expecting to see items submitted this week
- Paola from ABC said they are decomissioning the server where many of their library PDFs are hosted
- She asked if we can download them and upload them directly to CGSpace
- I re-created my local Artifactory container
- I am doing a walkthrough of DSpace 7.3-SNAPSHOT to see how things are lately
- One thing I realized is that OAI is no longer a standalone web application, it is part of the `server` app now: http://localhost:8080/server/oai/request?verb=Identify
- Then I was able to migrate to DSpace 7 with `dspace database migrate ignored` as the [DSpace upgrade notes say](https://wiki.lyrasis.org/display/DSDOC7x/Upgrading+DSpace)
- I see that the [flash of unstyled content bug](https://github.com/DSpace/dspace-angular/issues/1357) still exists on dspace-angluar... ouch!
- Create another test account for Rafael from Bioversity-CIAT to submit some items to DSpace Test:
```console
$ dspace user -a -m tip-submit@cgiar.org -g CIAT -s Submit -p 'fuuuuuuuu'
```
- I added the account to the Alliance Admins account, which is should allow him to submit to any Alliance collection
- According to my notes from [2020-10]({{< relref "2020-10.md" >}}) the account must be in the admin group in order to submit via the REST API
- Abenet and I noticed 1,735 items in CTA's community that have the title "delete"
- We asked Peter and he said we should delete them
- I exported the CTA community metadata and used OpenRefine to filter all items with the "delete" title, then used the "expunge" bulkedit action to remove them
- I realized I forgot to clean up the old Let's Encrypt certbot stuff after upgrading CGSpace (linode18) to Ubuntu 20.04 a few weeks ago
- I also removed the pre-Ubuntu 20.04 Let's Encrypt stuff from the Ansble infrastructure playbooks
- Gaia sent me her notes on the final review of duplicates of all TAC/ICW documents
- I created a filter in LibreOffice and selected the IDs for items with the action "delete", then I created a custom text facet in OpenRefine with this GREL:
```
or(
isNotNull(value.match('33')),
isNotNull(value.match('179')),
isNotNull(value.match('452')),
isNotNull(value.match('489')),
isNotNull(value.match('541')),
isNotNull(value.match('568')),
isNotNull(value.match('646')),
isNotNull(value.match('889'))
)
```
- Then I flagged all matching records, exported a CSV to use with SAFBuilder, and imported the 692 items on CGSpace, and generated the thumbnails:
- Leroy from CIAT said that the CIAT Library server has security issues so was limited to internal traffic
- I extracted a list of URLs from CGSpace to send him:
```console
localhost/dspacetest= ☘ \COPY (SELECT DISTINCT(text_value) FROM metadatavalue WHERE metadata_field_id=219 AND text_value ~ 'https?://ciat-library') to /tmp/2022-03-31-ciat-library-urls.csv WITH CSV HEADER;
COPY 4552
```
- I did some checks and cleanups in OpenRefine because there are some values with "#page" etc
- Once I sorted them there were only ~2,700, which means there are going to be almost two thousand items with duplicate PDFs
- I suggested that we might want to handle those cases specially and extract the chapters or whatever page range since they are probably books