Add notes for 2022-12-08

This commit is contained in:
2022-12-08 18:59:57 +02:00
parent 4200ae4189
commit 1bafe6ce71
29 changed files with 107 additions and 34 deletions

View File

@ -88,5 +88,40 @@ $ csvgrep -c matched -m true /tmp/cgspace-matches.csv | wc -l
- This means I've added a few thousand UN M.49 regions to the `cg.coverage.subregion` field in the last few days
- I had to extract them from CGSpace and delete them using `delete-metadata-values.py`
- My [DSpace 7.x pull request to tell ImageMagick about the PDF CropBox](https://github.com/DSpace/DSpace/pull/8550) was merged
- Start a harvest on AReS
## 2022-12-08
- While on the plane I decided to fix some ORCID identifiers, as I had seen some poorly formatted ones
- I couldn't remember the XPath syntax so this was kinda ghetto:
```console
$ xmllint --xpath '//node/isComposedBy/node()' dspace/config/controlled-vocabularies/cg-creator-identifier.xml | grep -oE 'label=".*"' | sed -e 's/label="//' -e 's/"$//' > /tmp/orcid-names.txt
$ ./ilri/update-orcids.py -i /tmp/orcid-names.txt -db dspace -u dspace -p 'fuuu' -m 247
```
- After that there were still some poorly formatted ones that my script didn't fix, so perhaps these are new ones not in our list
- I dumped them and combined with the existing ones to resolve later:
```console
localhost/dspace= ☘ \COPY (SELECT dspace_object_id,text_value FROM metadatavalue WHERE metadata_field_id=247 AND text_value LIKE '%http%') to /tmp/orcid-formatting.txt;
COPY 36
```
- I think there are really just some new ones...
```console
$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-identifier.xml /tmp/orcid-formatting.txt| grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort -u > /tmp/2022-12-08-orcids.txt
$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-identifier.xml | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort -u | wc -l
1907
$ wc -l /tmp/2022-12-08-orcids.txt
1939 /tmp/2022-12-08-orcids.txt
```
- Then I applied these updates on CGSpace
- Maria mentioned that she was getting a lot more items in her daily subscription emails
- I had a hunch it was related to me updating the `last_modified` timestamp after updating a bunch of countries, regions, etc in items
- Then today I noticed this option in `dspace.cfg`: `eperson.subscription.onlynew`
- By default DSpace sends notifications for modified items too! I've disabled it now...
<!-- vim: set sw=2 ts=2: -->