Add notes for 2018-03-25

This commit is contained in:
2018-03-25 22:46:48 +03:00
parent e95f2c2f49
commit c070fda9b3
3 changed files with 107 additions and 8 deletions

View File

@ -445,3 +445,48 @@ isNotNull(value.match(/.*\ufffd.*/))
- More work on the Ubuntu 18.04 readiness stuff for the [Ansible playbooks](https://github.com/ilri/rmg-ansible-public)
- The playbook now uses the system's Ruby and Node.js so I don't have to manually install RVM and NVM after
## 2018-03-25
- Looking at Peter's author corrections and trying to work out a way to find errors in OpenRefine easily
- I can find all names that have acceptable characters using a GREL expression like:
```
isNotNull(value.match(/.*[a-zA-ZáÁéèïíñØøöóúü].*/))
```
- But it's probably better to just say which characters I know for sure are not valid (like parentheses, pipe, or weird Unicode characters):
```
or(
isNotNull(value.match(/.*[(|)].*/)),
isNotNull(value.match(/.*\uFFFD.*/)),
isNotNull(value.match(/.*\u00A0.*/)),
isNotNull(value.match(/.*\u200A.*/))
)
```
- And here's one combined GREL expression to check for items marked as to delete or check so I can flag them and export them to a separate CSV (though perhaps it's time to add delete support to my `fix-metadata-values.py` script:
```
or(
isNotNull(value.match(/.*delete.*/i)),
isNotNull(value.match(/.*remove.*/i)),
isNotNull(value.match(/.*check.*/i))
)
```
- So I guess the routine is in OpenRefine is:
- Transform: trim leading/trailing whitespace
- Transform: collapse consecutive whitespace
- Custom text facet for items to delete/check
- Custom text facet for illegal characters
- Test the corrections and deletions locally, then run them on CGSpace:
```
$ ./fix-metadata-values.py -i /tmp/Correct-2928-Authors-2018-03-21.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3
$ ./delete-metadata-values.py -i /tmp/Delete-8-Authors-2018-03-21.csv -f dc.contributor.author -m 3 -db dspacetest -u dspace -p 'fuuu'
```
- Afterwards I started a full Discovery reindexing