mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2018-03-25
This commit is contained in:
@ -445,3 +445,48 @@ isNotNull(value.match(/.*\ufffd.*/))
|
||||
|
||||
- More work on the Ubuntu 18.04 readiness stuff for the [Ansible playbooks](https://github.com/ilri/rmg-ansible-public)
|
||||
- The playbook now uses the system's Ruby and Node.js so I don't have to manually install RVM and NVM after
|
||||
|
||||
## 2018-03-25
|
||||
|
||||
- Looking at Peter's author corrections and trying to work out a way to find errors in OpenRefine easily
|
||||
- I can find all names that have acceptable characters using a GREL expression like:
|
||||
|
||||
```
|
||||
isNotNull(value.match(/.*[a-zA-ZáÁéèïíñØøöóúü].*/))
|
||||
```
|
||||
|
||||
- But it's probably better to just say which characters I know for sure are not valid (like parentheses, pipe, or weird Unicode characters):
|
||||
|
||||
```
|
||||
or(
|
||||
isNotNull(value.match(/.*[(|)].*/)),
|
||||
isNotNull(value.match(/.*\uFFFD.*/)),
|
||||
isNotNull(value.match(/.*\u00A0.*/)),
|
||||
isNotNull(value.match(/.*\u200A.*/))
|
||||
)
|
||||
```
|
||||
|
||||
- And here's one combined GREL expression to check for items marked as to delete or check so I can flag them and export them to a separate CSV (though perhaps it's time to add delete support to my `fix-metadata-values.py` script:
|
||||
|
||||
```
|
||||
or(
|
||||
isNotNull(value.match(/.*delete.*/i)),
|
||||
isNotNull(value.match(/.*remove.*/i)),
|
||||
isNotNull(value.match(/.*check.*/i))
|
||||
)
|
||||
```
|
||||
|
||||
- So I guess the routine is in OpenRefine is:
|
||||
- Transform: trim leading/trailing whitespace
|
||||
- Transform: collapse consecutive whitespace
|
||||
- Custom text facet for items to delete/check
|
||||
- Custom text facet for illegal characters
|
||||
|
||||
- Test the corrections and deletions locally, then run them on CGSpace:
|
||||
|
||||
```
|
||||
$ ./fix-metadata-values.py -i /tmp/Correct-2928-Authors-2018-03-21.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3
|
||||
$ ./delete-metadata-values.py -i /tmp/Delete-8-Authors-2018-03-21.csv -f dc.contributor.author -m 3 -db dspacetest -u dspace -p 'fuuu'
|
||||
```
|
||||
|
||||
- Afterwards I started a full Discovery reindexing
|
||||
|
Reference in New Issue
Block a user