Update notes for 2019-02-05

This commit is contained in:
2019-02-05 08:59:54 +02:00
parent f40f503304
commit c053b90504
4 changed files with 77 additions and 14 deletions

View File

@ -160,4 +160,34 @@ COPY 321
- At this rate I think I just need to stop paying attention to these alerts—DSpace gets thrashed when people use the APIs properly and there's nothing we can do to improve REST API performance!
- Perhaps I just need to keep increasing the Linode alert threshold (currently 300%) for this host?
## 2019-02-05
- Peter sent me corrections and deletions for the CTA subjects and as usual, there were encoding errors with some accentsÁ in his file
- In other news, it seems that the GREL syntax regarding booleans changed in OpenRefine recently, so I need to update some expressions like the one I use to detect encoding errors to use `toString()`:
```
or(
isNotNull(value.match(/.*\uFFFD.*/)),
isNotNull(value.match(/.*\u00A0.*/)),
isNotNull(value.match(/.*\u200A.*/)),
isNotNull(value.match(/.*\u2019.*/)),
isNotNull(value.match(/.*\u00b4.*/)),
isNotNull(value.match(/.*\u007e.*/))
).toString()
```
- Testing the corrections for sixty-five items and sixteen deletions using my [fix-metadata-values.py](https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897) and [delete-metadata-values.py](https://gist.github.com/alanorth/bd7d58c947f686401a2b1fadc78736be) scripts:
```
$ ./fix-metadata-values.py -i 2019-02-04-Correct-65-CTA-Subjects.csv -f cg.subject.cta -t CORRECT -m 124 -db dspace -u dspace -p 'fuu' -d
$ ./delete-metadata-values.py -i 2019-02-04-Delete-16-CTA-Subjects.csv -f cg.subject.cta -m 124 -db dspace -u dspace -p 'fuu' -d
```
- I applied them on DSpace Test and CGSpace and started a full Discovery re-index:
```
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"
$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
```
<!-- vim: set sw=2 ts=2: -->