1
0
mirror of https://github.com/ilri/csv-metadata-quality.git synced 2025-05-08 14:16:00 +02:00

Add check for "suspicious" characters

These standalone characters often indicate issues with encoding or
copy/paste in languages with accents like French and Spanish. For
example: foreˆt should be forêt.

It is not possible to fix these issues automatically, but this will
print a warning so you can notify the owner of the data.
This commit is contained in:
2019-07-29 17:08:49 +03:00
parent 8047a57cc5
commit fa4fa3491b
5 changed files with 39 additions and 0 deletions

View File

@ -6,3 +6,4 @@ Test,2019-06-150,,
"Doe, J.",2019-06-15||2019-01-10,,
Someone,,0378-5955|0378-5955,
Unnecessary Unicode,2019-07-29,,
Suspicious Character||foreˆt,2019-07-29,,

1 dc.contributor.author birthdate dc.identifier.issn dc.identifier.isbn
6 Doe, J. 2019-06-15||2019-01-10
7 Someone 0378-5955|0378-5955
8 Unnecessary Unicode​ 2019-07-29
9 Suspicious Character||foreˆt 2019-07-29