1
0
mirror of https://github.com/ilri/csv-metadata-quality.git synced 2025-05-08 06:06:00 +02:00

Add check for "suspicious" characters

These standalone characters often indicate issues with encoding or
copy/paste in languages with accents like French and Spanish. For
example: foreˆt should be forêt.

It is not possible to fix these issues automatically, but this will
print a warning so you can notify the owner of the data.
This commit is contained in:
2019-07-29 17:08:49 +03:00
parent 8047a57cc5
commit fa4fa3491b
5 changed files with 39 additions and 0 deletions

View File

@ -105,3 +105,14 @@ def test_check_valid_date():
result = check.date(value)
assert result == value
def test_check_suspicious_characters(capsys):
'''Test checking for suspicious characters.'''
value = 'foreˆt'
check.suspicious_characters(value)
captured = capsys.readouterr()
assert captured.out == f'Suspicious character: {value}\n'