mirror of
https://github.com/ilri/csv-metadata-quality.git
synced 2025-05-08 06:06:00 +02:00
Add check for "suspicious" characters
These standalone characters often indicate issues with encoding or copy/paste in languages with accents like French and Spanish. For example: foreˆt should be forêt. It is not possible to fix these issues automatically, but this will print a warning so you can notify the owner of the data.
This commit is contained in:
@ -105,3 +105,14 @@ def test_check_valid_date():
|
||||
result = check.date(value)
|
||||
|
||||
assert result == value
|
||||
|
||||
|
||||
def test_check_suspicious_characters(capsys):
|
||||
'''Test checking for suspicious characters.'''
|
||||
|
||||
value = 'foreˆt'
|
||||
|
||||
check.suspicious_characters(value)
|
||||
|
||||
captured = capsys.readouterr()
|
||||
assert captured.out == f'Suspicious character: {value}\n'
|
||||
|
Reference in New Issue
Block a user