mirror of
https://github.com/ilri/csv-metadata-quality.git
synced 2025-05-09 14:46:00 +02:00
Add check for "suspicious" characters
These standalone characters often indicate issues with encoding or copy/paste in languages with accents like French and Spanish. For example: foreˆt should be forêt. It is not possible to fix these issues automatically, but this will print a warning so you can notify the owner of the data.
This commit is contained in:
@ -31,6 +31,9 @@ def main(argv):
|
||||
# Check: invalid multi-value separator
|
||||
df[column] = df[column].apply(check.separators)
|
||||
|
||||
# Check: suspicious characters
|
||||
df[column] = df[column].apply(check.suspicious_characters)
|
||||
|
||||
# Fix: invalid multi-value separator
|
||||
if args.unsafe_fixes:
|
||||
df[column] = df[column].apply(fix.separators)
|
||||
|
Reference in New Issue
Block a user