1
0
mirror of https://github.com/ilri/csv-metadata-quality.git synced 2025-05-09 14:46:00 +02:00

Add support for fixing "unnecessary" Unicode

These are things like non-breaking spaces, "replacement" characters,
etc that add nothing to the metadata and often cause errors during
parsing or displaying in a UI.
This commit is contained in:
2019-07-29 16:38:10 +03:00
parent ae66382046
commit 8047a57cc5
4 changed files with 54 additions and 0 deletions

View File

@ -25,6 +25,9 @@ def main(argv):
# Fix: whitespace
df[column] = df[column].apply(fix.whitespace)
# Fix: unnecessary Unicode
df[column] = df[column].apply(fix.unnecessary_unicode)
# Check: invalid multi-value separator
df[column] = df[column].apply(check.separators)