1
0
mirror of https://github.com/ilri/csv-metadata-quality.git synced 2025-05-10 15:16:01 +02:00

Expand check/fix for multi-value separators

I just came across some metadata that had unnecessary multi-value
separators at the end of a field, causing a blank value to be used.

For example: "Kenya||Tanzania||"
This commit is contained in:
2021-01-03 15:30:03 +02:00
parent c26ad83534
commit 0dc66c5c4e
4 changed files with 30 additions and 5 deletions

View File

@ -103,13 +103,13 @@ def run(argv):
# Fix: unnecessary Unicode
df[column] = df[column].apply(fix.unnecessary_unicode)
# Check: invalid multi-value separator
# Check: invalid and unnecessary multi-value separators
df[column] = df[column].apply(check.separators, field_name=column)
# Check: suspicious characters
df[column] = df[column].apply(check.suspicious_characters, field_name=column)
# Fix: invalid multi-value separator
# Fix: invalid and unnecessary multi-value separators
if args.unsafe_fixes:
df[column] = df[column].apply(fix.separators, field_name=column)
# Run whitespace fix again after fixing invalid separators