1
0
mirror of https://github.com/ilri/csv-metadata-quality.git synced 2025-05-16 17:43:54 +02:00

Add support for validating languages

Will validate against ISO 639-2 or ISO 639-3 depending on how long
the language field is. Otherwise will return that the language is
invalid.

Does not currently have any support for generic values like "Other".
This commit is contained in:
2019-07-29 18:59:42 +03:00
parent 1978fa7b48
commit a36454a3ac
5 changed files with 117 additions and 15 deletions

@ -43,6 +43,11 @@ def main(argv):
# Fix: duplicate metadata values
df[column] = df[column].apply(fix.duplicates)
# Check: invalid language
match = re.match(r'^.*?language.*$', column)
if match is not None:
df[column] = df[column].apply(check.language)
# Check: invalid ISSN
match = re.match(r'^.*?issn.*$', column)
if match is not None: