1
0
mirror of https://github.com/ilri/csv-metadata-quality.git synced 2024-11-17 19:47:03 +01:00
Commit Graph

3 Commits

Author SHA1 Message Date
330a7b7b9c
Don't unnecessarily rewrite DataFrames for checks
By using df[column] = df[column].apply(check...) we were re-writing
the DataFrame every time we returned from a check. We don't actuall
y need to return a value at all, as the point of checks is to print
a warning to the screen. In Python a "return" statement without a v
ariable returns None.

I haven't measured the impact of this, but I assume it will mean we
are faster and use less memory.
2021-03-16 16:04:19 +02:00
a7fc5a246c
Colorize output
Some checks failed
continuous-integration/drone/push Build is failing
Messages will be colorized:

- Red for errors
- Yellow for warnings or information
- Green for fixes
2021-02-21 13:01:25 +02:00
8435ee242d
Experimental language detection using langid
Works decenty well assuming the title, abstract, and citation fields
are an accurate representation of the language as identified by the
language field. Handles ISO 639-1 (alpha 2) and ISO 639-3 (alpha 3)
values seamlessly.

This includes updated pipenv environment, test data, pytest tests
for both correct and incorrect ISO 639-1 and ISO 639-3 languages,
and a new command line option "-e".
2019-09-26 13:46:32 +03:00