1
0
mirror of https://github.com/ilri/csv-metadata-quality.git synced 2024-11-16 19:17:03 +01:00
Commit Graph

5 Commits

Author SHA1 Message Date
fa4fa3491b
Add check for "suspicious" characters
These standalone characters often indicate issues with encoding or
copy/paste in languages with accents like French and Spanish. For
example: foreˆt should be forêt.

It is not possible to fix these issues automatically, but this will
print a warning so you can notify the owner of the data.
2019-07-29 17:08:49 +03:00
87b1997051
Fix whitespace errors found by flake8 2019-07-28 17:47:28 +03:00
ce8f140c66
tests: Remove unused pytest import 2019-07-28 17:42:54 +03:00
196bb434fa
Add date validation
I'm only concerned with validating issue dates here. In DSpace they
are generally always YYYY, YYY-MM, or YYYY-MM-DD (though in theory
they could be any valid ISO8601 format).

This also checks for cases where the date is missing and where the
metadata has specified multiple dates like "1990||1991", as this is
valid, but there is no practical value for it in our system.
2019-07-28 16:11:36 +03:00
a849615b41
Add tests for check functions
Relies on capturing stdout.

See: https://docs.pytest.org/en/5.0.1/capture.html
2019-07-27 02:10:13 +03:00