1
0
mirror of https://github.com/ilri/csv-metadata-quality.git synced 2024-11-16 19:17:03 +01:00
Commit Graph

8 Commits

Author SHA1 Message Date
456b8a2f26
Update tests 2019-08-01 23:59:11 +03:00
1f65a28307
Add support for validating subjects against AGROVOC
Checks values in the dc.subject or dcterms.subject field against the
AGROVOC REST API hosted by FAO. Code borrowed from agrovoc-lookup.py.

See: http://agrovoc.uniroma2.it/agrovoc/agrovoc/en/
See: https://github.com/ilri/DSpace/blob/5_x-prod/agrovoc-lookup.py
2019-07-30 00:30:31 +03:00
a36454a3ac
Add support for validating languages
Will validate against ISO 639-2 or ISO 639-3 depending on how long
the language field is. Otherwise will return that the language is
invalid.

Does not currently have any support for generic values like "Other".
2019-07-29 18:59:42 +03:00
fa4fa3491b
Add check for "suspicious" characters
These standalone characters often indicate issues with encoding or
copy/paste in languages with accents like French and Spanish. For
example: foreˆt should be forêt.

It is not possible to fix these issues automatically, but this will
print a warning so you can notify the owner of the data.
2019-07-29 17:08:49 +03:00
87b1997051
Fix whitespace errors found by flake8 2019-07-28 17:47:28 +03:00
ce8f140c66
tests: Remove unused pytest import 2019-07-28 17:42:54 +03:00
196bb434fa
Add date validation
I'm only concerned with validating issue dates here. In DSpace they
are generally always YYYY, YYY-MM, or YYYY-MM-DD (though in theory
they could be any valid ISO8601 format).

This also checks for cases where the date is missing and where the
metadata has specified multiple dates like "1990||1991", as this is
valid, but there is no practical value for it in our system.
2019-07-28 16:11:36 +03:00
a849615b41
Add tests for check functions
Relies on capturing stdout.

See: https://docs.pytest.org/en/5.0.1/capture.html
2019-07-27 02:10:13 +03:00