1
0
mirror of https://github.com/ilri/csv-metadata-quality.git synced 2024-11-16 11:07:03 +01:00
Commit Graph

7 Commits

Author SHA1 Message Date
a36454a3ac
Add support for validating languages
Will validate against ISO 639-2 or ISO 639-3 depending on how long
the language field is. Otherwise will return that the language is
invalid.

Does not currently have any support for generic values like "Other".
2019-07-29 18:59:42 +03:00
8509006165
data/test.csv: Use more descriptive tests
To make it obvious what each item is testing.
2019-07-29 17:38:46 +03:00
fa4fa3491b
Add check for "suspicious" characters
These standalone characters often indicate issues with encoding or
copy/paste in languages with accents like French and Spanish. For
example: foreˆt should be forêt.

It is not possible to fix these issues automatically, but this will
print a warning so you can notify the owner of the data.
2019-07-29 17:08:49 +03:00
8047a57cc5
Add support for fixing "unnecessary" Unicode
These are things like non-breaking spaces, "replacement" characters,
etc that add nothing to the metadata and often cause errors during
parsing or displaying in a UI.
2019-07-29 16:38:10 +03:00
40e77db713
Add "unsafe fixes" runtime option
In this case it fixes occurences of invalid multi-value separators.
DSpace uses "||" to separate multiple values in one field, but our
editors sometimes give us files with mistakes like "|". We can fix
these to be correct multi-value separators if we are sure that the
metadata is not actually using "|" for some legitimate purpose.
2019-07-28 22:53:39 +03:00
5771764ad2
data/test.csv: Add some new records to test dates
Test invalid, missing, and multiple dates.
2019-07-28 16:23:55 +03:00
f2060adadf
Move tests.csv to data directory 2019-07-27 00:02:47 +03:00