1
0
mirror of https://github.com/ilri/csv-metadata-quality.git synced 2024-11-16 11:07:03 +01:00
Commit Graph

5 Commits

Author SHA1 Message Date
fa4fa3491b
Add check for "suspicious" characters
These standalone characters often indicate issues with encoding or
copy/paste in languages with accents like French and Spanish. For
example: foreˆt should be forêt.

It is not possible to fix these issues automatically, but this will
print a warning so you can notify the owner of the data.
2019-07-29 17:08:49 +03:00
8047a57cc5
Add support for fixing "unnecessary" Unicode
These are things like non-breaking spaces, "replacement" characters,
etc that add nothing to the metadata and often cause errors during
parsing or displaying in a UI.
2019-07-29 16:38:10 +03:00
40e77db713
Add "unsafe fixes" runtime option
In this case it fixes occurences of invalid multi-value separators.
DSpace uses "||" to separate multiple values in one field, but our
editors sometimes give us files with mistakes like "|". We can fix
these to be correct multi-value separators if we are sure that the
metadata is not actually using "|" for some legitimate purpose.
2019-07-28 22:53:39 +03:00
5771764ad2
data/test.csv: Add some new records to test dates
Test invalid, missing, and multiple dates.
2019-07-28 16:23:55 +03:00
f2060adadf
Move tests.csv to data directory 2019-07-27 00:02:47 +03:00