1
0
mirror of https://github.com/ilri/csv-metadata-quality.git synced 2024-11-16 19:17:03 +01:00
Commit Graph

7 Commits

Author SHA1 Message Date
40d5f7d81b
Add support for removing newlines
This was tricky because of the nature of newlines. In actuality we
are removing Unix line feeds here (U+000A) because Windows carriage
returns are actually already removed by the string stripping in the
whitespace fix.

Creating the test case in Vim was difficult because I couldn't fig-
ure out how to manually enter a line feed character. In the end I
used a search and replace on a known pattern like "ALAN", replacing
it with \r. Neither entering the Unicode code point (U+000A) direc-
tly or typing an "Enter" character after ^V worked. Grrr.
2019-07-30 20:05:12 +03:00
1e444cf040
Add fix for duplicate metadata values 2019-07-29 18:05:03 +03:00
8047a57cc5
Add support for fixing "unnecessary" Unicode
These are things like non-breaking spaces, "replacement" characters,
etc that add nothing to the metadata and often cause errors during
parsing or displaying in a UI.
2019-07-29 16:38:10 +03:00
40e77db713
Add "unsafe fixes" runtime option
In this case it fixes occurences of invalid multi-value separators.
DSpace uses "||" to separate multiple values in one field, but our
editors sometimes give us files with mistakes like "|". We can fix
these to be correct multi-value separators if we are sure that the
metadata is not actually using "|" for some legitimate purpose.
2019-07-28 22:53:39 +03:00
87b1997051
Fix whitespace errors found by flake8 2019-07-28 17:47:28 +03:00
ce8f140c66
tests: Remove unused pytest import 2019-07-28 17:42:54 +03:00
41a30f1b07
Add initial tests
For now only test fixes because they return changed data. I'm not
sure how to test the checks, because they don't return data and I
can't modify them to return boolean values without breaking the app.
2019-07-27 00:36:40 +03:00