mirror of
https://github.com/ilri/csv-metadata-quality.git
synced 2024-11-25 15:18:19 +01:00
Alan Orth
49e3543878
This will check all strings for un-normalized Unicode characters. Normalization is done using NFC. This includes tests and updated sample data (data/test.csv). See: https://withblue.ink/2019/03/11/why-you-need-to-normalize-unicode-strings.html
31 lines
1.4 KiB
Plaintext
31 lines
1.4 KiB
Plaintext
dc.title,dc.date.issued,dc.identifier.issn,dc.identifier.isbn,dc.language.iso,dc.subject,cg.coverage.country,filename
|
||
Leading space,2019-07-29,,,,,,
|
||
Trailing space ,2019-07-29,,,,,,
|
||
Excessive space,2019-07-29,,,,,,
|
||
Miscellaenous ||whitespace | issues ,2019-07-29,,,,,,
|
||
Duplicate||Duplicate,2019-07-29,,,,,,
|
||
Invalid ISSN,2019-07-29,2321-2302,,,,,
|
||
Invalid ISBN,2019-07-29,,978-0-306-40615-6,,,,
|
||
Multiple valid ISSNs,2019-07-29,0378-5955||0024-9319,,,,,
|
||
Multiple valid ISBNs,2019-07-29,,99921-58-10-7||978-0-306-40615-7,,,,
|
||
Invalid date,2019-07-260,,,,,,
|
||
Multiple dates,2019-07-26||2019-01-10,,,,,,
|
||
Invalid multi-value separator,2019-07-29,0378-5955|0024-9319,,,,,
|
||
Unnecessary Unicode,2019-07-29,,,,,,
|
||
Suspicious character||foreˆt,2019-07-29,,,,,,
|
||
Invalid ISO 639-1 (alpha 2) language,2019-07-29,,,jp,,,
|
||
Invalid ISO 639-3 (alpha 3) language,2019-07-29,,,chi,,,
|
||
Invalid language,2019-07-29,,,Span,,,
|
||
Invalid AGROVOC subject,2019-07-29,,,,FOREST,,
|
||
Newline (LF),2019-07-30,,,,"TANZA
|
||
NIA",,
|
||
Missing date,,,,,,,
|
||
Invalid country,2019-08-01,,,,,KENYAA,
|
||
Uncommon filename extension,2019-08-10,,,,,,file.pdf.lck
|
||
Unneccesary unicode (U+002D + U+00AD),2019-08-10,,978-92-9043-823-6,,,,
|
||
"Missing space,after comma",2019-08-27,,,,,,
|
||
Incorrect ISO 639-1 language,2019-09-26,,,es,,,
|
||
Incorrect ISO 639-3 language,2019-09-26,,,spa,,,
|
||
Composéd Unicode,2020-01-14,,,,,,
|
||
Decomposéd Unicode,2020-01-14,,,,,,
|