1
0
mirror of https://github.com/ilri/csv-metadata-quality.git synced 2025-05-08 06:06:00 +02:00

Add support for validating languages

Will validate against ISO 639-2 or ISO 639-3 depending on how long
the language field is. Otherwise will return that the language is
invalid.

Does not currently have any support for generic values like "Other".
This commit is contained in:
2019-07-29 18:59:42 +03:00
parent 1978fa7b48
commit a36454a3ac
5 changed files with 117 additions and 15 deletions

View File

@ -1,15 +1,18 @@
dc.contributor.author,birthdate,dc.identifier.issn,dc.identifier.isbn
Leading space,2019-07-29,,
Trailing space ,2019-07-29,,
Excessive space,2019-07-29,,
Miscellaenous ||whitespace | issues ,2019-07-29,,
Duplicate||Duplicate,2019-07-29,,
Invalid ISSN,2019-07-29,2321-2302,
Invalid ISBN,2019-07-29,,978-0-306-40615-6
Multiple valid ISSNs,2019-07-29,0378-5955||0024-9319,
Multiple valid ISBNs,2019-07-29,,99921-58-10-7||978-0-306-40615-7
Invalid date,2019-07-260,,
Multiple dates,2019-07-26||2019-01-10,,
Invalid multi-value separator,,0378-5955|0024-9319,
Unnecessary Unicode,2019-07-29,,
Suspicious character||foreˆt,2019-07-29,,
dc.contributor.author,birthdate,dc.identifier.issn,dc.identifier.isbn,dc.language.iso
Leading space,2019-07-29,,,
Trailing space ,2019-07-29,,,
Excessive space,2019-07-29,,,
Miscellaenous ||whitespace | issues ,2019-07-29,,,
Duplicate||Duplicate,2019-07-29,,,
Invalid ISSN,2019-07-29,2321-2302,,
Invalid ISBN,2019-07-29,,978-0-306-40615-6,
Multiple valid ISSNs,2019-07-29,0378-5955||0024-9319,,
Multiple valid ISBNs,2019-07-29,,99921-58-10-7||978-0-306-40615-7,
Invalid date,2019-07-260,,,
Multiple dates,2019-07-26||2019-01-10,,,
Invalid multi-value separator,,0378-5955|0024-9319,,
Unnecessary Unicode,2019-07-29,,,
Suspicious character||foreˆt,2019-07-29,,,
Invalid ISO 639-2 language,2019-07-29,,,jp
Invalid ISO 639-3 language,2019-07-29,,,chi
Invalid language,2019-07-29,,,Span

1 dc.contributor.author birthdate dc.identifier.issn dc.identifier.isbn dc.language.iso
2 Leading space 2019-07-29
3 Trailing space 2019-07-29
4 Excessive space 2019-07-29
5 Miscellaenous ||whitespace | issues 2019-07-29
6 Duplicate||Duplicate 2019-07-29
7 Invalid ISSN 2019-07-29 2321-2302
8 Invalid ISBN 2019-07-29 978-0-306-40615-6
9 Multiple valid ISSNs 2019-07-29 0378-5955||0024-9319
10 Multiple valid ISBNs 2019-07-29 99921-58-10-7||978-0-306-40615-7
11 Invalid date 2019-07-260
12 Multiple dates 2019-07-26||2019-01-10
13 Invalid multi-value separator 0378-5955|0024-9319
14 Unnecessary Unicode​ 2019-07-29
15 Suspicious character||foreˆt 2019-07-29
16 Invalid ISO 639-2 language 2019-07-29 jp
17 Invalid ISO 639-3 language 2019-07-29 chi
18 Invalid language 2019-07-29 Span