mirror of
https://github.com/ilri/csv-metadata-quality.git
synced 2024-11-16 02:57:04 +01:00
Alan Orth
8435ee242d
Works decenty well assuming the title, abstract, and citation fields are an accurate representation of the language as identified by the language field. Handles ISO 639-1 (alpha 2) and ISO 639-3 (alpha 3) values seamlessly. This includes updated pipenv environment, test data, pytest tests for both correct and incorrect ISO 639-1 and ISO 639-3 languages, and a new command line option "-e".
29 lines
1.3 KiB
Plaintext
29 lines
1.3 KiB
Plaintext
dc.title,birthdate,dc.identifier.issn,dc.identifier.isbn,dc.language.iso,dc.subject,cg.coverage.country,filename
|
||
Leading space,2019-07-29,,,,,,
|
||
Trailing space ,2019-07-29,,,,,,
|
||
Excessive space,2019-07-29,,,,,,
|
||
Miscellaenous ||whitespace | issues ,2019-07-29,,,,,,
|
||
Duplicate||Duplicate,2019-07-29,,,,,,
|
||
Invalid ISSN,2019-07-29,2321-2302,,,,,
|
||
Invalid ISBN,2019-07-29,,978-0-306-40615-6,,,,
|
||
Multiple valid ISSNs,2019-07-29,0378-5955||0024-9319,,,,,
|
||
Multiple valid ISBNs,2019-07-29,,99921-58-10-7||978-0-306-40615-7,,,,
|
||
Invalid date,2019-07-260,,,,,,
|
||
Multiple dates,2019-07-26||2019-01-10,,,,,,
|
||
Invalid multi-value separator,2019-07-29,0378-5955|0024-9319,,,,,
|
||
Unnecessary Unicode,2019-07-29,,,,,,
|
||
Suspicious character||foreˆt,2019-07-29,,,,,,
|
||
Invalid ISO 639-1 (alpha 2) language,2019-07-29,,,jp,,,
|
||
Invalid ISO 639-3 (alpha 3) language,2019-07-29,,,chi,,,
|
||
Invalid language,2019-07-29,,,Span,,,
|
||
Invalid AGROVOC subject,2019-07-29,,,,FOREST,,
|
||
Newline (LF),2019-07-30,,,,"TANZA
|
||
NIA",,
|
||
Missing date,,,,,,,
|
||
Invalid country,2019-08-01,,,,,KENYAA,
|
||
Uncommon filename extension,2019-08-10,,,,,,file.pdf.lck
|
||
Unneccesary unicode (U+002D + U+00AD),2019-08-10,,978-92-9043-823-6,,,,
|
||
"Missing space,after comma",2019-08-27,,,,,,
|
||
Incorrect ISO 639-1 language,2019-09-26,,,es,,,
|
||
Incorrect ISO 639-3 language,2019-09-26,,,spa,,,
|