87b1997051
Fix whitespace errors found by flake8
2019-07-28 17:47:28 +03:00
aadb3117eb
csv_metadata_quality/app.py: Remove unused test input files
2019-07-28 17:45:05 +03:00
e88d35ace3
csv_metadata_quality/app.py: Use regex in column match
...
Check for a column that has "issn" or "isbn" in the name rather
than by its explicit name, as the column is dc.identifier.issn now,
but will be cg.issn in the future if CG Core v2 happens.
2019-07-28 17:27:20 +03:00
196bb434fa
Add date validation
...
I'm only concerned with validating issue dates here. In DSpace they
are generally always YYYY, YYY-MM, or YYYY-MM-DD (though in theory
they could be any valid ISO8601 format).
This also checks for cases where the date is missing and where the
metadata has specified multiple dates like "1990||1991", as this is
valid, but there is no practical value for it in our system.
2019-07-28 16:11:36 +03:00
e2bb2d4df9
Main function should be "main()"
2019-07-27 23:09:16 +03:00
c47c064a13
Make output less debuggy
2019-07-27 09:21:13 +03:00
2b41f9416b
csv_metadata_quality/fix.py: Remove extra newline
2019-07-27 01:29:22 +03:00
3cf9f9452b
csv_metadata_quality/check.py: Always return field
...
We always need to return the field back so apply doesn't set it to
null when creating the new data frame.
2019-07-27 01:28:08 +03:00
18f26c343d
csv_metadata_quality/app.py: Fix path to test.csv
2019-07-27 00:25:30 +03:00
84c3b17678
csv_metadata_quality/app.py: Add comment
2019-07-26 23:49:13 +03:00
aaf3537ba4
Add check for invalid multi-value separators
2019-07-26 23:48:24 +03:00
02f9d8a736
csv_metadata_quality/check.py: Add check for missing isbn values
2019-07-26 23:45:18 +03:00
dfd961d720
Bring test.csv into project
2019-07-26 23:14:37 +03:00
e160b17fb0
Add ISSN and ISBN checks using python-stdnum
2019-07-26 23:14:10 +03:00
30a4b0005f
csv_metadata_quality/fix.py: Remove test function
2019-07-26 22:56:40 +03:00
232d28e13e
Refactor as package with subpackages
...
This makes it cleaner for introducing checks, fixes, tests, docs,
and tests in the future. Currently can be run like this:
python -m csv_metadata_quality
CSV input and output paths are still hard coded.
See: https://dev.to/codemouse92/dead-simple-python-project-structure-and-imports-38c6
2019-07-26 22:11:10 +03:00