|
02f9d8a736
|
csv_metadata_quality/check.py: Add check for missing isbn values
|
2019-07-26 23:45:18 +03:00 |
|
|
64e7a73417
|
README.md: Add information about checks and fixes
|
2019-07-26 23:20:16 +03:00 |
|
|
dfd961d720
|
Bring test.csv into project
|
2019-07-26 23:14:37 +03:00 |
|
|
e160b17fb0
|
Add ISSN and ISBN checks using python-stdnum
|
2019-07-26 23:14:10 +03:00 |
|
|
30a4b0005f
|
csv_metadata_quality/fix.py: Remove test function
|
2019-07-26 22:56:40 +03:00 |
|
|
b657c51fd2
|
Add initial README.md with intro, license, and todo
|
2019-07-26 22:18:38 +03:00 |
|
|
5c6453b397
|
Add GPLv3 license
|
2019-07-26 22:16:16 +03:00 |
|
|
232d28e13e
|
Refactor as package with subpackages
This makes it cleaner for introducing checks, fixes, tests, docs,
and tests in the future. Currently can be run like this:
python -m csv_metadata_quality
CSV input and output paths are still hard coded.
See: https://dev.to/codemouse92/dead-simple-python-project-structure-and-imports-38c6
|
2019-07-26 22:11:10 +03:00 |
|
|
ef5b8f7244
|
fix.py: Massive improvements
Use Python's str.strip() instead of kludgy regular expressions and
use split/join to handle multi-value fields more cleanly.
|
2019-07-26 19:31:55 +03:00 |
|
|
801870e0ba
|
Add fix.py
Initial working version of metadata cleaning script that fixes lea-
ding and trailing whitespace (even in DSpace multi-value fields).
|
2019-07-26 19:08:28 +03:00 |
|
|
21b78b9519
|
Initial commit
Pipenv environment with Pandas.
|
2019-07-26 17:54:13 +03:00 |
|