Commit Graph

615 Commits

Author SHA1 Message Date
Alan Orth df1087b26f
README.md: Improve introduction, checks, and todo 2019-07-26 23:50:41 +03:00
Alan Orth 84c3b17678
csv_metadata_quality/app.py: Add comment 2019-07-26 23:49:13 +03:00
Alan Orth 844b968098
tests/test.csv: Add invalid multi-value field separator 2019-07-26 23:48:45 +03:00
Alan Orth aaf3537ba4
Add check for invalid multi-value separators 2019-07-26 23:48:24 +03:00
Alan Orth 02f9d8a736
csv_metadata_quality/check.py: Add check for missing isbn values 2019-07-26 23:45:18 +03:00
Alan Orth 64e7a73417
README.md: Add information about checks and fixes 2019-07-26 23:20:16 +03:00
Alan Orth dfd961d720
Bring test.csv into project 2019-07-26 23:14:37 +03:00
Alan Orth e160b17fb0
Add ISSN and ISBN checks using python-stdnum 2019-07-26 23:14:10 +03:00
Alan Orth 30a4b0005f
csv_metadata_quality/fix.py: Remove test function 2019-07-26 22:56:40 +03:00
Alan Orth b657c51fd2
Add initial README.md with intro, license, and todo 2019-07-26 22:18:38 +03:00
Alan Orth 5c6453b397
Add GPLv3 license 2019-07-26 22:16:16 +03:00
Alan Orth 232d28e13e
Refactor as package with subpackages
This makes it cleaner for introducing checks, fixes, tests, docs,
and tests in the future. Currently can be run like this:

  python -m csv_metadata_quality

CSV input and output paths are still hard coded.

See: https://dev.to/codemouse92/dead-simple-python-project-structure-and-imports-38c6
2019-07-26 22:11:10 +03:00
Alan Orth ef5b8f7244
fix.py: Massive improvements
Use Python's str.strip() instead of kludgy regular expressions and
use split/join to handle multi-value fields more cleanly.
2019-07-26 19:31:55 +03:00
Alan Orth 801870e0ba
Add fix.py
Initial working version of metadata cleaning script that fixes lea-
ding and trailing whitespace (even in DSpace multi-value fields).
2019-07-26 19:08:28 +03:00
Alan Orth 21b78b9519
Initial commit
Pipenv environment with Pandas.
2019-07-26 17:54:13 +03:00