csv-metadata-quality

mirror of https://github.com/ilri/csv-metadata-quality.git synced 2025-09-07 20:31:45 +02:00

Author	SHA1	Message	Date
Alan Orth	02f9d8a736	csv_metadata_quality/check.py: Add check for missing isbn values	2019-07-26 23:45:18 +03:00
Alan Orth	64e7a73417	README.md: Add information about checks and fixes	2019-07-26 23:20:16 +03:00
Alan Orth	dfd961d720	Bring test.csv into project	2019-07-26 23:14:37 +03:00
Alan Orth	e160b17fb0	Add ISSN and ISBN checks using python-stdnum	2019-07-26 23:14:10 +03:00
Alan Orth	30a4b0005f	csv_metadata_quality/fix.py: Remove test function	2019-07-26 22:56:40 +03:00
Alan Orth	b657c51fd2	Add initial README.md with intro, license, and todo	2019-07-26 22:18:38 +03:00
Alan Orth	5c6453b397	Add GPLv3 license	2019-07-26 22:16:16 +03:00
Alan Orth	232d28e13e	Refactor as package with subpackages This makes it cleaner for introducing checks, fixes, tests, docs, and tests in the future. Currently can be run like this: python -m csv_metadata_quality CSV input and output paths are still hard coded. See: https://dev.to/codemouse92/dead-simple-python-project-structure-and-imports-38c6	2019-07-26 22:11:10 +03:00
Alan Orth	ef5b8f7244	fix.py: Massive improvements Use Python's str.strip() instead of kludgy regular expressions and use split/join to handle multi-value fields more cleanly.	2019-07-26 19:31:55 +03:00
Alan Orth	801870e0ba	Add fix.py Initial working version of metadata cleaning script that fixes lea- ding and trailing whitespace (even in DSpace multi-value fields).	2019-07-26 19:08:28 +03:00
Alan Orth	21b78b9519	Initial commit Pipenv environment with Pandas.	2019-07-26 17:54:13 +03:00