csv-metadata-quality

mirror of https://github.com/ilri/csv-metadata-quality.git synced 2025-08-31 09:02:37 +02:00

Author	SHA1	Message	Date
Alan Orth	646146d59f	Add Excel verion of test file	2019-07-28 17:07:33 +03:00
Alan Orth	c2f28194eb	Add xlrd to pipenv for Pandas read_excel support	2019-07-28 17:05:17 +03:00
Alan Orth	5771764ad2	data/test.csv: Add some new records to test dates Test invalid, missing, and multiple dates.	2019-07-28 16:23:55 +03:00
Alan Orth	196bb434fa	Add date validation I'm only concerned with validating issue dates here. In DSpace they are generally always YYYY, YYY-MM, or YYYY-MM-DD (though in theory they could be any valid ISO8601 format). This also checks for cases where the date is missing and where the metadata has specified multiple dates like "1990\|\|1991", as this is valid, but there is no practical value for it in our system.	2019-07-28 16:11:36 +03:00
Alan Orth	73b4061c7b	Add ipython to pipenv dev packages	2019-07-28 10:06:41 +03:00
Alan Orth	e2bb2d4df9	Main function should be "main()"	2019-07-27 23:09:16 +03:00
Alan Orth	7e16968bf2	README.md: Add todos	2019-07-27 19:24:35 +03:00
Alan Orth	c47c064a13	Make output less debuggy	2019-07-27 09:21:13 +03:00
Alan Orth	a849615b41	Add tests for check functions Relies on capturing stdout. See: https://docs.pytest.org/en/5.0.1/capture.html	2019-07-27 02:10:13 +03:00
Alan Orth	2b41f9416b	csv_metadata_quality/fix.py: Remove extra newline	2019-07-27 01:29:22 +03:00
Alan Orth	3cf9f9452b	csv_metadata_quality/check.py: Always return field We always need to return the field back so apply doesn't set it to null when creating the new data frame.	2019-07-27 01:28:08 +03:00
Alan Orth	1d861f263b	.build.yml: Fix setup script I wasn't chaning into the project directory so the pipenv virtual environment was not getting created in the correct place.	2019-07-27 00:41:57 +03:00
Alan Orth	33121f8a01	.build.yml: Add tests	2019-07-27 00:38:34 +03:00
Alan Orth	41a30f1b07	Add initial tests For now only test fixes because they return changed data. I'm not sure how to test the checks, because they don't return data and I can't modify them to return boolean values without breaking the app.	2019-07-27 00:36:40 +03:00
Alan Orth	103e630f6e	Add requirements-dev.txt Generated with: $ pipenv lock -r -d > requirements-dev.txt	2019-07-27 00:33:52 +03:00
Alan Orth	99f00fcb85	Add pytest to pipenv dev environment	2019-07-27 00:32:53 +03:00
Alan Orth	18f26c343d	csv_metadata_quality/app.py: Fix path to test.csv	2019-07-27 00:25:30 +03:00
Alan Orth	f2060adadf	Move tests.csv to data directory	2019-07-27 00:02:47 +03:00
Alan Orth	0a751c1f25	README.md: Add SourceHut build badge	2019-07-26 23:59:31 +03:00
Alan Orth	2eb48d8ed0	Add SourceHut build file For now it only attempts to install the Python requirements using pipenv. Later it will run tests with pytest.	2019-07-26 23:56:16 +03:00
Alan Orth	7fb7f7e03c	Add requirements.txt Generated using pipenv: $ pipenv lock -r > requirements.txt	2019-07-26 23:54:07 +03:00
Alan Orth	df1087b26f	README.md: Improve introduction, checks, and todo	2019-07-26 23:50:41 +03:00
Alan Orth	84c3b17678	csv_metadata_quality/app.py: Add comment	2019-07-26 23:49:13 +03:00
Alan Orth	844b968098	tests/test.csv: Add invalid multi-value field separator	2019-07-26 23:48:45 +03:00
Alan Orth	aaf3537ba4	Add check for invalid multi-value separators	2019-07-26 23:48:24 +03:00
Alan Orth	02f9d8a736	csv_metadata_quality/check.py: Add check for missing isbn values	2019-07-26 23:45:18 +03:00
Alan Orth	64e7a73417	README.md: Add information about checks and fixes	2019-07-26 23:20:16 +03:00
Alan Orth	dfd961d720	Bring test.csv into project	2019-07-26 23:14:37 +03:00
Alan Orth	e160b17fb0	Add ISSN and ISBN checks using python-stdnum	2019-07-26 23:14:10 +03:00
Alan Orth	30a4b0005f	csv_metadata_quality/fix.py: Remove test function	2019-07-26 22:56:40 +03:00
Alan Orth	b657c51fd2	Add initial README.md with intro, license, and todo	2019-07-26 22:18:38 +03:00
Alan Orth	5c6453b397	Add GPLv3 license	2019-07-26 22:16:16 +03:00
Alan Orth	232d28e13e	Refactor as package with subpackages This makes it cleaner for introducing checks, fixes, tests, docs, and tests in the future. Currently can be run like this: python -m csv_metadata_quality CSV input and output paths are still hard coded. See: https://dev.to/codemouse92/dead-simple-python-project-structure-and-imports-38c6	2019-07-26 22:11:10 +03:00
Alan Orth	ef5b8f7244	fix.py: Massive improvements Use Python's str.strip() instead of kludgy regular expressions and use split/join to handle multi-value fields more cleanly.	2019-07-26 19:31:55 +03:00
Alan Orth	801870e0ba	Add fix.py Initial working version of metadata cleaning script that fixes lea- ding and trailing whitespace (even in DSpace multi-value fields).	2019-07-26 19:08:28 +03:00
Alan Orth	21b78b9519	Initial commit Pipenv environment with Pandas.	2019-07-26 17:54:13 +03:00

... 2 3 4 5 6

286 Commits