1
0
mirror of https://github.com/ilri/csv-metadata-quality.git synced 2025-05-15 17:23:01 +02:00
Commit Graph

23 Commits

Author SHA1 Message Date
41a30f1b07 Add initial tests
For now only test fixes because they return changed data. I'm not
sure how to test the checks, because they don't return data and I
can't modify them to return boolean values without breaking the app.
2019-07-27 00:36:40 +03:00
103e630f6e Add requirements-dev.txt
Generated with:

  $ pipenv lock -r -d > requirements-dev.txt
2019-07-27 00:33:52 +03:00
99f00fcb85 Add pytest to pipenv dev environment 2019-07-27 00:32:53 +03:00
18f26c343d csv_metadata_quality/app.py: Fix path to test.csv 2019-07-27 00:25:30 +03:00
f2060adadf Move tests.csv to data directory 2019-07-27 00:02:47 +03:00
0a751c1f25 README.md: Add SourceHut build badge 2019-07-26 23:59:31 +03:00
2eb48d8ed0 Add SourceHut build file
For now it only attempts to install the Python requirements using
pipenv. Later it will run tests with pytest.
2019-07-26 23:56:16 +03:00
7fb7f7e03c Add requirements.txt
Generated using pipenv:

  $ pipenv lock -r > requirements.txt
2019-07-26 23:54:07 +03:00
df1087b26f README.md: Improve introduction, checks, and todo 2019-07-26 23:50:41 +03:00
84c3b17678 csv_metadata_quality/app.py: Add comment 2019-07-26 23:49:13 +03:00
844b968098 tests/test.csv: Add invalid multi-value field separator 2019-07-26 23:48:45 +03:00
aaf3537ba4 Add check for invalid multi-value separators 2019-07-26 23:48:24 +03:00
02f9d8a736 csv_metadata_quality/check.py: Add check for missing isbn values 2019-07-26 23:45:18 +03:00
64e7a73417 README.md: Add information about checks and fixes 2019-07-26 23:20:16 +03:00
dfd961d720 Bring test.csv into project 2019-07-26 23:14:37 +03:00
e160b17fb0 Add ISSN and ISBN checks using python-stdnum 2019-07-26 23:14:10 +03:00
30a4b0005f csv_metadata_quality/fix.py: Remove test function 2019-07-26 22:56:40 +03:00
b657c51fd2 Add initial README.md with intro, license, and todo 2019-07-26 22:18:38 +03:00
5c6453b397 Add GPLv3 license 2019-07-26 22:16:16 +03:00
232d28e13e Refactor as package with subpackages
This makes it cleaner for introducing checks, fixes, tests, docs,
and tests in the future. Currently can be run like this:

  python -m csv_metadata_quality

CSV input and output paths are still hard coded.

See: https://dev.to/codemouse92/dead-simple-python-project-structure-and-imports-38c6
2019-07-26 22:11:10 +03:00
ef5b8f7244 fix.py: Massive improvements
Use Python's str.strip() instead of kludgy regular expressions and
use split/join to handle multi-value fields more cleanly.
2019-07-26 19:31:55 +03:00
801870e0ba Add fix.py
Initial working version of metadata cleaning script that fixes lea-
ding and trailing whitespace (even in DSpace multi-value fields).
2019-07-26 19:08:28 +03:00
21b78b9519 Initial commit
Pipenv environment with Pandas.
2019-07-26 17:54:13 +03:00