c47c064a13
Make output less debuggy
2019-07-27 09:21:13 +03:00
a849615b41
Add tests for check functions
...
Relies on capturing stdout.
See: https://docs.pytest.org/en/5.0.1/capture.html
2019-07-27 02:10:13 +03:00
2b41f9416b
csv_metadata_quality/fix.py: Remove extra newline
2019-07-27 01:29:22 +03:00
3cf9f9452b
csv_metadata_quality/check.py: Always return field
...
We always need to return the field back so apply doesn't set it to
null when creating the new data frame.
2019-07-27 01:28:08 +03:00
1d861f263b
.build.yml: Fix setup script
...
I wasn't chaning into the project directory so the pipenv virtual
environment was not getting created in the correct place.
2019-07-27 00:41:57 +03:00
33121f8a01
.build.yml: Add tests
2019-07-27 00:38:34 +03:00
41a30f1b07
Add initial tests
...
For now only test fixes because they return changed data. I'm not
sure how to test the checks, because they don't return data and I
can't modify them to return boolean values without breaking the app.
2019-07-27 00:36:40 +03:00
103e630f6e
Add requirements-dev.txt
...
Generated with:
$ pipenv lock -r -d > requirements-dev.txt
2019-07-27 00:33:52 +03:00
99f00fcb85
Add pytest to pipenv dev environment
2019-07-27 00:32:53 +03:00
18f26c343d
csv_metadata_quality/app.py: Fix path to test.csv
2019-07-27 00:25:30 +03:00
f2060adadf
Move tests.csv to data directory
2019-07-27 00:02:47 +03:00
0a751c1f25
README.md: Add SourceHut build badge
2019-07-26 23:59:31 +03:00
2eb48d8ed0
Add SourceHut build file
...
For now it only attempts to install the Python requirements using
pipenv. Later it will run tests with pytest.
2019-07-26 23:56:16 +03:00
7fb7f7e03c
Add requirements.txt
...
Generated using pipenv:
$ pipenv lock -r > requirements.txt
2019-07-26 23:54:07 +03:00
df1087b26f
README.md: Improve introduction, checks, and todo
2019-07-26 23:50:41 +03:00
84c3b17678
csv_metadata_quality/app.py: Add comment
2019-07-26 23:49:13 +03:00
844b968098
tests/test.csv: Add invalid multi-value field separator
2019-07-26 23:48:45 +03:00
aaf3537ba4
Add check for invalid multi-value separators
2019-07-26 23:48:24 +03:00
02f9d8a736
csv_metadata_quality/check.py: Add check for missing isbn values
2019-07-26 23:45:18 +03:00
64e7a73417
README.md: Add information about checks and fixes
2019-07-26 23:20:16 +03:00
dfd961d720
Bring test.csv into project
2019-07-26 23:14:37 +03:00
e160b17fb0
Add ISSN and ISBN checks using python-stdnum
2019-07-26 23:14:10 +03:00
30a4b0005f
csv_metadata_quality/fix.py: Remove test function
2019-07-26 22:56:40 +03:00
b657c51fd2
Add initial README.md with intro, license, and todo
2019-07-26 22:18:38 +03:00
5c6453b397
Add GPLv3 license
2019-07-26 22:16:16 +03:00
232d28e13e
Refactor as package with subpackages
...
This makes it cleaner for introducing checks, fixes, tests, docs,
and tests in the future. Currently can be run like this:
python -m csv_metadata_quality
CSV input and output paths are still hard coded.
See: https://dev.to/codemouse92/dead-simple-python-project-structure-and-imports-38c6
2019-07-26 22:11:10 +03:00
ef5b8f7244
fix.py: Massive improvements
...
Use Python's str.strip() instead of kludgy regular expressions and
use split/join to handle multi-value fields more cleanly.
2019-07-26 19:31:55 +03:00
801870e0ba
Add fix.py
...
Initial working version of metadata cleaning script that fixes lea-
ding and trailing whitespace (even in DSpace multi-value fields).
2019-07-26 19:08:28 +03:00
21b78b9519
Initial commit
...
Pipenv environment with Pandas.
2019-07-26 17:54:13 +03:00