1
0
mirror of https://github.com/ilri/csv-metadata-quality.git synced 2025-05-07 13:46:00 +02:00
Commit Graph

477 Commits

Author SHA1 Message Date
2b41f9416b csv_metadata_quality/fix.py: Remove extra newline 2019-07-27 01:29:22 +03:00
3cf9f9452b csv_metadata_quality/check.py: Always return field
We always need to return the field back so apply doesn't set it to
null when creating the new data frame.
2019-07-27 01:28:08 +03:00
1d861f263b .build.yml: Fix setup script
I wasn't chaning into the project directory so the pipenv virtual
environment was not getting created in the correct place.
2019-07-27 00:41:57 +03:00
33121f8a01 .build.yml: Add tests 2019-07-27 00:38:34 +03:00
41a30f1b07 Add initial tests
For now only test fixes because they return changed data. I'm not
sure how to test the checks, because they don't return data and I
can't modify them to return boolean values without breaking the app.
2019-07-27 00:36:40 +03:00
103e630f6e Add requirements-dev.txt
Generated with:

  $ pipenv lock -r -d > requirements-dev.txt
2019-07-27 00:33:52 +03:00
99f00fcb85 Add pytest to pipenv dev environment 2019-07-27 00:32:53 +03:00
18f26c343d csv_metadata_quality/app.py: Fix path to test.csv 2019-07-27 00:25:30 +03:00
f2060adadf Move tests.csv to data directory 2019-07-27 00:02:47 +03:00
0a751c1f25 README.md: Add SourceHut build badge 2019-07-26 23:59:31 +03:00
2eb48d8ed0 Add SourceHut build file
For now it only attempts to install the Python requirements using
pipenv. Later it will run tests with pytest.
2019-07-26 23:56:16 +03:00
7fb7f7e03c Add requirements.txt
Generated using pipenv:

  $ pipenv lock -r > requirements.txt
2019-07-26 23:54:07 +03:00
df1087b26f README.md: Improve introduction, checks, and todo 2019-07-26 23:50:41 +03:00
84c3b17678 csv_metadata_quality/app.py: Add comment 2019-07-26 23:49:13 +03:00
844b968098 tests/test.csv: Add invalid multi-value field separator 2019-07-26 23:48:45 +03:00
aaf3537ba4 Add check for invalid multi-value separators 2019-07-26 23:48:24 +03:00
02f9d8a736 csv_metadata_quality/check.py: Add check for missing isbn values 2019-07-26 23:45:18 +03:00
64e7a73417 README.md: Add information about checks and fixes 2019-07-26 23:20:16 +03:00
dfd961d720 Bring test.csv into project 2019-07-26 23:14:37 +03:00
e160b17fb0 Add ISSN and ISBN checks using python-stdnum 2019-07-26 23:14:10 +03:00
30a4b0005f csv_metadata_quality/fix.py: Remove test function 2019-07-26 22:56:40 +03:00
b657c51fd2 Add initial README.md with intro, license, and todo 2019-07-26 22:18:38 +03:00
5c6453b397 Add GPLv3 license 2019-07-26 22:16:16 +03:00
232d28e13e Refactor as package with subpackages
This makes it cleaner for introducing checks, fixes, tests, docs,
and tests in the future. Currently can be run like this:

  python -m csv_metadata_quality

CSV input and output paths are still hard coded.

See: https://dev.to/codemouse92/dead-simple-python-project-structure-and-imports-38c6
2019-07-26 22:11:10 +03:00
ef5b8f7244 fix.py: Massive improvements
Use Python's str.strip() instead of kludgy regular expressions and
use split/join to handle multi-value fields more cleanly.
2019-07-26 19:31:55 +03:00
801870e0ba Add fix.py
Initial working version of metadata cleaning script that fixes lea-
ding and trailing whitespace (even in DSpace multi-value fields).
2019-07-26 19:08:28 +03:00
21b78b9519 Initial commit
Pipenv environment with Pandas.
2019-07-26 17:54:13 +03:00