1
0
mirror of https://github.com/ilri/csv-metadata-quality.git synced 2024-11-26 07:38:18 +01:00
Commit Graph

617 Commits

Author SHA1 Message Date
2eb48d8ed0
Add SourceHut build file
For now it only attempts to install the Python requirements using
pipenv. Later it will run tests with pytest.
2019-07-26 23:56:16 +03:00
7fb7f7e03c
Add requirements.txt
Generated using pipenv:

  $ pipenv lock -r > requirements.txt
2019-07-26 23:54:07 +03:00
df1087b26f
README.md: Improve introduction, checks, and todo 2019-07-26 23:50:41 +03:00
84c3b17678
csv_metadata_quality/app.py: Add comment 2019-07-26 23:49:13 +03:00
844b968098
tests/test.csv: Add invalid multi-value field separator 2019-07-26 23:48:45 +03:00
aaf3537ba4
Add check for invalid multi-value separators 2019-07-26 23:48:24 +03:00
02f9d8a736
csv_metadata_quality/check.py: Add check for missing isbn values 2019-07-26 23:45:18 +03:00
64e7a73417
README.md: Add information about checks and fixes 2019-07-26 23:20:16 +03:00
dfd961d720
Bring test.csv into project 2019-07-26 23:14:37 +03:00
e160b17fb0
Add ISSN and ISBN checks using python-stdnum 2019-07-26 23:14:10 +03:00
30a4b0005f
csv_metadata_quality/fix.py: Remove test function 2019-07-26 22:56:40 +03:00
b657c51fd2
Add initial README.md with intro, license, and todo 2019-07-26 22:18:38 +03:00
5c6453b397
Add GPLv3 license 2019-07-26 22:16:16 +03:00
232d28e13e
Refactor as package with subpackages
This makes it cleaner for introducing checks, fixes, tests, docs,
and tests in the future. Currently can be run like this:

  python -m csv_metadata_quality

CSV input and output paths are still hard coded.

See: https://dev.to/codemouse92/dead-simple-python-project-structure-and-imports-38c6
2019-07-26 22:11:10 +03:00
ef5b8f7244
fix.py: Massive improvements
Use Python's str.strip() instead of kludgy regular expressions and
use split/join to handle multi-value fields more cleanly.
2019-07-26 19:31:55 +03:00
801870e0ba
Add fix.py
Initial working version of metadata cleaning script that fixes lea-
ding and trailing whitespace (even in DSpace multi-value fields).
2019-07-26 19:08:28 +03:00
21b78b9519
Initial commit
Pipenv environment with Pandas.
2019-07-26 17:54:13 +03:00