1
0
mirror of https://github.com/ilri/csv-metadata-quality.git synced 2024-11-25 07:10:17 +01:00

Compare commits

...

2 Commits

Author SHA1 Message Date
d3880a9dfa
Remove Python 3.6 support
All checks were successful
continuous-integration/drone/push Build is passing
Pandas 1.2.0 apparently requires Python 3.7.1+.
2021-01-03 15:51:53 +02:00
7edb8b19d7
tests/test_check.py: Reformat with black 2021-01-03 15:50:21 +02:00
4 changed files with 5 additions and 18 deletions

View File

@ -46,20 +46,4 @@ steps:
- python setup.py install - python setup.py install
- csv-metadata-quality -i data/test.csv -o /tmp/test.csv -e -u --agrovoc-fields dc.subject,cg.coverage.country - csv-metadata-quality -i data/test.csv -o /tmp/test.csv -e -u --agrovoc-fields dc.subject,cg.coverage.country
---
kind: pipeline
type: docker
name: python36
steps:
- name: test
image: python:3.6-slim
commands:
- id
- python -V
- pip install -r requirements-dev.txt
- pytest
- python setup.py install
- csv-metadata-quality -i data/test.csv -o /tmp/test.csv -e -u --agrovoc-fields dc.subject,cg.coverage.country
# vim: ts=2 sw=2 et # vim: ts=2 sw=2 et

View File

@ -7,6 +7,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## Unreleased ## Unreleased
### Changed ### Changed
- Reformat with black - Reformat with black
- Requires Python 3.7+ for pandas 1.2.0
### Updated ### Updated
- Run `poetry update` - Run `poetry update`

View File

@ -1,7 +1,7 @@
# CSV Metadata Quality ![GitHub Actions](https://github.com/ilri/csv-metadata-quality/workflows/Build%20and%20Test/badge.svg) [![builds.sr.ht status](https://builds.sr.ht/~alanorth/csv-metadata-quality.svg)](https://builds.sr.ht/~alanorth/csv-metadata-quality?) # CSV Metadata Quality ![GitHub Actions](https://github.com/ilri/csv-metadata-quality/workflows/Build%20and%20Test/badge.svg) [![builds.sr.ht status](https://builds.sr.ht/~alanorth/csv-metadata-quality.svg)](https://builds.sr.ht/~alanorth/csv-metadata-quality?)
A simple, but opinionated metadata quality checker and fixer designed to work with CSVs in the DSpace ecosystem (though it could theoretically work on any CSV that uses Dublin Core fields as columns). The implementation is essentially a pipeline of checks and fixes that begins with splitting multi-value fields on the standard DSpace "||" separator, trimming leading/trailing whitespace, and then proceeding to more specialized cases like ISSNs, ISBNs, languages, etc. A simple, but opinionated metadata quality checker and fixer designed to work with CSVs in the DSpace ecosystem (though it could theoretically work on any CSV that uses Dublin Core fields as columns). The implementation is essentially a pipeline of checks and fixes that begins with splitting multi-value fields on the standard DSpace "||" separator, trimming leading/trailing whitespace, and then proceeding to more specialized cases like ISSNs, ISBNs, languages, etc.
Requires Python 3.6 or greater (3.8 recommended). CSV and Excel support comes from the [Pandas](https://pandas.pydata.org/) library, though your mileage may vary with Excel because this is much less tested. Requires Python 3.7 or greater (3.8 recommended). CSV and Excel support comes from the [Pandas](https://pandas.pydata.org/) library, though your mileage may vary with Excel because this is much less tested.
## Functionality ## Functionality

View File

@ -69,7 +69,9 @@ def test_check_unnecessary_separators(capsys):
check.separators(field, field_name) check.separators(field, field_name)
captured = capsys.readouterr() captured = capsys.readouterr()
assert captured.out == f"Unnecessary multi-value separator ({field_name}): {field}\n" assert (
captured.out == f"Unnecessary multi-value separator ({field_name}): {field}\n"
)
def test_check_valid_separators(): def test_check_valid_separators():