1
0
mirror of https://github.com/ilri/csv-metadata-quality.git synced 2024-11-25 15:18:19 +01:00

Compare commits

..

No commits in common. "d3880a9dfa6b605c64f4d84b69140493ca56cf98" and "a6709c7f826b59713978a5749440c562864cad86" have entirely different histories.

4 changed files with 18 additions and 5 deletions

View File

@ -46,4 +46,20 @@ steps:
- python setup.py install - python setup.py install
- csv-metadata-quality -i data/test.csv -o /tmp/test.csv -e -u --agrovoc-fields dc.subject,cg.coverage.country - csv-metadata-quality -i data/test.csv -o /tmp/test.csv -e -u --agrovoc-fields dc.subject,cg.coverage.country
---
kind: pipeline
type: docker
name: python36
steps:
- name: test
image: python:3.6-slim
commands:
- id
- python -V
- pip install -r requirements-dev.txt
- pytest
- python setup.py install
- csv-metadata-quality -i data/test.csv -o /tmp/test.csv -e -u --agrovoc-fields dc.subject,cg.coverage.country
# vim: ts=2 sw=2 et # vim: ts=2 sw=2 et

View File

@ -7,7 +7,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## Unreleased ## Unreleased
### Changed ### Changed
- Reformat with black - Reformat with black
- Requires Python 3.7+ for pandas 1.2.0
### Updated ### Updated
- Run `poetry update` - Run `poetry update`

View File

@ -1,7 +1,7 @@
# CSV Metadata Quality ![GitHub Actions](https://github.com/ilri/csv-metadata-quality/workflows/Build%20and%20Test/badge.svg) [![builds.sr.ht status](https://builds.sr.ht/~alanorth/csv-metadata-quality.svg)](https://builds.sr.ht/~alanorth/csv-metadata-quality?) # CSV Metadata Quality ![GitHub Actions](https://github.com/ilri/csv-metadata-quality/workflows/Build%20and%20Test/badge.svg) [![builds.sr.ht status](https://builds.sr.ht/~alanorth/csv-metadata-quality.svg)](https://builds.sr.ht/~alanorth/csv-metadata-quality?)
A simple, but opinionated metadata quality checker and fixer designed to work with CSVs in the DSpace ecosystem (though it could theoretically work on any CSV that uses Dublin Core fields as columns). The implementation is essentially a pipeline of checks and fixes that begins with splitting multi-value fields on the standard DSpace "||" separator, trimming leading/trailing whitespace, and then proceeding to more specialized cases like ISSNs, ISBNs, languages, etc. A simple, but opinionated metadata quality checker and fixer designed to work with CSVs in the DSpace ecosystem (though it could theoretically work on any CSV that uses Dublin Core fields as columns). The implementation is essentially a pipeline of checks and fixes that begins with splitting multi-value fields on the standard DSpace "||" separator, trimming leading/trailing whitespace, and then proceeding to more specialized cases like ISSNs, ISBNs, languages, etc.
Requires Python 3.7 or greater (3.8 recommended). CSV and Excel support comes from the [Pandas](https://pandas.pydata.org/) library, though your mileage may vary with Excel because this is much less tested. Requires Python 3.6 or greater (3.8 recommended). CSV and Excel support comes from the [Pandas](https://pandas.pydata.org/) library, though your mileage may vary with Excel because this is much less tested.
## Functionality ## Functionality

View File

@ -69,9 +69,7 @@ def test_check_unnecessary_separators(capsys):
check.separators(field, field_name) check.separators(field, field_name)
captured = capsys.readouterr() captured = capsys.readouterr()
assert ( assert captured.out == f"Unnecessary multi-value separator ({field_name}): {field}\n"
captured.out == f"Unnecessary multi-value separator ({field_name}): {field}\n"
)
def test_check_valid_separators(): def test_check_valid_separators():