1
0
mirror of https://github.com/ilri/csv-metadata-quality.git synced 2024-11-22 05:45:02 +01:00

Compare commits

...

5 Commits

Author SHA1 Message Date
2ec52f1b73
README.md: Update description
All checks were successful
continuous-integration/drone/push Build is passing
2021-01-26 15:43:41 +02:00
aa1abf15a7
README.md: Adjust title 2021-01-26 15:35:21 +02:00
cbf94490f2
Version 0.4.3 2021-01-26 15:22:40 +02:00
f3d0d5ef07
setup.py: Remove Python 3.6
I actually removed Python 3.6 support a few weeks ago after updating
to Pandas 1.2.0, but forgot to update this.
2021-01-26 15:22:08 +02:00
4b7b99c94c
CHANGELOG.md: Add note about multi-value separators 2021-01-26 15:20:22 +02:00
5 changed files with 8 additions and 7 deletions

View File

@ -4,13 +4,15 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## Unreleased
## [0.4.3] - 2021-01-26
### Changed
- Reformat with black
- Requires Python 3.7+ for pandas 1.2.0
### Updated
- Run `poetry update`
- Expand check/fix for multi-value separators to include metadata with invalid
separators at the end, for example "Kenya||Tanzania||"
## [0.4.2] - 2020-07-06
### Changed

View File

@ -1,5 +1,5 @@
# CSV Metadata Quality ![GitHub Actions](https://github.com/ilri/csv-metadata-quality/workflows/Build%20and%20Test/badge.svg) [![Build Status](https://ci.mjanja.ch/api/badges/alanorth/csv-metadata-quality/status.svg)](https://ci.mjanja.ch/alanorth/csv-metadata-quality)
A simple, but opinionated metadata quality checker and fixer designed to work with CSVs in the DSpace ecosystem (though it could theoretically work on any CSV that uses Dublin Core fields as columns). The implementation is essentially a pipeline of checks and fixes that begins with splitting multi-value fields on the standard DSpace "||" separator, trimming leading/trailing whitespace, and then proceeding to more specialized cases like ISSNs, ISBNs, languages, etc.
# DSpace CSV Metadata Quality Checker ![GitHub Actions](https://github.com/ilri/csv-metadata-quality/workflows/Build%20and%20Test/badge.svg) [![Build Status](https://ci.mjanja.ch/api/badges/alanorth/csv-metadata-quality/status.svg)](https://ci.mjanja.ch/alanorth/csv-metadata-quality)
A simple, but opinionated metadata quality checker and fixer designed to work with CSVs in the DSpace ecosystem (though it could theoretically work on any CSV that uses Dublin Core fields as columns). The implementation is essentially a pipeline of checks and fixes that begins with splitting multi-value fields on the standard DSpace "||" separator, trimming leading/trailing whitespace, and then proceeding to more specialized cases like ISSNs, ISBNs, languages, unnecessary Unicode, AGROVOC terms, etc.
Requires Python 3.7 or greater (3.8 recommended). CSV and Excel support comes from the [Pandas](https://pandas.pydata.org/) library, though your mileage may vary with Excel because this is much less tested.

View File

@ -1 +1 @@
VERSION = "0.4.2"
VERSION = "0.4.3"

View File

@ -1,6 +1,6 @@
[tool.poetry]
name = "csv-metadata-quality"
version = "0.4.2"
version = "0.4.3"
description="A simple, but opinionated CSV quality checking and fixing pipeline for CSVs in the DSpace ecosystem."
authors = ["Alan Orth <alan.orth@gmail.com>"]
license="GPL-3.0-only"

View File

@ -14,7 +14,7 @@ install_requires = [
setuptools.setup(
name="csv-metadata-quality",
version="0.4.2",
version="0.4.3",
author="Alan Orth",
author_email="aorth@mjanja.ch",
description="A simple, but opinionated CSV quality checking and fixing pipeline for CSVs in the DSpace ecosystem.",
@ -23,7 +23,6 @@ setuptools.setup(
long_description_content_type="text/markdown",
url="https://github.com/alanorth/csv-metadata-quality",
classifiers=[
"Programming Language :: Python :: 3.6",
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",