1
0
mirror of https://github.com/ilri/csv-metadata-quality.git synced 2025-07-23 06:21:45 +02:00

23 Commits

Author SHA1 Message Date
fdccdf7318 Version 0.6.1
Some checks failed
continuous-integration/drone/push Build is failing
2023-02-23 13:46:56 +03:00
ff2c986eec setup.py: minimum python 3.9 2023-02-23 11:47:40 +03:00
547574866e Update requirements
Generated with poetry export:

    $ poetry export --without-hashes -f requirements.txt > requirements.txt
    $ poetry export --without-hashes --with dev -f requirements.txt > requirements-dev.txt

I am trying `--without-hashes` to work around an error on pip install
when running in CI:

    ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==
2023-02-23 11:46:24 +03:00
8aa7b93d87 poetry.lock: run poetry update 2023-02-23 11:45:53 +03:00
53fdb50906 csv_metadata_quality/check.py: run black
Some checks failed
continuous-integration/drone/push Build is failing
2023-02-18 22:10:04 +03:00
3e0e9a7f8b poetry.lock: run poetry update 2023-02-18 22:09:33 +03:00
03d824b78e pyproject.toml: update some dependencies 2023-02-18 22:09:05 +03:00
8bc4cd419c Strip filename descriptions before checking
Some checks failed
continuous-integration/drone/push Build is failing
When checking for uncommon file extensions in the filename field
we should strip descriptions that are meant for SAF Bundler, for
example: Annual_Report_2020.pdf__description:Report. This ends up
as a false positive that spams the output with warnings.
2023-02-13 11:00:57 +03:00
bde38e9ed4 CHANGELOG.md: add notes about abstracts 2023-02-13 10:39:03 +03:00
8db1e36a6d csv_metadata_quality/app.py: skip abstract in separator check
Also skip abstract in the separator check, since it's rare to have
any "|" here, but more likely that if one is present then it's for
a reason.
2023-02-13 10:37:33 +03:00
fbb625be5c Ignore common non-SPDX licenses
This is meant to catch licenses that are supposed to be SPDX but
aren't, not licenses that *aren't* supposed to be SPDX. We have so
many free-text license descriptions like "Copyrighted" and "Other"
that I'm sick of seeing warnings for them!
2023-02-07 17:01:56 +03:00
084b970798 CHANGELOG.md: add note about abstract field 2023-02-07 16:52:34 +03:00
171b35b015 Add data/abstract-check.csv
A test file with several whitespace and newline scenarios in the
abstract. I am currently disabling whitespace/newline fixes in the
abstract because they are too agressive.
2023-02-07 16:50:47 +03:00
545bb8cd0c csv_metadata_quality/app.py: disable whitespace on abstracts
It's too aggressive on abstracts. If people paste in text from a
PDF there are often newlines, and most of the time this is what
they want.
2023-02-07 16:48:40 +03:00
d5afbad788 Update requirements
Some checks failed
continuous-integration/drone/push Build is failing
Generated with poetry export:

    $ poetry export --without-hashes -f requirements.txt > requirements.txt
    $ poetry export --without-hashes --with dev -f requirements.txt > requirements-dev.txt

I am trying `--without-hashes` to work around an error on pip install
when running in CI:

    ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==
2023-01-24 14:18:19 +03:00
d40c9ed97a poetry.lock: run poetry update 2023-01-24 14:17:44 +03:00
c4a2ee8563 CHANGELOG.md: add note about fix.separators() 2023-01-24 14:16:23 +03:00
3596381d03 csv_metadata_quality/app.py: separators fix
Don't run the invalid separators fix on title fields because some
items use "|" in the title to indicate something like a subtitle.

For example:

    Progress Review and Work Planning Meeting | Day 1
2023-01-24 14:13:55 +03:00
5abd32a41f CHANGELOG.md: run poetry update 2022-12-20 15:09:58 +02:00
0ed0fabe21 tests/test_check.py: remove local variables
This was raised by ruff.

> F841 Local variable `result` is assigned to but never used

We don't actually need the output of the function since these tests
capture the stdout.
2022-12-20 15:09:20 +02:00
d5cfec65bd tests/test_check.py: fix logic in assert
This was raised by ruff.

> E711 Comparison to `None` should be `cond is None`
2022-12-20 15:07:41 +02:00
66893753ba Move isort config to pyproject.toml
See: https://pycqa.github.io/isort/docs/configuration/black_compatibility.html
2022-12-20 15:03:10 +02:00
57be05ebb6 poetry.lock: run poetry update 2022-12-20 14:59:35 +02:00
12 changed files with 579 additions and 436 deletions

View File

@ -4,7 +4,7 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## Unreleased
## [0.6.1] - 2023-02-23
### Fixed
- Missing region check should ignore subregion field, if it exists
@ -12,6 +12,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Use SPDX license data from SPDX themselves instead of spdx-license-list
because it is deprecated and outdated
- Require Python 3.9+
- Don't run `fix.separators()` on title or abstract fields
- Don't run whitespace or newline fixes on abstract fields
- Ignore some common non-SPDX licenses
- Ignore `__description` suffix in filenames meant for SAFBuilder when checking
for uncommon file extensions
### Updated
- Python dependencies
## [0.6.0] - 2022-09-02
### Changed

View File

@ -90,12 +90,14 @@ def run(argv):
continue
# Fix: whitespace
df[column] = df[column].apply(fix.whitespace, field_name=column)
# Fix: newlines
if args.unsafe_fixes:
df[column] = df[column].apply(fix.newlines, field_name=column)
match = re.match(r"^.*?abstract.*$", column)
if match is None:
# Fix: whitespace
df[column] = df[column].apply(fix.whitespace, field_name=column)
# Fix: newlines
df[column] = df[column].apply(fix.newlines, field_name=column)
# Fix: missing space after comma. Only run on author and citation
# fields for now, as this problem is mostly an issue in names.
@ -121,10 +123,14 @@ def run(argv):
# Fix: unnecessary Unicode
df[column] = df[column].apply(fix.unnecessary_unicode)
# Fix: invalid and unnecessary multi-value separators
df[column] = df[column].apply(fix.separators, field_name=column)
# Run whitespace fix again after fixing invalid separators
df[column] = df[column].apply(fix.whitespace, field_name=column)
# Fix: invalid and unnecessary multi-value separators. Skip the title
# and abstract fields because "|" is used to indicate something like
# a subtitle.
match = re.match(r"^.*?(abstract|title).*$", column)
if match is None:
df[column] = df[column].apply(fix.separators, field_name=column)
# Run whitespace fix again after fixing invalid separators
df[column] = df[column].apply(fix.whitespace, field_name=column)
# Fix: duplicate metadata values
df[column] = df[column].apply(fix.duplicates, field_name=column)

View File

@ -33,7 +33,6 @@ def issn(field):
# Try to split multi-value field on "||" separator
for value in field.split("||"):
if not stdnum_issn.is_valid(value):
print(f"{Fore.RED}Invalid ISSN: {Fore.RESET}{value}")
@ -56,7 +55,6 @@ def isbn(field):
# Try to split multi-value field on "||" separator
for value in field.split("||"):
if not stdnum_isbn.is_valid(value):
print(f"{Fore.RED}Invalid ISBN: {Fore.RESET}{value}")
@ -173,7 +171,6 @@ def language(field):
# Try to split multi-value field on "||" separator
for value in field.split("||"):
# After splitting, check if language value is 2 or 3 characters so we
# can check it against ISO 639-1 or ISO 639-3 accordingly.
if len(value) == 2:
@ -286,6 +283,11 @@ def filename_extension(field):
# Iterate over all values
for value in values:
# Strip filename descriptions that are meant for SAF Bundler, for
# example: Annual_Report_2020.pdf__description:Report
if "__description" in value:
value = value.split("__")[0]
# Assume filename extension does not match
filename_extension_match = False
@ -312,8 +314,19 @@ def spdx_license_identifier(field):
Prints the value if it is invalid.
"""
# List of common non-SPDX licenses to ignore
# See: https://ilri.github.io/cgspace-submission-guidelines/dcterms-license/dcterms-license.txt
ignore_licenses = {
"All rights reserved; no re-use allowed",
"All rights reserved; self-archive copy only",
"Copyrighted; Non-commercial educational use only",
"Copyrighted; Non-commercial use only",
"Copyrighted; all rights reserved",
"Other",
}
# Skip fields with missing values
if pd.isna(field):
if pd.isna(field) or field in ignore_licenses:
return
spdx_licenses = load_spdx_licenses()

View File

@ -1,3 +1,3 @@
# SPDX-License-Identifier: GPL-3.0-only
VERSION = "0.6.0"
VERSION = "0.6.1"

17
data/abstract-check.csv Normal file
View File

@ -0,0 +1,17 @@
id,dc.title,dcterms.abstract
1,Normal item,This is an abstract
2,Leading whitespace, This is an abstract
3,Trailing whitespace,This is an abstract
4,Consecutive whitespace,This is an abstract
5,Newline,"This
is an abstract"
6,Newline with leading whitespace," This
is an abstract"
7,Newline with trailing whitespace,"This
is an abstract "
8,Newline with consecutive whitespace,"This
is an abstract"
9,Multiple newlines,"This
is
an
abstract"
1 id dc.title dcterms.abstract
2 1 Normal item This is an abstract
3 2 Leading whitespace This is an abstract
4 3 Trailing whitespace This is an abstract
5 4 Consecutive whitespace This is an abstract
6 5 Newline This is an abstract
7 6 Newline with leading whitespace This is an abstract
8 7 Newline with trailing whitespace This is an abstract
9 8 Newline with consecutive whitespace This is an abstract
10 9 Multiple newlines This is an abstract

776
poetry.lock generated

File diff suppressed because it is too large Load Diff

View File

@ -1,6 +1,6 @@
[tool.poetry]
name = "csv-metadata-quality"
version = "0.6.0"
version = "0.6.1"
description="A simple, but opinionated CSV quality checking and fixing pipeline for CSVs in the DSpace ecosystem."
authors = ["Alan Orth <alan.orth@gmail.com>"]
license="GPL-3.0-only"
@ -12,27 +12,31 @@ csv-metadata-quality = 'csv_metadata_quality.__main__:main'
[tool.poetry.dependencies]
python = "^3.9"
pandas = "^1.5.1"
python-stdnum = "^1.17"
requests = "^2.28.1"
requests-cache = "^0.9.7"
pandas = "^1.5.2"
python-stdnum = "^1.18"
requests = "^2.28.2"
requests-cache = "^0.9.8"
langid = "^1.1.6"
colorama = "^0.4.5"
colorama = "^0.4.6"
ftfy = "^6.1.1"
country-converter = {git = "https://github.com/alanorth/country_converter.git", rev = "myanmar-region"}
pycountry = {git = "https://github.com/alanorth/pycountry", rev = "iso-codes-4.12.0"}
[tool.poetry.dev-dependencies]
pytest = "^7.2.0"
flake8 = "^5.0.4"
pytest = "^7.2.1"
flake8 = "^6.0.0"
pytest-clarity = "^1.0.1"
black = "^22.10.0"
isort = "^5.10.1"
csvkit = "^1.0.7"
black = "^23.1.0"
isort = "^5.12.0"
csvkit = "^1.1.0"
[tool.poetry.group.dev.dependencies]
ipython = "^8.7.0"
ipython = "^8.10.0"
[build-system]
requires = ["poetry>=0.12"]
build-backend = "poetry.masonry.api"
[tool.isort]
profile = "black"
line_length=88

View File

@ -1,80 +1,80 @@
agate-dbf==0.2.2 ; python_version >= "3.9" and python_version < "4.0"
agate-excel==0.2.5 ; python_version >= "3.9" and python_version < "4.0"
agate-sql==0.5.8 ; python_version >= "3.9" and python_version < "4.0"
agate==1.6.3 ; python_version >= "3.9" and python_version < "4.0"
agate-sql==0.5.9 ; python_version >= "3.9" and python_version < "4.0"
agate==1.7.1 ; python_version >= "3.9" and python_version < "4.0"
appdirs==1.4.4 ; python_version >= "3.9" and python_version < "4.0"
appnope==0.1.3 ; python_version >= "3.9" and python_version < "4.0" and sys_platform == "darwin"
asttokens==2.2.1 ; python_version >= "3.9" and python_version < "4.0"
attrs==22.1.0 ; python_version >= "3.9" and python_version < "4.0"
attrs==22.2.0 ; python_version >= "3.9" and python_version < "4.0"
babel==2.11.0 ; python_version >= "3.9" and python_version < "4.0"
backcall==0.2.0 ; python_version >= "3.9" and python_version < "4.0"
black==22.12.0 ; python_version >= "3.9" and python_version < "4.0"
black==23.1.0 ; python_version >= "3.9" and python_version < "4.0"
cattrs==22.2.0 ; python_version >= "3.9" and python_version < "4.0"
certifi==2022.12.7 ; python_version >= "3.9" and python_version < "4"
charset-normalizer==2.1.1 ; python_version >= "3.9" and python_version < "4"
charset-normalizer==3.0.1 ; python_version >= "3.9" and python_version < "4"
click==8.1.3 ; python_version >= "3.9" and python_version < "4.0"
colorama==0.4.6 ; python_version >= "3.9" and python_version < "4.0"
commonmark==0.9.1 ; python_version >= "3.9" and python_version < "4.0"
country-converter @ git+https://github.com/alanorth/country_converter.git@myanmar-region ; python_version >= "3.9" and python_version < "4.0"
csvkit==1.0.7 ; python_version >= "3.9" and python_version < "4.0"
csvkit==1.1.1 ; python_version >= "3.9" and python_version < "4.0"
dbfread==2.0.7 ; python_version >= "3.9" and python_version < "4.0"
decorator==5.1.1 ; python_version >= "3.9" and python_version < "4.0"
et-xmlfile==1.1.0 ; python_version >= "3.9" and python_version < "4.0"
exceptiongroup==1.0.4 ; python_version >= "3.9" and python_version < "3.11"
exceptiongroup==1.1.0 ; python_version >= "3.9" and python_version < "3.11"
executing==1.2.0 ; python_version >= "3.9" and python_version < "4.0"
flake8==5.0.4 ; python_version >= "3.9" and python_version < "4.0"
flake8==6.0.0 ; python_version >= "3.9" and python_version < "4.0"
ftfy==6.1.1 ; python_version >= "3.9" and python_version < "4"
future==0.18.2 ; python_version >= "3.9" and python_version < "4.0"
greenlet==2.0.1 ; python_version >= "3.9" and (platform_machine == "aarch64" or platform_machine == "ppc64le" or platform_machine == "x86_64" or platform_machine == "amd64" or platform_machine == "AMD64" or platform_machine == "win32" or platform_machine == "WIN32") and python_version < "4.0"
greenlet==2.0.2 ; python_version >= "3.9" and (platform_machine == "aarch64" or platform_machine == "ppc64le" or platform_machine == "x86_64" or platform_machine == "amd64" or platform_machine == "AMD64" or platform_machine == "win32" or platform_machine == "WIN32") and python_version < "4.0"
idna==3.4 ; python_version >= "3.9" and python_version < "4"
iniconfig==1.1.1 ; python_version >= "3.9" and python_version < "4.0"
ipython==8.7.0 ; python_version >= "3.9" and python_version < "4.0"
iniconfig==2.0.0 ; python_version >= "3.9" and python_version < "4.0"
ipython==8.10.0 ; python_version >= "3.9" and python_version < "4.0"
isodate==0.6.1 ; python_version >= "3.9" and python_version < "4.0"
isort==5.11.1 ; python_version >= "3.9" and python_version < "4.0"
isort==5.12.0 ; python_version >= "3.9" and python_version < "4.0"
jedi==0.18.2 ; python_version >= "3.9" and python_version < "4.0"
langid==1.1.6 ; python_version >= "3.9" and python_version < "4.0"
leather==0.3.4 ; python_version >= "3.9" and python_version < "4.0"
markdown-it-py==2.2.0 ; python_version >= "3.9" and python_version < "4.0"
matplotlib-inline==0.1.6 ; python_version >= "3.9" and python_version < "4.0"
mccabe==0.7.0 ; python_version >= "3.9" and python_version < "4.0"
mypy-extensions==0.4.3 ; python_version >= "3.9" and python_version < "4.0"
numpy==1.23.5 ; python_version < "4.0" and python_version >= "3.9"
mdurl==0.1.2 ; python_version >= "3.9" and python_version < "4.0"
mypy-extensions==1.0.0 ; python_version >= "3.9" and python_version < "4.0"
numpy==1.24.2 ; python_version < "4.0" and python_version >= "3.9"
olefile==0.46 ; python_version >= "3.9" and python_version < "4.0"
openpyxl==3.0.10 ; python_version >= "3.9" and python_version < "4.0"
packaging==22.0 ; python_version >= "3.9" and python_version < "4.0"
pandas==1.5.2 ; python_version >= "3.9" and python_version < "4.0"
parsedatetime==2.4 ; python_version >= "3.9" and python_version < "4.0"
openpyxl==3.1.1 ; python_version >= "3.9" and python_version < "4.0"
packaging==23.0 ; python_version >= "3.9" and python_version < "4.0"
pandas==1.5.3 ; python_version >= "3.9" and python_version < "4.0"
parsedatetime==2.6 ; python_version >= "3.9" and python_version < "4.0"
parso==0.8.3 ; python_version >= "3.9" and python_version < "4.0"
pathspec==0.10.3 ; python_version >= "3.9" and python_version < "4.0"
pathspec==0.11.0 ; python_version >= "3.9" and python_version < "4.0"
pexpect==4.8.0 ; python_version >= "3.9" and python_version < "4.0" and sys_platform != "win32"
pickleshare==0.7.5 ; python_version >= "3.9" and python_version < "4.0"
platformdirs==2.6.0 ; python_version >= "3.9" and python_version < "4.0"
platformdirs==3.0.0 ; python_version >= "3.9" and python_version < "4.0"
pluggy==1.0.0 ; python_version >= "3.9" and python_version < "4.0"
pprintpp==0.4.0 ; python_version >= "3.9" and python_version < "4.0"
prompt-toolkit==3.0.36 ; python_version >= "3.9" and python_version < "4.0"
prompt-toolkit==3.0.37 ; python_version >= "3.9" and python_version < "4.0"
ptyprocess==0.7.0 ; python_version >= "3.9" and python_version < "4.0" and sys_platform != "win32"
pure-eval==0.2.2 ; python_version >= "3.9" and python_version < "4.0"
pycodestyle==2.9.1 ; python_version >= "3.9" and python_version < "4.0"
pycodestyle==2.10.0 ; python_version >= "3.9" and python_version < "4.0"
pycountry @ git+https://github.com/alanorth/pycountry@iso-codes-4.12.0 ; python_version >= "3.9" and python_version < "4.0"
pyflakes==2.5.0 ; python_version >= "3.9" and python_version < "4.0"
pygments==2.13.0 ; python_version >= "3.9" and python_version < "4.0"
pyflakes==3.0.1 ; python_version >= "3.9" and python_version < "4.0"
pygments==2.14.0 ; python_version >= "3.9" and python_version < "4.0"
pytest-clarity==1.0.1 ; python_version >= "3.9" and python_version < "4.0"
pytest==7.2.0 ; python_version >= "3.9" and python_version < "4.0"
pytest==7.2.1 ; python_version >= "3.9" and python_version < "4.0"
python-dateutil==2.8.2 ; python_version >= "3.9" and python_version < "4.0"
python-slugify==7.0.0 ; python_version >= "3.9" and python_version < "4.0"
python-slugify==8.0.0 ; python_version >= "3.9" and python_version < "4.0"
python-stdnum==1.18 ; python_version >= "3.9" and python_version < "4.0"
pytimeparse==1.1.8 ; python_version >= "3.9" and python_version < "4.0"
pytz==2022.6 ; python_version >= "3.9" and python_version < "4.0"
requests-cache==0.9.7 ; python_version >= "3.9" and python_version < "4.0"
requests==2.28.1 ; python_version >= "3.9" and python_version < "4"
rich==12.6.0 ; python_version >= "3.9" and python_version < "4.0"
pytz==2022.7.1 ; python_version >= "3.9" and python_version < "4.0"
requests-cache==0.9.8 ; python_version >= "3.9" and python_version < "4.0"
requests==2.28.2 ; python_version >= "3.9" and python_version < "4"
rich==13.3.1 ; python_version >= "3.9" and python_version < "4.0"
six==1.16.0 ; python_version >= "3.9" and python_version < "4.0"
sqlalchemy==1.4.45 ; python_version >= "3.9" and python_version < "4.0"
sqlalchemy==1.4.46 ; python_version >= "3.9" and python_version < "4.0"
stack-data==0.6.2 ; python_version >= "3.9" and python_version < "4.0"
text-unidecode==1.3 ; python_version >= "3.9" and python_version < "4.0"
tomli==2.0.1 ; python_version >= "3.9" and python_full_version < "3.11.0a7"
traitlets==5.7.1 ; python_version >= "3.9" and python_version < "4.0"
typing-extensions==4.4.0 ; python_version >= "3.9" and python_version < "3.10"
tomli==2.0.1 ; python_version >= "3.9" and python_version < "3.11"
traitlets==5.9.0 ; python_version >= "3.9" and python_version < "4.0"
typing-extensions==4.5.0 ; python_version >= "3.9" and python_version < "3.10"
url-normalize==1.4.3 ; python_version >= "3.9" and python_version < "4.0"
urllib3==1.26.13 ; python_version >= "3.9" and python_version < "4"
wcwidth==0.2.5 ; python_version >= "3.9" and python_version < "4"
urllib3==1.26.14 ; python_version >= "3.9" and python_version < "4"
wcwidth==0.2.6 ; python_version >= "3.9" and python_version < "4"
xlrd==2.0.1 ; python_version >= "3.9" and python_version < "4.0"

View File

@ -1,23 +1,23 @@
appdirs==1.4.4 ; python_version >= "3.9" and python_version < "4.0"
attrs==22.1.0 ; python_version >= "3.9" and python_version < "4.0"
attrs==22.2.0 ; python_version >= "3.9" and python_version < "4.0"
cattrs==22.2.0 ; python_version >= "3.9" and python_version < "4.0"
certifi==2022.12.7 ; python_version >= "3.9" and python_version < "4"
charset-normalizer==2.1.1 ; python_version >= "3.9" and python_version < "4"
charset-normalizer==3.0.1 ; python_version >= "3.9" and python_version < "4"
colorama==0.4.6 ; python_version >= "3.9" and python_version < "4.0"
country-converter @ git+https://github.com/alanorth/country_converter.git@myanmar-region ; python_version >= "3.9" and python_version < "4.0"
exceptiongroup==1.0.4 ; python_version >= "3.9" and python_version < "3.11"
exceptiongroup==1.1.0 ; python_version >= "3.9" and python_version < "3.11"
ftfy==6.1.1 ; python_version >= "3.9" and python_version < "4"
idna==3.4 ; python_version >= "3.9" and python_version < "4"
langid==1.1.6 ; python_version >= "3.9" and python_version < "4.0"
numpy==1.23.5 ; python_version < "4.0" and python_version >= "3.9"
pandas==1.5.2 ; python_version >= "3.9" and python_version < "4.0"
numpy==1.24.2 ; python_version < "4.0" and python_version >= "3.9"
pandas==1.5.3 ; python_version >= "3.9" and python_version < "4.0"
pycountry @ git+https://github.com/alanorth/pycountry@iso-codes-4.12.0 ; python_version >= "3.9" and python_version < "4.0"
python-dateutil==2.8.2 ; python_version >= "3.9" and python_version < "4.0"
python-stdnum==1.18 ; python_version >= "3.9" and python_version < "4.0"
pytz==2022.6 ; python_version >= "3.9" and python_version < "4.0"
requests-cache==0.9.7 ; python_version >= "3.9" and python_version < "4.0"
requests==2.28.1 ; python_version >= "3.9" and python_version < "4"
pytz==2022.7.1 ; python_version >= "3.9" and python_version < "4.0"
requests-cache==0.9.8 ; python_version >= "3.9" and python_version < "4.0"
requests==2.28.2 ; python_version >= "3.9" and python_version < "4"
six==1.16.0 ; python_version >= "3.9" and python_version < "4.0"
url-normalize==1.4.3 ; python_version >= "3.9" and python_version < "4.0"
urllib3==1.26.13 ; python_version >= "3.9" and python_version < "4"
wcwidth==0.2.5 ; python_version >= "3.9" and python_version < "4"
urllib3==1.26.14 ; python_version >= "3.9" and python_version < "4"
wcwidth==0.2.6 ; python_version >= "3.9" and python_version < "4"

View File

@ -1,6 +0,0 @@
[isort]
multi_line_output=3
include_trailing_comma=True
force_grid_wrap=0
use_parentheses=True
line_length=88

View File

@ -14,7 +14,7 @@ install_requires = [
setuptools.setup(
name="csv-metadata-quality",
version="0.6.0",
version="0.6.1",
author="Alan Orth",
author_email="aorth@mjanja.ch",
description="A simple, but opinionated CSV quality checking and fixing pipeline for CSVs in the DSpace ecosystem.",
@ -23,7 +23,6 @@ setuptools.setup(
long_description_content_type="text/markdown",
url="https://github.com/alanorth/csv-metadata-quality",
classifiers=[
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"License :: OSI Approved :: GNU General Public License v3 (GPLv3)",

View File

@ -25,7 +25,7 @@ def test_check_valid_issn():
result = check.issn(value)
assert result == None
assert result is None
def test_check_invalid_isbn(capsys):
@ -46,7 +46,7 @@ def test_check_valid_isbn():
result = check.isbn(value)
assert result == None
assert result is None
def test_check_missing_date(capsys):
@ -102,7 +102,7 @@ def test_check_valid_date():
result = check.date(value, field_name)
assert result == None
assert result is None
def test_check_suspicious_characters(capsys):
@ -128,7 +128,7 @@ def test_check_valid_iso639_1_language():
result = check.language(value)
assert result == None
assert result is None
def test_check_valid_iso639_3_language():
@ -138,7 +138,7 @@ def test_check_valid_iso639_3_language():
result = check.language(value)
assert result == None
assert result is None
def test_check_invalid_iso639_1_language(capsys):
@ -249,7 +249,7 @@ def test_check_common_filename_extension():
result = check.filename_extension(value)
assert result == None
assert result is None
def test_check_incorrect_iso_639_1_language(capsys):
@ -305,7 +305,7 @@ def test_check_correct_iso_639_1_language():
result = experimental.correct_language(series, exclude)
assert result == None
assert result is None
def test_check_correct_iso_639_3_language():
@ -321,7 +321,7 @@ def test_check_correct_iso_639_3_language():
result = experimental.correct_language(series, exclude)
assert result == None
assert result is None
def test_check_valid_spdx_license_identifier():
@ -331,7 +331,7 @@ def test_check_valid_spdx_license_identifier():
result = check.spdx_license_identifier(license)
assert result == None
assert result is None
def test_check_invalid_spdx_license_identifier(capsys):
@ -339,7 +339,7 @@ def test_check_invalid_spdx_license_identifier(capsys):
license = "CC-BY-SA"
result = check.spdx_license_identifier(license)
check.spdx_license_identifier(license)
captured = capsys.readouterr()
assert (
@ -362,7 +362,7 @@ def test_check_duplicate_item(capsys):
}
df = pd.DataFrame(data=d)
result = check.duplicate_items(df)
check.duplicate_items(df)
captured = capsys.readouterr()
assert (
@ -379,7 +379,7 @@ def test_check_no_mojibake():
result = check.mojibake(field, field_name)
assert result == None
assert result is None
def test_check_mojibake(capsys):
@ -388,7 +388,7 @@ def test_check_mojibake(capsys):
field = "CIAT Publicaçao"
field_name = "dcterms.isPartOf"
result = check.mojibake(field, field_name)
check.mojibake(field, field_name)
captured = capsys.readouterr()
assert (
@ -411,7 +411,7 @@ def test_check_doi_field():
result = check.citation_doi(series, exclude)
assert result == None
assert result is None
def test_check_doi_only_in_citation(capsys):
@ -448,7 +448,7 @@ def test_title_in_citation():
result = check.title_in_citation(series, exclude)
assert result == None
assert result is None
def test_title_not_in_citation(capsys):
@ -485,7 +485,7 @@ def test_country_matches_region():
result = check.countries_match_regions(series, exclude)
assert result == None
assert result is None
def test_country_not_matching_region(capsys):