1
0
mirror of https://github.com/ilri/csv-metadata-quality.git synced 2024-11-27 16:18:19 +01:00

Compare commits

...

3 Commits

Author SHA1 Message Date
20a2cce34b
CHANGELOG.md: add fixes
Some checks failed
continuous-integration/drone/push Build is failing
2023-03-10 16:17:20 +03:00
d661ffe439
Check comma space on bibliographicCitation too
The regex was only matching `dc.identifier.citation`, but we need
to match `dcterms.bibliographicCitation` too.
2023-03-10 16:13:16 +03:00
45a310387a
Don't fix multi-value separators on citations 2023-03-10 16:12:30 +03:00
2 changed files with 9 additions and 2 deletions

View File

@ -4,6 +4,13 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## Unreleased
### Fixed
- Fixed regex so we don't run the invalid multi-value separator fix on
`dcterms.bibliographicCitation` fields
- Fixed regex so we run the comma space fix on `dcterms.bibliographicCitation`
fields
## [0.6.1] - 2023-02-23
### Fixed
- Missing region check should ignore subregion field, if it exists

View File

@ -102,7 +102,7 @@ def run(argv):
# Fix: missing space after comma. Only run on author and citation
# fields for now, as this problem is mostly an issue in names.
if args.unsafe_fixes:
match = re.match(r"^.*?(author|citation).*$", column)
match = re.match(r"^.*?(author|[Cc]itation).*$", column)
if match is not None:
df[column] = df[column].apply(fix.comma_space, field_name=column)
@ -126,7 +126,7 @@ def run(argv):
# Fix: invalid and unnecessary multi-value separators. Skip the title
# and abstract fields because "|" is used to indicate something like
# a subtitle.
match = re.match(r"^.*?(abstract|title).*$", column)
match = re.match(r"^.*?(abstract|[Cc]itation|title).*$", column)
if match is None:
df[column] = df[column].apply(fix.separators, field_name=column)
# Run whitespace fix again after fixing invalid separators