1
0
mirror of https://github.com/ilri/csv-metadata-quality.git synced 2024-11-25 23:28:18 +01:00

Compare commits

...

3 Commits

Author SHA1 Message Date
27b2d81ca8
CHANGELOG.md: Add note about dcterms.issued
All checks were successful
continuous-integration/drone/push Build is passing
2021-02-28 15:14:39 +02:00
91ebd0f606
README.md: Update TODOs
A few of these date things have been addressed.
2021-02-28 15:13:36 +02:00
dd2cfae047
csv_metadata_quality/app.py: Match dcterms.issued for dates
We used to only check fields that had "date" in their name because
we were using DSpace's default dc.date.* fields. Now we are using
dcterms.issued so I will add that one as well.
2021-02-28 15:11:06 +02:00
3 changed files with 6 additions and 3 deletions

View File

@ -4,6 +4,11 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## Unreleased changes
### Added
- Check dates in dcterms.issued field as well, not just fields that have the
word "date" in them
## [0.4.4] - 2021-02-21
### Added
- Accept dates formatted in ISO 8601 extended with combined date and time, for

View File

@ -109,8 +109,6 @@ This currently uses the [Python langid](https://github.com/saffsd/langid.py) lib
- Add an option to drop invalid AGROVOC subjects?
- Add tests for application invocation, ie `tests/test_app.py`?
- Validate ISSNs or journal titles against CrossRef API?
- Better ISO 8601 date parsing (currently only supports simple dates, perhaps we need to use dateutil.parser.parseiso())
- Fix lazy date check (assumes field name has "date" but could be dcterms.issued etc!)
## License
This work is licensed under the [GPLv3](https://www.gnu.org/licenses/gpl-3.0.en.html).

View File

@ -142,7 +142,7 @@ def run(argv):
df[column] = df[column].apply(check.isbn)
# Check: invalid date
match = re.match(r"^.*?date.*$", column)
match = re.match(r"^.*?(date|dcterms\.issued).*$", column)
if match is not None:
df[column] = df[column].apply(check.date, field_name=column)