2019-08-02 00:10:28 +03:00
|
|
|
# Changelog
|
|
|
|
All notable changes to this project will be documented in this file.
|
|
|
|
|
|
|
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
|
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
|
|
|
2021-03-19 11:48:27 +02:00
|
|
|
## Unreleased
|
|
|
|
### Added
|
|
|
|
- Ability to check for, and fix, "mojibake" characters using [ftfy](https://github.com/LuminosoInsight/python-ftfy)
|
|
|
|
|
2021-04-14 16:16:02 +03:00
|
|
|
### Updated
|
|
|
|
- Python dependencies
|
|
|
|
|
2021-03-17 10:00:34 +02:00
|
|
|
## [0.4.7] - 2021-03-17
|
2021-03-13 12:59:45 +02:00
|
|
|
### Changed
|
|
|
|
- Fixing invalid multi-value separators like `|` and `|||` is no longer class-
|
|
|
|
ified as "unsafe" as I have yet to see a case where this was intentional
|
2021-03-16 16:11:24 +02:00
|
|
|
- Not user visible, but now checks only print a warning to the screen instead
|
|
|
|
of returning a value and re-writing the DataFrame, which should be faster and
|
|
|
|
use less memory
|
2021-03-13 12:59:45 +02:00
|
|
|
|
2021-03-14 09:13:51 +02:00
|
|
|
### Added
|
|
|
|
- Configurable directory for AGROVOC requests cache (to allow running the web
|
|
|
|
version from Google App Engine where we can only write to /tmp)
|
2021-03-17 09:55:07 +02:00
|
|
|
- Ability to check for duplicate items in the data set (uses a combination of
|
|
|
|
the title, type, and date issued to determine uniqueness)
|
2021-03-14 09:13:51 +02:00
|
|
|
|
2021-03-14 21:04:19 +02:00
|
|
|
### Removed
|
|
|
|
- Checks for invalid and unnecessary multi-value separators because now I fix
|
|
|
|
them whenever I see them, so there is no need to have checks for them
|
|
|
|
|
2021-03-17 09:58:02 +02:00
|
|
|
### Updated
|
|
|
|
- Run `poetry update` to update project dependencies
|
|
|
|
|
2021-03-11 12:14:54 +02:00
|
|
|
## [0.4.6] - 2021-03-11
|
2021-03-11 12:13:22 +02:00
|
|
|
### Added
|
2021-03-11 10:37:27 +02:00
|
|
|
- Validation of dcterms.license field against SPDX license identifiers
|
|
|
|
|
2021-03-11 12:13:22 +02:00
|
|
|
### Changed
|
2021-03-11 10:50:52 +02:00
|
|
|
- Use DCTERMS fields where possible in `data/test.csv`
|
|
|
|
|
2021-03-11 11:10:27 +02:00
|
|
|
### Updated
|
|
|
|
- Run `poetry update` to update project dependencies
|
|
|
|
|
2021-03-11 12:14:54 +02:00
|
|
|
### Fixed
|
|
|
|
- Output for all fixes should be green, because it is good
|
|
|
|
|
2021-03-04 21:38:10 +02:00
|
|
|
## [0.4.5] - 2021-03-04
|
2021-02-28 15:14:39 +02:00
|
|
|
### Added
|
|
|
|
- Check dates in dcterms.issued field as well, not just fields that have the
|
|
|
|
word "date" in them
|
|
|
|
|
2021-03-04 21:32:46 +02:00
|
|
|
### Updated
|
|
|
|
- Run `poetry update` to update project dependencies
|
|
|
|
|
2021-02-21 13:25:22 +02:00
|
|
|
## [0.4.4] - 2021-02-21
|
2021-02-04 21:43:44 +02:00
|
|
|
### Added
|
|
|
|
- Accept dates formatted in ISO 8601 extended with combined date and time, for
|
|
|
|
example: 2020-08-31T11:04:56Z
|
2021-02-21 13:12:26 +02:00
|
|
|
- Colorized output: red for errors, yellow for warnings and information, green
|
|
|
|
for changes
|
2021-02-04 21:43:44 +02:00
|
|
|
|
2021-02-04 21:48:12 +02:00
|
|
|
### Updated
|
|
|
|
- Run `poetry update` to update project dependencies
|
|
|
|
|
2021-01-26 15:22:40 +02:00
|
|
|
## [0.4.3] - 2021-01-26
|
2020-07-06 14:10:46 +03:00
|
|
|
### Changed
|
|
|
|
- Reformat with black
|
2021-01-03 15:51:53 +02:00
|
|
|
- Requires Python 3.7+ for pandas 1.2.0
|
2020-07-06 14:10:46 +03:00
|
|
|
|
2020-09-08 15:04:40 +03:00
|
|
|
### Updated
|
|
|
|
- Run `poetry update`
|
2021-01-26 15:20:22 +02:00
|
|
|
- Expand check/fix for multi-value separators to include metadata with invalid
|
|
|
|
separators at the end, for example "Kenya||Tanzania||"
|
2020-09-08 15:04:40 +03:00
|
|
|
|
2020-07-06 14:04:34 +03:00
|
|
|
## [0.4.2] - 2020-07-06
|
2020-01-16 12:37:11 +02:00
|
|
|
### Changed
|
|
|
|
- Add field name to the output for more fixes and checks to help identify where
|
|
|
|
the error is
|
2020-07-06 14:00:21 +03:00
|
|
|
- Minor optimizations to AGROVOC subject lookup
|
|
|
|
- Use Poetry instead of Pipenv
|
2020-01-16 12:37:11 +02:00
|
|
|
|
2020-01-29 12:41:43 +02:00
|
|
|
### Updated
|
|
|
|
- Update python dependencies to latest versions
|
|
|
|
|
2020-01-15 12:19:42 +02:00
|
|
|
## [0.4.1] - 2020-01-15
|
|
|
|
### Changed
|
|
|
|
- Reduce minimum Python version to 3.6 by working around the `is_normalized()`
|
|
|
|
that only works in Python >= 3.8
|
|
|
|
|
2020-01-15 11:44:56 +02:00
|
|
|
## [0.4.0] - 2020-01-15
|
2020-01-15 11:40:40 +02:00
|
|
|
### Added
|
|
|
|
- Unicode normalization (enable with `--unsafe-fixes`, see README.md)
|
|
|
|
|
2019-11-14 09:19:19 +02:00
|
|
|
### Updated
|
2020-01-15 10:58:44 +02:00
|
|
|
- Update python dependencies to latest versions, including numpy 1.18.1, pandas
|
|
|
|
1.0.0rc0, flake8 3.7.9, pytest 5.3.2, and black 19.10b0
|
2019-11-14 23:30:26 +02:00
|
|
|
- Regenerate requirements.txt and requirements-dev.txt
|
2019-11-14 09:19:19 +02:00
|
|
|
|
2019-11-14 23:11:43 +02:00
|
|
|
### Changed
|
|
|
|
- Use Python 3.8.0 for pipenv
|
2019-11-14 23:24:08 +02:00
|
|
|
- Use Ubuntu 18.04 "Bionic" for TravisCI builds
|
|
|
|
- Test Python 3.8 in TravisCI builds
|
2019-11-14 23:11:43 +02:00
|
|
|
|
2019-10-01 17:11:52 +03:00
|
|
|
## [0.3.1] - 2019-10-01
|
2019-10-01 17:10:23 +03:00
|
|
|
## Changed
|
2019-10-01 16:56:37 +03:00
|
|
|
- Replace non-breaking spaces (U+00A0) with space instead of removing them
|
2019-10-01 17:10:23 +03:00
|
|
|
- Harmonize language of script output when fixing various issues
|
2019-10-01 16:56:37 +03:00
|
|
|
|
2019-09-26 14:06:31 +03:00
|
|
|
## [0.3.0] - 2019-09-26
|
2019-08-28 21:02:21 +03:00
|
|
|
### Updated
|
2019-09-11 16:45:39 +03:00
|
|
|
- Update python dependencies to latest versions, including numpy 1.17.2, pandas
|
2019-09-24 18:55:05 +03:00
|
|
|
0.25.1, pytest 5.1.3, and requests-cache 0.5.2
|
2019-08-28 21:02:21 +03:00
|
|
|
|
2019-09-26 14:13:50 +03:00
|
|
|
### Added
|
2019-09-24 18:49:20 +03:00
|
|
|
- csvkit to dev requirements (csvcut etc are useful during development)
|
2019-09-26 14:14:57 +03:00
|
|
|
- Experimental language validation using the Python `langid` library (enable with `-e`, see README.md)
|
2019-09-24 18:49:20 +03:00
|
|
|
|
2019-08-29 01:26:11 +03:00
|
|
|
### Changed
|
|
|
|
- Re-formatted code with black and isort
|
|
|
|
|
2019-08-28 00:10:17 +03:00
|
|
|
## [0.2.2] - 2019-08-27
|
2019-08-21 15:35:46 +03:00
|
|
|
### Changed
|
|
|
|
- Output of date checks to include column names (helps debugging in case there are multiple date fields)
|
|
|
|
|
2019-08-27 00:11:22 +03:00
|
|
|
### Added
|
|
|
|
- Ability to exclude certain fields using `--exclude-fields`
|
2019-08-28 00:10:17 +03:00
|
|
|
- Fix for missing space after a comma, ie "Orth,Alan S."
|
2019-08-27 00:11:22 +03:00
|
|
|
|
2019-08-21 16:37:49 +03:00
|
|
|
### Improved
|
|
|
|
- AGROVOC lookup code
|
|
|
|
|
2019-08-11 10:39:18 +03:00
|
|
|
## [0.2.1] - 2019-08-11
|
2019-08-10 23:47:46 +03:00
|
|
|
### Added
|
2019-08-11 10:43:27 +03:00
|
|
|
- Check for uncommon filename extensions
|
2019-08-11 00:09:35 +03:00
|
|
|
- Replacement of unneccessary Unicode characters like soft hyphens (U+00AD)
|
2019-08-10 23:47:46 +03:00
|
|
|
|
2019-08-09 01:39:43 +03:00
|
|
|
## [0.2.0] - 2019-08-09
|
2019-08-03 22:18:44 +03:00
|
|
|
### Added
|
|
|
|
- Handle Ctrl-C interrupt gracefully
|
2019-08-09 01:28:07 +03:00
|
|
|
- Make output in suspicious character check more user friendly
|
2019-08-09 01:33:34 +03:00
|
|
|
- Add pytest-clarity to dev packages for more user friendly pytest output
|
2019-08-02 00:10:28 +03:00
|
|
|
|
|
|
|
## [0.1.0] - 2019-08-01
|
|
|
|
### Changed
|
|
|
|
- AGROVOC validation is now turned off by default
|
|
|
|
|
|
|
|
### Added
|
|
|
|
- Ability to enable AGROVOC validation on a field-by-field basis using the `--agrovoc-fields` option
|
|
|
|
- Option to print the version (`--version` or `-V`)
|