mirror of
https://github.com/ilri/csv-metadata-quality.git
synced 2025-05-09 22:56:01 +02:00
Add check for "suspicious" characters
These standalone characters often indicate issues with encoding or copy/paste in languages with accents like French and Spanish. For example: foreˆt should be forêt. It is not possible to fix these issues automatically, but this will print a warning so you can notify the owner of the data.
This commit is contained in:
@ -11,6 +11,7 @@ Requires Python 3.6 or greater. CSV and Excel support comes from the [Pandas](ht
|
||||
- Fix leading, trailing, and excessive whitespace ✓
|
||||
- Fix invalid multi-value separators ("|") using `--unsafe-fixes` ✓
|
||||
- Remove unnecessary Unicode like [non-breaking spaces](https://en.wikipedia.org/wiki/Non-breaking_space), [replacement characters](https://en.wikipedia.org/wiki/Specials_(Unicode_block)#Replacement_character), etc ✓
|
||||
- Check for "suspicious" characters that could indicate encoding or copy/paste issues, for example "foreˆt" should be "forêt" ✓
|
||||
|
||||
## Installation
|
||||
The easiest way to install CSV Metadata Quality is with [pipenv](https://github.com/pypa/pipenv):
|
||||
|
Reference in New Issue
Block a user