1
0
mirror of https://github.com/ilri/csv-metadata-quality.git synced 2024-11-16 11:07:03 +01:00
Commit Graph

16 Commits

Author SHA1 Message Date
898bb412c3
Add checks and unsafe fixes for mojibake
This detects whether text has likely been encoded in one encoding
and decoded in another, perhaps multiple times. This often results
in display of "mojibake" characters.

For example, a file encoded in UTF-8 is opened as CP-1252 (Windows
Latin codepage) in Microsoft Excel, and saved again as UTF-8. You
will see strings like this in the resulting file:

    - CIAT Publicaçao
    - CIAT Publicación

The correct version of these in UTF-8 would be:

    - CIAT Publicaçao
    - CIAT Publicación

I use a code snippet from Martijn Pieters on StackOverflow to de-
tect whether a string is "weird" as determined by the excellent
"fixes text for you" (ftfy) Python library, then check if a weird
string encodes as CP-1252 or not. If so, I can try to fix it.

See: https://stackoverflow.com/questions/29071995/identify-garbage-unicode-string-using-python
2021-03-19 10:22:21 +02:00
f816e17fe7
Version 0.4.7
All checks were successful
continuous-integration/drone/push Build is passing
2021-03-17 10:00:34 +02:00
fa84cfa440
Bump version to 0.4.6-dev 2021-03-11 22:44:36 +02:00
6cc1401f88
pyproject.toml: Minimum Python is technically 3.7.1
All checks were successful
continuous-integration/drone/push Build is passing
See: https://pandas.pydata.org/pandas-docs/stable/whatsnew/v1.2.0.html
2021-03-11 13:41:58 +02:00
1554cfd5c9
Version 0.4.6 2021-03-11 12:14:54 +02:00
6e4b0e5c1b
Add validation of SPDX license identifiers
Currently this only checks the dcterms.license field and the result
will only be a warning.
2021-03-11 10:33:16 +02:00
b16fa9121f
pyproject.toml: Add csv-metadata-quality as a script
All checks were successful
continuous-integration/drone/push Build is passing
For some reason I stopped having csv-metadata-quality available in
my poetry environment after install. It seems I need to add it as a
poetry tool script? I had already done this in setup.py years ago,
which works for regular python setup.py installs, but hadn't needed
to do it in poetry for a year or more that I've been using it, until
now.
2021-03-08 09:50:05 +02:00
202bda862a
Bump version to 0.4.5
All checks were successful
continuous-integration/drone/push Build is passing
2021-03-04 21:38:10 +02:00
d76e72532a
Move unreleased changes to v0.4.4
All checks were successful
continuous-integration/drone/push Build is passing
2021-02-21 13:25:22 +02:00
7fb8acb866
Add colorama for colored output
Red for errors, yellow for warnings or information, and green for
fixes.
2021-02-21 13:00:31 +02:00
cbf94490f2
Version 0.4.3 2021-01-26 15:22:40 +02:00
f4914c414f
Only install ipython on Python 3.7+ 2020-10-06 17:48:16 +03:00
f13c360084
Update poetry package dependencies 2020-10-06 17:20:16 +03:00
cb07d357d4
Version 0.4.2 2020-07-06 14:04:34 +03:00
aa9e23b46c
pyproject.toml: Update license specifier
We need to use valid SPDX license identifiers.
2020-06-09 14:22:53 +03:00
0c44b967b6
Add poetry project file and lock
I want to try to use poetry instead of pipenv because pipenv takes
forever to do dependency resolution sometimes. Also, I have had a
few issues with Python modules like black that don't have releases
other than pre-releases, and even including the project itself in
the dependencies (pip install -e . ...?). My initial experience is
that poetry handles this better.
2020-05-31 17:33:40 +03:00