1
0
mirror of https://github.com/ilri/csv-metadata-quality.git synced 2024-10-31 19:43:00 +01:00
Commit Graph

39 Commits

Author SHA1 Message Date
7097136b7e
Use my fork of country_converter again
There is an issue with the UN M.49 region for Myanmar.
2022-11-28 17:38:45 +03:00
b02f1f65ee
pyproject.toml: use upstream country_converter
Version 0.8.0 has the country and UN M.49 region fixes.

See: https://github.com/konstantinstadler/country_converter/releases/tag/v0.8.0
2022-11-28 17:14:16 +03:00
4d5ef38dde
pyproject.toml: add ipython to dev dependencies 2022-11-28 17:11:18 +03:00
df57988e5a
Use my fork of pycountry
Until they update to iso-codes 4.12.0.

See: https://github.com/flyingcircusio/pycountry/pull/149
2022-11-08 10:21:28 +03:00
15f52f8be8
Switch to my fork of country-converter
Until a few issues are resolved regarding new countries and regions.

See: https://github.com/konstantinstadler/country_converter/pull/122
See: https://github.com/konstantinstadler/country_converter/pull/123
2022-11-08 10:04:31 +03:00
ca82820a8e
pyproject.toml: update dependencies to latest 2022-11-07 12:13:28 +03:00
58b7b6e9d8
Version 0.6.0
All checks were successful
continuous-integration/drone/push Build is passing
2022-09-02 16:35:58 +03:00
21e9948a75
pyproject.toml: manually updated all deps
Update all deps to their latest versions on pypi.org and remove the
explicit dependency on SQLAlchemy.
2022-09-02 16:30:40 +03:00
566c2b45cf
Remove Excel support
I never used this and it seems xlrd doesn't even support .xlsx any-
more anyways. If this was needed I could theoretically use openpyxl
but I'd rather just stick to CSV.
2022-09-02 16:14:24 +03:00
b0d46cd864
pyproject.toml: update black
It's no longer in beta!
2022-01-30 13:22:47 +03:00
3ee9319d84
pyproject.toml: bump flake8 2022-01-30 13:21:09 +03:00
4d5f4b5abb
pyproject.toml: update pycountry
Seems to be a few major versions from 19.x.x to 21.x.x. All tests
passing in pytest so it's probably fine.
2022-01-30 13:15:38 +03:00
98d38801fa
pyproject.toml: update requests and requests-cache 2022-01-30 13:11:01 +03:00
e94a4539bf
pyproject.toml: bump Pandas to v1.4.0
As of Pandas v1.4.0 the minimum Python version is 3.8.

See: https://pandas.pydata.org/docs/whatsnew/v1.4.0.html
2022-01-30 13:03:56 +03:00
d9e427a80e
pyproject.toml: don't install ipython
It always complains about running in a virtual environment anyways,
and I can use the one from the OS instead.
2022-01-29 16:25:58 +03:00
8b15154285
pyproject.toml: use ftfy 6.0
Lots of improvements here! Improvements to heuristics and a new way
to configure which fixes get applied.

See: https://github.com/rspeer/python-ftfy/blob/master/CHANGELOG.md#version-60-april-2-2021
2021-12-15 21:48:56 +02:00
9905e183ea
Bump version to 0.6.0-dev 2021-12-09 23:21:30 +02:00
cc34db7ff8
Version 0.5.0
All checks were successful
continuous-integration/drone/push Build is passing
2021-12-08 15:29:46 +02:00
ccc2a73456
Add check for countries without matching regions
If we have country "Kenya" we should have region "Eastern Africa"
according to the UN M.49 geolocation scheme.
2021-12-08 15:02:20 +02:00
215d61c188
pyproject.toml: limit SQLAlchemy to < 1.4.23
SQLAlchemy gets pulled in by csvkit's agate-sql dependency and there
is currently an issue with Poetry's parsing of the SQLAlchemy 1.4.23
constraints. Temporarily explicitly install a version of SQLAlchemy
that works (can remove later once Poetry fixes this). Anyways, I am
not using any SQLAlchemy features that I know of.

See: https://github.com/python-poetry/poetry/issues/4402
2021-09-06 21:01:09 +03:00
b8f4be9ebb
pyproject.toml: Update pytest-clarity and black
These seem to have much newer versions that didn't get updated in
this project due to the version pinning selector I was using with
poetry.

In the case of pytest-clarity the previous version was 0.3.1 and
the version selector was a caret (^), which will never update the
left-most (major) number. Now they seem to be on 1.x.x so it will
be OK in the future.

In the case of black, they use weird numbering so it's anyone's
guess how this will work! Luckily it's only used for linting and
formatting.
2021-07-06 15:30:41 +03:00
4e2eab68b0
Update requests-cache
Apparently we were stuck on an older version of requests-cache due
to the fact that we were using the caret, which will never update
the left-most (major) version. Upstream requests-cache is currently
version 0.6.4, and there seems to have been some changes to the API.
2021-07-06 15:24:39 +03:00
8eddb76aab
Bump version to 0.4.8-dev
All checks were successful
continuous-integration/drone/push Build is passing
2021-03-19 11:53:56 +02:00
898bb412c3
Add checks and unsafe fixes for mojibake
This detects whether text has likely been encoded in one encoding
and decoded in another, perhaps multiple times. This often results
in display of "mojibake" characters.

For example, a file encoded in UTF-8 is opened as CP-1252 (Windows
Latin codepage) in Microsoft Excel, and saved again as UTF-8. You
will see strings like this in the resulting file:

    - CIAT Publicaçao
    - CIAT Publicación

The correct version of these in UTF-8 would be:

    - CIAT Publicaçao
    - CIAT Publicación

I use a code snippet from Martijn Pieters on StackOverflow to de-
tect whether a string is "weird" as determined by the excellent
"fixes text for you" (ftfy) Python library, then check if a weird
string encodes as CP-1252 or not. If so, I can try to fix it.

See: https://stackoverflow.com/questions/29071995/identify-garbage-unicode-string-using-python
2021-03-19 10:22:21 +02:00
f816e17fe7
Version 0.4.7
All checks were successful
continuous-integration/drone/push Build is passing
2021-03-17 10:00:34 +02:00
fa84cfa440
Bump version to 0.4.6-dev 2021-03-11 22:44:36 +02:00
6cc1401f88
pyproject.toml: Minimum Python is technically 3.7.1
All checks were successful
continuous-integration/drone/push Build is passing
See: https://pandas.pydata.org/pandas-docs/stable/whatsnew/v1.2.0.html
2021-03-11 13:41:58 +02:00
1554cfd5c9
Version 0.4.6 2021-03-11 12:14:54 +02:00
6e4b0e5c1b
Add validation of SPDX license identifiers
Currently this only checks the dcterms.license field and the result
will only be a warning.
2021-03-11 10:33:16 +02:00
b16fa9121f
pyproject.toml: Add csv-metadata-quality as a script
All checks were successful
continuous-integration/drone/push Build is passing
For some reason I stopped having csv-metadata-quality available in
my poetry environment after install. It seems I need to add it as a
poetry tool script? I had already done this in setup.py years ago,
which works for regular python setup.py installs, but hadn't needed
to do it in poetry for a year or more that I've been using it, until
now.
2021-03-08 09:50:05 +02:00
202bda862a
Bump version to 0.4.5
All checks were successful
continuous-integration/drone/push Build is passing
2021-03-04 21:38:10 +02:00
d76e72532a
Move unreleased changes to v0.4.4
All checks were successful
continuous-integration/drone/push Build is passing
2021-02-21 13:25:22 +02:00
7fb8acb866
Add colorama for colored output
Red for errors, yellow for warnings or information, and green for
fixes.
2021-02-21 13:00:31 +02:00
cbf94490f2
Version 0.4.3 2021-01-26 15:22:40 +02:00
f4914c414f
Only install ipython on Python 3.7+ 2020-10-06 17:48:16 +03:00
f13c360084
Update poetry package dependencies 2020-10-06 17:20:16 +03:00
cb07d357d4
Version 0.4.2 2020-07-06 14:04:34 +03:00
aa9e23b46c
pyproject.toml: Update license specifier
We need to use valid SPDX license identifiers.
2020-06-09 14:22:53 +03:00
0c44b967b6
Add poetry project file and lock
I want to try to use poetry instead of pipenv because pipenv takes
forever to do dependency resolution sometimes. Also, I have had a
few issues with Python modules like black that don't have releases
other than pre-releases, and even including the project itself in
the dependencies (pip install -e . ...?). My initial experience is
that poetry handles this better.
2020-05-31 17:33:40 +03:00