Commit Graph

615 Commits

Author SHA1 Message Date
Alan Orth 59742e47f1
Update requirements
Generated with poetry export:

    $ poetry export --without-hashes -f requirements.txt > requirements.txt
    $ poetry export --without-hashes --with dev -f requirements.txt > requirements-dev.txt

I am trying `--without-hashes` to work around an error on pip install
when running in CI:

    ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==
2022-09-02 16:32:04 +03:00
Alan Orth 9c741b1d49
poetry.lock: sync latest deps 2022-09-02 16:31:19 +03:00
Alan Orth 21e9948a75
pyproject.toml: manually updated all deps
Update all deps to their latest versions on pypi.org and remove the
explicit dependency on SQLAlchemy.
2022-09-02 16:30:40 +03:00
Alan Orth f64435fc9d
tests/test_check.py: add missing excludes 2022-09-02 16:24:33 +03:00
Alan Orth 566c2b45cf
Remove Excel support
I never used this and it seems xlrd doesn't even support .xlsx any-
more anyways. If this was needed I could theoretically use openpyxl
but I'd rather just stick to CSV.
2022-09-02 16:14:24 +03:00
Alan Orth 41b813be6e
CHANGELOG.md: add not about exclude logic 2022-09-02 16:03:51 +03:00
Alan Orth 040e56fc76
Improve exclude function
When a user explicitly requests that a field be excluded with -x we
skip that field in most checks. Up until now that did not include
the item-based checks using a transposed dataframe because we don't
know the metadata field names (labels) until we iterate over them.

Now the excludes are respected for item-based checks.
2022-09-02 15:59:22 +03:00
Alan Orth 1f76247353
csv_metadata_quality/app.py: rework exclude/skip
Instead of processing the excludes inside the for column loop we do
it once before and then only need to check if the current column is
in the list.
2022-09-02 10:35:04 +03:00
Alan Orth 2e489fc921
Add new data/test-geography.csv test file
continuous-integration/drone/push Build is passing Details
This file has metadata to test different scenarios related to chec-
king and fixing missing regions.
2022-09-01 16:57:29 +03:00
Alan Orth 117c6ca85d
csv_metadata_quality/check.py: missing region fixes
Port over the recent fixes and logic improvements to regions from
fix.py.
2022-09-01 16:38:35 +03:00
Alan Orth f49214fa2e
csv_metadata_quality/fix.py: fix bug in regions
We need to make sure we're only manipulating the regions if we have
any missing. The previous code was always manipulating the existing
row, even when there were no missing regions, which resulted in new
values like "Eastern Africa||".
2022-09-01 16:15:32 +03:00
Alan Orth 7ce20726d0
csv_metadata_quality/fix.py: minor change
Print missing regions when we know they are missing, instead of do-
ing another check later and looping over them again.
2022-09-01 16:03:49 +03:00
Alan Orth 473be5ac2f
csv_metadata_quality/fix.py: don't add "not found" region
country_converter returns the literal "not found" string if a coun-
try cannot be found. In that case we do not want to consider that as
a region!
2022-09-01 15:46:21 +03:00
Alan Orth 7c61cae417 csv_metadata_quality/fix.py: silence warning
By default country_converter prints "not found in regex" if a coun-
try is not found. We can silence this by switching the logging lev-
el to something above WARNING.
2022-09-01 15:44:50 +03:00
Alan Orth ae16289637
csv_metadata_quality/fix.py: Minor change
The country_converter documentation says we should instantiate the
CountryConverter() class once instead of calling coco.convert() in
each iteration of the loop so we don't end up loading the data file
more than once.
2022-09-01 15:40:45 +03:00
Alan Orth fdb7900cd0
Update requirements
continuous-integration/drone/push Build is passing Details
Generated with poetry export:

    $ poetry export --without-hashes -f requirements.txt > requirements.txt
    $ poetry export --without-hashes --with dev -f requirements.txt > requirements-dev.txt

I am trying `--without-hashes` to work around an error on pip install
when running in CI:

    ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==
2022-09-01 11:21:10 +03:00
Alan Orth 9c65569c43
poetry.lock: run poetry update 2022-09-01 08:44:12 +03:00
Alan Orth 0cf0bc97f0
csv_metadata_quality/fix.py: fix logic error again
continuous-integration/drone/push Build is passing Details
It seems there was another logic error raised by the test in pytest.
With my real data, it was enough to check if the region column was
None, but with my test I was explicitly setting the region to "" (an
empty string). So to be really sure we should check if the string
is not None *and* if its length is greater than 0.
2022-08-03 20:51:14 +03:00
Alan Orth 40c3585bab
csv_metadata_quality/fix.py: fix logic error
Fix string concatenation with existing regions.
2022-08-03 18:26:08 +03:00
Alan Orth b9c44aed7d
csv_metadata_quality/fix.py: fix logic issue
continuous-integration/drone/push Build is passing Details
Forgot to return the row as-is if we don't find any countries.
2022-08-02 10:17:30 +03:00
Alan Orth 032a1db392
README.md: Add note about missing regions
continuous-integration/drone/push Build is passing Details
2022-07-28 16:58:01 +03:00
Alan Orth da87531779
CHANGELOG.md: Add note about adding missing regions 2022-07-28 16:54:05 +03:00
Alan Orth 689ee184f7
Add unsafe check to add missing regions 2022-07-28 16:52:43 +03:00
Alan Orth 344993370c
Update requirements
continuous-integration/drone/push Build is passing Details
Generated with poetry export:

    $ poetry export --without-hashes -f requirements.txt > requirements.txt
    $ poetry export --without-hashes --dev -f requirements.txt > requirements-dev.txt

I am trying `--without-hashes` to work around an error on pip install
when running in CI:

    ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==.
2022-07-08 15:50:42 +03:00
Alan Orth 00b4dca185
poetry.lock: run poetry update 2022-07-08 15:50:03 +03:00
Alan Orth 5a87bf4317
Update requirements
continuous-integration/drone/push Build is passing Details
Generated with poetry export:

    $ poetry export --without-hashes -f requirements.txt > requirements.txt
    $ poetry export --without-hashes --dev -f requirements.txt > requirements-dev.txt

I am trying `--without-hashes` to work around an error on pip install
when running in CI:

    ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==.
2022-03-21 14:37:38 +03:00
Alan Orth c706719d8b
poetry.lock: run poetry update 2022-03-21 14:37:03 +03:00
Alan Orth e7ea8ef9f0
README.md: add note about spdx-license-list
continuous-integration/drone/push Build is passing Details
This Python module was deprecated in favor of using the SPDX license
data directly.

See: https://github.com/spdx/license-list-data
2022-01-30 13:27:20 +03:00
Alan Orth ea050376fc
Update requirements
Generated with poetry export:

    $ poetry export --without-hashes -f requirements.txt > requirements.txt
    $ poetry export --without-hashes --dev -f requirements.txt > requirements-dev.txt

I am trying `--without-hashes` to work around an error on pip install
when running in CI:

    ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==.
2022-01-30 13:26:37 +03:00
Alan Orth 4ba615cd41
poetry.lock: run poetry update 2022-01-30 13:26:04 +03:00
Alan Orth b0d46cd864
pyproject.toml: update black
It's no longer in beta!
2022-01-30 13:22:47 +03:00
Alan Orth 3ee9319d84
pyproject.toml: bump flake8 2022-01-30 13:21:09 +03:00
Alan Orth 4d5f4b5abb
pyproject.toml: update pycountry
Seems to be a few major versions from 19.x.x to 21.x.x. All tests
passing in pytest so it's probably fine.
2022-01-30 13:15:38 +03:00
Alan Orth 98d38801fa
pyproject.toml: update requests and requests-cache 2022-01-30 13:11:01 +03:00
Alan Orth dad7a8765c
.github/workflows/python-app.yml: use Python 3.10
That's what I use for testing locally. Note that we need to quote
the version here because otherwise GitHub Actions will interpret it
as 3.1 due to how YAML works.
2022-01-30 13:06:51 +03:00
Alan Orth d126304534
README.md: update note about Python version 2022-01-30 13:05:36 +03:00
Alan Orth 38c2584863
.drone.yml: don't test on Python 3.7 anymore
Pandas 1.4.0 has a minimum Python requirement of 3.8.

See: https://pandas.pydata.org/docs/whatsnew/v1.4.0.html
2022-01-30 13:04:52 +03:00
Alan Orth e94a4539bf
pyproject.toml: bump Pandas to v1.4.0
As of Pandas v1.4.0 the minimum Python version is 3.8.

See: https://pandas.pydata.org/docs/whatsnew/v1.4.0.html
2022-01-30 13:03:56 +03:00
Alan Orth a589d39e38
poetry.lock: run poetry lock 2022-01-29 16:26:16 +03:00
Alan Orth d9e427a80e
pyproject.toml: don't install ipython
It always complains about running in a virtual environment anyways,
and I can use the one from the OS instead.
2022-01-29 16:25:58 +03:00
Alan Orth 8ee5e2e306
setup.py: denote that Python 3.10 works
continuous-integration/drone/push Build is passing Details
I have been using Python 3.10 for months, and already added it to
the CI builds.
2022-01-29 16:08:01 +03:00
Alan Orth 490701f244
Run more CLI tests in CI
continuous-integration/drone/push Build is passing Details
2021-12-24 14:47:25 +02:00
Alan Orth e1b270cf83
CHANGELOG.md: add note about dropping invalid AGROVOC values
continuous-integration/drone/push Build is passing Details
2021-12-23 12:47:42 +02:00
Alan Orth b7efe2de40
data/test.csv: update invalid AGROVOC entry
Now that we can drop invalid AGROVOC values we should have a valid
value and an invalid value here. Depending on how the checker is
invoked we will either print a warning or drop the invalid value.
2021-12-23 12:45:38 +02:00
Alan Orth c43095139a
tests/test_check.py: add tests for dropping invalid AGROVOC 2021-12-23 12:44:32 +02:00
Alan Orth a7727b8431
Add support for dropping invalid AGROVOC terms
Requires --agrovoc-fields <field.name> to do the actual validation,
and -d to drop invalid ones.
2021-12-23 12:43:55 +02:00
Alan Orth 7763a021c5
csv_metadata_quality/fix.py: sort imports with isort
continuous-integration/drone/push Build is passing Details
2021-12-15 23:15:02 +02:00
Alan Orth 3c12ef3f66
Update requirements
Generated with poetry export:

    $ poetry export --without-hashes -f requirements.txt > requirements.txt
    $ poetry export --without-hashes --dev -f requirements.txt > requirements-dev.txt

I am trying `--without-hashes` to work around an error on pip install
when running in CI:

    ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==.
2021-12-15 23:11:44 +02:00
Alan Orth aee2438e94
poetry.lock: run poetry update 2021-12-15 23:10:27 +02:00
Alan Orth a351ba9706
CHANGELOG.md: add notes about ftfy 2021-12-15 22:09:01 +02:00