Commit Graph

440 Commits

Author SHA1 Message Date
Alan Orth 5a87bf4317
Update requirements
continuous-integration/drone/push Build is passing Details
Generated with poetry export:

    $ poetry export --without-hashes -f requirements.txt > requirements.txt
    $ poetry export --without-hashes --dev -f requirements.txt > requirements-dev.txt

I am trying `--without-hashes` to work around an error on pip install
when running in CI:

    ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==.
2022-03-21 14:37:38 +03:00
Alan Orth c706719d8b
poetry.lock: run poetry update 2022-03-21 14:37:03 +03:00
Alan Orth e7ea8ef9f0
README.md: add note about spdx-license-list
continuous-integration/drone/push Build is passing Details
This Python module was deprecated in favor of using the SPDX license
data directly.

See: https://github.com/spdx/license-list-data
2022-01-30 13:27:20 +03:00
Alan Orth ea050376fc
Update requirements
Generated with poetry export:

    $ poetry export --without-hashes -f requirements.txt > requirements.txt
    $ poetry export --without-hashes --dev -f requirements.txt > requirements-dev.txt

I am trying `--without-hashes` to work around an error on pip install
when running in CI:

    ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==.
2022-01-30 13:26:37 +03:00
Alan Orth 4ba615cd41
poetry.lock: run poetry update 2022-01-30 13:26:04 +03:00
Alan Orth b0d46cd864
pyproject.toml: update black
It's no longer in beta!
2022-01-30 13:22:47 +03:00
Alan Orth 3ee9319d84
pyproject.toml: bump flake8 2022-01-30 13:21:09 +03:00
Alan Orth 4d5f4b5abb
pyproject.toml: update pycountry
Seems to be a few major versions from 19.x.x to 21.x.x. All tests
passing in pytest so it's probably fine.
2022-01-30 13:15:38 +03:00
Alan Orth 98d38801fa
pyproject.toml: update requests and requests-cache 2022-01-30 13:11:01 +03:00
Alan Orth dad7a8765c
.github/workflows/python-app.yml: use Python 3.10
That's what I use for testing locally. Note that we need to quote
the version here because otherwise GitHub Actions will interpret it
as 3.1 due to how YAML works.
2022-01-30 13:06:51 +03:00
Alan Orth d126304534
README.md: update note about Python version 2022-01-30 13:05:36 +03:00
Alan Orth 38c2584863
.drone.yml: don't test on Python 3.7 anymore
Pandas 1.4.0 has a minimum Python requirement of 3.8.

See: https://pandas.pydata.org/docs/whatsnew/v1.4.0.html
2022-01-30 13:04:52 +03:00
Alan Orth e94a4539bf
pyproject.toml: bump Pandas to v1.4.0
As of Pandas v1.4.0 the minimum Python version is 3.8.

See: https://pandas.pydata.org/docs/whatsnew/v1.4.0.html
2022-01-30 13:03:56 +03:00
Alan Orth a589d39e38
poetry.lock: run poetry lock 2022-01-29 16:26:16 +03:00
Alan Orth d9e427a80e
pyproject.toml: don't install ipython
It always complains about running in a virtual environment anyways,
and I can use the one from the OS instead.
2022-01-29 16:25:58 +03:00
Alan Orth 8ee5e2e306
setup.py: denote that Python 3.10 works
continuous-integration/drone/push Build is passing Details
I have been using Python 3.10 for months, and already added it to
the CI builds.
2022-01-29 16:08:01 +03:00
Alan Orth 490701f244
Run more CLI tests in CI
continuous-integration/drone/push Build is passing Details
2021-12-24 14:47:25 +02:00
Alan Orth e1b270cf83
CHANGELOG.md: add note about dropping invalid AGROVOC values
continuous-integration/drone/push Build is passing Details
2021-12-23 12:47:42 +02:00
Alan Orth b7efe2de40
data/test.csv: update invalid AGROVOC entry
Now that we can drop invalid AGROVOC values we should have a valid
value and an invalid value here. Depending on how the checker is
invoked we will either print a warning or drop the invalid value.
2021-12-23 12:45:38 +02:00
Alan Orth c43095139a
tests/test_check.py: add tests for dropping invalid AGROVOC 2021-12-23 12:44:32 +02:00
Alan Orth a7727b8431
Add support for dropping invalid AGROVOC terms
Requires --agrovoc-fields <field.name> to do the actual validation,
and -d to drop invalid ones.
2021-12-23 12:43:55 +02:00
Alan Orth 7763a021c5
csv_metadata_quality/fix.py: sort imports with isort
continuous-integration/drone/push Build is passing Details
2021-12-15 23:15:02 +02:00
Alan Orth 3c12ef3f66
Update requirements
Generated with poetry export:

    $ poetry export --without-hashes -f requirements.txt > requirements.txt
    $ poetry export --without-hashes --dev -f requirements.txt > requirements-dev.txt

I am trying `--without-hashes` to work around an error on pip install
when running in CI:

    ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==.
2021-12-15 23:11:44 +02:00
Alan Orth aee2438e94
poetry.lock: run poetry update 2021-12-15 23:10:27 +02:00
Alan Orth a351ba9706
CHANGELOG.md: add notes about ftfy 2021-12-15 22:09:01 +02:00
Alan Orth e4faf114dc
csv_metadata_quality/util.py: update for ftfy 6.0
The sequence_weirdness() heuristic is deprecated. Now we should use
is_bad().

See: https://ftfy.readthedocs.io/en/v6.0/heuristic.html
See: https://github.com/rspeer/python-ftfy/blob/master/CHANGELOG.md#version-60-april-2-2021
2021-12-15 21:58:07 +02:00
Alan Orth ff49a80432
csv_metadata_quality/fix.py: configure ftfy
Don't replace smart quotes in ftfy. If our text has them we should
keep them.
2021-12-15 21:51:51 +02:00
Alan Orth 8b15154285
pyproject.toml: use ftfy 6.0
Lots of improvements here! Improvements to heuristics and a new way
to configure which fixes get applied.

See: https://github.com/rspeer/python-ftfy/blob/master/CHANGELOG.md#version-60-april-2-2021
2021-12-15 21:48:56 +02:00
Alan Orth 5854f8e865
CHANGELOG.md: add note about unnecessary Unicode 2021-12-15 13:56:31 +02:00
Alan Orth e7322efadd
csv_metadata_quality/app.py: move unnecessary Unicode fix
We actually want to do this after we try to fix mojibake with ftfy.
These "unnecessary" Unicode characters could actually help ftfy in
some cases because often times they indicate that some character
from another encoding was there before (like an accent, dash, or
smart quote).
2021-12-15 13:53:25 +02:00
Alan Orth 95015febbd
csv_metadata_quality/fix.py: fix thin spaces
continuous-integration/drone/push Build is passing Details
Replace thin spaces with normal spaces. Sometimes I see these get
mis handled on Windows machines and they end up as "?" or so.
2021-12-09 23:22:53 +02:00
Alan Orth cef6c66b30
CHANGELOG.md: start next changes 2021-12-09 23:21:58 +02:00
Alan Orth 9905e183ea
Bump version to 0.6.0-dev 2021-12-09 23:21:30 +02:00
Alan Orth cc34db7ff8
Version 0.5.0
continuous-integration/drone/push Build is passing Details
2021-12-08 15:29:46 +02:00
Alan Orth b79e07b814
CHANGELOG.md: Add note about countries without regions 2021-12-08 15:21:45 +02:00
Alan Orth 865b950c33
Update requirements
Generated with poetry export:

    $ poetry export --without-hashes -f requirements.txt > requirements.txt
    $ poetry export --without-hashes --dev -f requirements.txt > requirements-dev.txt

I am trying `--without-hashes` to work around an error on pip install
when running in CI:

    ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==.
2021-12-08 15:20:22 +02:00
Alan Orth 6f269ca6b1
poetry.lock: run poetry update 2021-12-08 15:19:49 +02:00
Alan Orth 120e8cf09f
tests/test_check.py: add checks for countries without regions 2021-12-08 15:18:50 +02:00
Alan Orth a4eb79f625
data/test.csv: add data for countries without regions check 2021-12-08 15:17:55 +02:00
Alan Orth ccc2a73456
Add check for countries without matching regions
If we have country "Kenya" we should have region "Eastern Africa"
according to the UN M.49 geolocation scheme.
2021-12-08 15:02:20 +02:00
Alan Orth ad33195ba3
README.md: adjust intro
continuous-integration/drone/push Build is passing Details
Makes the badges not wrap and looks better in my opinion.
2021-12-08 11:36:34 +02:00
Alan Orth 72fe38972e
Update requirements
continuous-integration/drone/push Build is passing Details
Generated with poetry export:

    $ poetry export --without-hashes -f requirements.txt > requirements.txt
    $ poetry export --without-hashes --dev -f requirements.txt > requirements-dev.txt

I am trying `--without-hashes` to work around an error on pip install
when running in CI:

    ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==.
2021-12-05 16:29:37 +02:00
Alan Orth 04232d0ede
poetry.lock: run poetry update 2021-12-05 16:29:09 +02:00
Alan Orth f5fa33bbc6
CHANGELOG.md: add title in citation note 2021-12-05 16:23:39 +02:00
Alan Orth 1b978159c1
data/text.csv: Add data for title in citation test 2021-12-05 16:23:06 +02:00
Alan Orth 4d5696c4cb
csv_metadata_quality/check.py: update title in citation check
Initialize the titles and citations before the for loop so we can
access them later. This makes it easier to check if the item actua-
lly has a citation.
2021-12-05 16:21:44 +02:00
Alan Orth e02678cd7c
tests/test_check.py: add tests for title in citation 2021-12-05 16:01:11 +02:00
Alan Orth 01b4354a14
tests/test_check.py: fix comment 2021-12-05 15:58:25 +02:00
Alan Orth 3b40a68279
Add check for title in citation
This checks if the item title exists in the citation. If it is not
present it could just be missing, or could have minor differences
in the whitespace, accents, etc.
2021-12-05 15:52:42 +02:00
Alan Orth 999cc65097
csv_metadata_quality/app.py: adjust mojibake check
If unsafe fixes (-u) are enabled then we don't need to do the check
first before actually fixing them. Doing the check first creates e-
tra output that needs to be reviewed by the user.
2021-12-05 15:18:35 +02:00