1
0
mirror of https://github.com/ilri/csv-metadata-quality.git synced 2025-05-09 22:56:01 +02:00
Commit Graph

637 Commits

Author SHA1 Message Date
ea050376fc Update requirements
Generated with poetry export:

    $ poetry export --without-hashes -f requirements.txt > requirements.txt
    $ poetry export --without-hashes --dev -f requirements.txt > requirements-dev.txt

I am trying `--without-hashes` to work around an error on pip install
when running in CI:

    ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==.
2022-01-30 13:26:37 +03:00
4ba615cd41 poetry.lock: run poetry update 2022-01-30 13:26:04 +03:00
b0d46cd864 pyproject.toml: update black
It's no longer in beta!
2022-01-30 13:22:47 +03:00
3ee9319d84 pyproject.toml: bump flake8 2022-01-30 13:21:09 +03:00
4d5f4b5abb pyproject.toml: update pycountry
Seems to be a few major versions from 19.x.x to 21.x.x. All tests
passing in pytest so it's probably fine.
2022-01-30 13:15:38 +03:00
98d38801fa pyproject.toml: update requests and requests-cache 2022-01-30 13:11:01 +03:00
dad7a8765c .github/workflows/python-app.yml: use Python 3.10
That's what I use for testing locally. Note that we need to quote
the version here because otherwise GitHub Actions will interpret it
as 3.1 due to how YAML works.
2022-01-30 13:06:51 +03:00
d126304534 README.md: update note about Python version 2022-01-30 13:05:36 +03:00
38c2584863 .drone.yml: don't test on Python 3.7 anymore
Pandas 1.4.0 has a minimum Python requirement of 3.8.

See: https://pandas.pydata.org/docs/whatsnew/v1.4.0.html
2022-01-30 13:04:52 +03:00
e94a4539bf pyproject.toml: bump Pandas to v1.4.0
As of Pandas v1.4.0 the minimum Python version is 3.8.

See: https://pandas.pydata.org/docs/whatsnew/v1.4.0.html
2022-01-30 13:03:56 +03:00
a589d39e38 poetry.lock: run poetry lock 2022-01-29 16:26:16 +03:00
d9e427a80e pyproject.toml: don't install ipython
It always complains about running in a virtual environment anyways,
and I can use the one from the OS instead.
2022-01-29 16:25:58 +03:00
8ee5e2e306 setup.py: denote that Python 3.10 works
All checks were successful
continuous-integration/drone/push Build is passing
I have been using Python 3.10 for months, and already added it to
the CI builds.
2022-01-29 16:08:01 +03:00
490701f244 Run more CLI tests in CI
All checks were successful
continuous-integration/drone/push Build is passing
2021-12-24 14:47:25 +02:00
e1b270cf83 CHANGELOG.md: add note about dropping invalid AGROVOC values
All checks were successful
continuous-integration/drone/push Build is passing
2021-12-23 12:47:42 +02:00
b7efe2de40 data/test.csv: update invalid AGROVOC entry
Now that we can drop invalid AGROVOC values we should have a valid
value and an invalid value here. Depending on how the checker is
invoked we will either print a warning or drop the invalid value.
2021-12-23 12:45:38 +02:00
c43095139a tests/test_check.py: add tests for dropping invalid AGROVOC 2021-12-23 12:44:32 +02:00
a7727b8431 Add support for dropping invalid AGROVOC terms
Requires --agrovoc-fields <field.name> to do the actual validation,
and -d to drop invalid ones.
2021-12-23 12:43:55 +02:00
7763a021c5 csv_metadata_quality/fix.py: sort imports with isort
All checks were successful
continuous-integration/drone/push Build is passing
2021-12-15 23:15:02 +02:00
3c12ef3f66 Update requirements
Generated with poetry export:

    $ poetry export --without-hashes -f requirements.txt > requirements.txt
    $ poetry export --without-hashes --dev -f requirements.txt > requirements-dev.txt

I am trying `--without-hashes` to work around an error on pip install
when running in CI:

    ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==.
2021-12-15 23:11:44 +02:00
aee2438e94 poetry.lock: run poetry update 2021-12-15 23:10:27 +02:00
a351ba9706 CHANGELOG.md: add notes about ftfy 2021-12-15 22:09:01 +02:00
e4faf114dc csv_metadata_quality/util.py: update for ftfy 6.0
The sequence_weirdness() heuristic is deprecated. Now we should use
is_bad().

See: https://ftfy.readthedocs.io/en/v6.0/heuristic.html
See: https://github.com/rspeer/python-ftfy/blob/master/CHANGELOG.md#version-60-april-2-2021
2021-12-15 21:58:07 +02:00
ff49a80432 csv_metadata_quality/fix.py: configure ftfy
Don't replace smart quotes in ftfy. If our text has them we should
keep them.
2021-12-15 21:51:51 +02:00
8b15154285 pyproject.toml: use ftfy 6.0
Lots of improvements here! Improvements to heuristics and a new way
to configure which fixes get applied.

See: https://github.com/rspeer/python-ftfy/blob/master/CHANGELOG.md#version-60-april-2-2021
2021-12-15 21:48:56 +02:00
5854f8e865 CHANGELOG.md: add note about unnecessary Unicode 2021-12-15 13:56:31 +02:00
e7322efadd csv_metadata_quality/app.py: move unnecessary Unicode fix
We actually want to do this after we try to fix mojibake with ftfy.
These "unnecessary" Unicode characters could actually help ftfy in
some cases because often times they indicate that some character
from another encoding was there before (like an accent, dash, or
smart quote).
2021-12-15 13:53:25 +02:00
95015febbd csv_metadata_quality/fix.py: fix thin spaces
All checks were successful
continuous-integration/drone/push Build is passing
Replace thin spaces with normal spaces. Sometimes I see these get
mis handled on Windows machines and they end up as "?" or so.
2021-12-09 23:22:53 +02:00
cef6c66b30 CHANGELOG.md: start next changes 2021-12-09 23:21:58 +02:00
9905e183ea Bump version to 0.6.0-dev 2021-12-09 23:21:30 +02:00
cc34db7ff8 Version 0.5.0
All checks were successful
continuous-integration/drone/push Build is passing
v0.5.0
2021-12-08 15:29:46 +02:00
b79e07b814 CHANGELOG.md: Add note about countries without regions 2021-12-08 15:21:45 +02:00
865b950c33 Update requirements
Generated with poetry export:

    $ poetry export --without-hashes -f requirements.txt > requirements.txt
    $ poetry export --without-hashes --dev -f requirements.txt > requirements-dev.txt

I am trying `--without-hashes` to work around an error on pip install
when running in CI:

    ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==.
2021-12-08 15:20:22 +02:00
6f269ca6b1 poetry.lock: run poetry update 2021-12-08 15:19:49 +02:00
120e8cf09f tests/test_check.py: add checks for countries without regions 2021-12-08 15:18:50 +02:00
a4eb79f625 data/test.csv: add data for countries without regions check 2021-12-08 15:17:55 +02:00
ccc2a73456 Add check for countries without matching regions
If we have country "Kenya" we should have region "Eastern Africa"
according to the UN M.49 geolocation scheme.
2021-12-08 15:02:20 +02:00
ad33195ba3 README.md: adjust intro
All checks were successful
continuous-integration/drone/push Build is passing
Makes the badges not wrap and looks better in my opinion.
2021-12-08 11:36:34 +02:00
72fe38972e Update requirements
All checks were successful
continuous-integration/drone/push Build is passing
Generated with poetry export:

    $ poetry export --without-hashes -f requirements.txt > requirements.txt
    $ poetry export --without-hashes --dev -f requirements.txt > requirements-dev.txt

I am trying `--without-hashes` to work around an error on pip install
when running in CI:

    ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==.
2021-12-05 16:29:37 +02:00
04232d0ede poetry.lock: run poetry update 2021-12-05 16:29:09 +02:00
f5fa33bbc6 CHANGELOG.md: add title in citation note 2021-12-05 16:23:39 +02:00
1b978159c1 data/text.csv: Add data for title in citation test 2021-12-05 16:23:06 +02:00
4d5696c4cb csv_metadata_quality/check.py: update title in citation check
Initialize the titles and citations before the for loop so we can
access them later. This makes it easier to check if the item actua-
lly has a citation.
2021-12-05 16:21:44 +02:00
e02678cd7c tests/test_check.py: add tests for title in citation 2021-12-05 16:01:11 +02:00
01b4354a14 tests/test_check.py: fix comment 2021-12-05 15:58:25 +02:00
3b40a68279 Add check for title in citation
This checks if the item title exists in the citation. If it is not
present it could just be missing, or could have minor differences
in the whitespace, accents, etc.
2021-12-05 15:52:42 +02:00
999cc65097 csv_metadata_quality/app.py: adjust mojibake check
If unsafe fixes (-u) are enabled then we don't need to do the check
first before actually fixing them. Doing the check first creates e-
tra output that needs to be reviewed by the user.
2021-12-05 15:18:35 +02:00
a7c3be280d Update requirements
All checks were successful
continuous-integration/drone/push Build is passing
Generated with poetry export:

    $ poetry export --without-hashes -f requirements.txt > requirements.txt
    $ poetry export --without-hashes --dev -f requirements.txt > requirements-dev.txt

I am trying `--without-hashes` to work around an error on pip install
when running in CI:

    ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==.
2021-11-27 12:26:21 +02:00
69f68e0a72 poetry.lock: Run poetry update 2021-11-27 12:25:40 +02:00
c941a90944 .drone.yml: Test on Python 3.10
All checks were successful
continuous-integration/drone/push Build is passing
2021-10-11 20:09:32 +03:00