csv-metadata-quality

mirror of https://github.com/ilri/csv-metadata-quality.git synced 2024-10-31 19:43:00 +01:00

Author	SHA1	Message	Date
Alan Orth	7097136b7e	Use my fork of country_converter again There is an issue with the UN M.49 region for Myanmar.	2022-11-28 17:38:45 +03:00
Alan Orth	b02f1f65ee	pyproject.toml: use upstream country_converter Version 0.8.0 has the country and UN M.49 region fixes. See: https://github.com/konstantinstadler/country_converter/releases/tag/v0.8.0	2022-11-28 17:14:16 +03:00
Alan Orth	4d5ef38dde	pyproject.toml: add ipython to dev dependencies	2022-11-28 17:11:18 +03:00
Alan Orth	df57988e5a	Use my fork of pycountry Until they update to iso-codes 4.12.0. See: https://github.com/flyingcircusio/pycountry/pull/149	2022-11-08 10:21:28 +03:00
Alan Orth	15f52f8be8	Switch to my fork of country-converter Until a few issues are resolved regarding new countries and regions. See: https://github.com/konstantinstadler/country_converter/pull/122 See: https://github.com/konstantinstadler/country_converter/pull/123	2022-11-08 10:04:31 +03:00
Alan Orth	ca82820a8e	pyproject.toml: update dependencies to latest	2022-11-07 12:13:28 +03:00
Alan Orth	58b7b6e9d8	Version 0.6.0 All checks were successful continuous-integration/drone/push Build is passing Details	2022-09-02 16:35:58 +03:00
Alan Orth	21e9948a75	pyproject.toml: manually updated all deps Update all deps to their latest versions on pypi.org and remove the explicit dependency on SQLAlchemy.	2022-09-02 16:30:40 +03:00
Alan Orth	566c2b45cf	Remove Excel support I never used this and it seems xlrd doesn't even support .xlsx any- more anyways. If this was needed I could theoretically use openpyxl but I'd rather just stick to CSV.	2022-09-02 16:14:24 +03:00
Alan Orth	b0d46cd864	pyproject.toml: update black It's no longer in beta!	2022-01-30 13:22:47 +03:00
Alan Orth	3ee9319d84	pyproject.toml: bump flake8	2022-01-30 13:21:09 +03:00
Alan Orth	4d5f4b5abb	pyproject.toml: update pycountry Seems to be a few major versions from 19.x.x to 21.x.x. All tests passing in pytest so it's probably fine.	2022-01-30 13:15:38 +03:00
Alan Orth	98d38801fa	pyproject.toml: update requests and requests-cache	2022-01-30 13:11:01 +03:00
Alan Orth	e94a4539bf	pyproject.toml: bump Pandas to v1.4.0 As of Pandas v1.4.0 the minimum Python version is 3.8. See: https://pandas.pydata.org/docs/whatsnew/v1.4.0.html	2022-01-30 13:03:56 +03:00
Alan Orth	d9e427a80e	pyproject.toml: don't install ipython It always complains about running in a virtual environment anyways, and I can use the one from the OS instead.	2022-01-29 16:25:58 +03:00
Alan Orth	8b15154285	pyproject.toml: use ftfy 6.0 Lots of improvements here! Improvements to heuristics and a new way to configure which fixes get applied. See: https://github.com/rspeer/python-ftfy/blob/master/CHANGELOG.md#version-60-april-2-2021	2021-12-15 21:48:56 +02:00
Alan Orth	9905e183ea	Bump version to 0.6.0-dev	2021-12-09 23:21:30 +02:00
Alan Orth	cc34db7ff8	Version 0.5.0 All checks were successful continuous-integration/drone/push Build is passing Details	2021-12-08 15:29:46 +02:00
Alan Orth	ccc2a73456	Add check for countries without matching regions If we have country "Kenya" we should have region "Eastern Africa" according to the UN M.49 geolocation scheme.	2021-12-08 15:02:20 +02:00
Alan Orth	215d61c188	pyproject.toml: limit SQLAlchemy to < 1.4.23 SQLAlchemy gets pulled in by csvkit's agate-sql dependency and there is currently an issue with Poetry's parsing of the SQLAlchemy 1.4.23 constraints. Temporarily explicitly install a version of SQLAlchemy that works (can remove later once Poetry fixes this). Anyways, I am not using any SQLAlchemy features that I know of. See: https://github.com/python-poetry/poetry/issues/4402	2021-09-06 21:01:09 +03:00
Alan Orth	b8f4be9ebb	pyproject.toml: Update pytest-clarity and black These seem to have much newer versions that didn't get updated in this project due to the version pinning selector I was using with poetry. In the case of pytest-clarity the previous version was 0.3.1 and the version selector was a caret (^), which will never update the left-most (major) number. Now they seem to be on 1.x.x so it will be OK in the future. In the case of black, they use weird numbering so it's anyone's guess how this will work! Luckily it's only used for linting and formatting.	2021-07-06 15:30:41 +03:00
Alan Orth	4e2eab68b0	Update requests-cache Apparently we were stuck on an older version of requests-cache due to the fact that we were using the caret, which will never update the left-most (major) version. Upstream requests-cache is currently version 0.6.4, and there seems to have been some changes to the API.	2021-07-06 15:24:39 +03:00
Alan Orth	8eddb76aab	Bump version to 0.4.8-dev All checks were successful continuous-integration/drone/push Build is passing Details	2021-03-19 11:53:56 +02:00
Alan Orth	898bb412c3	Add checks and unsafe fixes for mojibake This detects whether text has likely been encoded in one encoding and decoded in another, perhaps multiple times. This often results in display of "mojibake" characters. For example, a file encoded in UTF-8 is opened as CP-1252 (Windows Latin codepage) in Microsoft Excel, and saved again as UTF-8. You will see strings like this in the resulting file: - CIAT PublicaÃ§ao - CIAT PublicaciÃ³n The correct version of these in UTF-8 would be: - CIAT Publicaçao - CIAT Publicación I use a code snippet from Martijn Pieters on StackOverflow to de- tect whether a string is "weird" as determined by the excellent "fixes text for you" (ftfy) Python library, then check if a weird string encodes as CP-1252 or not. If so, I can try to fix it. See: https://stackoverflow.com/questions/29071995/identify-garbage-unicode-string-using-python	2021-03-19 10:22:21 +02:00
Alan Orth	f816e17fe7	Version 0.4.7 All checks were successful continuous-integration/drone/push Build is passing Details	2021-03-17 10:00:34 +02:00
Alan Orth	fa84cfa440	Bump version to 0.4.6-dev	2021-03-11 22:44:36 +02:00
Alan Orth	6cc1401f88	pyproject.toml: Minimum Python is technically 3.7.1 All checks were successful continuous-integration/drone/push Build is passing Details See: https://pandas.pydata.org/pandas-docs/stable/whatsnew/v1.2.0.html	2021-03-11 13:41:58 +02:00
Alan Orth	1554cfd5c9	Version 0.4.6	2021-03-11 12:14:54 +02:00
Alan Orth	6e4b0e5c1b	Add validation of SPDX license identifiers Currently this only checks the dcterms.license field and the result will only be a warning.	2021-03-11 10:33:16 +02:00
Alan Orth	b16fa9121f	pyproject.toml: Add csv-metadata-quality as a script All checks were successful continuous-integration/drone/push Build is passing Details For some reason I stopped having csv-metadata-quality available in my poetry environment after install. It seems I need to add it as a poetry tool script? I had already done this in setup.py years ago, which works for regular python setup.py installs, but hadn't needed to do it in poetry for a year or more that I've been using it, until now.	2021-03-08 09:50:05 +02:00
Alan Orth	202bda862a	Bump version to 0.4.5 All checks were successful continuous-integration/drone/push Build is passing Details	2021-03-04 21:38:10 +02:00
Alan Orth	d76e72532a	Move unreleased changes to v0.4.4 All checks were successful continuous-integration/drone/push Build is passing Details	2021-02-21 13:25:22 +02:00
Alan Orth	7fb8acb866	Add colorama for colored output Red for errors, yellow for warnings or information, and green for fixes.	2021-02-21 13:00:31 +02:00
Alan Orth	cbf94490f2	Version 0.4.3	2021-01-26 15:22:40 +02:00
Alan Orth	f4914c414f	Only install ipython on Python 3.7+	2020-10-06 17:48:16 +03:00
Alan Orth	f13c360084	Update poetry package dependencies	2020-10-06 17:20:16 +03:00
Alan Orth	cb07d357d4	Version 0.4.2	2020-07-06 14:04:34 +03:00
Alan Orth	aa9e23b46c	pyproject.toml: Update license specifier We need to use valid SPDX license identifiers.	2020-06-09 14:22:53 +03:00
Alan Orth	0c44b967b6	Add poetry project file and lock I want to try to use poetry instead of pipenv because pipenv takes forever to do dependency resolution sometimes. Also, I have had a few issues with Python modules like black that don't have releases other than pre-releases, and even including the project itself in the dependencies (pip install -e . ...?). My initial experience is that poetry handles this better.	2020-05-31 17:33:40 +03:00

39 Commits