csv-metadata-quality

mirror of https://github.com/ilri/csv-metadata-quality.git synced 2024-11-05 05:53:02 +01:00

Author	SHA1	Message	Date
Alan Orth	8eddb76aab	Bump version to 0.4.8-dev All checks were successful continuous-integration/drone/push Build is passing Details	2021-03-19 11:53:56 +02:00
Alan Orth	898bb412c3	Add checks and unsafe fixes for mojibake This detects whether text has likely been encoded in one encoding and decoded in another, perhaps multiple times. This often results in display of "mojibake" characters. For example, a file encoded in UTF-8 is opened as CP-1252 (Windows Latin codepage) in Microsoft Excel, and saved again as UTF-8. You will see strings like this in the resulting file: - CIAT PublicaÃ§ao - CIAT PublicaciÃ³n The correct version of these in UTF-8 would be: - CIAT Publicaçao - CIAT Publicación I use a code snippet from Martijn Pieters on StackOverflow to de- tect whether a string is "weird" as determined by the excellent "fixes text for you" (ftfy) Python library, then check if a weird string encodes as CP-1252 or not. If so, I can try to fix it. See: https://stackoverflow.com/questions/29071995/identify-garbage-unicode-string-using-python	2021-03-19 10:22:21 +02:00
Alan Orth	f816e17fe7	Version 0.4.7 All checks were successful continuous-integration/drone/push Build is passing Details	2021-03-17 10:00:34 +02:00
Alan Orth	fa84cfa440	Bump version to 0.4.6-dev	2021-03-11 22:44:36 +02:00
Alan Orth	6cc1401f88	pyproject.toml: Minimum Python is technically 3.7.1 All checks were successful continuous-integration/drone/push Build is passing Details See: https://pandas.pydata.org/pandas-docs/stable/whatsnew/v1.2.0.html	2021-03-11 13:41:58 +02:00
Alan Orth	1554cfd5c9	Version 0.4.6	2021-03-11 12:14:54 +02:00
Alan Orth	6e4b0e5c1b	Add validation of SPDX license identifiers Currently this only checks the dcterms.license field and the result will only be a warning.	2021-03-11 10:33:16 +02:00
Alan Orth	b16fa9121f	pyproject.toml: Add csv-metadata-quality as a script All checks were successful continuous-integration/drone/push Build is passing Details For some reason I stopped having csv-metadata-quality available in my poetry environment after install. It seems I need to add it as a poetry tool script? I had already done this in setup.py years ago, which works for regular python setup.py installs, but hadn't needed to do it in poetry for a year or more that I've been using it, until now.	2021-03-08 09:50:05 +02:00
Alan Orth	202bda862a	Bump version to 0.4.5 All checks were successful continuous-integration/drone/push Build is passing Details	2021-03-04 21:38:10 +02:00
Alan Orth	d76e72532a	Move unreleased changes to v0.4.4 All checks were successful continuous-integration/drone/push Build is passing Details	2021-02-21 13:25:22 +02:00
Alan Orth	7fb8acb866	Add colorama for colored output Red for errors, yellow for warnings or information, and green for fixes.	2021-02-21 13:00:31 +02:00
Alan Orth	cbf94490f2	Version 0.4.3	2021-01-26 15:22:40 +02:00
Alan Orth	f4914c414f	Only install ipython on Python 3.7+	2020-10-06 17:48:16 +03:00
Alan Orth	f13c360084	Update poetry package dependencies	2020-10-06 17:20:16 +03:00
Alan Orth	cb07d357d4	Version 0.4.2	2020-07-06 14:04:34 +03:00
Alan Orth	aa9e23b46c	pyproject.toml: Update license specifier We need to use valid SPDX license identifiers.	2020-06-09 14:22:53 +03:00
Alan Orth	0c44b967b6	Add poetry project file and lock I want to try to use poetry instead of pipenv because pipenv takes forever to do dependency resolution sometimes. Also, I have had a few issues with Python modules like black that don't have releases other than pre-releases, and even including the project itself in the dependencies (pip install -e . ...?). My initial experience is that poetry handles this better.	2020-05-31 17:33:40 +03:00

17 Commits