csv-metadata-quality

mirror of https://github.com/ilri/csv-metadata-quality.git synced 2025-09-14 15:38:14 +02:00

Author	SHA1	Message	Date
Alan Orth	28b5996aa6	Output field name for more fixes and checks This helps identify which field has the error.	2020-01-16 12:35:11 +02:00
Alan Orth	40ba9bae6c	README.md: Adjust heading size	2020-01-15 12:26:11 +02:00
Alan Orth	0b2d211455	Version 0.4.1 v0.4.1	2020-01-15 12:19:42 +02:00
Alan Orth	7f1df0b47c	Support Python 3.6 and 3.7 again	2020-01-15 12:19:17 +02:00
Alan Orth	365ecda324	Add utility function to check normalization Python's built-in unicodedata library includes the is_normalized() function starting with Python 3.8. This utility function allows us to do the same thing with earlier Python versions. See: https://docs.python.org/3/library/unicodedata.html	2020-01-15 12:17:52 +02:00
Alan Orth	550ce7fb7e	.travis.yml: Only test Python 3.8 The Unicode normalization feature requires Python 3.8 because the unicodedata.is_normalized() function only appears there. If I find another way to check if a string is normalized without normalizing it first I will drop the requirements back down to Python 3.6. See: https://docs.python.org/3/library/unicodedata.html	2020-01-15 11:57:21 +02:00
Alan Orth	705127fd28	Version 0.4.0 v0.4.0	2020-01-15 11:44:56 +02:00
Alan Orth	894e0a196d	setup.py: Change Python requirements The `unicodedata.is_normalized()` function requires Python 3.8. See: https://docs.python.org/3/library/unicodedata.html	2020-01-15 11:43:25 +02:00
Alan Orth	87181bc7b8	Run black, isort, and flake8.	2020-01-15 11:41:31 +02:00
Alan Orth	8de5d862b6	CHANGELOG.md: Add note about Unicode normalization	2020-01-15 11:40:40 +02:00
Alan Orth	49e3543878	Add Unicode normalization This will check all strings for un-normalized Unicode characters. Normalization is done using NFC. This includes tests and updated sample data (data/test.csv). See: https://withblue.ink/2019/03/11/why-you-need-to-normalize-unicode-strings.html	2020-01-15 11:37:54 +02:00
Alan Orth	403b253762	CHANGELOG.md: Update python library versions	2020-01-15 10:58:44 +02:00
Alan Orth	c5fbaf407a	Update python requirements Generated using pipenv: $ pipenv lock -r > requirements.txt $ pipenv lock -r -d > requirements-dev.txt	2020-01-15 10:51:58 +02:00
Alan Orth	4f81f6c83c	Pipfile.lock: Run pipenv update	2020-01-15 10:51:19 +02:00
Alan Orth	4b9d1e060f	setup.py: Add Python 3.8 classifier	2019-12-14 12:56:11 +02:00
Alan Orth	c8a71e3143	Pipfile.lock: Run pipenv update	2019-12-14 12:53:39 +02:00
Alan Orth	7964d98ca5	Pipfile: Specify exact version of black Black only releases pre-release versions, which causes issues with pipenv. Instead of always running pipenv with "--pre" and potenti- ally letting in some other pre-release versions for other depende- ncies, I would rather specify the latest black version explicitly. See: https://github.com/psf/black/issues/517 See: https://github.com/microsoft/vscode-python/issues/5171	2019-12-14 12:41:28 +02:00
Alan Orth	64ffc2f1da	.travis.yml: Install packages from requirements.txt too	2019-11-14 23:42:28 +02:00
Alan Orth	7b1bc29a92	.travis.yml: Try using pip instead of pipenv The Pipfile knows it was created with Python 3.8, yet we're running with multiple Python versions on Travis. I'm curious if would work better to use pip to install dependencies instead of pipenv in this case.	2019-11-14 23:37:25 +02:00
Alan Orth	f0110d8e74	CHANGELOG.md: Add note about requirements	2019-11-14 23:30:26 +02:00
Alan Orth	86498deee8	Update python requirements Generated using pipenv: $ pipenv lock -r > requirements.txt $ pipenv lock -r -d > requirements-dev.txt	2019-11-14 23:28:42 +02:00
Alan Orth	251647a15f	CHANGELOG.md: Add TravisCI changes	2019-11-14 23:24:08 +02:00
Alan Orth	0bd28e22ec	.travis.yml: Test Python 3.8	2019-11-14 23:22:37 +02:00
Alan Orth	63fdce7d13	.travis.yml: Use Ubuntu 18.04 "Bionic"	2019-11-14 23:22:19 +02:00
Alan Orth	f068c0e16a	CHANGELOG.md: Use Python 3.8.0 for pipenv	2019-11-14 23:11:43 +02:00
Alan Orth	79b8f62a85	Use Python 3.8 for pipenv Python 3.8.0 entered Arch Linux core repositories now and all tests pass with Python 3.8.0 so it's time...	2019-11-14 23:10:20 +02:00
Alan Orth	6c1e132531	CHANGELOG.md: Add unreleased changes	2019-11-14 09:19:19 +02:00
Alan Orth	c0f3c866bd	Pipfile.lock: Run pipenv update Updates the following dependencies: - numpy 1.17.2→1.17.4 - pandas 0.25.1→0.25.3 - flake8 3.7.8→3.7.9 - pytest 5.1.3→5.2.2 - black 19.3b0→19.10b0	2019-11-14 09:17:31 +02:00
Alan Orth	36d0474b95	CHANGELOG.md: Move unreleased changes to v0.3.1 v0.3.1	2019-10-01 17:11:52 +03:00
Alan Orth	efdc3a841a	Version 0.3.1	2019-10-01 17:11:13 +03:00
Alan Orth	fd2ba6845d	CHANGELOG.md: Update unreleased notes	2019-10-01 17:10:23 +03:00
Alan Orth	e55380b4d5	csv_metadata_quality/fix.py: Harmonize language in fix output We should always say if we're removing or replacing something.	2019-10-01 17:09:49 +03:00
Alan Orth	85ae16d9b7	CHANGELOG.md: Add note about non-breaking spaces	2019-10-01 16:56:37 +03:00
Alan Orth	c42f8b4812	csv_metadata_quality/fix.py: Replace non-breaking spaces We should be replacing non-breaking spaces (U+00A0) with normal sp- aces instead of removing them.	2019-10-01 16:55:04 +03:00
Alan Orth	1c75608d54	README.md: Update introduction text We should mention that this is not DSpace specific. Rather, it is much more realistically Dublin Core specific.	2019-09-26 14:19:13 +03:00
Alan Orth	0b15a8ed3b	README.md: Remove TODO about lack of space after comma This was added as an automatic global fix a few weeks ago.	2019-09-26 14:16:33 +03:00
Alan Orth	9ca266f5f0	data/test.csv: Change birthdate column to dc.date.issued More accurately reflects actual data we will be validating.	2019-09-26 14:15:48 +03:00
Alan Orth	0d3f948708	CHANGELOG.md: Update comment about language validation	2019-09-26 14:14:57 +03:00
Alan Orth	c04207fcfc	CHANGELOG.md: Fix header formatting	2019-09-26 14:13:50 +03:00
Alan Orth	9d4eceddc7	.build.yml: Enable experimental CLI checks on SourceHut	2019-09-26 14:11:35 +03:00
Alan Orth	e15c98cccb	Move unreleased changes to v0.3.0 v0.3.0	2019-09-26 14:06:31 +03:00
Alan Orth	93c4e1a993	Update python requirements Generated using pipenv: $ pipenv lock -r > requirements.txt $ pipenv lock -r -d > requirements-dev.txt	2019-09-26 14:05:37 +03:00
Alan Orth	9963b2bb64	Pipfile.lock: Run pipenv update	2019-09-26 14:04:50 +03:00
Alan Orth	76291c1876	CHANGELOG.md: Add note about language validation	2019-09-26 14:03:18 +03:00
Alan Orth	604bd5bda6	Reformat tests with black	2019-09-26 14:02:51 +03:00
Alan Orth	e7c220039b	README.md: Add note about experimental language validation	2019-09-26 13:59:50 +03:00
Alan Orth	d7b5e378bc	setup.py: Add langid	2019-09-26 13:49:32 +03:00
Alan Orth	8435ee242d	Experimental language detection using langid Works decenty well assuming the title, abstract, and citation fields are an accurate representation of the language as identified by the language field. Handles ISO 639-1 (alpha 2) and ISO 639-3 (alpha 3) values seamlessly. This includes updated pipenv environment, test data, pytest tests for both correct and incorrect ISO 639-1 and ISO 639-3 languages, and a new command line option "-e".	2019-09-26 13:46:32 +03:00
Alan Orth	7ac1c6f554	README.md: Update comment about ISO 639-3 The pycountry library is actually using ISO 639-3 apparently. See: https://pypi.org/project/pycountry/	2019-09-26 07:51:41 +03:00
Alan Orth	86d4623fd3	More ISO 639-1 and ISO 639-3 fixes ISO 639-1 uses two-letter codes and ISO 639-3 uses three-letter codes. Technically there ISO 639-2/T and ISO 639-2/B, which also uses three letter codes, but those are not supported by the pycountry library so I won't even worry about them. See: https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes	2019-09-26 07:44:39 +03:00

1 2 3 4 5 ...

317 Commits