csv-metadata-quality

mirror of https://github.com/ilri/csv-metadata-quality.git synced 2024-11-24 14:50:17 +01:00

Author	SHA1	Message	Date
Alan Orth	63fdce7d13	.travis.yml: Use Ubuntu 18.04 "Bionic"	2019-11-14 23:22:19 +02:00
Alan Orth	f068c0e16a	CHANGELOG.md: Use Python 3.8.0 for pipenv	2019-11-14 23:11:43 +02:00
Alan Orth	79b8f62a85	Use Python 3.8 for pipenv Python 3.8.0 entered Arch Linux core repositories now and all tests pass with Python 3.8.0 so it's time...	2019-11-14 23:10:20 +02:00
Alan Orth	6c1e132531	CHANGELOG.md: Add unreleased changes	2019-11-14 09:19:19 +02:00
Alan Orth	c0f3c866bd	Pipfile.lock: Run pipenv update Updates the following dependencies: - numpy 1.17.2→1.17.4 - pandas 0.25.1→0.25.3 - flake8 3.7.8→3.7.9 - pytest 5.1.3→5.2.2 - black 19.3b0→19.10b0	2019-11-14 09:17:31 +02:00
Alan Orth	36d0474b95	CHANGELOG.md: Move unreleased changes to v0.3.1	2019-10-01 17:11:52 +03:00
Alan Orth	efdc3a841a	Version 0.3.1	2019-10-01 17:11:13 +03:00
Alan Orth	fd2ba6845d	CHANGELOG.md: Update unreleased notes	2019-10-01 17:10:23 +03:00
Alan Orth	e55380b4d5	csv_metadata_quality/fix.py: Harmonize language in fix output We should always say if we're removing or replacing something.	2019-10-01 17:09:49 +03:00
Alan Orth	85ae16d9b7	CHANGELOG.md: Add note about non-breaking spaces	2019-10-01 16:56:37 +03:00
Alan Orth	c42f8b4812	csv_metadata_quality/fix.py: Replace non-breaking spaces We should be replacing non-breaking spaces (U+00A0) with normal sp- aces instead of removing them.	2019-10-01 16:55:04 +03:00
Alan Orth	1c75608d54	README.md: Update introduction text We should mention that this is not DSpace specific. Rather, it is much more realistically Dublin Core specific.	2019-09-26 14:19:13 +03:00
Alan Orth	0b15a8ed3b	README.md: Remove TODO about lack of space after comma This was added as an automatic global fix a few weeks ago.	2019-09-26 14:16:33 +03:00
Alan Orth	9ca266f5f0	data/test.csv: Change birthdate column to dc.date.issued More accurately reflects actual data we will be validating.	2019-09-26 14:15:48 +03:00
Alan Orth	0d3f948708	CHANGELOG.md: Update comment about language validation	2019-09-26 14:14:57 +03:00
Alan Orth	c04207fcfc	CHANGELOG.md: Fix header formatting	2019-09-26 14:13:50 +03:00
Alan Orth	9d4eceddc7	.build.yml: Enable experimental CLI checks on SourceHut	2019-09-26 14:11:35 +03:00
Alan Orth	e15c98cccb	Move unreleased changes to v0.3.0	2019-09-26 14:06:31 +03:00
Alan Orth	93c4e1a993	Update python requirements Generated using pipenv: $ pipenv lock -r > requirements.txt $ pipenv lock -r -d > requirements-dev.txt	2019-09-26 14:05:37 +03:00
Alan Orth	9963b2bb64	Pipfile.lock: Run pipenv update	2019-09-26 14:04:50 +03:00
Alan Orth	76291c1876	CHANGELOG.md: Add note about language validation	2019-09-26 14:03:18 +03:00
Alan Orth	604bd5bda6	Reformat tests with black	2019-09-26 14:02:51 +03:00
Alan Orth	e7c220039b	README.md: Add note about experimental language validation	2019-09-26 13:59:50 +03:00
Alan Orth	d7b5e378bc	setup.py: Add langid	2019-09-26 13:49:32 +03:00
Alan Orth	8435ee242d	Experimental language detection using langid Works decenty well assuming the title, abstract, and citation fields are an accurate representation of the language as identified by the language field. Handles ISO 639-1 (alpha 2) and ISO 639-3 (alpha 3) values seamlessly. This includes updated pipenv environment, test data, pytest tests for both correct and incorrect ISO 639-1 and ISO 639-3 languages, and a new command line option "-e".	2019-09-26 13:46:32 +03:00
Alan Orth	7ac1c6f554	README.md: Update comment about ISO 639-3 The pycountry library is actually using ISO 639-3 apparently. See: https://pypi.org/project/pycountry/	2019-09-26 07:51:41 +03:00
Alan Orth	86d4623fd3	More ISO 639-1 and ISO 639-3 fixes ISO 639-1 uses two-letter codes and ISO 639-3 uses three-letter codes. Technically there ISO 639-2/T and ISO 639-2/B, which also uses three letter codes, but those are not supported by the pycountry library so I won't even worry about them. See: https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes	2019-09-26 07:44:39 +03:00
Alan Orth	ddbe970342	data/test.csv: Update titles of language tests ISO 639-1 is alpha 2 and ISO 639-3 is alpha 3. See: https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes	2019-09-26 07:40:27 +03:00
Alan Orth	31c78ca6f3	data/test.csv: Rename contributor column to title This makes more sense as a description of each test and the titles are obviously not authors.	2019-09-26 05:50:40 +03:00
Alan Orth	154d05b5e2	CHANGELOG.md: Update notes	2019-09-24 18:55:05 +03:00
Alan Orth	186f146edb	Pipfile.lock: Run pipenv update Synchronizes state with the Pipfile and brings some new deps.	2019-09-24 18:54:49 +03:00
Alan Orth	a4cb301943	CHANGELOG.md: Add note about csvkit	2019-09-24 18:49:20 +03:00
Alan Orth	219e37526d	Pipfile: Add csvkit to dev requirements Used to inspect CSV files during testing and development.	2019-09-24 18:48:01 +03:00
Alan Orth	f304ca6a33	csv_metadata_quality/app.py: Use simpler column iteration I don't know where I got the other one...	2019-09-21 17:19:39 +03:00
Alan Orth	3d5c8bdf5d	CHANGELOG.md: Add notes about updated python packages	2019-09-11 16:45:39 +03:00
Alan Orth	480956d54d	Pipfile.lock: Run pipenv update	2019-09-11 16:45:16 +03:00
Alan Orth	d9fc09f121	Fix references to ISO 639 It turns out that ISO 639-1 is the two-letter codes, and ISO 639-2 is the three-letter codes, aka alpha2 and alpha3. See: https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes	2019-09-11 16:36:53 +03:00
Alan Orth	b5899001b7	CHANGELOG.md: Add note about black and isort	2019-08-29 01:26:11 +03:00
Alan Orth	c92977d1ca	Update requirements-dev.txt Generated with: $ pipenv lock -r -d > requirements-dev.txt	2019-08-29 01:25:14 +03:00
Alan Orth	280a99c8a8	Sort imports with isort See: https://sourcery.ai/blog/python-best-practices/	2019-08-29 01:15:04 +03:00
Alan Orth	0388145b81	Add configuration for isort See: https://sourcery.ai/blog/python-best-practices/	2019-08-29 01:14:31 +03:00
Alan Orth	d97dcd19db	Format with black	2019-08-29 01:10:39 +03:00
Alan Orth	b375f0e895	Add black and isort to pipenv dev dependencies These do a very opinionated automatic formatting and validation of code. See: https://sourcery.ai/blog/python-best-practices/	2019-08-29 01:08:38 +03:00
Alan Orth	865c61d316	Add note about updated python dependencies	2019-08-28 21:02:21 +03:00
Alan Orth	3b2ba57b75	Update python requirements Generated using pipenv: $ pipenv lock -r > requirements.txt $ pipenv lock -r -d > requirements-dev.txt	2019-08-28 21:01:48 +03:00
Alan Orth	2805c556a9	Pipfile.lock: Run pipenv update Brings numpy 1.17.1, pandas 0.25.1, requests-cache 0.5.2, and pandas 0.25.1.	2019-08-28 20:58:35 +03:00
Alan Orth	c354a3687c	Release version 0.2.2	2019-08-28 00:10:17 +03:00
Alan Orth	07f80cb37f	tests/test_fix.py: Add test for missing space after comma	2019-08-28 00:08:56 +03:00
Alan Orth	89d72540f1	data/test.csv: Add sample for missing space after comma	2019-08-28 00:08:26 +03:00
Alan Orth	81190d56bb	Add fix for missing space after commas This happens in names very often, for example in the contributor and citation fields. I will limit this to those fields for now and hide this fix behind the "unsafe fixes" option until I test it more.	2019-08-28 00:05:52 +03:00

... 8 9 10 11 12 ...

644 Commits