csv-metadata-quality

mirror of https://github.com/ilri/csv-metadata-quality.git synced 2025-06-14 12:42:35 +02:00

Author	SHA1	Message	Date
Alan Orth	13d5221378	csv_metadata_quality/check.py: Fix test for False	2019-08-10 23:52:53 +03:00
Alan Orth	3c7a9eb75b	CHANGELOG.md: Add check for uncommon filename extensions	2019-08-10 23:47:46 +03:00
Alan Orth	a99fbd8a51	data/test.csv: Add test case for uncommon filename extension	2019-08-10 23:46:56 +03:00
Alan Orth	e801042340	tests/test_check.py: Fix unused result We don't need to capture the function's return value here because pytest will capture stdout from the function.	2019-08-10 23:45:41 +03:00
Alan Orth	62ef2a4489	tests/test_check.py: Add tests for file extensions	2019-08-10 23:44:13 +03:00
Alan Orth	9ce7dc6716	Add check for uncommon filenames Generally we want people to upload documents in accessible formats like PDF, Word, Excel, and PowerPoint. This check warns if a file is using an uncommon extension.	2019-08-10 23:41:16 +03:00
Alan Orth	5ff584a8d7	Version 0.2.0 v0.2.0	2019-08-09 01:39:51 +03:00
Alan Orth	4cf7bc182b	Update requirements-dev.txt Generated with: $ pipenv lock -r -d > requirements-dev.txt	2019-08-09 01:34:54 +03:00
Alan Orth	7d3f5aae66	CHANGELOG.md: Add pytest-clarity	2019-08-09 01:33:34 +03:00
Alan Orth	c77c065e25	Update Pipfile.lock	2019-08-09 01:32:53 +03:00
Alan Orth	8fb40d96b1	Pipfile: Add pytest-clarity to dev packages This helps you understand the cryptic assertion error output from pytest. For some reason pytest-clarity is a pre-release package so we need to install it in pipenv with --pre.	2019-08-09 01:30:37 +03:00
Alan Orth	5f2e3ff4bd	CHANGELOG.md: Add improved suspicious character check	2019-08-09 01:28:07 +03:00
Alan Orth	d93c2aae13	tests/test_check.py: Update suspicious character check The suspicious character check was updated to include the name of the field where the metadata value with the suspicious character exists.	2019-08-09 01:26:38 +03:00
Alan Orth	62fea95087	Improve suspicious character detection Now it will print just the part of the metadata value that contains the suspicious character (up to 80 characters, so we don't make the line break on terminals that use 80 character width by default). Also, print the name of the field in which the metadata value is so that it is easier for the user to locate.	2019-08-09 01:25:40 +03:00
Alan Orth	8772bdec51	csv_metadata_quality/app.py: Explicitly exit with success	2019-08-04 09:10:37 +03:00
Alan Orth	6d4ecd75aa	csv_metadata_quality/app.py: Close files before exit	2019-08-04 09:10:19 +03:00
Alan Orth	264ce1d1df	CHANGELOG.md: Add new item for Ctrl-C handling	2019-08-03 22:18:44 +03:00
Alan Orth	f4e7fd73f5	csv_metadata_quality/app.py: Handle Ctrl-C Instead of printing an ugly two-page stack trace.	2019-08-03 21:11:57 +03:00
Alan Orth	a00d3d7ea5	README.md: Simplify installation instructions Pipenv has captured the local dependency with `-e .` so now it gets installed by the Pipfile or requirements.txt.	2019-08-02 11:02:50 +03:00
Alan Orth	f772a3be41	Update python requirements Generated using pipenv: $ pipenv lock -r > requirements.txt	2019-08-02 11:02:25 +03:00
Alan Orth	d1b3e9e375	pipenv install -e .	2019-08-02 10:58:21 +03:00
Alan Orth	85ae7bdc5a	Increment version to 0.1.0 v0.1.0	2019-08-02 00:12:47 +03:00
Alan Orth	980ef0d98d	Add initial changelog	2019-08-02 00:10:28 +03:00
Alan Orth	0561300ebe	Add option to print version with `--version` or `-V` I guess `-v` is more commonly used for "verbose" so I will use the short option of `-V` for version.	2019-08-02 00:09:54 +03:00
Alan Orth	456b8a2f26	Update tests	2019-08-01 23:59:11 +03:00
Alan Orth	0ed390dbd5	README.md: Update AGROVOC information Now details the new `--agrovoc-fields` option.	2019-08-01 23:54:40 +03:00
Alan Orth	bf876a046a	Rework AGROVOC validation AGROVOC validation is now disabled by default, but can be enabled on a field-by-field basis. For example, countries and regions are also present in AGROVOC. Fields with these values can be enabled using the new `--agrovoc-fields` option. I reworked the script output to show the field name when printing an invalid term so that the user knows in which field the term is.	2019-08-01 23:51:58 +03:00
Alan Orth	576b3a3638	csv_metadata_quality/__main__.py: Fix spacing Identified by flake8.	2019-08-01 23:28:16 +03:00
Alan Orth	a6eba0fc1a	.build.yml: Use pipenv run instead of pipenv shell The latter creates a shell and of course it doesn't ever exit!	2019-07-31 17:58:21 +03:00
Alan Orth	857492cd93	.build.yml: Try fix CLI test I was writing the CLI output to /dev/null because I was lazy, but actually this might break the SourceHut build.	2019-07-31 17:54:49 +03:00
Alan Orth	db42fbfea9	.build.yml: Try to run the script itself	2019-07-31 17:48:17 +03:00
Alan Orth	734ae7f011	setup.py: Add "DSpace" to description I suppose someone searching PyPi (eventually) would be happy to see that this was written specifically with DSpace in mind.	2019-07-31 17:42:27 +03:00
Alan Orth	fd3861e7cd	README.md: Update installation and usage instructions It is much easier now that I have created a proper package.	2019-07-31 17:41:18 +03:00
Alan Orth	cec1a34dfe	.gitignore: Ignore egg cache from distutils	2019-07-31 17:38:54 +03:00
Alan Orth	9100efdf50	Re-work as a proper standalone Python package Add a setup.py so that installation is easier and a standalone CLI script called csv-metadata-quality is provided. Now the user only needs to run this from a virtual environment inside the project directory: $ pip install . Eventually I could publish this on PyPi when I settle on a more appropriate package name. See: https://packaging.python.org/tutorials/packaging-projects/ See: https://chriswarrick.com/blog/2014/09/15/python-apps-the-right-way-entry_points-and-scripts/	2019-07-31 17:34:36 +03:00
Alan Orth	4c4f4a3ba2	README.md: Update todos	2019-07-31 16:33:49 +03:00
Alan Orth	22cc7bc793	README.md: Improve section on unsafe fixes	2019-07-31 16:00:05 +03:00
Alan Orth	915327539a	pytest.ini: Ignore deprecation warnings These come from third-party libraries I have no control over. See: https://docs.pytest.org/en/latest/warnings.html#deprecationwarning-and-pendingdeprecationwarning	2019-07-31 13:33:20 +03:00
Alan Orth	63ffd77723	data/test.csv: Clarify that newline is a line feed	2019-07-31 13:03:43 +03:00
Alan Orth	b9d041927e	Update python requirements Generated using pipenv: $ pipenv lock -r > requirements.txt $ pipenv lock -r -d > requirements-dev.txt	2019-07-30 23:06:26 +03:00
Alan Orth	020c87768e	data/test.csv: Add item with missing date	2019-07-30 21:02:51 +03:00
Alan Orth	abf74909ee	data/test.csv: Add missing date I was meaning to test for an invalid multi-value separator here!	2019-07-30 21:01:42 +03:00
Alan Orth	40d5f7d81b	Add support for removing newlines This was tricky because of the nature of newlines. In actuality we are removing Unix line feeds here (U+000A) because Windows carriage returns are actually already removed by the string stripping in the whitespace fix. Creating the test case in Vim was difficult because I couldn't fig- ure out how to manually enter a line feed character. In the end I used a search and replace on a known pattern like "ALAN", replacing it with \r. Neither entering the Unicode code point (U+000A) direc- tly or typing an "Enter" character after ^V worked. Grrr.	2019-07-30 20:05:12 +03:00
Alan Orth	346e66ca98	README.md: Add more information to introduction	2019-07-30 17:44:30 +03:00
Alan Orth	bad2fce124	pytest.ini: Don't print captured output It makes the summary of passes and fails more annoying to read due to the lines being long and including newlines. I am actually not sure why this fixes it, though... See: https://pytest.readthedocs.io/en/latest/capture.html	2019-07-30 16:43:31 +03:00
Alan Orth	3c798fb504	Use pycountry instead of iso-639 for languages The latter is a fork that hasn't been updated since 2016 and the original still seems to be well maintained, with recent database updates as well as tests for Python 3.7. Also, pycountry supports ISO 3166-2 (administrative zones), which we could eventually use for sub regions.	2019-07-30 16:39:26 +03:00
Alan Orth	a85b410ab9	README.md: Improve introduction and functionality	2019-07-30 16:09:15 +03:00
Alan Orth	4e3511cd55	csv_metadata_quality/check.py: Fix AGROVOC lookup We actually only need to see if there are more than zero matches because a term like "Nigeria" will match in English, Spanish, etc, whereas terms that really don't match will have zero results.	2019-07-30 14:51:44 +03:00
Alan Orth	1fd3d8bc2f	Add pytest.ini from responder This seems to be produce a more informative output, though I'm not sure how to filter the annoying deprecation warnings pytest throws about things that are usually done by modules I'm using, not by me. From: https://github.com/taoufik07/responder/blob/master/pytest.ini	2019-07-30 00:45:18 +03:00
Alan Orth	b4aaffd6f1	Update python requirements Generated using pipenv: $ pipenv lock -r > requirements.txt	2019-07-30 00:34:32 +03:00

... 8 9 10 11 12

578 Commits