csv-metadata-quality

mirror of https://github.com/ilri/csv-metadata-quality.git synced 2025-08-17 10:15:51 +02:00

Author	SHA1	Message	Date
Alan Orth	264ce1d1df	CHANGELOG.md: Add new item for Ctrl-C handling	2019-08-03 22:18:44 +03:00
Alan Orth	f4e7fd73f5	csv_metadata_quality/app.py: Handle Ctrl-C Instead of printing an ugly two-page stack trace.	2019-08-03 21:11:57 +03:00
Alan Orth	a00d3d7ea5	README.md: Simplify installation instructions Pipenv has captured the local dependency with `-e .` so now it gets installed by the Pipfile or requirements.txt.	2019-08-02 11:02:50 +03:00
Alan Orth	f772a3be41	Update python requirements Generated using pipenv: $ pipenv lock -r > requirements.txt	2019-08-02 11:02:25 +03:00
Alan Orth	d1b3e9e375	pipenv install -e .	2019-08-02 10:58:21 +03:00
Alan Orth	85ae7bdc5a	Increment version to 0.1.0 v0.1.0	2019-08-02 00:12:47 +03:00
Alan Orth	980ef0d98d	Add initial changelog	2019-08-02 00:10:28 +03:00
Alan Orth	0561300ebe	Add option to print version with `--version` or `-V` I guess `-v` is more commonly used for "verbose" so I will use the short option of `-V` for version.	2019-08-02 00:09:54 +03:00
Alan Orth	456b8a2f26	Update tests	2019-08-01 23:59:11 +03:00
Alan Orth	0ed390dbd5	README.md: Update AGROVOC information Now details the new `--agrovoc-fields` option.	2019-08-01 23:54:40 +03:00
Alan Orth	bf876a046a	Rework AGROVOC validation AGROVOC validation is now disabled by default, but can be enabled on a field-by-field basis. For example, countries and regions are also present in AGROVOC. Fields with these values can be enabled using the new `--agrovoc-fields` option. I reworked the script output to show the field name when printing an invalid term so that the user knows in which field the term is.	2019-08-01 23:51:58 +03:00
Alan Orth	576b3a3638	csv_metadata_quality/__main__.py: Fix spacing Identified by flake8.	2019-08-01 23:28:16 +03:00
Alan Orth	a6eba0fc1a	.build.yml: Use pipenv run instead of pipenv shell The latter creates a shell and of course it doesn't ever exit!	2019-07-31 17:58:21 +03:00
Alan Orth	857492cd93	.build.yml: Try fix CLI test I was writing the CLI output to /dev/null because I was lazy, but actually this might break the SourceHut build.	2019-07-31 17:54:49 +03:00
Alan Orth	db42fbfea9	.build.yml: Try to run the script itself	2019-07-31 17:48:17 +03:00
Alan Orth	734ae7f011	setup.py: Add "DSpace" to description I suppose someone searching PyPi (eventually) would be happy to see that this was written specifically with DSpace in mind.	2019-07-31 17:42:27 +03:00
Alan Orth	fd3861e7cd	README.md: Update installation and usage instructions It is much easier now that I have created a proper package.	2019-07-31 17:41:18 +03:00
Alan Orth	cec1a34dfe	.gitignore: Ignore egg cache from distutils	2019-07-31 17:38:54 +03:00
Alan Orth	9100efdf50	Re-work as a proper standalone Python package Add a setup.py so that installation is easier and a standalone CLI script called csv-metadata-quality is provided. Now the user only needs to run this from a virtual environment inside the project directory: $ pip install . Eventually I could publish this on PyPi when I settle on a more appropriate package name. See: https://packaging.python.org/tutorials/packaging-projects/ See: https://chriswarrick.com/blog/2014/09/15/python-apps-the-right-way-entry_points-and-scripts/	2019-07-31 17:34:36 +03:00
Alan Orth	4c4f4a3ba2	README.md: Update todos	2019-07-31 16:33:49 +03:00
Alan Orth	22cc7bc793	README.md: Improve section on unsafe fixes	2019-07-31 16:00:05 +03:00
Alan Orth	915327539a	pytest.ini: Ignore deprecation warnings These come from third-party libraries I have no control over. See: https://docs.pytest.org/en/latest/warnings.html#deprecationwarning-and-pendingdeprecationwarning	2019-07-31 13:33:20 +03:00
Alan Orth	63ffd77723	data/test.csv: Clarify that newline is a line feed	2019-07-31 13:03:43 +03:00
Alan Orth	b9d041927e	Update python requirements Generated using pipenv: $ pipenv lock -r > requirements.txt $ pipenv lock -r -d > requirements-dev.txt	2019-07-30 23:06:26 +03:00
Alan Orth	020c87768e	data/test.csv: Add item with missing date	2019-07-30 21:02:51 +03:00
Alan Orth	abf74909ee	data/test.csv: Add missing date I was meaning to test for an invalid multi-value separator here!	2019-07-30 21:01:42 +03:00
Alan Orth	40d5f7d81b	Add support for removing newlines This was tricky because of the nature of newlines. In actuality we are removing Unix line feeds here (U+000A) because Windows carriage returns are actually already removed by the string stripping in the whitespace fix. Creating the test case in Vim was difficult because I couldn't fig- ure out how to manually enter a line feed character. In the end I used a search and replace on a known pattern like "ALAN", replacing it with \r. Neither entering the Unicode code point (U+000A) direc- tly or typing an "Enter" character after ^V worked. Grrr.	2019-07-30 20:05:12 +03:00
Alan Orth	346e66ca98	README.md: Add more information to introduction	2019-07-30 17:44:30 +03:00
Alan Orth	bad2fce124	pytest.ini: Don't print captured output It makes the summary of passes and fails more annoying to read due to the lines being long and including newlines. I am actually not sure why this fixes it, though... See: https://pytest.readthedocs.io/en/latest/capture.html	2019-07-30 16:43:31 +03:00
Alan Orth	3c798fb504	Use pycountry instead of iso-639 for languages The latter is a fork that hasn't been updated since 2016 and the original still seems to be well maintained, with recent database updates as well as tests for Python 3.7. Also, pycountry supports ISO 3166-2 (administrative zones), which we could eventually use for sub regions.	2019-07-30 16:39:26 +03:00
Alan Orth	a85b410ab9	README.md: Improve introduction and functionality	2019-07-30 16:09:15 +03:00
Alan Orth	4e3511cd55	csv_metadata_quality/check.py: Fix AGROVOC lookup We actually only need to see if there are more than zero matches because a term like "Nigeria" will match in English, Spanish, etc, whereas terms that really don't match will have zero results.	2019-07-30 14:51:44 +03:00
Alan Orth	1fd3d8bc2f	Add pytest.ini from responder This seems to be produce a more informative output, though I'm not sure how to filter the annoying deprecation warnings pytest throws about things that are usually done by modules I'm using, not by me. From: https://github.com/taoufik07/responder/blob/master/pytest.ini	2019-07-30 00:45:18 +03:00
Alan Orth	b4aaffd6f1	Update python requirements Generated using pipenv: $ pipenv lock -r > requirements.txt	2019-07-30 00:34:32 +03:00
Alan Orth	5ea2e856e4	data/test.csv: Add dc.subject column for AGROVOC tests	2019-07-30 00:33:31 +03:00
Alan Orth	1f65a28307	Add support for validating subjects against AGROVOC Checks values in the dc.subject or dcterms.subject field against the AGROVOC REST API hosted by FAO. Code borrowed from agrovoc-lookup.py. See: http://agrovoc.uniroma2.it/agrovoc/agrovoc/en/ See: https://github.com/ilri/DSpace/blob/5_x-prod/agrovoc-lookup.py	2019-07-30 00:30:31 +03:00
Alan Orth	bb882315f1	Update python requirements Generated using pipenv: $ pipenv lock -r > requirements.txt $ pipenv lock -r -d > requirements-dev.txt	2019-07-29 19:10:49 +03:00
Alan Orth	a36454a3ac	Add support for validating languages Will validate against ISO 639-2 or ISO 639-3 depending on how long the language field is. Otherwise will return that the language is invalid. Does not currently have any support for generic values like "Other".	2019-07-29 18:59:42 +03:00
Alan Orth	1978fa7b48	Add iso-639 library to validate languages	2019-07-29 18:58:50 +03:00
Alan Orth	e49b4e8f22	README.md: Try to simplify list of functionality	2019-07-29 18:25:38 +03:00
Alan Orth	0eb852a65b	README.md: Improve note about unsafe options	2019-07-29 18:14:50 +03:00
Alan Orth	8d92498bbf	Add .gitignore	2019-07-29 18:10:29 +03:00
Alan Orth	8c34c2d6e6	README.md: Add note about removing duplicate values	2019-07-29 18:09:48 +03:00
Alan Orth	1e444cf040	Add fix for duplicate metadata values	2019-07-29 18:05:03 +03:00
Alan Orth	d7888d59a8	csv_metadata_quality/check.py: Return date even if it is invalid Otherwise it is missing from the final CSV and then we can't even fix it. :)	2019-07-29 17:40:14 +03:00
Alan Orth	8509006165	data/test.csv: Use more descriptive tests To make it obvious what each item is testing.	2019-07-29 17:38:46 +03:00
Alan Orth	e33551776c	README.md: Update note about unsafe options	2019-07-29 17:25:42 +03:00
Alan Orth	7f781d7077	README.md: Finish writing usage section	2019-07-29 17:21:34 +03:00
Alan Orth	50ae4e17f2	csv_metadata_quality/fix.py: Fix indent	2019-07-29 17:14:48 +03:00
Alan Orth	fa4fa3491b	Add check for "suspicious" characters These standalone characters often indicate issues with encoding or copy/paste in languages with accents like French and Spanish. For example: foreˆt should be forêt. It is not possible to fix these issues automatically, but this will print a warning so you can notify the owner of the data.	2019-07-29 17:08:49 +03:00

... 8 9 10 11 12

562 Commits