1
0
mirror of https://github.com/ilri/csv-metadata-quality.git synced 2024-11-30 01:28:18 +01:00
Commit Graph

14 Commits

Author SHA1 Message Date
8435ee242d
Experimental language detection using langid
Works decenty well assuming the title, abstract, and citation fields
are an accurate representation of the language as identified by the
language field. Handles ISO 639-1 (alpha 2) and ISO 639-3 (alpha 3)
values seamlessly.

This includes updated pipenv environment, test data, pytest tests
for both correct and incorrect ISO 639-1 and ISO 639-3 languages,
and a new command line option "-e".
2019-09-26 13:46:32 +03:00
219e37526d
Pipfile: Add csvkit to dev requirements
Used to inspect CSV files during testing and development.
2019-09-24 18:48:01 +03:00
b375f0e895
Add black and isort to pipenv dev dependencies
These do a very opinionated automatic formatting and validation of
code.

See: https://sourcery.ai/blog/python-best-practices/
2019-08-29 01:08:38 +03:00
8fb40d96b1
Pipfile: Add pytest-clarity to dev packages
This helps you understand the cryptic assertion error output from
pytest. For some reason pytest-clarity is a pre-release package so
we need to install it in pipenv with --pre.
2019-08-09 01:30:37 +03:00
d1b3e9e375
pipenv install -e . 2019-08-02 10:58:21 +03:00
3c798fb504
Use pycountry instead of iso-639 for languages
The latter is a fork that hasn't been updated since 2016 and the
original still seems to be well maintained, with recent database
updates as well as tests for Python 3.7.

Also, pycountry supports ISO 3166-2 (administrative zones), which
we could eventually use for sub regions.
2019-07-30 16:39:26 +03:00
1f65a28307
Add support for validating subjects against AGROVOC
Checks values in the dc.subject or dcterms.subject field against the
AGROVOC REST API hosted by FAO. Code borrowed from agrovoc-lookup.py.

See: http://agrovoc.uniroma2.it/agrovoc/agrovoc/en/
See: https://github.com/ilri/DSpace/blob/5_x-prod/agrovoc-lookup.py
2019-07-30 00:30:31 +03:00
1978fa7b48
Add iso-639 library to validate languages 2019-07-29 18:58:50 +03:00
c6e7d6d9b5
Add flake8 to pipenv dev environment
To help check PEP8 formatting/style compliance.
2019-07-28 17:46:30 +03:00
c2f28194eb
Add xlrd to pipenv for Pandas read_excel support 2019-07-28 17:05:17 +03:00
73b4061c7b
Add ipython to pipenv dev packages 2019-07-28 10:06:41 +03:00
99f00fcb85
Add pytest to pipenv dev environment 2019-07-27 00:32:53 +03:00
e160b17fb0
Add ISSN and ISBN checks using python-stdnum 2019-07-26 23:14:10 +03:00
21b78b9519
Initial commit
Pipenv environment with Pandas.
2019-07-26 17:54:13 +03:00