1
0
mirror of https://github.com/ilri/csv-metadata-quality.git synced 2025-05-18 10:21:54 +02:00

Experimental language detection using langid

Works decenty well assuming the title, abstract, and citation fields
are an accurate representation of the language as identified by the
language field. Handles ISO 639-1 (alpha 2) and ISO 639-3 (alpha 3)
values seamlessly.

This includes updated pipenv environment, test data, pytest tests
for both correct and incorrect ISO 639-1 and ISO 639-3 languages,
and a new command line option "-e".
This commit is contained in:
2019-09-24 18:55:05 +03:00
parent 7ac1c6f554
commit 8435ee242d
5 changed files with 186 additions and 0 deletions

View File

@ -20,6 +20,7 @@ requests = "*"
requests-cache = "*"
pycountry = "*"
csv-metadata-quality = {editable = true,path = "."}
langid = "*"
[requires]
python_version = "3.7"