1
0
mirror of https://github.com/ilri/csv-metadata-quality.git synced 2025-05-09 14:46:00 +02:00

Add support for validating subjects against AGROVOC

Checks values in the dc.subject or dcterms.subject field against the
AGROVOC REST API hosted by FAO. Code borrowed from agrovoc-lookup.py.

See: http://agrovoc.uniroma2.it/agrovoc/agrovoc/en/
See: https://github.com/ilri/DSpace/blob/5_x-prod/agrovoc-lookup.py
This commit is contained in:
2019-07-30 00:30:31 +03:00
parent bb882315f1
commit 1f65a28307
7 changed files with 129 additions and 1 deletions

View File

@ -43,6 +43,11 @@ def main(argv):
# Fix: duplicate metadata values
df[column] = df[column].apply(fix.duplicates)
# Check: invalid AGROVOC subject
match = re.match(r'.*?dc\.subject.*$', column)
if match is not None:
df[column] = df[column].apply(check.agrovoc)
# Check: invalid language
match = re.match(r'^.*?language.*$', column)
if match is not None: