Generally we want people to upload documents in accessible formats
like PDF, Word, Excel, and PowerPoint. This check warns if a file
is using an uncommon extension.
This helps you understand the cryptic assertion error output from
pytest. For some reason pytest-clarity is a pre-release package so
we need to install it in pipenv with --pre.
Now it will print just the part of the metadata value that contains
the suspicious character (up to 80 characters, so we don't make the
line break on terminals that use 80 character width by default).
Also, print the name of the field in which the metadata value is so
that it is easier for the user to locate.
AGROVOC validation is now disabled by default, but can be enabled
on a field-by-field basis. For example, countries and regions are
also present in AGROVOC. Fields with these values can be enabled
using the new `--agrovoc-fields` option.
I reworked the script output to show the field name when printing
an invalid term so that the user knows in which field the term is.
This was tricky because of the nature of newlines. In actuality we
are removing Unix line feeds here (U+000A) because Windows carriage
returns are actually already removed by the string stripping in the
whitespace fix.
Creating the test case in Vim was difficult because I couldn't fig-
ure out how to manually enter a line feed character. In the end I
used a search and replace on a known pattern like "ALAN", replacing
it with \r. Neither entering the Unicode code point (U+000A) direc-
tly or typing an "Enter" character after ^V worked. Grrr.
It makes the summary of passes and fails more annoying to read due
to the lines being long and including newlines. I am actually not
sure why this fixes it, though...
See: https://pytest.readthedocs.io/en/latest/capture.html
The latter is a fork that hasn't been updated since 2016 and the
original still seems to be well maintained, with recent database
updates as well as tests for Python 3.7.
Also, pycountry supports ISO 3166-2 (administrative zones), which
we could eventually use for sub regions.
We actually only need to see if there are more than zero matches
because a term like "Nigeria" will match in English, Spanish, etc,
whereas terms that *really* don't match will have zero results.
This seems to be produce a more informative output, though I'm not
sure how to filter the annoying deprecation warnings pytest throws
about things that are usually done by modules I'm using, not by me.
From: https://github.com/taoufik07/responder/blob/master/pytest.ini