By using df[column] = df[column].apply(check...) we were re-writing
the DataFrame every time we returned from a check. We don't actuall
y need to return a value at all, as the point of checks is to print
a warning to the screen. In Python a "return" statement without a v
ariable returns None.
I haven't measured the impact of this, but I assume it will mean we
are faster and use less memory.
Allow overriding the directory for the requests cache. In the case
of csv-metadata-quality-web, which currently runs on Google's App
Engine, we can only write to /tmp.
This is no longer class-ified as "unsafe" as I have yet to see a
case where this was intentional, and it always causes issues when
you import the data in a DSpace repository.
I now use this version in my development environment. Eventually I
should add a matrix of versions to use, but I don't know the GitHub
Actions syntax well enough yet.
Generated with poetry export:
$ poetry export --without-hashes -f requirements.txt > requirements.txt
$ poetry export --without-hashes --dev -f requirements.txt > requirements-dev.txt
I am trying `--without-hashes` to work around an error on pip install
when running in CI:
ERROR: In --require-hashes mode, all requirements must have
their versions pinned with ==.
PEP8 recommends keeping imports at the top of the file. Also, I had
to re-work the issn/isbn so they didn't conflict with the functions
in check.py (flake8 warned about them being redefined).
Imports sorted with isort.
See: https://www.python.org/dev/peps/pep-0008/#imports
The original Dublin Core elements set was superceded by DCTERMS in
2008 and we have started using them in our DSpace repository so I
think it's good to update them in our test data. Old DC fields are
still checked and fixed in this tool, though.
It's worth nothing that currently supported DSpace versions (4, 5,
and 6) all have hard-coded a few fields like dc.title internally so
we can't migrate those to their DCTERMS counterparts just yet.
For some reason I stopped having csv-metadata-quality available in
my poetry environment after install. It seems I need to add it as a
poetry tool script? I had already done this in setup.py years ago,
which works for regular python setup.py installs, but hadn't needed
to do it in poetry for a year or more that I've been using it, until
now.
Generated with poetry export:
$ poetry export --without-hashes -f requirements.txt > requirements.txt
$ poetry export --without-hashes --dev -f requirements.txt > requirements-dev.txt
I am trying `--without-hashes` to work around an error on pip install
when running in CI:
ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==.
We used to only check fields that had "date" in their name because
we were using DSpace's default dc.date.* fields. Now we are using
dcterms.issued so I will add that one as well.
Generated with poetry export:
$ poetry export --without-hashes -f requirements.txt > requirements.txt
$ poetry export --without-hashes --dev -f requirements.txt > requirements-dev.txt
I am trying `--without-hashes` to work around an error on pip install
when running in CI:
ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==.