1
0
mirror of https://github.com/ilri/csv-metadata-quality.git synced 2024-11-25 07:10:17 +01:00
Commit Graph

575 Commits

Author SHA1 Message Date
renovate[bot]
5c4ad0eb41
Update dependency black to v23.10.1
All checks were successful
continuous-integration/drone/push Build is passing
2023-10-23 20:03:53 +00:00
1c03999582
Merge pull request #24 from ilri/renovate/actions-checkout-4.x
All checks were successful
continuous-integration/drone/push Build is passing
Update actions/checkout action to v4
2023-10-15 23:39:45 +03:00
1f637f32cd
Rework requests-cache
We should only be running this once per invocation, not for every
row we check. This should be more efficient, but it means that we
don't cache responses when running via pytest, which is actually
probably a good thing.
2023-10-15 23:37:38 +03:00
b8241e919d
poetry.lock: run poetry update 2023-10-15 23:22:48 +03:00
b8dc19cc3f
csv_metadata_quality/check.py: enable requests-cache
This was disabled at some point. We also need to use the new delete
method instead.
2023-10-15 23:21:58 +03:00
93c9b739ac
csv_metadata_quality/check.py: use HTTPS
Use HTTPS for AGROVOC REST API.
2023-10-15 22:38:45 +03:00
4ed2786703
pyproject.toml: update pycountry
Use the latest branch in my fork that has iso-codes 4.15.0.
2023-10-15 21:53:09 +03:00
renovate[bot]
8728789183
Update actions/checkout action to v4
All checks were successful
continuous-integration/drone/push Build is passing
2023-09-04 14:26:25 +00:00
bf90464809
poetry.lock: run poetry update
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone Build is passing
2023-08-08 09:55:41 +02:00
1878002391 poetry.lock: run poetry update
All checks were successful
continuous-integration/drone/push Build is passing
2023-06-12 10:42:50 +03:00
d21d2621e3 csv_metadata_quality/app.py: read fields as strings
I suspect this undermines the PyArrow backend performance gains in
recent Pandas 2.0.0, but we are dealing with messy data sometimes
and we must rely on data being strings.
2023-06-12 10:42:50 +03:00
f3fb1ff7fb Don't crash when title is missing
We shouldn't crash the country/region checker/fixer when the title
field is missing, since we only use it to show status to the user.
2023-06-12 10:42:50 +03:00
1fa81f7558
Merge pull request #13 from ilri/renovate/ipython-8.x-lockfile
All checks were successful
continuous-integration/drone/push Build is passing
Update dependency ipython to v8.14.0
2023-06-03 17:09:21 +03:00
renovate[bot]
7409193b6b
Update dependency ipython to v8.14.0
All checks were successful
continuous-integration/drone/push Build is passing
2023-06-02 15:58:34 +00:00
a84fcf0b7b
.drone.yml: try to use poetry instead of pip
All checks were successful
continuous-integration/drone/push Build is passing
2023-05-30 11:39:08 +03:00
25ac290df4
.github: update Python actions
Some checks failed
continuous-integration/drone/push Build is failing
We don't need to use `python setup.py install` anymore. We can use
poetry directly in CI.

See: https://github.com/actions/setup-python/blob/main/docs/advanced-usage.md
2023-05-29 22:58:01 +03:00
3f52bad1e3
Remove setup.py
As far as I understand this is deprecated.
2023-05-29 22:41:37 +03:00
0208ad0ade
Merge pull request #12 from ilri/renovate/requests-cache-1.x
Update dependency requests-cache to v1
2023-05-29 22:37:23 +03:00
renovate[bot]
3632ae0fc9
Update dependency requests-cache to v1
All checks were successful
continuous-integration/drone/push Build is passing
2023-05-29 19:25:58 +00:00
17d089cc6e
poetry.lock: run poetry update
All checks were successful
continuous-integration/drone/push Build is passing
2023-05-29 22:24:22 +03:00
bc470a4343
pyproject.toml: rework pandas and pyarrow
We don't explicitly depend on PyArrow. It should come as a pandas
extra. I installed it like this:

    $ poetry add pandas=="^2.0.2[feather,performance]"

See: https://pandas.pydata.org/docs/getting_started/install.html#other-data-sources
2023-05-29 22:24:04 +03:00
be609a809d
setup.py: add Python 3.11 classifier 2023-05-29 21:32:59 +03:00
de3387ded7
Use Python 3.11 in Drone CI and GitHub Actions 2023-05-29 21:31:03 +03:00
f343e87f0c
renovate.json: fix json 2023-05-29 21:26:03 +03:00
7d3524fbd5
renovate.json: disable requirements.txt support
Poetry is used to manage dependencies. The requirements.txt files
are generated manually by exporting from Poetry.
2023-05-29 21:11:48 +03:00
c614b71a52
Merge pull request #5 from ilri/renovate/configure
Configure Renovate
2023-05-29 21:02:16 +03:00
renovate[bot]
d159a839f3
Add renovate.json 2023-05-29 17:40:33 +00:00
36e2ebe5f4
poetry.lock: run poetry update
All checks were successful
continuous-integration/drone/push Build is passing
2023-05-10 15:06:41 +03:00
33f67b7a7c
Update requirements
All checks were successful
continuous-integration/drone/push Build is passing
Generated with poetry export:

    $ poetry export --without-hashes -f requirements.txt > requirements.txt
    $ poetry export --without-hashes --with dev -f requirements.txt > requirements-dev.txt

I am trying `--without-hashes` to work around an error on pip install
when running in CI:

    ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==
2023-05-03 14:29:12 +03:00
c0e1448439
poetry.lock: run poetry update 2023-05-03 14:28:47 +03:00
5d0804a08f
Update requirements
Some checks failed
continuous-integration/drone/push Build is failing
Generated with poetry export:

    $ poetry export --without-hashes -f requirements.txt > requirements.txt
    $ poetry export --without-hashes --with dev -f requirements.txt > requirements-dev.txt

I am trying `--without-hashes` to work around an error on pip install
when running in CI:

    ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==
2023-04-22 12:44:54 -07:00
f01c9edf17
poetry.lock: run poetry update 2023-04-22 12:44:16 -07:00
8d4295b2b3
CHANGELOG.md: add note about description field 2023-04-22 12:17:44 -07:00
e2d46e9495
csv_metadata_quality/app.py: skip newline fix on description
The description field often has free-form text like the abstract and
there are too many legitimate newlines here to be correcting them
automatically.
2023-04-22 12:16:13 -07:00
1491e1edb0
Fix path to data/licenses.json
All checks were successful
continuous-integration/drone/push Build is passing
When we install and run this from CI, this file needs to exist in
the package's folder inside site-packages. Then we can use __file__
to get the path relative to the package.

See: https://python-packaging.readthedocs.io/en/latest/non-code-files.html
2023-04-05 15:28:21 +03:00
34142c3e6b
Update requirements
Some checks failed
continuous-integration/drone/push Build is failing
Generated with poetry export:

    $ poetry export --without-hashes -f requirements.txt > requirements.txt
    $ poetry export --without-hashes --with dev -f requirements.txt > requirements-dev.txt

I am trying `--without-hashes` to work around an error on pip install
when running in CI:

    ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==
2023-04-05 12:51:56 +03:00
0c88b96e8d
poetry.lock: run poetry update 2023-04-05 12:51:19 +03:00
2e55b4d6e3
pyproject.toml: add pyarrow explicitly
CI was failing because pyarrow is not an extra provided by pandas.
Indeed, according to the docs the named extras installing pyarrow
are actually feather and parquet, so we need to install pyarrow
explicitly.

See: https://pandas.pydata.org/pandas-docs/version/2.0/getting_started/install.html#install-dependencies
2023-04-05 12:49:40 +03:00
c90aad29f0
Use poetry dev group
This is the new syntax since Poetry 1.2.0.

See: https://python-poetry.org/docs/managing-dependencies/#installing-group-dependencies
2023-04-05 12:37:03 +03:00
6fd1e1377f
Add pyarrow extra to Python Pandas deps 2023-04-05 11:40:22 +03:00
c64b7eb1f1
CHANGELOG.md: add note about Pandas 2.0.0 2023-04-05 11:17:48 +03:00
29cbc4f3a3
Update requirements
Generated with poetry export:

    $ poetry export --without-hashes -f requirements.txt > requirements.txt
    $ poetry export --without-hashes --with dev -f requirements.txt > requirements-dev.txt

I am trying `--without-hashes` to work around an error on pip install
when running in CI:

    ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==
2023-04-05 11:17:06 +03:00
307af1acfc
poetry.lock: run poetry update 2023-04-05 11:15:55 +03:00
b5106de9df
pyproject.toml: Pandas 2.0.0 2023-04-05 11:15:40 +03:00
9eeadfc44e
poetry.lock: after adding pandas 2.0.0rc1
Some checks failed
continuous-integration/drone/push Build is failing
This is going to be an issue on the master branch if I update any
dependencies in the mean time...
2023-03-22 12:17:26 +03:00
d4aed378cf
Switch to pandas 2.0.0rc1
Seems to work fine with the new PyArrow datatypes.
2023-03-22 12:16:56 +03:00
20a2cce34b
CHANGELOG.md: add fixes
Some checks failed
continuous-integration/drone/push Build is failing
2023-03-10 16:17:20 +03:00
d661ffe439
Check comma space on bibliographicCitation too
The regex was only matching `dc.identifier.citation`, but we need
to match `dcterms.bibliographicCitation` too.
2023-03-10 16:13:16 +03:00
45a310387a
Don't fix multi-value separators on citations 2023-03-10 16:12:30 +03:00
47b03c49ba
README.md: Update TODOs
Some checks failed
continuous-integration/drone/push Build is failing
2023-03-07 10:45:04 +03:00