poetry.lock: Run poetry update

CHANGELOG.md: Add note about poetry
Update requirements
2025-10-09 11:09:22 +02:00 · 2021-02-15 15:13:12 +02:00 · 2021-02-04 21:48:12 +02:00 · 2021-02-04 21:46:49 +02:00 · 2021-02-04 21:45:30 +02:00 · 2021-02-04 21:43:44 +02:00
22 changed files with 1664 additions and 757 deletions
--- a/.build.yml
+++ b/.build.yml
@@ -1,19 +0,0 @@
 image: archlinux
 packages:
  - python-pipenv
 sources:
  - https://git.sr.ht/~alanorth/csv-metadata-quality
 tasks:
  - setup: |
      cd csv-metadata-quality
      pipenv install --dev
  - pytest: |
      cd csv-metadata-quality
      pipenv run pytest
  - testcli: |
      cd csv-metadata-quality
      pipenv run pip install .
      pipenv run csv-metadata-quality -i data/test.csv -o /tmp/test.csv -u --agrovoc-fields dc.subject,cg.coverage.country
 environment:
  PIPENV_NOSPIN: 'True'
  PIPENV_HIDE_EMOJIS: 'True'
--- a/.drone.yml
+++ b/.drone.yml
@@ -0,0 +1,49 @@
 ---
 kind: pipeline
 type: docker
 name: python39
 steps:
 - name: test
  image: python:3.9-slim
  commands:
  - id
  - python -V
  - pip install -r requirements-dev.txt
  - pytest
  - python setup.py install
  - csv-metadata-quality -i data/test.csv -o /tmp/test.csv -e -u --agrovoc-fields dc.subject,cg.coverage.country
 ---
 kind: pipeline
 type: docker
 name: python38
 steps:
 - name: test
  image: python:3.8-slim
  commands:
  - id
  - python -V
  - pip install -r requirements-dev.txt
  - pytest
  - python setup.py install
  - csv-metadata-quality -i data/test.csv -o /tmp/test.csv -e -u --agrovoc-fields dc.subject,cg.coverage.country
 ---
 kind: pipeline
 type: docker
 name: python37
 steps:
 - name: test
  image: python:3.7-slim
  commands:
  - id
  - python -V
  - pip install -r requirements-dev.txt
  - pytest
  - python setup.py install
  - csv-metadata-quality -i data/test.csv -o /tmp/test.csv -e -u --agrovoc-fields dc.subject,cg.coverage.country
 # vim: ts=2 sw=2 et
--- a/.github/workflows/python-app.yml
+++ b/.github/workflows/python-app.yml
@@ -0,0 +1,41 @@
 # This workflow will install Python dependencies, run tests and lint with a single version of Python
 # For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions
 name: Build and Test
 on:
  push:
    branches: [ master ]
  pull_request:
    branches: [ master ]
 jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - name: Set up Python 3.8
      uses: actions/setup-python@v2
      with:
        python-version: 3.8
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install flake8 pytest
        if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
        if [ -f requirements-dev.txt ]; then pip install -r requirements-dev.txt; fi
    - name: Lint with flake8
      run: |
        # stop the build if there are Python syntax errors or undefined names
        flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
        # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
        flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
    - name: Test with pytest
      run: |
        pytest
    - name: Test CLI
      run: |
        python setup.py install
        csv-metadata-quality -i data/test.csv -o /tmp/test.csv -e -u --agrovoc-fields dc.subject,cg.coverage.country
--- a/.travis.yml
+++ b/.travis.yml
@@ -1,11 +0,0 @@
 dist: xenial
 language: python
 python:
  - "3.6"
  - "3.7"
 install:
  - "pip install pipenv --upgrade-strategy=only-if-needed"
  - "pipenv install --dev"
 script: pytest
 # vim: ts=2 sw=2 et
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -4,14 +4,66 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 ## Unreleased
 ### Added
 - Accept dates formatted in ISO 8601 extended with combined date and time, for
 example: 2020-08-31T11:04:56Z
 ### Updated
 - Run `poetry update` to update project dependencies
 ## [0.4.3] - 2021-01-26
 ### Changed
 - Reformat with black
 - Requires Python 3.7+ for pandas 1.2.0
 ### Updated
 - Run `poetry update`
 - Expand check/fix for multi-value separators to include metadata with invalid
 separators at the end, for example "Kenya||Tanzania||"
 ## [0.4.2] - 2020-07-06
 ### Changed
 - Add field name to the output for more fixes and checks to help identify where
 the error is
 - Minor optimizations to AGROVOC subject lookup
 - Use Poetry instead of Pipenv
 ### Updated
 - Update python dependencies to latest versions
 ## [0.4.1] - 2020-01-15
 ### Changed
 - Reduce minimum Python version to 3.6 by working around the `is_normalized()`
 that only works in Python >= 3.8
 ## [0.4.0] - 2020-01-15
 ### Added
 - Unicode normalization (enable with `--unsafe-fixes`, see README.md)
 ### Updated
 - Update python dependencies to latest versions, including numpy 1.18.1, pandas
 1.0.0rc0, flake8 3.7.9, pytest 5.3.2, and black 19.10b0
 - Regenerate requirements.txt and requirements-dev.txt
 ### Changed
 - Use Python 3.8.0 for pipenv
 - Use Ubuntu 18.04 "Bionic" for TravisCI builds
 - Test Python 3.8 in TravisCI builds
 ## [0.3.1] - 2019-10-01
 ## Changed
 - Replace non-breaking spaces (U+00A0) with space instead of removing them
 - Harmonize language of script output when fixing various issues
 ## [0.3.0] - 2019-09-26
 ### Updated
 - Update python dependencies to latest versions, including numpy 1.17.2, pandas
 0.25.1, pytest 5.1.3, and requests-cache 0.5.2
-## Added
+### Added
 - csvkit to dev requirements (csvcut etc are useful during development)
- Experimental language validation using `-e` (see README.md)
+- Experimental language validation using the Python `langid` library (enable with `-e`, see README.md)
 ### Changed
 - Re-formatted code with black and isort
--- a/29
+++ b/29
@@ -1,29 +0,0 @@
 [[source]]
 name = "pypi"
 url = "https://pypi.org/simple"
 verify_ssl = true
 [dev-packages]
 pytest = "*"
 ipython = "*"
 flake8 = "*"
 pytest-clarity = "*"
 black = "*"
 isort = "*"
 csvkit = "*"
 [packages]
 pandas = "*"
 python-stdnum = "*"
 xlrd = "*"
 requests = "*"
 requests-cache = "*"
 pycountry = "*"
 csv-metadata-quality = {editable = true,path = "."}
 langid = "*"
 [requires]
 python_version = "3.7"
 [pipenv]
 allow_prereleases = true
--- a/Pipfile.lock
+++ b/Pipfile.lock
@@ -1,555 +0,0 @@
 {
    "_meta": {
        "hash": {
            "sha256": "59562d8c59eb09e23b49475d6901687edbf605f5b84e283e90cc8e2de518641f"
        },
        "pipfile-spec": 6,
        "requires": {
            "python_version": "3.7"
        },
        "sources": [
            {
                "name": "pypi",
                "url": "https://pypi.org/simple",
                "verify_ssl": true
            }
        ]
    },
    "default": {
        "certifi": {
            "hashes": [
                "sha256:e4f3620cfea4f83eedc95b24abd9cd56f3c4b146dd0177e83a21b4eb49e21e50",
                "sha256:fd7c7c74727ddcf00e9acd26bba8da604ffec95bf1c2144e67aff7a8b50e6cef"
            ],
            "version": "==2019.9.11"
        },
        "chardet": {
            "hashes": [
                "sha256:84ab92ed1c4d4f16916e05906b6b75a6c0fb5db821cc65e70cbd64a3e2a5eaae",
                "sha256:fc323ffcaeaed0e0a02bf4d117757b98aed530d9ed4531e3e15460124c106691"
            ],
            "version": "==3.0.4"
        },
        "csv-metadata-quality": {
            "editable": true,
            "path": "."
        },
        "idna": {
            "hashes": [
                "sha256:c357b3f628cf53ae2c4c05627ecc484553142ca23264e593d327bcde5e9c3407",
                "sha256:ea8b7f6188e6fa117537c3df7da9fc686d485087abf6ac197f9c46432f7e4a3c"
            ],
            "version": "==2.8"
        },
        "langid": {
            "hashes": [
                "sha256:044bcae1912dab85c33d8e98f2811b8f4ff1213e5e9a9e9510137b84da2cb293"
            ],
            "index": "pypi",
            "version": "==1.1.6"
        },
        "numpy": {
            "hashes": [
                "sha256:05dbfe72684cc14b92568de1bc1f41e5f62b00f714afc9adee42f6311738091f",
                "sha256:0d82cb7271a577529d07bbb05cb58675f2deb09772175fab96dc8de025d8ac05",
                "sha256:10132aa1fef99adc85a905d82e8497a580f83739837d7cbd234649f2e9b9dc58",
                "sha256:12322df2e21f033a60c80319c25011194cd2a21294cc66fee0908aeae2c27832",
                "sha256:16f19b3aa775dddc9814e02a46b8e6ae6a54ed8cf143962b4e53f0471dbd7b16",
                "sha256:3d0b0989dd2d066db006158de7220802899a1e5c8cf622abe2d0bd158fd01c2c",
                "sha256:438a3f0e7b681642898fd7993d38e2bf140a2d1eafaf3e89bb626db7f50db355",
                "sha256:5fd214f482ab53f2cea57414c5fb3e58895b17df6e6f5bca5be6a0bb6aea23bb",
                "sha256:73615d3edc84dd7c4aeb212fa3748fb83217e00d201875a47327f55363cef2df",
                "sha256:7bd355ad7496f4ce1d235e9814ec81ee3d28308d591c067ce92e49f745ba2c2f",
                "sha256:7d077f2976b8f3de08a0dcf5d72083f4af5411e8fddacd662aae27baa2601196",
                "sha256:a4092682778dc48093e8bda8d26ee8360153e2047826f95a3f5eae09f0ae3abf",
                "sha256:b458de8624c9f6034af492372eb2fee41a8e605f03f4732f43fc099e227858b2",
                "sha256:e70fc8ff03a961f13363c2c95ef8285e0cf6a720f8271836f852cc0fa64e97c8",
                "sha256:ee8e9d7cad5fe6dde50ede0d2e978d81eafeaa6233fb0b8719f60214cf226578",
                "sha256:f4a4f6aba148858a5a5d546a99280f71f5ee6ec8182a7d195af1a914195b21a2"
            ],
            "version": "==1.17.2"
        },
        "pandas": {
            "hashes": [
                "sha256:18d91a9199d1dfaa01ad645f7540370ba630bdcef09daaf9edf45b4b1bca0232",
                "sha256:3f26e5da310a0c0b83ea50da1fd397de2640b02b424aa69be7e0784228f656c9",
                "sha256:4182e32f4456d2c64619e97c58571fa5ca0993d1e8c2d9ca44916185e1726e15",
                "sha256:426e590e2eb0e60f765271d668a30cf38b582eaae5ec9b31229c8c3c10c5bc21",
                "sha256:5eb934a8f0dc358f0e0cdf314072286bbac74e4c124b64371395e94644d5d919",
                "sha256:717928808043d3ea55b9bcde636d4a52d2236c246f6df464163a66ff59980ad8",
                "sha256:8145f97c5ed71827a6ec98ceaef35afed1377e2d19c4078f324d209ff253ecb5",
                "sha256:8744c84c914dcc59cbbb2943b32b7664df1039d99e834e1034a3372acb89ea4d",
                "sha256:c1ac1d9590d0c9314ebf01591bd40d4c03d710bfc84a3889e5263c97d7891dee",
                "sha256:cb2e197b7b0687becb026b84d3c242482f20cbb29a9981e43604eb67576da9f6",
                "sha256:d4001b71ad2c9b84ff18b182cea22b7b6cbf624216da3ea06fb7af28d1f93165",
                "sha256:d8930772adccb2882989ab1493fa74bd87d47c8ac7417f5dd3dd834ba8c24dc9",
                "sha256:dfbb0173ee2399bc4ed3caf2d236e5c0092f948aafd0a15fbe4a0e77ee61a958",
                "sha256:eebfbba048f4fa8ac711b22c78516e16ff8117d05a580e7eeef6b0c2be554c18",
                "sha256:f1b21bc5cf3dbea53d33615d1ead892dfdae9d7052fa8898083bec88be20dcd2"
            ],
            "index": "pypi",
            "version": "==0.25.1"
        },
        "pycountry": {
            "hashes": [
                "sha256:3c57aa40adcf293d59bebaffbe60d8c39976fba78d846a018dc0c2ec9c6cb3cb"
            ],
            "index": "pypi",
            "version": "==19.8.18"
        },
        "python-dateutil": {
            "hashes": [
                "sha256:7e6584c74aeed623791615e26efd690f29817a27c73085b78e4bad02493df2fb",
                "sha256:c89805f6f4d64db21ed966fda138f8a5ed7a4fdbc1a8ee329ce1b74e3c74da9e"
            ],
            "version": "==2.8.0"
        },
        "python-stdnum": {
            "hashes": [
                "sha256:d5f0af1bee9ddd9a20b398b46ce062dbd4d41fcc9646940f2667256a44df3854",
                "sha256:f445ec32bf5246c90389204cabba465f494545371c29a83fa2d30e6c872a6763"
            ],
            "index": "pypi",
            "version": "==1.11"
        },
        "pytz": {
            "hashes": [
                "sha256:26c0b32e437e54a18161324a2fca3c4b9846b74a8dccddd843113109e1116b32",
                "sha256:c894d57500a4cd2d5c71114aaab77dbab5eabd9022308ce5ac9bb93a60a6f0c7"
            ],
            "version": "==2019.2"
        },
        "requests": {
            "hashes": [
                "sha256:11e007a8a2aa0323f5a921e9e6a2d7e4e67d9877e85773fba9ba6419025cbeb4",
                "sha256:9cf5292fcd0f598c671cfc1e0d7d1a7f13bb8085e9a590f48c010551dc6c4b31"
            ],
            "index": "pypi",
            "version": "==2.22.0"
        },
        "requests-cache": {
            "hashes": [
                "sha256:813023269686045f8e01e2289cc1e7e9ae5ab22ddd1e2849a9093ab3ab7270eb",
                "sha256:81e13559baee64677a7d73b85498a5a8f0639e204517b5d05ff378e44a57831a"
            ],
            "index": "pypi",
            "version": "==0.5.2"
        },
        "six": {
            "hashes": [
                "sha256:3350809f0555b11f552448330d0b52d5f24c91a322ea4a15ef22629740f3761c",
                "sha256:d16a0141ec1a18405cd4ce8b4613101da75da0e9a7aec5bdd4fa804d0e0eba73"
            ],
            "version": "==1.12.0"
        },
        "urllib3": {
            "hashes": [
                "sha256:3de946ffbed6e6746608990594d08faac602528ac7015ac28d33cee6a45b7398",
                "sha256:9a107b99a5393caf59c7aa3c1249c16e6879447533d0887f4336dde834c7be86"
            ],
            "version": "==1.25.6"
        },
        "xlrd": {
            "hashes": [
                "sha256:546eb36cee8db40c3eaa46c351e67ffee6eeb5fa2650b71bc4c758a29a1b29b2",
                "sha256:e551fb498759fa3a5384a94ccd4c3c02eb7c00ea424426e212ac0c57be9dfbde"
            ],
            "index": "pypi",
            "version": "==1.2.0"
        }
    },
    "develop": {
        "agate": {
            "hashes": [
                "sha256:48d6f80b35611c1ba25a642cbc5b90fcbdeeb2a54711c4a8d062ee2809334d1c",
                "sha256:c93aaa500b439d71e4a5cf088d0006d2ce2c76f1950960c8843114e5f361dfd3"
            ],
            "version": "==1.6.1"
        },
        "agate-dbf": {
            "hashes": [
                "sha256:00c93c498ec9a04cc587bf63dd7340e67e2541f0df4c9a7259d7cb3dd4ce372f"
            ],
            "version": "==0.2.1"
        },
        "agate-excel": {
            "hashes": [
                "sha256:8f255ef2c87c436b7132049e1dd86c8e08bf82d8c773aea86f3069b461a17d52"
            ],
            "version": "==0.2.3"
        },
        "agate-sql": {
            "hashes": [
                "sha256:9277490ba8b8e7c747a9ae3671f52fe486784b48d4a14e78ca197fb0e36f281b"
            ],
            "version": "==0.5.4"
        },
        "appdirs": {
            "hashes": [
                "sha256:9e5896d1372858f8dd3344faf4e5014d21849c756c8d5701f78f8a103b372d92",
                "sha256:d8b24664561d0d34ddfaec54636d502d7cea6e29c3eaf68f3df6180863e2166e"
            ],
            "version": "==1.4.3"
        },
        "atomicwrites": {
            "hashes": [
                "sha256:03472c30eb2c5d1ba9227e4c2ca66ab8287fbfbbda3888aa93dc2e28fc6811b4",
                "sha256:75a9445bac02d8d058d5e1fe689654ba5a6556a1dfd8ce6ec55a0ed79866cfa6"
            ],
            "version": "==1.3.0"
        },
        "attrs": {
            "hashes": [
                "sha256:69c0dbf2ed392de1cb5ec704444b08a5ef81680a61cb899dc08127123af36a79",
                "sha256:f0b870f674851ecbfbbbd364d6b5cbdff9dcedbc7f3f5e18a6891057f21fe399"
            ],
            "version": "==19.1.0"
        },
        "babel": {
            "hashes": [
                "sha256:af92e6106cb7c55286b25b38ad7695f8b4efb36a90ba483d7f7a6628c46158ab",
                "sha256:e86135ae101e31e2c8ec20a4e0c5220f4eed12487d5cf3f78be7e98d3a57fc28"
            ],
            "version": "==2.7.0"
        },
        "backcall": {
            "hashes": [
                "sha256:38ecd85be2c1e78f77fd91700c76e14667dc21e2713b63876c0eb901196e01e4",
                "sha256:bbbf4b1e5cd2bdb08f915895b51081c041bac22394fdfcfdfbe9f14b77c08bf2"
            ],
            "version": "==0.1.0"
        },
        "black": {
            "hashes": [
                "sha256:09a9dcb7c46ed496a9850b76e4e825d6049ecd38b611f1224857a79bd985a8cf",
                "sha256:68950ffd4d9169716bcb8719a56c07a2f4485354fec061cdd5910aa07369731c"
            ],
            "index": "pypi",
            "version": "==19.3b0"
        },
        "click": {
            "hashes": [
                "sha256:2335065e6395b9e67ca716de5f7526736bfa6ceead690adf616d925bdc622b13",
                "sha256:5b94b49521f6456670fdb30cd82a4eca9412788a93fa6dd6df72c94d5a8ff2d7"
            ],
            "version": "==7.0"
        },
        "csvkit": {
            "hashes": [
                "sha256:1353a383531bee191820edfb88418c13dfe1cdfa9dd3dc46f431c05cd2a260a0"
            ],
            "index": "pypi",
            "version": "==1.0.4"
        },
        "dbfread": {
            "hashes": [
                "sha256:07c8a9af06ffad3f6f03e8fe91ad7d2733e31a26d2b72c4dd4cfbae07ee3b73d",
                "sha256:f604def58c59694fa0160d7be5d0b8d594467278d2bb6a47d46daf7162c84cec"
            ],
            "version": "==2.0.7"
        },
        "decorator": {
            "hashes": [
                "sha256:86156361c50488b84a3f148056ea716ca587df2f0de1d34750d35c21312725de",
                "sha256:f069f3a01830ca754ba5258fde2278454a0b5b79e0d7f5c13b3b97e57d4acff6"
            ],
            "version": "==4.4.0"
        },
        "entrypoints": {
            "hashes": [
                "sha256:589f874b313739ad35be6e0cd7efde2a4e9b6fea91edcc34e58ecbb8dbe56d19",
                "sha256:c70dd71abe5a8c85e55e12c19bd91ccfeec11a6e99044204511f9ed547d48451"
            ],
            "version": "==0.3"
        },
        "et-xmlfile": {
            "hashes": [
                "sha256:614d9722d572f6246302c4491846d2c393c199cfa4edc9af593437691683335b"
            ],
            "version": "==1.0.1"
        },
        "flake8": {
            "hashes": [
                "sha256:19241c1cbc971b9962473e4438a2ca19749a7dd002dd1a946eaba171b4114548",
                "sha256:8e9dfa3cecb2400b3738a42c54c3043e821682b9c840b0448c0503f781130696"
            ],
            "index": "pypi",
            "version": "==3.7.8"
        },
        "future": {
            "hashes": [
                "sha256:67045236dcfd6816dc439556d009594abf643e5eb48992e36beac09c2ca659b8"
            ],
            "version": "==0.17.1"
        },
        "importlib-metadata": {
            "hashes": [
                "sha256:aa18d7378b00b40847790e7c27e11673d7fed219354109d0e7b9e5b25dc3ad26",
                "sha256:d5f18a79777f3aa179c145737780282e27b508fc8fd688cb17c7a813e8bd39af"
            ],
            "markers": "python_version < '3.8'",
            "version": "==0.23"
        },
        "ipython": {
            "hashes": [
                "sha256:c4ab005921641e40a68e405e286e7a1fcc464497e14d81b6914b4fd95e5dee9b",
                "sha256:dd76831f065f17bddd7eaa5c781f5ea32de5ef217592cf019e34043b56895aa1"
            ],
            "index": "pypi",
            "version": "==7.8.0"
        },
        "ipython-genutils": {
            "hashes": [
                "sha256:72dd37233799e619666c9f639a9da83c34013a73e8bbc79a7a6348d93c61fab8",
                "sha256:eb2e116e75ecef9d4d228fdc66af54269afa26ab4463042e33785b887c628ba8"
            ],
            "version": "==0.2.0"
        },
        "isodate": {
            "hashes": [
                "sha256:2e364a3d5759479cdb2d37cce6b9376ea504db2ff90252a2e5b7cc89cc9ff2d8",
                "sha256:aa4d33c06640f5352aca96e4b81afd8ab3b47337cc12089822d6f322ac772c81"
            ],
            "version": "==0.6.0"
        },
        "isort": {
            "hashes": [
                "sha256:54da7e92468955c4fceacd0c86bd0ec997b0e1ee80d97f67c35a78b719dccab1",
                "sha256:6e811fcb295968434526407adb8796944f1988c5b65e8139058f2014cbe100fd"
            ],
            "index": "pypi",
            "version": "==4.3.21"
        },
        "jdcal": {
            "hashes": [
                "sha256:1abf1305fce18b4e8aa248cf8fe0c56ce2032392bc64bbd61b5dff2a19ec8bba",
                "sha256:472872e096eb8df219c23f2689fc336668bdb43d194094b5cc1707e1640acfc8"
            ],
            "version": "==1.4.1"
        },
        "jedi": {
            "hashes": [
                "sha256:786b6c3d80e2f06fd77162a07fed81b8baa22dde5d62896a790a331d6ac21a27",
                "sha256:ba859c74fa3c966a22f2aeebe1b74ee27e2a462f56d3f5f7ca4a59af61bfe42e"
            ],
            "version": "==0.15.1"
        },
        "leather": {
            "hashes": [
                "sha256:076d1603b5281488285718ce1a5ce78cf1027fe1e76adf9c548caf83c519b988",
                "sha256:e0bb36a6d5f59fbf3c1a6e75e7c8bee29e67f06f5b48c0134407dde612eba5e2"
            ],
            "version": "==0.3.3"
        },
        "mccabe": {
            "hashes": [
                "sha256:ab8a6258860da4b6677da4bd2fe5dc2c659cff31b3ee4f7f5d64e79735b80d42",
                "sha256:dd8d182285a0fe56bace7f45b5e7d1a6ebcbf524e8f3bd87eb0f125271b8831f"
            ],
            "version": "==0.6.1"
        },
        "more-itertools": {
            "hashes": [
                "sha256:409cd48d4db7052af495b09dec721011634af3753ae1ef92d2b32f73a745f832",
                "sha256:92b8c4b06dac4f0611c0729b2f2ede52b2e1bac1ab48f089c7ddc12e26bb60c4"
            ],
            "version": "==7.2.0"
        },
        "openpyxl": {
            "hashes": [
                "sha256:340a1ab2069764559b9d58027a43a24db18db0e25deb80f81ecb8ca7ee5253db"
            ],
            "version": "==3.0.0"
        },
        "packaging": {
            "hashes": [
                "sha256:28b924174df7a2fa32c1953825ff29c61e2f5e082343165438812f00d3a7fc47",
                "sha256:d9551545c6d761f3def1677baf08ab2a3ca17c56879e70fecba2fc4dde4ed108"
            ],
            "version": "==19.2"
        },
        "parsedatetime": {
            "hashes": [
                "sha256:3d817c58fb9570d1eec1dd46fa9448cd644eeed4fb612684b02dfda3a79cb84b",
                "sha256:9ee3529454bf35c40a77115f5a596771e59e1aee8c53306f346c461b8e913094"
            ],
            "version": "==2.4"
        },
        "parso": {
            "hashes": [
                "sha256:63854233e1fadb5da97f2744b6b24346d2750b85965e7e399bec1620232797dc",
                "sha256:666b0ee4a7a1220f65d367617f2cd3ffddff3e205f3f16a0284df30e774c2a9c"
            ],
            "version": "==0.5.1"
        },
        "pexpect": {
            "hashes": [
                "sha256:2094eefdfcf37a1fdbfb9aa090862c1a4878e5c7e0e7e7088bdb511c558e5cd1",
                "sha256:9e2c1fd0e6ee3a49b28f95d4b33bc389c89b20af6a1255906e90ff1262ce62eb"
            ],
            "markers": "sys_platform != 'win32'",
            "version": "==4.7.0"
        },
        "pickleshare": {
            "hashes": [
                "sha256:87683d47965c1da65cdacaf31c8441d12b8044cdec9aca500cd78fc2c683afca",
                "sha256:9649af414d74d4df115d5d718f82acb59c9d418196b7b4290ed47a12ce62df56"
            ],
            "version": "==0.7.5"
        },
        "pluggy": {
            "hashes": [
                "sha256:0db4b7601aae1d35b4a033282da476845aa19185c1e6964b25cf324b5e4ec3e6",
                "sha256:fa5fa1622fa6dd5c030e9cad086fa19ef6a0cf6d7a2d12318e10cb49d6d68f34"
            ],
            "version": "==0.13.0"
        },
        "prompt-toolkit": {
            "hashes": [
                "sha256:11adf3389a996a6d45cc277580d0d53e8a5afd281d0c9ec71b28e6f121463780",
                "sha256:2519ad1d8038fd5fc8e770362237ad0364d16a7650fb5724af6997ed5515e3c1",
                "sha256:977c6583ae813a37dc1c2e1b715892461fcbdaa57f6fc62f33a528c4886c8f55"
            ],
            "version": "==2.0.9"
        },
        "ptyprocess": {
            "hashes": [
                "sha256:923f299cc5ad920c68f2bc0bc98b75b9f838b93b599941a6b63ddbc2476394c0",
                "sha256:d7cc528d76e76342423ca640335bd3633420dc1366f258cb31d05e865ef5ca1f"
            ],
            "version": "==0.6.0"
        },
        "py": {
            "hashes": [
                "sha256:64f65755aee5b381cea27766a3a147c3f15b9b6b9ac88676de66ba2ae36793fa",
                "sha256:dc639b046a6e2cff5bbe40194ad65936d6ba360b52b3c3fe1d08a82dd50b5e53"
            ],
            "version": "==1.8.0"
        },
        "pycodestyle": {
            "hashes": [
                "sha256:95a2219d12372f05704562a14ec30bc76b05a5b297b21a5dfe3f6fac3491ae56",
                "sha256:e40a936c9a450ad81df37f549d676d127b1b66000a6c500caa2b085bc0ca976c"
            ],
            "version": "==2.5.0"
        },
        "pyflakes": {
            "hashes": [
                "sha256:17dbeb2e3f4d772725c777fabc446d5634d1038f234e77343108ce445ea69ce0",
                "sha256:d976835886f8c5b31d47970ed689944a0262b5f3afa00a5a7b4dc81e5449f8a2"
            ],
            "version": "==2.1.1"
        },
        "pygments": {
            "hashes": [
                "sha256:71e430bc85c88a430f000ac1d9b331d2407f681d6f6aec95e8bcfbc3df5b0127",
                "sha256:881c4c157e45f30af185c1ffe8d549d48ac9127433f2c380c24b84572ad66297"
            ],
            "version": "==2.4.2"
        },
        "pyparsing": {
            "hashes": [
                "sha256:6f98a7b9397e206d78cc01df10131398f1c8b8510a2f4d97d9abd82e1aacdd80",
                "sha256:d9338df12903bbf5d65a0e4e87c2161968b10d2e489652bb47001d82a9b028b4"
            ],
            "version": "==2.4.2"
        },
        "pytest": {
            "hashes": [
                "sha256:813b99704b22c7d377bbd756ebe56c35252bb710937b46f207100e843440b3c2",
                "sha256:cc6620b96bc667a0c8d4fa592a8c9c94178a1bd6cc799dbb057dfd9286d31a31"
            ],
            "index": "pypi",
            "version": "==5.1.3"
        },
        "pytest-clarity": {
            "hashes": [
                "sha256:3f40d5ae7cb21cc95e622fc4f50d9466f80ae0f91460225b8c95c07afbf93e20"
            ],
            "index": "pypi",
            "version": "==0.2.0a1"
        },
        "python-slugify": {
            "hashes": [
                "sha256:575d03256a132fc1efb4c52966c6eb11c57a13b071618f0b26076057a23f6937"
            ],
            "version": "==3.0.4"
        },
        "pytimeparse": {
            "hashes": [
                "sha256:04b7be6cc8bd9f5647a6325444926c3ac34ee6bc7e69da4367ba282f076036bd",
                "sha256:e86136477be924d7e670646a98561957e8ca7308d44841e21f5ddea757556a0a"
            ],
            "version": "==1.1.8"
        },
        "pytz": {
            "hashes": [
                "sha256:26c0b32e437e54a18161324a2fca3c4b9846b74a8dccddd843113109e1116b32",
                "sha256:c894d57500a4cd2d5c71114aaab77dbab5eabd9022308ce5ac9bb93a60a6f0c7"
            ],
            "version": "==2019.2"
        },
        "six": {
            "hashes": [
                "sha256:3350809f0555b11f552448330d0b52d5f24c91a322ea4a15ef22629740f3761c",
                "sha256:d16a0141ec1a18405cd4ce8b4613101da75da0e9a7aec5bdd4fa804d0e0eba73"
            ],
            "version": "==1.12.0"
        },
        "sqlalchemy": {
            "hashes": [
                "sha256:2f8ff566a4d3a92246d367f2e9cd6ed3edeef670dcd6dda6dfdc9efed88bcd80"
            ],
            "version": "==1.3.8"
        },
        "termcolor": {
            "hashes": [
                "sha256:1d6d69ce66211143803fbc56652b41d73b4a400a2891d7bf7a1cdf4c02de613b"
            ],
            "version": "==1.1.0"
        },
        "text-unidecode": {
            "hashes": [
                "sha256:1311f10e8b895935241623731c2ba64f4c455287888b18189350b67134a822e8",
                "sha256:bad6603bb14d279193107714b288be206cac565dfa49aa5b105294dd5c4aab93"
            ],
            "version": "==1.3"
        },
        "toml": {
            "hashes": [
                "sha256:229f81c57791a41d65e399fc06bf0848bab550a9dfd5ed66df18ce5f05e73d5c",
                "sha256:235682dd292d5899d361a811df37e04a8828a5b1da3115886b73cf81ebc9100e"
            ],
            "version": "==0.10.0"
        },
        "traitlets": {
            "hashes": [
                "sha256:262089114405f22f4833be96b31e143ab906d7764a22c04c71fee0bbda4787ba",
                "sha256:6ad5b30dacd5e2424c46cc94a0aeab990d98ae17d181acea2cc4272ac3409fca"
            ],
            "version": "==4.3.3.dev0"
        },
        "wcwidth": {
            "hashes": [
                "sha256:3df37372226d6e63e1b1e1eda15c594bca98a22d33a23832a90998faa96bc65e",
                "sha256:f4ebe71925af7b40a864553f761ed559b43544f8f71746c2d756c7fe788ade7c"
            ],
            "version": "==0.1.7"
        },
        "xlrd": {
            "hashes": [
                "sha256:546eb36cee8db40c3eaa46c351e67ffee6eeb5fa2650b71bc4c758a29a1b29b2",
                "sha256:e551fb498759fa3a5384a94ccd4c3c02eb7c00ea424426e212ac0c57be9dfbde"
            ],
            "index": "pypi",
            "version": "==1.2.0"
        },
        "zipp": {
            "hashes": [
                "sha256:3718b1cbcd963c7d4c5511a8240812904164b7f381b647143a89d3b98f9bcd8e",
                "sha256:f06903e9f1f43b12d371004b4ac7b06ab39a44adc747266928ae6debfa7b3335"
            ],
            "version": "==0.6.0"
        }
    }
 }
--- a/README.md
+++ b/README.md
@@ -1,7 +1,11 @@
-# CSV Metadata Quality [![Build Status](https://travis-ci.org/ilri/csv-metadata-quality.svg?branch=master)](https://travis-ci.org/ilri/csv-metadata-quality) [![builds.sr.ht status](https://builds.sr.ht/~alanorth/csv-metadata-quality.svg)](https://builds.sr.ht/~alanorth/csv-metadata-quality?)
+# DSpace CSV Metadata Quality Checker ![GitHub Actions](https://github.com/ilri/csv-metadata-quality/workflows/Build%20and%20Test/badge.svg) [![Build Status](https://ci.mjanja.ch/api/badges/alanorth/csv-metadata-quality/status.svg)](https://ci.mjanja.ch/alanorth/csv-metadata-quality)
-A simple, but opinionated metadata quality checker and fixer designed to work with CSVs in the DSpace ecosystem. The implementation is essentially a pipeline of checks and fixes that begins with splitting multi-value fields on the standard DSpace "||" separator, trimming leading/trailing whitespace, and then proceeding to more specialized cases like ISSNs, ISBNs, languages, etc.
+A simple, but opinionated metadata quality checker and fixer designed to work with CSVs in the DSpace ecosystem (though it could theoretically work on any CSV that uses Dublin Core fields as columns). The implementation is essentially a pipeline of checks and fixes that begins with splitting multi-value fields on the standard DSpace "||" separator, trimming leading/trailing whitespace, and then proceeding to more specialized cases like ISSNs, ISBNs, languages, unnecessary Unicode, AGROVOC terms, etc.
-Requires Python 3.6 or greater. CSV and Excel support comes from the [Pandas](https://pandas.pydata.org/) library, though your mileage may vary with Excel because this is much less tested.
+Requires Python 3.7 or greater (3.8 recommended). CSV and Excel support comes from the [Pandas](https://pandas.pydata.org/) library, though your mileage may vary with Excel because this is much less tested.
 If you use the DSpace CSV metadata quality checker please cite:
 *Orth, A. 2019. DSpace CSV metadata quality checker. Nairobi, Kenya: ILRI. https://hdl.handle.net/10568/110997.*
 ## Functionality
@@ -10,23 +14,24 @@ Requires Python 3.6 or greater. CSV and Excel support comes from the [Pandas](ht
 - Experimental validation of titles and abstracts against item's Dublin Core language field
 - Validate subjects against the AGROVOC REST API (see the `--agrovoc-fields` option)
 - Fix leading, trailing, and excessive (ie, more than one) whitespace
- Fix invalid multi-value separators (`|`) using `--unsafe-fixes`
+- Fix invalid and unnecessary multi-value separators (`|`) using `--unsafe-fixes`
 - Fix problematic newlines (line feeds) using `--unsafe-fixes`
 - Remove unnecessary Unicode like [non-breaking spaces](https://en.wikipedia.org/wiki/Non-breaking_space), [replacement characters](https://en.wikipedia.org/wiki/Specials_(Unicode_block)#Replacement_character), etc
 - Check for "suspicious" characters that indicate encoding or copy/paste issues, for example "foreˆt" should be "forêt"
 - Remove duplicate metadata values
 - Perform [Unicode normalization](https://withblue.ink/2019/03/11/why-you-need-to-normalize-unicode-strings.html) on strings using `--unsafe-fixes`
 ## Installation
-The easiest way to install CSV Metadata Quality is with [pipenv](https://github.com/pypa/pipenv):
+The easiest way to install CSV Metadata Quality is with [poetry](https://python-poetry.org):
 ```
 $ git clone https://github.com/ilri/csv-metadata-quality.git
 $ cd csv-metadata-quality
-$ pipenv install
+$ poetry install
-$ pipenv shell
+$ poetry shell
 ```
-Otherwise, if you don't have pipenv, you can use a vanilla Python virtual environment:
+Otherwise, if you don't have poetry, you can use a vanilla Python virtual environment:
 ```
 $ git clone https://github.com/ilri/csv-metadata-quality.git
@@ -55,9 +60,19 @@ You can enable several "unsafe" fixes with the `--unsafe-fixes` option. Currentl
 ### Invalid Multi-Value Separators
 This is considered "unsafe" because it is *theoretically* possible for a single `|` character to be used legitimately in a metadata value, though in my experience it is always a typo. For example, if a user mistakenly writes `Kenya|Tanzania` when attempting to indicate two countries, the result will be one metadata value with the literal text `Kenya|Tanzania`. The `--unsafe-fixes` option will correct the invalid multi-value separator so that there are two metadata values, ie `Kenya||Tanzania`.
 This will also remove unnecessary trailing multi-value separators, for example `Kenya||Tanzania||`.
 ### Newlines
 This is considered "unsafe" because some systems give special importance to vertical space and render it properly. DSpace does not support rendering newlines in its XMLUI and has, at times, suffered from parsing errors that cause the import process to fail if an input file had newlines. The `--unsafe-fixes` option strips Unix line feeds (U+000A).
 ### Unicode Normalization
 [Unicode](https://en.wikipedia.org/wiki/Unicode) is a standard for encoding text. As the standard aims to support most of the world's languages, characters can often be represented in different ways and still be valid Unicode. This leads to interesting problems that can be confusing unless you know what's going on behind the scenes. For example, the characters `é` and `é` *look* the same, but are not — technically they refer to different code points in the Unicode standard:
 - `é` is the Unicode code point `U+00E9`
 - `é` is the Unicode code points `U+0065` + `U+0301`
 Read more about [Unicode normalization](https://withblue.ink/2019/03/11/why-you-need-to-normalize-unicode-strings.html).
 ## AGROVOC Validation
 You can enable validation of metadata values in certain fields against the AGROVOC REST API with the `--agrovoc-fields` option. For example, in addition to agricultural subjects, many countries and regions are also present AGROVOC. Enable this validation by specifying a comma-separated list of fields:
@@ -92,8 +107,10 @@ This currently uses the [Python langid](https://github.com/saffsd/langid.py) lib
 - Validate DOIs? Normalize to https://doi.org format? Or use just the DOI part: 10.1016/j.worlddev.2010.06.006
 - Warn if two items use the same file in `filename` column
 - Add an option to drop invalid AGROVOC subjects?
 - Add check for author names with incorrect spacing after commas, ie "Orth,Alan S."
 - Add tests for application invocation, ie `tests/test_app.py`?
 - Validate ISSNs or journal titles against CrossRef API?
 - Better ISO 8601 date parsing (currently only supports simple dates, perhaps we need to use dateutil.parser.parseiso())
 - Fix lazy date check (assumes field name has "date" but could be dcterms.issued etc!)
 ## License
 This work is licensed under the [GPLv3](https://www.gnu.org/licenses/gpl-3.0.en.html).
--- a/csv_metadata_quality/app.py
+++ b/csv_metadata_quality/app.py
@@ -21,7 +21,8 @@ def parse_args(argv):
    parser.add_argument(
        "--experimental-checks",
        "-e",
-        help="Enable experimental checks like language detection", action="store_true"
+        help="Enable experimental checks like language detection",
        action="store_true",
    )
    parser.add_argument(
        "--input-file",
@@ -81,7 +82,7 @@ def run(argv):
                continue
        # Fix: whitespace
-        df[column] = df[column].apply(fix.whitespace)
+        df[column] = df[column].apply(fix.whitespace, field_name=column)
        # Fix: newlines
        if args.unsafe_fixes:
@@ -94,23 +95,28 @@ def run(argv):
            if match is not None:
                df[column] = df[column].apply(fix.comma_space, field_name=column)
        # Fix: perform Unicode normalization (NFC) to convert decomposed
        # characters into their canonical forms.
        if args.unsafe_fixes:
            df[column] = df[column].apply(fix.normalize_unicode, field_name=column)
        # Fix: unnecessary Unicode
        df[column] = df[column].apply(fix.unnecessary_unicode)
-        # Check: invalid multi-value separator
+        # Check: invalid and unnecessary multi-value separators
-        df[column] = df[column].apply(check.separators)
+        df[column] = df[column].apply(check.separators, field_name=column)
        # Check: suspicious characters
        df[column] = df[column].apply(check.suspicious_characters, field_name=column)
-        # Fix: invalid multi-value separator
+        # Fix: invalid and unnecessary multi-value separators
        if args.unsafe_fixes:
-            df[column] = df[column].apply(fix.separators)
+            df[column] = df[column].apply(fix.separators, field_name=column)
            # Run whitespace fix again after fixing invalid separators
-            df[column] = df[column].apply(fix.whitespace)
+            df[column] = df[column].apply(fix.whitespace, field_name=column)
        # Fix: duplicate metadata values
-        df[column] = df[column].apply(fix.duplicates)
+        df[column] = df[column].apply(fix.duplicates, field_name=column)
        # Check: invalid AGROVOC subject
        if args.agrovoc_fields:
--- a/csv_metadata_quality/check.py
+++ b/csv_metadata_quality/check.py
@@ -1,4 +1,9 @@
 from datetime import datetime, timedelta
 import pandas as pd
 import requests
 import requests_cache
 from pycountry import languages
 def issn(field):
@@ -51,8 +56,12 @@ def isbn(field):
    return field
-def separators(field):
+def separators(field, field_name):
-    """Check for invalid multi-value separators (ie "|" or "|||").
+    """Check for invalid and unnecessary multi-value separators, for example:
        value|value
        value|||value
        value||value||
    Prints the field with the invalid multi-value separator.
    """
@@ -65,12 +74,18 @@ def separators(field):
    # Try to split multi-value field on "||" separator
    for value in field.split("||"):
        # Check if the current value is blank
        if value == "":
            print(f"Unnecessary multi-value separator ({field_name}): {field}")
            continue
        # After splitting, see if there are any remaining "|" characters
        match = re.findall(r"^.*?\|.*$", value)
        # Check if there was a match
        if match:
-            print(f"Invalid multi-value separator: {field}")
+            print(f"Invalid multi-value separator ({field_name}): {field}")
    return field
@@ -85,7 +100,6 @@ def date(field, field_name):
    Prints the date if invalid.
    """
    from datetime import datetime
    if pd.isna(field):
        print(f"Missing date ({field_name}).")
@@ -121,6 +135,14 @@ def date(field, field_name):
        # Check if date is valid YYYY-MM-DD format
        datetime.strptime(field, "%Y-%m-%d")
        return field
    except ValueError:
        pass
    try:
        # Check if date is valid YYYY-MM-DDTHH:MM:SSZ format
        datetime.strptime(field, "%Y-%m-%dT%H:%M:%SZ")
        return field
    except ValueError:
        print(f"Invalid date ({field_name}): {field}")
@@ -170,8 +192,6 @@ def language(field):
    Prints the value if it is invalid.
    """
    from pycountry import languages
    # Skip fields with missing values
    if pd.isna(field):
        return
@@ -213,30 +233,23 @@ def agrovoc(field, field_name):
    Prints a warning if the value is invalid.
    """
    from datetime import timedelta
    import requests
    import requests_cache
    # Skip fields with missing values
    if pd.isna(field):
        return
    # enable transparent request cache with thirty days expiry
    expire_after = timedelta(days=30)
    requests_cache.install_cache("agrovoc-response-cache", expire_after=expire_after)
    # prune old cache entries
    requests_cache.core.remove_expired_responses()
    # Try to split multi-value field on "||" separator
    for value in field.split("||"):
-        request_url = (
+        request_url = "http://agrovoc.uniroma2.it/agrovoc/rest/v1/agrovoc/search"
-            f"http://agrovoc.uniroma2.it/agrovoc/rest/v1/agrovoc/search?query={value}"
+        request_params = {"query": value}
        )
-        # enable transparent request cache with thirty days expiry
+        request = requests.get(request_url, params=request_params)
        expire_after = timedelta(days=30)
        requests_cache.install_cache(
            "agrovoc-response-cache", expire_after=expire_after
        )
        request = requests.get(request_url)
        # prune old cache entries
        requests_cache.core.remove_expired_responses()
        if request.status_code == requests.codes.ok:
            data = request.json()
--- a/csv_metadata_quality/fix.py
+++ b/csv_metadata_quality/fix.py
@@ -1,9 +1,12 @@
 import re
 from unicodedata import normalize
 import pandas as pd
 from csv_metadata_quality.util import is_nfc
-def whitespace(field):
+
 def whitespace(field, field_name):
    """Fix whitespace issues.
    Return string with leading, trailing, and consecutive whitespace trimmed.
@@ -26,7 +29,7 @@ def whitespace(field):
        match = re.findall(pattern, value)
        if match:
-            print(f"Excessive whitespace: {value}")
+            print(f"Removing excessive whitespace ({field_name}): {value}")
            value = re.sub(pattern, " ", value)
        # Save cleaned value
@@ -38,8 +41,15 @@ def whitespace(field):
    return new_field
-def separators(field):
+def separators(field, field_name):
-    """Fix for invalid multi-value separators (ie "|")."""
+    """Fix for invalid and unnecessary multi-value separators, for example:
        value|value
        value|||value
        value||value||
    Prints the field with the invalid multi-value separator.
    """
    # Skip fields with missing values
    if pd.isna(field):
@@ -50,12 +60,18 @@ def separators(field):
    # Try to split multi-value field on "||" separator
    for value in field.split("||"):
        # Check if the value is blank and skip it
        if value == "":
            print(f"Fixing unnecessary multi-value separator ({field_name}): {field}")
            continue
        # After splitting, see if there are any remaining "|" characters
        pattern = re.compile(r"\|")
        match = re.findall(pattern, value)
        if match:
-            print(f"Fixing invalid multi-value separator: {value}")
+            print(f"Fixing invalid multi-value separator ({field_name}): {value}")
            value = re.sub(pattern, "||", value)
@@ -74,10 +90,10 @@ def unnecessary_unicode(field):
    Removes unnecessary Unicode characters like:
        - Zero-width space (U+200B)
        - Replacement character (U+FFFD)
        - No-break space (U+00A0)
    Replaces unnecessary Unicode characters like:
        - Soft hyphen (U+00AD) → hyphen
        - No-break space (U+00A0) → space
    Return string with characters removed or replaced.
    """
@@ -107,8 +123,8 @@ def unnecessary_unicode(field):
    match = re.findall(pattern, field)
    if match:
-        print(f"Removing unnecessary Unicode (U+00A0): {field}")
+        print(f"Replacing unnecessary Unicode (U+00A0): {field}")
-        field = re.sub(pattern, "", field)
+        field = re.sub(pattern, " ", field)
    # Check for soft hyphens (U+00AD), sometimes preceeded with a normal hyphen
    pattern = re.compile(r"\u002D*?\u00AD")
@@ -121,7 +137,7 @@ def unnecessary_unicode(field):
    return field
-def duplicates(field):
+def duplicates(field, field_name):
    """Remove duplicate metadata values."""
    # Skip fields with missing values
@@ -140,7 +156,7 @@ def duplicates(field):
        if value not in new_values:
            new_values.append(value)
        else:
-            print(f"Dropping duplicate value: {value}")
+            print(f"Removing duplicate value ({field_name}): {value}")
    # Create a new field consisting of all values joined with "||"
    new_field = "||".join(new_values)
@@ -201,3 +217,24 @@ def comma_space(field, field_name):
        field = re.sub(r",(\w)", r", \1", field)
    return field
 def normalize_unicode(field, field_name):
    """Fix occurrences of decomposed Unicode characters by normalizing them
    with NFC to their canonical forms, for example:
    Ouédraogo, Mathieu → Ouédraogo, Mathieu
    Return normalized string.
    """
    # Skip fields with missing values
    if pd.isna(field):
        return
    # Check if the current string is using normalized Unicode (NFC)
    if not is_nfc(field):
        print(f"Normalizing Unicode ({field_name}): {field}")
        field = normalize("NFC", field)
    return field
--- a/csv_metadata_quality/util.py
+++ b/csv_metadata_quality/util.py
@@ -0,0 +1,14 @@
 def is_nfc(field):
    """Utility function to check whether a string is using normalized Unicode.
    Python's built-in unicodedata library has the is_normalized() function, but
    it was only introduced in Python 3.8. By using a simple utility function we
    are able to run on Python >= 3.6 again.
    See: https://docs.python.org/3/library/unicodedata.html
    Return boolean.
    """
    from unicodedata import normalize
    return field == normalize("NFC", field)
--- a/csv_metadata_quality/version.py
+++ b/csv_metadata_quality/version.py
@@ -1 +1 @@
-VERSION = "0.3.0"
+VERSION = "0.4.3"
--- a/data/test.csv
+++ b/data/test.csv
@@ -1,4 +1,4 @@
-dc.title,birthdate,dc.identifier.issn,dc.identifier.isbn,dc.language.iso,dc.subject,cg.coverage.country,filename
+dc.title,dc.date.issued,dc.identifier.issn,dc.identifier.isbn,dc.language.iso,dc.subject,cg.coverage.country,filename
 Leading space,2019-07-29,,,,,,
 Trailing space ,2019-07-29,,,,,,
 Excessive  space,2019-07-29,,,,,,
@@ -26,3 +26,6 @@ Unneccesary unicode (U+002D + U+00AD),2019-08-10,,978-92-9043-823-6,,,,
 "Missing space,after comma",2019-08-27,,,,,,
 Incorrect ISO 639-1 language,2019-09-26,,,es,,,
 Incorrect ISO 639-3 language,2019-09-26,,,spa,,,
 Composéd Unicode,2020-01-14,,,,,,
 Decomposéd Unicode,2020-01-14,,,,,,
 Unnecessary multi-value separator,2021-01-03,0378-5955||,,,,,
--- a/poetry.lock
+++ b/poetry.lock
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -0,0 +1,31 @@
 [tool.poetry]
 name = "csv-metadata-quality"
 version = "0.4.3"
 description="A simple, but opinionated CSV quality checking and fixing pipeline for CSVs in the DSpace ecosystem."
 authors = ["Alan Orth <alan.orth@gmail.com>"]
 license="GPL-3.0-only"
 repository = "https://github.com/ilri/csv-metadata-quality"
 homepage = "https://github.com/ilri/csv-metadata-quality"
 [tool.poetry.dependencies]
 python = "^3.8"
 pandas = "^1.0.4"
 python-stdnum = "^1.13"
 xlrd = "^1.2.0"
 requests = "^2.23.0"
 requests-cache = "^0.5.2"
 pycountry = "^19.8.18"
 langid = "^1.1.6"
 [tool.poetry.dev-dependencies]
 pytest = "^6.1.1"
 ipython = { version = "^7.18.1", python = "^3.7" }
 flake8 = "^3.8.4"
 pytest-clarity = "^0.3.0-alpha.0"
 black = "20.8b1"
 isort = "^5.5.4"
 csvkit = "^1.0.5"
 [build-system]
 requires = ["poetry>=0.12"]
 build-backend = "poetry.masonry.api"
--- a/pytest.ini
+++ b/pytest.ini
@@ -1,5 +1,5 @@
 [pytest]
-addopts= -rsxX -s -v --strict --capture=sys
+addopts= -rsxX -s -v --strict-markers --capture=sys
 filterwarnings =
    error::UserWarning
    ignore:.*U.* is deprecated:DeprecationWarning
--- a/requirements-dev.txt
+++ b/requirements-dev.txt
@@ -1,57 +1,71 @@
-i https://pypi.org/simple
+agate-dbf==0.2.2
 agate-dbf==0.2.1
 agate-excel==0.2.3
-agate-sql==0.5.4
+agate-sql==0.5.5
 agate==1.6.1
-appdirs==1.4.3
+appdirs==1.4.4; python_version >= "3.6"
-atomicwrites==1.3.0
+appnope==0.1.2; python_version >= "3.7" and python_version < "4.0" and sys_platform == "darwin"
-attrs==19.1.0
+atomicwrites==1.4.0; python_version >= "3.6" and python_full_version < "3.0.0" and sys_platform == "win32" and (python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.4.0" and python_version >= "3.6") or sys_platform == "win32" and python_version >= "3.6" and python_full_version >= "3.4.0" and (python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.4.0" and python_version >= "3.6")
-babel==2.7.0
+attrs==20.3.0; python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.4.0" and python_version >= "3.6"
-backcall==0.1.0
+babel==2.9.0; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.4.0"
-black==19.3b0
+backcall==0.2.0; python_version >= "3.7" and python_version < "4.0"
-click==7.0
+black==20.8b1; python_version >= "3.6"
-csvkit==1.0.4
+certifi==2020.12.5; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.5.0"
 chardet==4.0.0; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.5.0"
 click==7.1.2; python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.5.0" and python_version >= "3.6"
 colorama==0.4.4; python_version >= "3.7" and python_full_version < "3.0.0" and sys_platform == "win32" and python_version < "4.0" and (python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.4.0" and python_version >= "3.6") or sys_platform == "win32" and python_version >= "3.7" and python_full_version >= "3.5.0" and python_version < "4.0" and (python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.4.0" and python_version >= "3.6")
 csvkit==1.0.5
 dbfread==2.0.7
-decorator==4.4.0
+decorator==4.4.2; python_version >= "3.7" and python_full_version < "3.0.0" and python_version < "4.0" or python_version >= "3.7" and python_version < "4.0" and python_full_version >= "3.2.0"
-entrypoints==0.3
+et-xmlfile==1.0.1; python_version >= "3.6"
-et-xmlfile==1.0.1
+flake8==3.8.4; (python_version >= "2.7" and python_full_version < "3.0.0") or (python_full_version >= "3.4.0")
-flake8==3.7.8
+idna==2.10; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.5.0"
-future==0.17.1
+iniconfig==1.1.1; python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.4.0" and python_version >= "3.6"
-importlib-metadata==0.23 ; python_version < '3.8'
+ipython-genutils==0.2.0; python_version >= "3.7" and python_version < "4.0"
-ipython-genutils==0.2.0
+ipython==7.20.0; python_version >= "3.7" and python_version < "4.0"
 ipython==7.8.0
 isodate==0.6.0
-isort==4.3.21
+isort==5.7.0; python_version >= "3.6" and python_version < "4.0"
-jdcal==1.4.1
+jdcal==1.4.1; python_version >= "3.6"
-jedi==0.15.1
+jedi==0.18.0; python_version >= "3.7" and python_version < "4.0"
 langid==1.1.6
 leather==0.3.3
-mccabe==0.6.1
+mccabe==0.6.1; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.4.0"
-more-itertools==7.2.0
+mypy-extensions==0.4.3; python_version >= "3.6"
-openpyxl==3.0.0
+numpy==1.20.0; python_version >= "3.7" and python_full_version >= "3.7.1"
-packaging==19.2
+openpyxl==3.0.6; python_version >= "3.6"
-parsedatetime==2.4
+packaging==20.9; python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.4.0" and python_version >= "3.6"
-parso==0.5.1
+pandas==1.2.1; python_full_version >= "3.7.1"
-pexpect==4.7.0 ; sys_platform != 'win32'
+parsedatetime==2.6
-pickleshare==0.7.5
+parso==0.8.1; python_version >= "3.7" and python_version < "4.0"
-pluggy==0.13.0
+pathspec==0.8.1; python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.5.0" and python_version >= "3.6"
-prompt-toolkit==2.0.9
+pexpect==4.8.0; python_version >= "3.7" and python_version < "4.0" and sys_platform != "win32"
-ptyprocess==0.6.0
+pickleshare==0.7.5; python_version >= "3.7" and python_version < "4.0"
-py==1.8.0
+pluggy==0.13.1; python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.4.0" and python_version >= "3.6"
-pycodestyle==2.5.0
+prompt-toolkit==3.0.14; python_version >= "3.7" and python_version < "4.0" and python_full_version >= "3.6.1"
-pyflakes==2.1.1
+ptyprocess==0.7.0; python_version >= "3.7" and python_version < "4.0" and sys_platform != "win32"
-pygments==2.4.2
+py==1.10.0; python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.4.0" and python_version >= "3.6"
-pyparsing==2.4.2
+pycodestyle==2.6.0; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.4.0"
-pytest-clarity==0.2.0a1
+pycountry==19.8.18
-pytest==5.1.3
+pyflakes==2.2.0; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.4.0"
-python-slugify==3.0.4
+pygments==2.7.4; python_version >= "3.7" and python_version < "4.0"
 pyparsing==2.4.7; python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.4.0" and python_version >= "3.6"
 pytest-clarity==0.3.0a0; (python_version >= "2.7" and python_full_version < "3.0.0") or (python_full_version >= "3.4.0")
 pytest==6.2.2; python_version >= "3.6"
 python-dateutil==2.8.1; python_full_version >= "3.7.1"
 python-slugify==4.0.1; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.5.0"
 python-stdnum==1.15
 pytimeparse==1.1.8
-pytz==2019.2
+pytz==2021.1; python_full_version >= "3.7.1"
-six==1.12.0
+regex==2020.11.13; python_version >= "3.6"
-sqlalchemy==1.3.8
+requests-cache==0.5.2
-termcolor==1.1.0
+requests==2.25.1; (python_version >= "2.7" and python_full_version < "3.0.0") or (python_full_version >= "3.5.0")
-text-unidecode==1.3
+six==1.15.0; python_full_version >= "3.7.1"
-toml==0.10.0
+sqlalchemy==1.3.23; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.4.0"
-traitlets==4.3.3.dev0
+termcolor==1.1.0; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.4.0"
-wcwidth==0.1.7
+text-unidecode==1.3; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.5.0"
-xlrd==1.2.0
+toml==0.10.2; python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.4.0" and python_version >= "3.6"
-zipp==0.6.0
+traitlets==5.0.5; python_version >= "3.7" and python_version < "4.0"
 typed-ast==1.4.2; python_version >= "3.6"
 typing-extensions==3.7.4.3; python_version >= "3.6"
 urllib3==1.26.3; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.5.0" and python_version < "4"
 wcwidth==0.2.5; python_version >= "3.7" and python_version < "4.0" and python_full_version >= "3.6.1"
 xlrd==1.2.0; (python_version >= "2.7" and python_full_version < "3.0.0") or (python_full_version >= "3.4.0")
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,17 +1,15 @@
-i https://pypi.org/simple
+certifi==2020.12.5; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.5.0"
-e .
+chardet==4.0.0; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.5.0"
-certifi==2019.9.11
+idna==2.10; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.5.0"
 chardet==3.0.4
 idna==2.8
 langid==1.1.6
-numpy==1.17.2
+numpy==1.20.0; python_version >= "3.7" and python_full_version >= "3.7.1"
-pandas==0.25.1
+pandas==1.2.1; python_full_version >= "3.7.1"
 pycountry==19.8.18
-python-dateutil==2.8.0
+python-dateutil==2.8.1; python_full_version >= "3.7.1"
-python-stdnum==1.11
+python-stdnum==1.15
-pytz==2019.2
+pytz==2021.1; python_full_version >= "3.7.1"
 requests-cache==0.5.2
-requests==2.22.0
+requests==2.25.1; (python_version >= "2.7" and python_full_version < "3.0.0") or (python_full_version >= "3.5.0")
-six==1.12.0
+six==1.15.0; python_full_version >= "3.7.1"
-urllib3==1.25.6
+urllib3==1.26.3; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.5.0" and python_version < "4"
-xlrd==1.2.0
+xlrd==1.2.0; (python_version >= "2.7" and python_full_version < "3.0.0") or (python_full_version >= "3.4.0")
--- a/setup.py
+++ b/setup.py
@@ -4,17 +4,17 @@ with open("README.md", "r") as fh:
    long_description = fh.read()
 install_requires = [
-    'pandas',
+    "pandas",
-    'python-stdnum',
+    "python-stdnum",
-    'requests',
+    "requests",
-    'requests-cache',
+    "requests-cache",
-    'pycountry',
+    "pycountry",
-    'langid'
+    "langid",
 ]
 setuptools.setup(
    name="csv-metadata-quality",
-    version="0.3.0",
+    version="0.4.3",
    author="Alan Orth",
    author_email="aorth@mjanja.ch",
    description="A simple, but opinionated CSV quality checking and fixing pipeline for CSVs in the DSpace ecosystem.",
@@ -23,17 +23,16 @@ setuptools.setup(
    long_description_content_type="text/markdown",
    url="https://github.com/alanorth/csv-metadata-quality",
    classifiers=[
        "Programming Language :: Python :: 3.6",
        "Programming Language :: Python :: 3.7",
        "Programming Language :: Python :: 3.8",
        "Programming Language :: Python :: 3.9",
        "License :: OSI Approved :: GNU General Public License v3 (GPLv3)",
        "Operating System :: OS Independent",
-        "Development Status :: 4 - Beta"
+        "Development Status :: 4 - Beta",
    ],
-    packages=['csv_metadata_quality'],
+    packages=["csv_metadata_quality"],
    entry_points={
-        'console_scripts': [
+        "console_scripts": ["csv-metadata-quality = csv_metadata_quality.__main__:main"]
            'csv-metadata-quality = csv_metadata_quality.__main__:main'
        ]
    },
-    install_requires=install_requires
+    install_requires=install_requires,
 )
--- a/tests/test_check.py
+++ b/tests/test_check.py
@@ -1,6 +1,7 @@
 import pandas as pd
 import csv_metadata_quality.check as check
 import csv_metadata_quality.experimental as experimental
 import pandas as pd
 def test_check_invalid_issn(capsys):
@@ -50,10 +51,27 @@ def test_check_invalid_separators(capsys):
    value = "Alan|Orth"
-    check.separators(value)
+    field_name = "dc.contributor.author"
    check.separators(value, field_name)
    captured = capsys.readouterr()
-    assert captured.out == f"Invalid multi-value separator: {value}\n"
+    assert captured.out == f"Invalid multi-value separator ({field_name}): {value}\n"
 def test_check_unnecessary_separators(capsys):
    """Test checking unnecessary multi-value separators."""
    field = "Alan||Orth||"
    field_name = "dc.contributor.author"
    check.separators(field, field_name)
    captured = capsys.readouterr()
    assert (
        captured.out == f"Unnecessary multi-value separator ({field_name}): {field}\n"
    )
 def test_check_valid_separators():
@@ -61,7 +79,9 @@ def test_check_valid_separators():
    value = "Alan||Orth"
-    result = check.separators(value)
+    field_name = "dc.contributor.author"
    result = check.separators(value, field_name)
    assert result == value
--- a/tests/test_fix.py
+++ b/tests/test_fix.py
@@ -6,7 +6,9 @@ def test_fix_leading_whitespace():
    value = " Alan"
-    assert fix.whitespace(value) == "Alan"
+    field_name = "dc.contributor.author"
    assert fix.whitespace(value, field_name) == "Alan"
 def test_fix_trailing_whitespace():
@@ -14,7 +16,9 @@ def test_fix_trailing_whitespace():
    value = "Alan "
-    assert fix.whitespace(value) == "Alan"
+    field_name = "dc.contributor.author"
    assert fix.whitespace(value, field_name) == "Alan"
 def test_fix_excessive_whitespace():
@@ -22,7 +26,9 @@ def test_fix_excessive_whitespace():
    value = "Alan  Orth"
-    assert fix.whitespace(value) == "Alan Orth"
+    field_name = "dc.contributor.author"
    assert fix.whitespace(value, field_name) == "Alan Orth"
 def test_fix_invalid_separators():
@@ -30,7 +36,19 @@ def test_fix_invalid_separators():
    value = "Alan|Orth"
-    assert fix.separators(value) == "Alan||Orth"
+    field_name = "dc.contributor.author"
    assert fix.separators(value, field_name) == "Alan||Orth"
 def test_fix_unnecessary_separators():
    """Test fixing unnecessary multi-value separators."""
    field = "Alan||Orth||"
    field_name = "dc.contributor.author"
    assert fix.separators(field, field_name) == "Alan||Orth"
 def test_fix_unnecessary_unicode():
@@ -46,7 +64,9 @@ def test_fix_duplicates():
    value = "Kenya||Kenya"
-    assert fix.duplicates(value) == "Kenya"
+    field_name = "dc.contributor.author"
    assert fix.duplicates(value, field_name) == "Kenya"
 def test_fix_newlines():
@@ -66,3 +86,25 @@ def test_fix_comma_space():
    field_name = "dc.contributor.author"
    assert fix.comma_space(value, field_name) == "Orth, Alan S."
 def test_fix_normalized_unicode():
    """Test fixing a string that is already in its normalized (NFC) Unicode form."""
    # string using the normalized canonical form of é
    value = "Ouédraogo, Mathieu"
    field_name = "dc.contributor.author"
    assert fix.normalize_unicode(value, field_name) == "Ouédraogo, Mathieu"
 def test_fix_decomposed_unicode():
    """Test fixing a string that contains Unicode string."""
    # string using the decomposed form of é
    value = "Ouédraogo, Mathieu"
    field_name = "dc.contributor.author"
    assert fix.normalize_unicode(value, field_name) == "Ouédraogo, Mathieu"
Author	SHA1	Message	Date
Alan Orth	9f5d2c2c4f	poetry.lock: Run poetry update All checks were successful continuous-integration/drone/push Build is passing Details	2021-02-15 15:13:12 +02:00
Alan Orth	202abf140c	CHANGELOG.md: Add note about poetry All checks were successful continuous-integration/drone/push Build is passing Details	2021-02-04 21:48:12 +02:00
Alan Orth	0cd6d3dfe6	Update requirements Generated with poetry export: $ poetry export --without-hashes -f requirements.txt > requirements.txt $ poetry export --without-hashes --dev -f requirements.txt > requirements-dev.txt I am trying `--without-hashes` to work around an error on pip install when running in CI: ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==.	2021-02-04 21:46:49 +02:00
Alan Orth	a458beac55	poetry.lock: Run poetry update	2021-02-04 21:45:30 +02:00
Alan Orth	e62ecb0a8f	CHANGELOG.md: Add note about new date format	2021-02-04 21:43:44 +02:00
Alan Orth	de92f32ab6	csv_metadata_quality/check.py: More date formats We should also allow ISO 8601 extended in combined date and time format. DSpace does not have a problem with dates in this format and I have found some metadata that uses this date format. For example: 2020-08-31T11:04:56Z See: https://en.wikipedia.org/wiki/ISO_8601	2021-02-04 21:39:14 +02:00
Alan Orth	dbbbc0944a	README.md: Add handle to citation All checks were successful continuous-integration/drone/push Build is passing Details	2021-01-27 10:33:37 +02:00
Alan Orth	d17bf3033c	README.md: Add citation	2021-01-27 10:32:26 +02:00
Alan Orth	2ec52f1b73	README.md: Update description All checks were successful continuous-integration/drone/push Build is passing Details	2021-01-26 15:43:41 +02:00
Alan Orth	aa1abf15a7	README.md: Adjust title	2021-01-26 15:35:21 +02:00
Alan Orth	cbf94490f2	Version 0.4.3	2021-01-26 15:22:40 +02:00
Alan Orth	f3d0d5ef07	setup.py: Remove Python 3.6 I actually removed Python 3.6 support a few weeks ago after updating to Pandas 1.2.0, but forgot to update this.	2021-01-26 15:22:08 +02:00
Alan Orth	4b7b99c94c	CHANGELOG.md: Add note about multi-value separators	2021-01-26 15:20:22 +02:00
Alan Orth	df670e81b9	README.md: Use badge from my Drone CI All checks were successful continuous-integration/drone/push Build is passing Details I'm not using SourceHut anymore.	2021-01-26 14:38:50 +02:00
Alan Orth	ae357d8c6c	Revert "Update requirements" This reverts commit `ca80340f7a`. Nope, we still need the --without-hashes because this still fails on Python 3.7, but not 3.8 or 3.9. From looking around it seems that nobody can agree whether poetry should handle this, pip should handle it, or upstream projects should pin their dependencies.	2021-01-26 14:15:31 +02:00
Alan Orth	ca80340f7a	Update requirements Some checks failed continuous-integration/drone/push Build is failing Details Generated with poetry export: $ poetry export -f requirements.txt > requirements.txt $ poetry export --dev -f requirements.txt > requirements-dev.txt Trying to see if we no longer need --without-hashes since we don't support Python 3.6 anymore.	2021-01-26 11:46:05 +02:00
Alan Orth	cc1743b86d	Remove .build.yml I will just use GitHub Actions and Drone.	2021-01-26 11:41:30 +02:00
Alan Orth	bcb9885c6b	Update requirements Generated with poetry export: $ poetry export --without-hashes -f requirements.txt > requirements.txt $ poetry export --without-hashes --dev -f requirements.txt > requirements-dev.txt I am trying `--without-hashes` to work around an error on pip install when running on Python 3.6 in Travis: ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==.	2021-01-26 10:36:48 +02:00
Alan Orth	b484b75178	poetry.lock: Run poetry update	2021-01-26 10:36:04 +02:00
Alan Orth	d3880a9dfa	Remove Python 3.6 support All checks were successful continuous-integration/drone/push Build is passing Details Pandas 1.2.0 apparently requires Python 3.7.1+.	2021-01-03 15:51:53 +02:00
Alan Orth	7edb8b19d7	tests/test_check.py: Reformat with black	2021-01-03 15:50:21 +02:00
Alan Orth	a6709c7f82	Update requirements Some checks failed continuous-integration/drone/push Build is failing Details Generated with poetry export: $ poetry export --without-hashes -f requirements.txt > requirements.txt $ poetry export --without-hashes --dev -f requirements.txt > requirements-dev.txt I am trying `--without-hashes` to work around an error on pip install when running on Python 3.6 in Travis: ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==.	2021-01-03 15:42:00 +02:00
Alan Orth	d489ea4609	poetry.lock: Run poetry update	2021-01-03 15:41:08 +02:00
Alan Orth	96634cbb67	pytest.ini: Change --strict to --strict-markers This is deprecated since pytest 6.2.0. See: https://docs.pytest.org/en/stable/deprecations.html#the-strict-command-line-option	2021-01-03 15:40:14 +02:00
Alan Orth	29e67a0887	Add tests for unnecessary multi-value separators	2021-01-03 15:37:18 +02:00
Alan Orth	32cea2055f	data/test.csv: Add unnecessary multi-value separator	2021-01-03 15:33:04 +02:00
Alan Orth	0dc66c5c4e	Expand check/fix for multi-value separators I just came across some metadata that had unnecessary multi-value separators at the end of a field, causing a blank value to be used. For example: "Kenya\|\|Tanzania\|\|"	2021-01-03 15:30:03 +02:00
Alan Orth	c26ad83534	.github: Test CLI invocation	2020-12-14 23:47:09 +02:00
Alan Orth	72ca9d99bf	setup.py: Add Python 3.9 [SKIP CI]	2020-12-14 23:44:35 +02:00
Alan Orth	ae33a9b793	Add .drone.yml	2020-12-14 23:42:23 +02:00
Alan Orth	fc0367bfc8	README.md: Update note about Python version	2020-12-08 10:52:24 +02:00
Alan Orth	e33b285034	README.md: Add GitHub Actions badge	2020-12-08 10:48:31 +02:00
Alan Orth	349fca03b8	.github/workflows/python-app.yml: Rename This name is displayed in the badge so it should be something more relevant.	2020-12-08 10:46:39 +02:00
Alan Orth	52d8904870	Remove .travis.yml They changed their free tier and I might as well use GitHub Actions for ILRI stuff anyways.	2020-12-08 10:41:36 +02:00
Alan Orth	971c69e535	Create python-app.yml Try GitHub Actions for Python 3.8 using GitHub's Python example.	2020-12-08 10:38:52 +02:00
Alan Orth	f8cc233e25	.travis.yml: Use Amazon Graviton2 ARM environment These are the new hotness and should have faster build times. See: https://blog.travis-ci.com/2020-09-11-arm-on-aws	2020-12-06 10:49:03 +02:00
Alan Orth	aa7b7a9592	Update requirements Generated with poetry export: $ poetry export --without-hashes -f requirements.txt > requirements.txt $ poetry export --without-hashes --dev -f requirements.txt > requirements-dev.txt I am trying `--without-hashes` to work around an error on pip install when running on Python 3.6 in Travis: ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==.	2020-11-03 07:42:45 +02:00
Alan Orth	57b455bde7	poetry.lock: Run poetry update	2020-11-03 07:40:56 +02:00
Alan Orth	23b95fa368	.travis.yml: Use Ubuntu 20.04 "Focal" environment	2020-10-29 00:14:54 +03:00
Alan Orth	6985f76aa3	.travis.yml: Bump Python versions Test Python 3.9 now that it was released, and allow tests to fail on nightly builds.	2020-10-29 00:14:36 +03:00
Alan Orth	98a6a19e12	Update requirements-dev.txt Generated with poetry export: $ poetry export --dev -f requirements.txt > requirements-dev.txt	2020-10-06 17:48:46 +03:00
Alan Orth	f4914c414f	Only install ipython on Python 3.7+	2020-10-06 17:48:16 +03:00
Alan Orth	d352fe8017	Update requirements Generated with poetry export: $ poetry export -f requirements.txt > requirements.txt $ poetry export --dev -f requirements.txt > requirements-dev.txt	2020-10-06 17:21:33 +03:00
Alan Orth	f13c360084	Update poetry package dependencies	2020-10-06 17:20:16 +03:00
Alan Orth	7cfd4c0b59	csv_metadata_quality: Move scoped imports to global According to PEP8 we should avoid scoped imports unless you have a good reason. Here there are two cases where we do (issn and isbn), but I will move the others to the global scope.	2020-10-06 17:11:39 +03:00
Alan Orth	826509ddcf	poetry.lock: Run poetry update List of updated modules: - Updating numpy (1.19.1 -> 1.19.2) - Updating pygments (2.6.1 -> 2.7.1) - Updating pandas (1.1.1 -> 1.1.2) All tests still pass according to pytest.	2020-09-26 12:18:23 +03:00
Alan Orth	22b5c0f7a1	CHANGELOG.md: Add note about dependencies update	2020-09-08 15:04:40 +03:00
Alan Orth	774e274b32	poetry.lock: Run poetry update Update dependencies to latest version: - Updating attrs (19.3.0 -> 20.2.0) - Updating more-itertools (8.4.0 -> 8.5.0) - Updating openpyxl (3.0.4 -> 3.0.5) - Updating parso (0.7.0 -> 0.7.1) - Updating sqlalchemy (1.3.18 -> 1.3.19) - Updating urllib3 (1.25.9 -> 1.25.10) - Updating agate-dbf (0.2.1 -> 0.2.2) - Updating agate-sql (0.5.4 -> 0.5.5) - Updating jedi (0.17.1 -> 0.17.2) - Updating numpy (1.19.0 -> 1.19.1) - Updating prompt-toolkit (3.0.5 -> 3.0.7) - Updating regex (2020.6.8 -> 2020.7.14) - Updating traitlets (4.3.3 -> 5.0.4) - Updating ipython (7.16.1 -> 7.18.1) - Updating pandas (1.0.5 -> 1.1.1) - Updating python-stdnum (1.13 -> 1.14) All tests still pass according to pytest.	2020-09-08 15:04:00 +03:00
Alan Orth	db474a802f	README.md: Use badge from travis-ci.com	2020-08-04 11:12:28 +03:00
Alan Orth	e241f8461b	CHANGELOG.md: Add notes	2020-07-06 14:10:46 +03:00
Alan Orth	431e6331c8	csv_metadata_quality/check.py: Format with black	2020-07-06 14:10:19 +03:00
Alan Orth	cb07d357d4	Version 0.4.2	2020-07-06 14:04:34 +03:00
Alan Orth	65cd48a26f	CHANGELOG.md: Update changes	2020-07-06 14:00:21 +03:00
Alan Orth	0f883f640c	Remove pipenv	2020-07-06 13:59:49 +03:00
Alan Orth	f4c5c5781e	README.md: Switch to poetry	2020-07-06 13:59:11 +03:00
Alan Orth	6aa784ad8c	Update requirements Generated with poetry export: $ poetry export -f requirements.txt > requirements.txt $ poetry export --dev -f requirements.txt > requirements-dev.txt	2020-07-06 13:57:07 +03:00
Alan Orth	7b8da94f41	poetry.lock: Update Python dependencies	2020-07-06 13:56:31 +03:00
Alan Orth	2a1566af62	csv_metadata_quality/check.py: Parameterize AGROVOC request	2020-07-06 13:44:46 +03:00
Alan Orth	5fcaa63bd5	csv_metadata_quality/check.py: Prune requests cache once We only need to prune the requests cache once before using it, not for every value we check.	2020-07-06 13:42:19 +03:00
Alan Orth	aa9e23b46c	pyproject.toml: Update license specifier We need to use valid SPDX license identifiers.	2020-06-09 14:22:53 +03:00
Alan Orth	73acb1661f	Update requirements Generated with poetry export: $ poetry export -f requirements.txt > requirements.txt $ poetry export --dev -f requirements.txt > requirements-dev.txt	2020-05-31 17:51:16 +03:00
Alan Orth	2a068fddc4	.build.yml: Fix test	2020-05-31 17:44:37 +03:00
Alan Orth	c6c2f13e88	.build.yml: Fix poetry install invocation Poetry apparently installs dev dependencies by default.	2020-05-31 17:37:09 +03:00
Alan Orth	56f16e37ed	.build.yml: Use poetry in SourceHut CI	2020-05-31 17:35:04 +03:00
Alan Orth	0c44b967b6	Add poetry project file and lock I want to try to use poetry instead of pipenv because pipenv takes forever to do dependency resolution sometimes. Also, I have had a few issues with Python modules like black that don't have releases other than pre-releases, and even including the project itself in the dependencies (pip install -e . ...?). My initial experience is that poetry handles this better.	2020-05-31 17:33:40 +03:00
Alan Orth	8a267bb40b	.travis.yml: Try to build with Python 3.8-dev But allow failures.	2020-03-29 16:40:11 +03:00
Alan Orth	8fda8f1ef1	Pipfile.lock: Run pipenv update All tests still passing.	2020-03-20 16:22:04 +02:00
Alan Orth	5e471813e8	CHANGELOG.md: Add note about python dependencies	2020-01-29 12:41:43 +02:00
Alan Orth	79244b9ac3	Pipfile.lock: Run pipenv update	2020-01-29 12:39:12 +02:00
Alan Orth	5e81a33482	CHANGELOG.md: Add note about field names	2020-01-16 12:37:11 +02:00
Alan Orth	28b5996aa6	Output field name for more fixes and checks This helps identify which field has the error.	2020-01-16 12:35:11 +02:00
Alan Orth	40ba9bae6c	README.md: Adjust heading size	2020-01-15 12:26:11 +02:00
Alan Orth	0b2d211455	Version 0.4.1	2020-01-15 12:19:42 +02:00
Alan Orth	7f1df0b47c	Support Python 3.6 and 3.7 again	2020-01-15 12:19:17 +02:00
Alan Orth	365ecda324	Add utility function to check normalization Python's built-in unicodedata library includes the is_normalized() function starting with Python 3.8. This utility function allows us to do the same thing with earlier Python versions. See: https://docs.python.org/3/library/unicodedata.html	2020-01-15 12:17:52 +02:00
Alan Orth	550ce7fb7e	.travis.yml: Only test Python 3.8 The Unicode normalization feature requires Python 3.8 because the unicodedata.is_normalized() function only appears there. If I find another way to check if a string is normalized without normalizing it first I will drop the requirements back down to Python 3.6. See: https://docs.python.org/3/library/unicodedata.html	2020-01-15 11:57:21 +02:00
Alan Orth	705127fd28	Version 0.4.0	2020-01-15 11:44:56 +02:00
Alan Orth	894e0a196d	setup.py: Change Python requirements The `unicodedata.is_normalized()` function requires Python 3.8. See: https://docs.python.org/3/library/unicodedata.html	2020-01-15 11:43:25 +02:00
Alan Orth	87181bc7b8	Run black, isort, and flake8.	2020-01-15 11:41:31 +02:00
Alan Orth	8de5d862b6	CHANGELOG.md: Add note about Unicode normalization	2020-01-15 11:40:40 +02:00
Alan Orth	49e3543878	Add Unicode normalization This will check all strings for un-normalized Unicode characters. Normalization is done using NFC. This includes tests and updated sample data (data/test.csv). See: https://withblue.ink/2019/03/11/why-you-need-to-normalize-unicode-strings.html	2020-01-15 11:37:54 +02:00
Alan Orth	403b253762	CHANGELOG.md: Update python library versions	2020-01-15 10:58:44 +02:00
Alan Orth	c5fbaf407a	Update python requirements Generated using pipenv: $ pipenv lock -r > requirements.txt $ pipenv lock -r -d > requirements-dev.txt	2020-01-15 10:51:58 +02:00
Alan Orth	4f81f6c83c	Pipfile.lock: Run pipenv update	2020-01-15 10:51:19 +02:00
Alan Orth	4b9d1e060f	setup.py: Add Python 3.8 classifier	2019-12-14 12:56:11 +02:00
Alan Orth	c8a71e3143	Pipfile.lock: Run pipenv update	2019-12-14 12:53:39 +02:00
Alan Orth	7964d98ca5	Pipfile: Specify exact version of black Black only releases pre-release versions, which causes issues with pipenv. Instead of always running pipenv with "--pre" and potenti- ally letting in some other pre-release versions for other depende- ncies, I would rather specify the latest black version explicitly. See: https://github.com/psf/black/issues/517 See: https://github.com/microsoft/vscode-python/issues/5171	2019-12-14 12:41:28 +02:00
Alan Orth	64ffc2f1da	.travis.yml: Install packages from requirements.txt too	2019-11-14 23:42:28 +02:00
Alan Orth	7b1bc29a92	.travis.yml: Try using pip instead of pipenv The Pipfile knows it was created with Python 3.8, yet we're running with multiple Python versions on Travis. I'm curious if would work better to use pip to install dependencies instead of pipenv in this case.	2019-11-14 23:37:25 +02:00
Alan Orth	f0110d8e74	CHANGELOG.md: Add note about requirements	2019-11-14 23:30:26 +02:00
Alan Orth	86498deee8	Update python requirements Generated using pipenv: $ pipenv lock -r > requirements.txt $ pipenv lock -r -d > requirements-dev.txt	2019-11-14 23:28:42 +02:00
Alan Orth	251647a15f	CHANGELOG.md: Add TravisCI changes	2019-11-14 23:24:08 +02:00
Alan Orth	0bd28e22ec	.travis.yml: Test Python 3.8	2019-11-14 23:22:37 +02:00
Alan Orth	63fdce7d13	.travis.yml: Use Ubuntu 18.04 "Bionic"	2019-11-14 23:22:19 +02:00
Alan Orth	f068c0e16a	CHANGELOG.md: Use Python 3.8.0 for pipenv	2019-11-14 23:11:43 +02:00
Alan Orth	79b8f62a85	Use Python 3.8 for pipenv Python 3.8.0 entered Arch Linux core repositories now and all tests pass with Python 3.8.0 so it's time...	2019-11-14 23:10:20 +02:00
Alan Orth	6c1e132531	CHANGELOG.md: Add unreleased changes	2019-11-14 09:19:19 +02:00
Alan Orth	c0f3c866bd	Pipfile.lock: Run pipenv update Updates the following dependencies: - numpy 1.17.2→1.17.4 - pandas 0.25.1→0.25.3 - flake8 3.7.8→3.7.9 - pytest 5.1.3→5.2.2 - black 19.3b0→19.10b0	2019-11-14 09:17:31 +02:00
Alan Orth	36d0474b95	CHANGELOG.md: Move unreleased changes to v0.3.1	2019-10-01 17:11:52 +03:00
Alan Orth	efdc3a841a	Version 0.3.1	2019-10-01 17:11:13 +03:00
Alan Orth	fd2ba6845d	CHANGELOG.md: Update unreleased notes	2019-10-01 17:10:23 +03:00
Alan Orth	e55380b4d5	csv_metadata_quality/fix.py: Harmonize language in fix output We should always say if we're removing or replacing something.	2019-10-01 17:09:49 +03:00
Alan Orth	85ae16d9b7	CHANGELOG.md: Add note about non-breaking spaces	2019-10-01 16:56:37 +03:00
Alan Orth	c42f8b4812	csv_metadata_quality/fix.py: Replace non-breaking spaces We should be replacing non-breaking spaces (U+00A0) with normal sp- aces instead of removing them.	2019-10-01 16:55:04 +03:00
Alan Orth	1c75608d54	README.md: Update introduction text We should mention that this is not DSpace specific. Rather, it is much more realistically Dublin Core specific.	2019-09-26 14:19:13 +03:00
Alan Orth	0b15a8ed3b	README.md: Remove TODO about lack of space after comma This was added as an automatic global fix a few weeks ago.	2019-09-26 14:16:33 +03:00
Alan Orth	9ca266f5f0	data/test.csv: Change birthdate column to dc.date.issued More accurately reflects actual data we will be validating.	2019-09-26 14:15:48 +03:00
Alan Orth	0d3f948708	CHANGELOG.md: Update comment about language validation	2019-09-26 14:14:57 +03:00
Alan Orth	c04207fcfc	CHANGELOG.md: Fix header formatting	2019-09-26 14:13:50 +03:00
Alan Orth	9d4eceddc7	.build.yml: Enable experimental CLI checks on SourceHut	2019-09-26 14:11:35 +03:00