Bump version to 0.4.5

setup.py: Bump version to 0.4.4
I missed to increase this when I actually released version 0.4.4 so I will do it in a separate commit now before I bump the version to 0.4.5.
2025-05-10 15:16:01 +02:00 · 2021-03-04 21:38:10 +02:00 · 2021-03-04 21:35:08 +02:00 · 2021-03-04 21:33:33 +02:00 · 2021-03-04 21:32:46 +02:00 · 2021-03-04 21:32:21 +02:00
13 changed files with 148 additions and 87 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -4,10 +4,20 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

-## Unreleased
+## [0.4.5] - 2021-03-04
+### Added
+- Check dates in dcterms.issued field as well, not just fields that have the
+word "date" in them
+
+### Updated
+- Run `poetry update` to update project dependencies
+
+## [0.4.4] - 2021-02-21
 ### Added
 - Accept dates formatted in ISO 8601 extended with combined date and time, for
 example: 2020-08-31T11:04:56Z
+- Colorized output: red for errors, yellow for warnings and information, green
+for changes

 ### Updated
 - Run `poetry update` to update project dependencies
--- a/README.md
+++ b/README.md
@ -109,8 +109,13 @@ This currently uses the [Python langid](https://github.com/saffsd/langid.py) lib
 - Add an option to drop invalid AGROVOC subjects?
 - Add tests for application invocation, ie `tests/test_app.py`?
 - Validate ISSNs or journal titles against CrossRef API?
- Better ISO 8601 date parsing (currently only supports simple dates, perhaps we need to use dateutil.parser.parseiso())
- Fix lazy date check (assumes field name has "date" but could be dcterms.issued etc!)
+- Add configurable field validation, like specify a field name and a validation file?
+  - Perhaps like --validate=field.name,filename
+- Add some row-based item sanity checks and fixes:
+    - Warn if item is Open Access, but missing a filename or URL
+    - Warn if item is Open Access, but missing a license
+    - Warn if item has an ISSN but no journal title
+    - Update journal titles from ISSN

 ## License
 This work is licensed under the [GPLv3](https://www.gnu.org/licenses/gpl-3.0.en.html).
--- a/csv_metadata_quality/app.py
+++ b/csv_metadata_quality/app.py
@ -4,6 +4,7 @@ import signal
 import sys

 import pandas as pd
+from colorama import Fore

 import csv_metadata_quality.check as check
 import csv_metadata_quality.experimental as experimental
@ -77,7 +78,7 @@ def run(argv):
                if column == exclude and skip is False:
                    skip = True
            if skip:
-                print(f"Skipping {column}")
+                print(f"{Fore.YELLOW}Skipping {Fore.RESET}{column}")

                continue

@ -141,7 +142,7 @@ def run(argv):
            df[column] = df[column].apply(check.isbn)

        # Check: invalid date
-        match = re.match(r"^.*?date.*$", column)
+        match = re.match(r"^.*?(date|dcterms\.issued).*$", column)
        if match is not None:
            df[column] = df[column].apply(check.date, field_name=column)

--- a/csv_metadata_quality/check.py
+++ b/csv_metadata_quality/check.py
@ -3,6 +3,7 @@ from datetime import datetime, timedelta
 import pandas as pd
 import requests
 import requests_cache
+from colorama import Fore
 from pycountry import languages


@ -26,7 +27,7 @@ def issn(field):
    for value in field.split("||"):

        if not issn.is_valid(value):
-            print(f"Invalid ISSN: {value}")
+            print(f"{Fore.RED}Invalid ISSN: {Fore.RESET}{value}")

    return field

@ -51,7 +52,7 @@ def isbn(field):
    for value in field.split("||"):

        if not isbn.is_valid(value):
-            print(f"Invalid ISBN: {value}")
+            print(f"{Fore.RED}Invalid ISBN: {Fore.RESET}{value}")

    return field

@ -76,7 +77,9 @@ def separators(field, field_name):
    for value in field.split("||"):
        # Check if the current value is blank
        if value == "":
-            print(f"Unnecessary multi-value separator ({field_name}): {field}")
+            print(
+                f"{Fore.RED}Unnecessary multi-value separator ({field_name}): {Fore.RESET}{field}"
+            )

            continue

@ -85,7 +88,9 @@ def separators(field, field_name):

        # Check if there was a match
        if match:
-            print(f"Invalid multi-value separator ({field_name}): {field}")
+            print(
+                f"{Fore.RED}Invalid multi-value separator ({field_name}): {Fore.RESET}{field}"
+            )

    return field

@ -102,7 +107,7 @@ def date(field, field_name):
    """

    if pd.isna(field):
-        print(f"Missing date ({field_name}).")
+        print(f"{Fore.RED}Missing date ({field_name}).{Fore.RESET}")

        return

@ -111,7 +116,9 @@ def date(field, field_name):

    # We don't allow multi-value date fields
    if len(multiple_dates) > 1:
-        print(f"Multiple dates not allowed ({field_name}): {field}")
+        print(
+            f"{Fore.RED}Multiple dates not allowed ({field_name}): {Fore.RESET}{field}"
+        )

        return field

@ -145,7 +152,7 @@ def date(field, field_name):

        return field
    except ValueError:
-        print(f"Invalid date ({field_name}): {field}")
+        print(f"{Fore.RED}Invalid date ({field_name}): {Fore.RESET}{field}")

        return field

@ -178,9 +185,7 @@ def suspicious_characters(field, field_name):
            # character and spanning enough of the rest to give a preview,
            # but not too much to cause the line to break in terminals with
            # a default of 80 characters width.
-            suspicious_character_msg = (
-                f"Suspicious character ({field_name}): {field_subset}"
-            )
+            suspicious_character_msg = f"{Fore.YELLOW}Suspicious character ({field_name}): {Fore.RESET}{field_subset}"
            print(f"{suspicious_character_msg:1.80}")

    return field
@ -205,16 +210,16 @@ def language(field):
        # can check it against ISO 639-1 or ISO 639-3 accordingly.
        if len(value) == 2:
            if not languages.get(alpha_2=value):
-                print(f"Invalid ISO 639-1 language: {value}")
+                print(f"{Fore.RED}Invalid ISO 639-1 language: {Fore.RESET}{value}")

                pass
        elif len(value) == 3:
            if not languages.get(alpha_3=value):
-                print(f"Invalid ISO 639-3 language: {value}")
+                print(f"{Fore.RED}Invalid ISO 639-3 language: {Fore.RESET}{value}")

                pass
        else:
-            print(f"Invalid language: {value}")
+            print(f"{Fore.RED}Invalid language: {Fore.RESET}{value}")

    return field

@ -256,7 +261,7 @@ def agrovoc(field, field_name):

            # check if there are any results
            if len(data["results"]) == 0:
-                print(f"Invalid AGROVOC ({field_name}): {value}")
+                print(f"{Fore.RED}Invalid AGROVOC ({field_name}): {Fore.RESET}{value}")

    return field

@ -309,6 +314,6 @@ def filename_extension(field):
                break

        if filename_extension_match is False:
-            print(f"Filename with uncommon extension: {value}")
+            print(f"{Fore.YELLOW}Filename with uncommon extension: {Fore.RESET}{value}")

    return field
--- a/csv_metadata_quality/experimental.py
+++ b/csv_metadata_quality/experimental.py
@ -1,4 +1,5 @@
 import pandas as pd
+from colorama import Fore


 def correct_language(row):
@ -10,10 +11,11 @@ def correct_language(row):
    language and returns the value in the language field if it does match.
    """

-    from pycountry import languages
-    import langid
    import re

+    import langid
+    from pycountry import languages
+
    # Initialize some variables at global scope so that we can set them in the
    # loop scope below and still be able to access them afterwards.
    language = ""
@ -83,12 +85,12 @@ def correct_language(row):
    detected_language = languages.get(alpha_2=langid_classification[0])
    if len(language) == 2 and language != detected_language.alpha_2:
        print(
-            f"Possibly incorrect language {language} (detected {detected_language.alpha_2}): {title}"
+            f"{Fore.YELLOW}Possibly incorrect language {language} (detected {detected_language.alpha_2}): {Fore.RESET}{title}"
        )

    elif len(language) == 3 and language != detected_language.alpha_3:
        print(
-            f"Possibly incorrect language {language} (detected {detected_language.alpha_3}): {title}"
+            f"{Fore.YELLOW}Possibly incorrect language {language} (detected {detected_language.alpha_3}): {Fore.RESET}{title}"
        )

    else:
--- a/csv_metadata_quality/fix.py
+++ b/csv_metadata_quality/fix.py
@ -2,6 +2,7 @@ import re
 from unicodedata import normalize

 import pandas as pd
+from colorama import Fore

 from csv_metadata_quality.util import is_nfc

@ -29,7 +30,9 @@ def whitespace(field, field_name):
        match = re.findall(pattern, value)

        if match:
-            print(f"Removing excessive whitespace ({field_name}): {value}")
+            print(
+                f"{Fore.GREEN}Removing excessive whitespace ({field_name}): {Fore.RESET}{value}"
+            )
            value = re.sub(pattern, " ", value)

        # Save cleaned value
@ -62,7 +65,9 @@ def separators(field, field_name):
    for value in field.split("||"):
        # Check if the value is blank and skip it
        if value == "":
-            print(f"Fixing unnecessary multi-value separator ({field_name}): {field}")
+            print(
+                f"{Fore.GREEN}Fixing unnecessary multi-value separator ({field_name}): {Fore.RESET}{field}"
+            )

            continue

@ -71,7 +76,9 @@ def separators(field, field_name):
        match = re.findall(pattern, value)

        if match:
-            print(f"Fixing invalid multi-value separator ({field_name}): {value}")
+            print(
+                f"{Fore.RED}Fixing invalid multi-value separator ({field_name}): {Fore.RESET}{value}"
+            )

            value = re.sub(pattern, "||", value)

@ -107,7 +114,7 @@ def unnecessary_unicode(field):
    match = re.findall(pattern, field)

    if match:
-        print(f"Removing unnecessary Unicode (U+200B): {field}")
+        print(f"{Fore.GREEN}Removing unnecessary Unicode (U+200B): {Fore.RESET}{field}")
        field = re.sub(pattern, "", field)

    # Check for replacement characters (U+FFFD)
@ -115,7 +122,7 @@ def unnecessary_unicode(field):
    match = re.findall(pattern, field)

    if match:
-        print(f"Removing unnecessary Unicode (U+FFFD): {field}")
+        print(f"{Fore.GREEN}Removing unnecessary Unicode (U+FFFD): {Fore.RESET}{field}")
        field = re.sub(pattern, "", field)

    # Check for no-break spaces (U+00A0)
@ -123,7 +130,9 @@ def unnecessary_unicode(field):
    match = re.findall(pattern, field)

    if match:
-        print(f"Replacing unnecessary Unicode (U+00A0): {field}")
+        print(
+            f"{Fore.GREEN}Replacing unnecessary Unicode (U+00A0): {Fore.RESET}{field}"
+        )
        field = re.sub(pattern, " ", field)

    # Check for soft hyphens (U+00AD), sometimes preceeded with a normal hyphen
@ -131,7 +140,9 @@ def unnecessary_unicode(field):
    match = re.findall(pattern, field)

    if match:
-        print(f"Replacing unnecessary Unicode (U+00AD): {field}")
+        print(
+            f"{Fore.GREEN}Replacing unnecessary Unicode (U+00AD): {Fore.RESET}{field}"
+        )
        field = re.sub(pattern, "-", field)

    return field
@ -156,7 +167,9 @@ def duplicates(field, field_name):
        if value not in new_values:
            new_values.append(value)
        else:
-            print(f"Removing duplicate value ({field_name}): {value}")
+            print(
+                f"{Fore.GREEN}Removing duplicate value ({field_name}): {Fore.RESET}{value}"
+            )

    # Create a new field consisting of all values joined with "||"
    new_field = "||".join(new_values)
@ -189,7 +202,7 @@ def newlines(field):
    match = re.findall(r"\n", field)

    if match:
-        print(f"Removing newline: {field}")
+        print(f"{Fore.GREEN}Removing newline: {Fore.RESET}{field}")
        field = field.replace("\n", "")

    return field
@ -213,7 +226,9 @@ def comma_space(field, field_name):
    match = re.findall(r",\w", field)

    if match:
-        print(f"Adding space after comma ({field_name}): {field}")
+        print(
+            f"{Fore.GREEN}Adding space after comma ({field_name}): {Fore.RESET}{field}"
+        )
        field = re.sub(r",(\w)", r", \1", field)

    return field
@ -234,7 +249,7 @@ def normalize_unicode(field, field_name):

    # Check if the current string is using normalized Unicode (NFC)
    if not is_nfc(field):
-        print(f"Normalizing Unicode ({field_name}): {field}")
+        print(f"{Fore.GREEN}Normalizing Unicode ({field_name}): {Fore.RESET}{field}")
        field = normalize("NFC", field)

    return field
--- a/csv_metadata_quality/version.py
+++ b/csv_metadata_quality/version.py
@ -1 +1 @@
-VERSION = "0.4.3"
+VERSION = "0.4.5"
--- a/poetry.lock
+++ b/poetry.lock
@ -159,7 +159,7 @@ python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*"
 name = "colorama"
 version = "0.4.4"
 description = "Cross-platform colored terminal text."
-category = "dev"
+category = "main"
 optional = false
 python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*"

@ -233,7 +233,7 @@ python-versions = "*"

 [[package]]
 name = "ipython"
-version = "7.20.0"
+version = "7.21.0"
 description = "IPython: Productive Interactive Computing"
 category = "dev"
 optional = false
@ -388,7 +388,7 @@ pyparsing = ">=2.0.2"

 [[package]]
 name = "pandas"
-version = "1.2.2"
+version = "1.2.3"
 description = "Powerful data structures for data analysis, time series, and statistics"
 category = "main"
 optional = false
@ -765,7 +765,7 @@ python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*"
 [metadata]
 lock-version = "1.1"
 python-versions = "^3.8"
-content-hash = "63f2c6ef09652c4f8407660ff7b4690c8a07e5501eb8fc8c477f485de5888fcf"
+content-hash = "8c4ba410bbdc930d2d74f7864470a18827029a5697869833959708d7425460ae"

 [metadata.files]
 agate = [
@ -851,8 +851,8 @@ iniconfig = [
    {file = "iniconfig-1.1.1.tar.gz", hash = "sha256:bc3af051d7d14b2ee5ef9969666def0cd1a000e121eaea580d4a313df4b37f32"},
 ]
 ipython = [
-    {file = "ipython-7.20.0-py3-none-any.whl", hash = "sha256:1918dea4bfdc5d1a830fcfce9a710d1d809cbed123e85eab0539259cb0f56640"},
-    {file = "ipython-7.20.0.tar.gz", hash = "sha256:1923af00820a8cf58e91d56b89efc59780a6e81363b94464a0f17c039dffff9e"},
+    {file = "ipython-7.21.0-py3-none-any.whl", hash = "sha256:34207ffb2f653bced2bc8e3756c1db86e7d93e44ed049daae9814fed66d408ec"},
+    {file = "ipython-7.21.0.tar.gz", hash = "sha256:04323f72d5b85b606330b6d7e2dc8d2683ad46c3905e955aa96ecc7a99388e70"},
 ]
 ipython-genutils = [
    {file = "ipython_genutils-0.2.0-py2.py3-none-any.whl", hash = "sha256:72dd37233799e619666c9f639a9da83c34013a73e8bbc79a7a6348d93c61fab8"},
@ -924,24 +924,22 @@ packaging = [
    {file = "packaging-20.9.tar.gz", hash = "sha256:5b327ac1320dc863dca72f4514ecc086f31186744b84a230374cc1fd776feae5"},
 ]
 pandas = [
-    {file = "pandas-1.2.2-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:c76a108272a4de63189b8f64086bbaf8348841d7e610b52f50959fbbf401524f"},
-    {file = "pandas-1.2.2-cp37-cp37m-manylinux1_i686.whl", hash = "sha256:e61a089151f1ed78682aa77a3bcae0495cf8e585546c26924857d7e8a9960568"},
-    {file = "pandas-1.2.2-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:fc351cd2df318674669481eb978a7799f24fd14ef26987a1aa75105b0531d1a1"},
-    {file = "pandas-1.2.2-cp37-cp37m-win32.whl", hash = "sha256:05ca6bda50123158eb15e716789083ca4c3b874fd47688df1716daa72644ee1c"},
-    {file = "pandas-1.2.2-cp37-cp37m-win_amd64.whl", hash = "sha256:08b6bbe74ae2b3e4741a744d2bce35ce0868a6b4189d8b84be26bb334f73da4c"},
-    {file = "pandas-1.2.2-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:230de25bd9791748b2638c726a5f37d77a96a83854710110fadd068d1e2c2c9f"},
-    {file = "pandas-1.2.2-cp38-cp38-manylinux1_i686.whl", hash = "sha256:a50cf3110a1914442e7b7b9cef394ef6bed0d801b8a34d56f4c4e927bbbcc7d0"},
-    {file = "pandas-1.2.2-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:4d33537a375cfb2db4d388f9a929b6582a364137ea6c6b161b0166440d6ffe36"},
-    {file = "pandas-1.2.2-cp38-cp38-manylinux2014_aarch64.whl", hash = "sha256:8ac028cd9a6e1efe43f3dc36f708263838283535cc45430a98b9803f44f4c84b"},
-    {file = "pandas-1.2.2-cp38-cp38-win32.whl", hash = "sha256:c43d1beb098a1da15934262009a7120aac8dafa20d042b31dab48c28868eb5a4"},
-    {file = "pandas-1.2.2-cp38-cp38-win_amd64.whl", hash = "sha256:69a70d79a791fa1fd5f6e84b8b6dec2ec92369bde4ab2e18d43fc8a1825f51d1"},
-    {file = "pandas-1.2.2-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:cbad4155028b8ca66aa19a8b13f593ebbf51bfb6c3f2685fe64f04d695a81864"},
-    {file = "pandas-1.2.2-cp39-cp39-manylinux1_i686.whl", hash = "sha256:fbddbb20f30308ba2546193d64e18c23b69f59d48cdef73676cbed803495c8dc"},
-    {file = "pandas-1.2.2-cp39-cp39-manylinux1_x86_64.whl", hash = "sha256:214ae60b1f863844e97c87f758c29940ffad96c666257323a4bb2a33c58719c2"},
-    {file = "pandas-1.2.2-cp39-cp39-manylinux2014_aarch64.whl", hash = "sha256:26b4919eb3039a686a86cd4f4a74224f8f66e3a419767da26909dcdd3b37c31e"},
-    {file = "pandas-1.2.2-cp39-cp39-win32.whl", hash = "sha256:e3c250faaf9979d0ec836d25e420428db37783fa5fed218da49c9fc06f80f51c"},
-    {file = "pandas-1.2.2-cp39-cp39-win_amd64.whl", hash = "sha256:e9bbcc7b5c432600797981706f5b54611990c6a86b2e424329c995eea5f9c42b"},
-    {file = "pandas-1.2.2.tar.gz", hash = "sha256:14ed84b463e9b84c8ff9308a79b04bf591ae3122a376ee0f62c68a1bd917a773"},
+    {file = "pandas-1.2.3-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:4d821b9b911fc1b7d428978d04ace33f0af32bb7549525c8a7b08444bce46b74"},
+    {file = "pandas-1.2.3-cp37-cp37m-manylinux1_i686.whl", hash = "sha256:9f5829e64507ad10e2561b60baf285c470f3c4454b007c860e77849b88865ae7"},
+    {file = "pandas-1.2.3-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:97b1954533b2a74c7e20d1342c4f01311d3203b48f2ebf651891e6a6eaf01104"},
+    {file = "pandas-1.2.3-cp37-cp37m-win32.whl", hash = "sha256:5e3c8c60541396110586bcbe6eccdc335a38e7de8c217060edaf4722260b158f"},
+    {file = "pandas-1.2.3-cp37-cp37m-win_amd64.whl", hash = "sha256:8a051e957c5206f722e83f295f95a2cf053e890f9a1fba0065780a8c2d045f5d"},
+    {file = "pandas-1.2.3-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:a93e34f10f67d81de706ce00bf8bb3798403cabce4ccb2de10c61b5ae8786ab5"},
+    {file = "pandas-1.2.3-cp38-cp38-manylinux1_i686.whl", hash = "sha256:46fc671c542a8392a4f4c13edc8527e3a10f6cb62912d856f82248feb747f06e"},
+    {file = "pandas-1.2.3-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:43e00770552595c2250d8d712ec8b6e08ca73089ac823122344f023efa4abea3"},
+    {file = "pandas-1.2.3-cp38-cp38-win32.whl", hash = "sha256:475b7772b6e18a93a43ea83517932deff33954a10d4fbae18d0c1aba4182310f"},
+    {file = "pandas-1.2.3-cp38-cp38-win_amd64.whl", hash = "sha256:72ffcea00ae8ffcdbdefff800284311e155fbb5ed6758f1a6110fc1f8f8f0c1c"},
+    {file = "pandas-1.2.3-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:621c044a1b5e535cf7dcb3ab39fca6f867095c3ef223a524f18f60c7fee028ea"},
+    {file = "pandas-1.2.3-cp39-cp39-manylinux1_i686.whl", hash = "sha256:0f27fd1adfa256388dc34895ca5437eaf254832223812afd817a6f73127f969c"},
+    {file = "pandas-1.2.3-cp39-cp39-manylinux1_x86_64.whl", hash = "sha256:dbb255975eb94143f2e6ec7dadda671d25147939047839cd6b8a4aff0379bb9b"},
+    {file = "pandas-1.2.3-cp39-cp39-win32.whl", hash = "sha256:d59842a5aa89ca03c2099312163ffdd06f56486050e641a45d926a072f04d994"},
+    {file = "pandas-1.2.3-cp39-cp39-win_amd64.whl", hash = "sha256:09761bf5f8c741d47d4b8b9073288de1be39bbfccc281d70b889ade12b2aad29"},
+    {file = "pandas-1.2.3.tar.gz", hash = "sha256:df6f10b85aef7a5bb25259ad651ad1cc1d6bb09000595cab47e718cbac250b1d"},
 ]
 parsedatetime = [
    {file = "parsedatetime-2.6-py3-none-any.whl", hash = "sha256:cb96edd7016872f58479e35879294258c71437195760746faffedb692aef000b"},
--- a/pyproject.toml
+++ b/pyproject.toml
@ -1,6 +1,6 @@
 [tool.poetry]
 name = "csv-metadata-quality"
-version = "0.4.3"
+version = "0.4.5"
 description="A simple, but opinionated CSV quality checking and fixing pipeline for CSVs in the DSpace ecosystem."
 authors = ["Alan Orth <alan.orth@gmail.com>"]
 license="GPL-3.0-only"
@ -16,6 +16,7 @@ requests = "^2.23.0"
 requests-cache = "^0.5.2"
 pycountry = "^19.8.18"
 langid = "^1.1.6"
+colorama = "^0.4.4"

 [tool.poetry.dev-dependencies]
 pytest = "^6.1.1"
--- a/requirements-dev.txt
+++ b/requirements-dev.txt
@ -12,7 +12,7 @@ black==20.8b1; python_version >= "3.6"
 certifi==2020.12.5; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.5.0"
 chardet==4.0.0; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.5.0"
 click==7.1.2; python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.5.0" and python_version >= "3.6"
-colorama==0.4.4; python_version >= "3.7" and python_full_version < "3.0.0" and sys_platform == "win32" and python_version < "4.0" and (python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.4.0" and python_version >= "3.6") or sys_platform == "win32" and python_version >= "3.7" and python_full_version >= "3.5.0" and python_version < "4.0" and (python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.4.0" and python_version >= "3.6")
+colorama==0.4.4; (python_version >= "2.7" and python_full_version < "3.0.0") or (python_full_version >= "3.5.0")
 csvkit==1.0.5
 dbfread==2.0.7
 decorator==4.4.2; python_version >= "3.7" and python_full_version < "3.0.0" and python_version < "4.0" or python_version >= "3.7" and python_version < "4.0" and python_full_version >= "3.2.0"
@ -21,7 +21,7 @@ flake8==3.8.4; (python_version >= "2.7" and python_full_version < "3.0.0") or (p
 idna==2.10; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.5.0"
 iniconfig==1.1.1; python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.4.0" and python_version >= "3.6"
 ipython-genutils==0.2.0; python_version >= "3.7" and python_version < "4.0"
-ipython==7.20.0; python_version >= "3.7" and python_version < "4.0"
+ipython==7.21.0; python_version >= "3.7" and python_version < "4.0"
 isodate==0.6.0
 isort==5.7.0; python_version >= "3.6" and python_version < "4.0"
 jdcal==1.4.1; python_version >= "3.6"
@ -30,29 +30,29 @@ langid==1.1.6
 leather==0.3.3
 mccabe==0.6.1; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.4.0"
 mypy-extensions==0.4.3; python_version >= "3.6"
-numpy==1.20.0; python_version >= "3.7" and python_full_version >= "3.7.1"
+numpy==1.20.1; python_version >= "3.7" and python_full_version >= "3.7.1"
 openpyxl==3.0.6; python_version >= "3.6"
 packaging==20.9; python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.4.0" and python_version >= "3.6"
-pandas==1.2.1; python_full_version >= "3.7.1"
+pandas==1.2.3; python_full_version >= "3.7.1"
 parsedatetime==2.6
 parso==0.8.1; python_version >= "3.7" and python_version < "4.0"
 pathspec==0.8.1; python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.5.0" and python_version >= "3.6"
 pexpect==4.8.0; python_version >= "3.7" and python_version < "4.0" and sys_platform != "win32"
 pickleshare==0.7.5; python_version >= "3.7" and python_version < "4.0"
 pluggy==0.13.1; python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.4.0" and python_version >= "3.6"
-prompt-toolkit==3.0.14; python_version >= "3.7" and python_version < "4.0" and python_full_version >= "3.6.1"
+prompt-toolkit==3.0.16; python_version >= "3.7" and python_version < "4.0" and python_full_version >= "3.6.1"
 ptyprocess==0.7.0; python_version >= "3.7" and python_version < "4.0" and sys_platform != "win32"
 py==1.10.0; python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.4.0" and python_version >= "3.6"
 pycodestyle==2.6.0; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.4.0"
 pycountry==19.8.18
 pyflakes==2.2.0; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.4.0"
-pygments==2.7.4; python_version >= "3.7" and python_version < "4.0"
+pygments==2.8.0; python_version >= "3.7" and python_version < "4.0"
 pyparsing==2.4.7; python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.4.0" and python_version >= "3.6"
 pytest-clarity==0.3.0a0; (python_version >= "2.7" and python_full_version < "3.0.0") or (python_full_version >= "3.4.0")
 pytest==6.2.2; python_version >= "3.6"
 python-dateutil==2.8.1; python_full_version >= "3.7.1"
 python-slugify==4.0.1; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.5.0"
-python-stdnum==1.15
+python-stdnum==1.16
 pytimeparse==1.1.8
 pytz==2021.1; python_full_version >= "3.7.1"
 regex==2020.11.13; python_version >= "3.6"
--- a/requirements.txt
+++ b/requirements.txt
@ -1,12 +1,13 @@
 certifi==2020.12.5; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.5.0"
 chardet==4.0.0; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.5.0"
+colorama==0.4.4; (python_version >= "2.7" and python_full_version < "3.0.0") or (python_full_version >= "3.5.0")
 idna==2.10; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.5.0"
 langid==1.1.6
-numpy==1.20.0; python_version >= "3.7" and python_full_version >= "3.7.1"
-pandas==1.2.1; python_full_version >= "3.7.1"
+numpy==1.20.1; python_version >= "3.7" and python_full_version >= "3.7.1"
+pandas==1.2.3; python_full_version >= "3.7.1"
 pycountry==19.8.18
 python-dateutil==2.8.1; python_full_version >= "3.7.1"
-python-stdnum==1.15
+python-stdnum==1.16
 pytz==2021.1; python_full_version >= "3.7.1"
 requests-cache==0.5.2
 requests==2.25.1; (python_version >= "2.7" and python_full_version < "3.0.0") or (python_full_version >= "3.5.0")
--- a/setup.py
+++ b/setup.py
@ -14,7 +14,7 @@ install_requires = [

 setuptools.setup(
    name="csv-metadata-quality",
-    version="0.4.3",
+    version="0.4.5",
    author="Alan Orth",
    author_email="aorth@mjanja.ch",
    description="A simple, but opinionated CSV quality checking and fixing pipeline for CSVs in the DSpace ecosystem.",
--- a/tests/test_check.py
+++ b/tests/test_check.py
@ -1,4 +1,5 @@
 import pandas as pd
+from colorama import Fore

 import csv_metadata_quality.check as check
 import csv_metadata_quality.experimental as experimental
@ -12,7 +13,7 @@ def test_check_invalid_issn(capsys):
    check.issn(value)

    captured = capsys.readouterr()
-    assert captured.out == f"Invalid ISSN: {value}\n"
+    assert captured.out == f"{Fore.RED}Invalid ISSN: {Fore.RESET}{value}\n"


 def test_check_valid_issn():
@ -33,7 +34,7 @@ def test_check_invalid_isbn(capsys):
    check.isbn(value)

    captured = capsys.readouterr()
-    assert captured.out == f"Invalid ISBN: {value}\n"
+    assert captured.out == f"{Fore.RED}Invalid ISBN: {Fore.RESET}{value}\n"


 def test_check_valid_isbn():
@ -56,7 +57,10 @@ def test_check_invalid_separators(capsys):
    check.separators(value, field_name)

    captured = capsys.readouterr()
-    assert captured.out == f"Invalid multi-value separator ({field_name}): {value}\n"
+    assert (
+        captured.out
+        == f"{Fore.RED}Invalid multi-value separator ({field_name}): {Fore.RESET}{value}\n"
+    )


 def test_check_unnecessary_separators(capsys):
@ -70,7 +74,8 @@ def test_check_unnecessary_separators(capsys):

    captured = capsys.readouterr()
    assert (
-        captured.out == f"Unnecessary multi-value separator ({field_name}): {field}\n"
+        captured.out
+        == f"{Fore.RED}Unnecessary multi-value separator ({field_name}): {Fore.RESET}{field}\n"
    )


@ -96,7 +101,7 @@ def test_check_missing_date(capsys):
    check.date(value, field_name)

    captured = capsys.readouterr()
-    assert captured.out == f"Missing date ({field_name}).\n"
+    assert captured.out == f"{Fore.RED}Missing date ({field_name}).{Fore.RESET}\n"


 def test_check_multiple_dates(capsys):
@ -109,7 +114,10 @@ def test_check_multiple_dates(capsys):
    check.date(value, field_name)

    captured = capsys.readouterr()
-    assert captured.out == f"Multiple dates not allowed ({field_name}): {value}\n"
+    assert (
+        captured.out
+        == f"{Fore.RED}Multiple dates not allowed ({field_name}): {Fore.RESET}{value}\n"
+    )


 def test_check_invalid_date(capsys):
@ -122,7 +130,9 @@ def test_check_invalid_date(capsys):
    check.date(value, field_name)

    captured = capsys.readouterr()
-    assert captured.out == f"Invalid date ({field_name}): {value}\n"
+    assert (
+        captured.out == f"{Fore.RED}Invalid date ({field_name}): {Fore.RESET}{value}\n"
+    )


 def test_check_valid_date():
@ -147,7 +157,10 @@ def test_check_suspicious_characters(capsys):
    check.suspicious_characters(value, field_name)

    captured = capsys.readouterr()
-    assert captured.out == f"Suspicious character ({field_name}): ˆt\n"
+    assert (
+        captured.out
+        == f"{Fore.YELLOW}Suspicious character ({field_name}): {Fore.RESET}ˆt\n"
+    )


 def test_check_valid_iso639_1_language():
@ -178,7 +191,9 @@ def test_check_invalid_iso639_1_language(capsys):
    check.language(value)

    captured = capsys.readouterr()
-    assert captured.out == f"Invalid ISO 639-1 language: {value}\n"
+    assert (
+        captured.out == f"{Fore.RED}Invalid ISO 639-1 language: {Fore.RESET}{value}\n"
+    )


 def test_check_invalid_iso639_3_language(capsys):
@ -189,7 +204,9 @@ def test_check_invalid_iso639_3_language(capsys):
    check.language(value)

    captured = capsys.readouterr()
-    assert captured.out == f"Invalid ISO 639-3 language: {value}\n"
+    assert (
+        captured.out == f"{Fore.RED}Invalid ISO 639-3 language: {Fore.RESET}{value}\n"
+    )


 def test_check_invalid_language(capsys):
@ -200,7 +217,7 @@ def test_check_invalid_language(capsys):
    check.language(value)

    captured = capsys.readouterr()
-    assert captured.out == f"Invalid language: {value}\n"
+    assert captured.out == f"{Fore.RED}Invalid language: {Fore.RESET}{value}\n"


 def test_check_invalid_agrovoc(capsys):
@ -212,7 +229,10 @@ def test_check_invalid_agrovoc(capsys):
    check.agrovoc(value, field_name)

    captured = capsys.readouterr()
-    assert captured.out == f"Invalid AGROVOC ({field_name}): {value}\n"
+    assert (
+        captured.out
+        == f"{Fore.RED}Invalid AGROVOC ({field_name}): {Fore.RESET}{value}\n"
+    )


 def test_check_valid_agrovoc():
@ -234,7 +254,10 @@ def test_check_uncommon_filename_extension(capsys):
    check.filename_extension(value)

    captured = capsys.readouterr()
-    assert captured.out == f"Filename with uncommon extension: {value}\n"
+    assert (
+        captured.out
+        == f"{Fore.YELLOW}Filename with uncommon extension: {Fore.RESET}{value}\n"
+    )


 def test_check_common_filename_extension():
@ -262,7 +285,7 @@ def test_check_incorrect_iso_639_1_language(capsys):
    captured = capsys.readouterr()
    assert (
        captured.out
-        == f"Possibly incorrect language {language} (detected en): {title}\n"
+        == f"{Fore.YELLOW}Possibly incorrect language {language} (detected en): {Fore.RESET}{title}\n"
    )


@ -281,7 +304,7 @@ def test_check_incorrect_iso_639_3_language(capsys):
    captured = capsys.readouterr()
    assert (
        captured.out
-        == f"Possibly incorrect language {language} (detected eng): {title}\n"
+        == f"{Fore.YELLOW}Possibly incorrect language {language} (detected eng): {Fore.RESET}{title}\n"
    )
Author	SHA1	Message	Date
Alan Orth	202bda862a	Bump version to 0.4.5 All checks were successful continuous-integration/drone/push Build is passing Details	2021-03-04 21:38:10 +02:00
Alan Orth	7479310ac0	setup.py: Bump version to 0.4.4 I missed to increase this when I actually released version 0.4.4 so I will do it in a separate commit now before I bump the version to 0.4.5.	2021-03-04 21:35:08 +02:00
Alan Orth	98a91bc9c2	Update requirements Generated with poetry export: $ poetry export --without-hashes -f requirements.txt > requirements.txt $ poetry export --without-hashes --dev -f requirements.txt > requirements-dev.txt I am trying `--without-hashes` to work around an error on pip install when running in CI: ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==.	2021-03-04 21:33:33 +02:00
Alan Orth	fc5bedcc5c	CHANGELOG.md: Add poetry update	2021-03-04 21:32:46 +02:00
Alan Orth	44d12d771a	poetry.lock: Run poetry update	2021-03-04 21:32:21 +02:00
Alan Orth	4a7000e975	README.md: Add more ideas to do	2021-03-04 21:26:53 +02:00
Alan Orth	27b2d81ca8	CHANGELOG.md: Add note about dcterms.issued All checks were successful continuous-integration/drone/push Build is passing Details	2021-02-28 15:14:39 +02:00
Alan Orth	91ebd0f606	README.md: Update TODOs A few of these date things have been addressed.	2021-02-28 15:13:36 +02:00
Alan Orth	dd2cfae047	csv_metadata_quality/app.py: Match dcterms.issued for dates We used to only check fields that had "date" in their name because we were using DSpace's default dc.date.* fields. Now we are using dcterms.issued so I will add that one as well.	2021-02-28 15:11:06 +02:00
Alan Orth	d76e72532a	Move unreleased changes to v0.4.4 All checks were successful continuous-integration/drone/push Build is passing Details	2021-02-21 13:25:22 +02:00
Alan Orth	13980d2dde	CHANGELOG.md: Add note about colored output	2021-02-21 13:12:26 +02:00
Alan Orth	9aaaa62461	Update requirements All checks were successful continuous-integration/drone/push Build is passing Details Generated with poetry export: $ poetry export --without-hashes -f requirements.txt > requirements.txt $ poetry export --without-hashes --dev -f requirements.txt > requirements-dev.txt I am trying `--without-hashes` to work around an error on pip install when running in CI: ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==.	2021-02-21 13:10:52 +02:00
Alan Orth	a7fc5a246c	Colorize output Some checks failed continuous-integration/drone/push Build is failing Details Messages will be colorized: - Red for errors - Yellow for warnings or information - Green for fixes	2021-02-21 13:01:25 +02:00
Alan Orth	7fb8acb866	Add colorama for colored output Red for errors, yellow for warnings or information, and green for fixes.	2021-02-21 13:00:31 +02:00