mirror of
https://github.com/ilri/csv-metadata-quality.git
synced 2024-12-30 16:04:29 +01:00
Remove Excel support
I never used this and it seems xlrd doesn't even support .xlsx any- more anyways. If this was needed I could theoretically use openpyxl but I'd rather just stick to CSV.
This commit is contained in:
parent
41b813be6e
commit
566c2b45cf
@ -21,6 +21,10 @@ with `-a <field.name>`
|
|||||||
- Ability to add missing UN M.49 regions when both country and region columns
|
- Ability to add missing UN M.49 regions when both country and region columns
|
||||||
are present. Enable with `-u` (unsafe fixes) for now.
|
are present. Enable with `-u` (unsafe fixes) for now.
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Support for reading Excel files (both `.xls` and `.xlsx`) as it was completely
|
||||||
|
untested
|
||||||
|
|
||||||
## [0.5.0] - 2021-12-08
|
## [0.5.0] - 2021-12-08
|
||||||
### Added
|
### Added
|
||||||
- Ability to check for, and fix, "mojibake" characters using [ftfy](https://github.com/LuminosoInsight/python-ftfy)
|
- Ability to check for, and fix, "mojibake" characters using [ftfy](https://github.com/LuminosoInsight/python-ftfy)
|
||||||
|
@ -8,7 +8,7 @@
|
|||||||
|
|
||||||
A simple, but opinionated metadata quality checker and fixer designed to work with CSVs in the DSpace ecosystem (though it could theoretically work on any CSV that uses Dublin Core fields as columns). The implementation is essentially a pipeline of checks and fixes that begins with splitting multi-value fields on the standard DSpace "||" separator, trimming leading/trailing whitespace, and then proceeding to more specialized cases like ISSNs, ISBNs, languages, unnecessary Unicode, AGROVOC terms, etc.
|
A simple, but opinionated metadata quality checker and fixer designed to work with CSVs in the DSpace ecosystem (though it could theoretically work on any CSV that uses Dublin Core fields as columns). The implementation is essentially a pipeline of checks and fixes that begins with splitting multi-value fields on the standard DSpace "||" separator, trimming leading/trailing whitespace, and then proceeding to more specialized cases like ISSNs, ISBNs, languages, unnecessary Unicode, AGROVOC terms, etc.
|
||||||
|
|
||||||
Requires Python 3.8 or greater. CSV and Excel support comes from the [Pandas](https://pandas.pydata.org/) library, though your mileage may vary with Excel because this is much less tested.
|
Requires Python 3.8 or greater. CSV support comes from the [Pandas](https://pandas.pydata.org/) library.
|
||||||
|
|
||||||
If you use the DSpace CSV metadata quality checker please cite:
|
If you use the DSpace CSV metadata quality checker please cite:
|
||||||
|
|
||||||
|
@ -36,7 +36,7 @@ def parse_args(argv):
|
|||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--input-file",
|
"--input-file",
|
||||||
"-i",
|
"-i",
|
||||||
help="Path to input file. Can be UTF-8 CSV or Excel XLSX.",
|
help="Path to input file. Must be a UTF-8 CSV.",
|
||||||
required=True,
|
required=True,
|
||||||
type=argparse.FileType("r", encoding="UTF-8"),
|
type=argparse.FileType("r", encoding="UTF-8"),
|
||||||
)
|
)
|
||||||
|
@ -14,7 +14,6 @@ csv-metadata-quality = 'csv_metadata_quality.__main__:main'
|
|||||||
python = "^3.8"
|
python = "^3.8"
|
||||||
pandas = "^1.4.0"
|
pandas = "^1.4.0"
|
||||||
python-stdnum = "^1.13"
|
python-stdnum = "^1.13"
|
||||||
xlrd = "^1.2.0"
|
|
||||||
requests = "^2.27.1"
|
requests = "^2.27.1"
|
||||||
requests-cache = "^0.9.1"
|
requests-cache = "^0.9.1"
|
||||||
pycountry = "^22.1.10"
|
pycountry = "^22.1.10"
|
||||||
|
Loading…
Reference in New Issue
Block a user