Alan Orth
566c2b45cf
Remove Excel support
...
I never used this and it seems xlrd doesn't even support .xlsx any-
more anyways. If this was needed I could theoretically use openpyxl
but I'd rather just stick to CSV.
2022-09-02 16:14:24 +03:00
Alan Orth
032a1db392
README.md: Add note about missing regions
continuous-integration/drone/push Build is passing
Details
2022-07-28 16:58:01 +03:00
Alan Orth
e7ea8ef9f0
README.md: add note about spdx-license-list
...
continuous-integration/drone/push Build is passing
Details
This Python module was deprecated in favor of using the SPDX license
data directly.
See: https://github.com/spdx/license-list-data
2022-01-30 13:27:20 +03:00
Alan Orth
d126304534
README.md: update note about Python version
2022-01-30 13:05:36 +03:00
Alan Orth
ad33195ba3
README.md: adjust intro
...
continuous-integration/drone/push Build is passing
Details
Makes the badges not wrap and looks better in my opinion.
2021-12-08 11:36:34 +02:00
Alan Orth
28f9026286
README.md: Minor edit
continuous-integration/drone/push Build is passing
Details
2021-03-19 16:26:31 +02:00
Alan Orth
a04dbc50db
Add notes about checking and fixing mojibake
2021-03-19 11:48:27 +02:00
Alan Orth
e92ec5d371
README.md: Add note about duplicate checking
continuous-integration/drone/push Build is passing
Details
2021-03-17 10:12:03 +02:00
Alan Orth
9a5e3fd6ef
README.md: Add TODO about detecting duplicates
2021-03-16 14:03:26 +02:00
Alan Orth
1008acf35e
Always fix invalid multi-value separators
...
continuous-integration/drone/push Build is passing
Details
This is no longer class-ified as "unsafe" as I have yet to see a
case where this was intentional, and it always causes issues when
you import the data in a DSpace repository.
2021-03-13 12:59:45 +02:00
Alan Orth
f00a07e2cd
README.md: Reorganize unsafe functionality
continuous-integration/drone/push Build is passing
Details
2021-03-13 11:56:52 +02:00
Alan Orth
6cc1401f88
pyproject.toml: Minimum Python is technically 3.7.1
...
continuous-integration/drone/push Build is passing
Details
See: https://pandas.pydata.org/pandas-docs/stable/whatsnew/v1.2.0.html
2021-03-11 13:41:58 +02:00
Alan Orth
ad2cda8a41
README.md: Add note about SPDX license identifiers
continuous-integration/drone/push Build is passing
Details
2021-03-11 12:21:34 +02:00
Alan Orth
6ca449d8ed
README.md: Update note about Python 3.8 to 3.8+
...
Currently the lower bound on Python version support is 3.7 because
of Pandas 1.2.0 requiring it, but I use 3.9 on my development box.
2021-03-11 12:16:07 +02:00
Alan Orth
6e4b0e5c1b
Add validation of SPDX license identifiers
...
Currently this only checks the dcterms.license field and the result
will only be a warning.
2021-03-11 10:33:16 +02:00
Alan Orth
4a7000e975
README.md: Add more ideas to do
2021-03-04 21:26:53 +02:00
Alan Orth
91ebd0f606
README.md: Update TODOs
...
A few of these date things have been addressed.
2021-02-28 15:13:36 +02:00
Alan Orth
dbbbc0944a
README.md: Add handle to citation
continuous-integration/drone/push Build is passing
Details
2021-01-27 10:33:37 +02:00
Alan Orth
d17bf3033c
README.md: Add citation
2021-01-27 10:32:26 +02:00
Alan Orth
2ec52f1b73
README.md: Update description
continuous-integration/drone/push Build is passing
Details
2021-01-26 15:43:41 +02:00
Alan Orth
aa1abf15a7
README.md: Adjust title
2021-01-26 15:35:21 +02:00
Alan Orth
df670e81b9
README.md: Use badge from my Drone CI
...
continuous-integration/drone/push Build is passing
Details
I'm not using SourceHut anymore.
2021-01-26 14:38:50 +02:00
Alan Orth
d3880a9dfa
Remove Python 3.6 support
...
continuous-integration/drone/push Build is passing
Details
Pandas 1.2.0 apparently requires Python 3.7.1+.
2021-01-03 15:51:53 +02:00
Alan Orth
0dc66c5c4e
Expand check/fix for multi-value separators
...
I just came across some metadata that had unnecessary multi-value
separators at the end of a field, causing a blank value to be used.
For example: "Kenya||Tanzania||"
2021-01-03 15:30:03 +02:00
Alan Orth
fc0367bfc8
README.md: Update note about Python version
2020-12-08 10:52:24 +02:00
Alan Orth
e33b285034
README.md: Add GitHub Actions badge
2020-12-08 10:48:31 +02:00
Alan Orth
db474a802f
README.md: Use badge from travis-ci.com
2020-08-04 11:12:28 +03:00
Alan Orth
f4c5c5781e
README.md: Switch to poetry
2020-07-06 13:59:11 +03:00
Alan Orth
40ba9bae6c
README.md: Adjust heading size
2020-01-15 12:26:11 +02:00
Alan Orth
49e3543878
Add Unicode normalization
...
This will check all strings for un-normalized Unicode characters.
Normalization is done using NFC. This includes tests and updated
sample data (data/test.csv).
See: https://withblue.ink/2019/03/11/why-you-need-to-normalize-unicode-strings.html
2020-01-15 11:37:54 +02:00
Alan Orth
1c75608d54
README.md: Update introduction text
...
We should mention that this is not DSpace specific. Rather, it is
much more realistically Dublin Core specific.
2019-09-26 14:19:13 +03:00
Alan Orth
0b15a8ed3b
README.md: Remove TODO about lack of space after comma
...
This was added as an automatic global fix a few weeks ago.
2019-09-26 14:16:33 +03:00
Alan Orth
e7c220039b
README.md: Add note about experimental language validation
2019-09-26 13:59:50 +03:00
Alan Orth
7ac1c6f554
README.md: Update comment about ISO 639-3
...
The pycountry library is actually using ISO 639-3 apparently.
See: https://pypi.org/project/pycountry/
2019-09-26 07:51:41 +03:00
Alan Orth
d9fc09f121
Fix references to ISO 639
...
It turns out that ISO 639-1 is the two-letter codes, and ISO 639-2
is the three-letter codes, aka alpha2 and alpha3.
See: https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
2019-09-11 16:36:53 +03:00
Alan Orth
2af714fb05
README.md: Add a handful of TODOs
2019-08-27 00:12:41 +03:00
Alan Orth
bd984f3db5
README.md: Update TravisCI badge
2019-08-22 15:07:03 +03:00
Alan Orth
3f4e84a638
README.md: Use ILRI GitHub remote
2019-08-22 14:54:12 +03:00
Alan Orth
a00d3d7ea5
README.md: Simplify installation instructions
...
Pipenv has captured the local dependency with `-e .` so now it gets
installed by the Pipfile or requirements.txt.
2019-08-02 11:02:50 +03:00
Alan Orth
0ed390dbd5
README.md: Update AGROVOC information
...
Now details the new `--agrovoc-fields` option.
2019-08-01 23:54:40 +03:00
Alan Orth
fd3861e7cd
README.md: Update installation and usage instructions
...
It is much easier now that I have created a proper package.
2019-07-31 17:41:18 +03:00
Alan Orth
4c4f4a3ba2
README.md: Update todos
2019-07-31 16:33:49 +03:00
Alan Orth
22cc7bc793
README.md: Improve section on unsafe fixes
2019-07-31 16:00:05 +03:00
Alan Orth
40d5f7d81b
Add support for removing newlines
...
This was tricky because of the nature of newlines. In actuality we
are removing Unix line feeds here (U+000A) because Windows carriage
returns are actually already removed by the string stripping in the
whitespace fix.
Creating the test case in Vim was difficult because I couldn't fig-
ure out how to manually enter a line feed character. In the end I
used a search and replace on a known pattern like "ALAN", replacing
it with \r. Neither entering the Unicode code point (U+000A) direc-
tly or typing an "Enter" character after ^V worked. Grrr.
2019-07-30 20:05:12 +03:00
Alan Orth
346e66ca98
README.md: Add more information to introduction
2019-07-30 17:44:30 +03:00
Alan Orth
a85b410ab9
README.md: Improve introduction and functionality
2019-07-30 16:09:15 +03:00
Alan Orth
1f65a28307
Add support for validating subjects against AGROVOC
...
Checks values in the dc.subject or dcterms.subject field against the
AGROVOC REST API hosted by FAO. Code borrowed from agrovoc-lookup.py.
See: http://agrovoc.uniroma2.it/agrovoc/agrovoc/en/
See: https://github.com/ilri/DSpace/blob/5_x-prod/agrovoc-lookup.py
2019-07-30 00:30:31 +03:00
Alan Orth
a36454a3ac
Add support for validating languages
...
Will validate against ISO 639-2 or ISO 639-3 depending on how long
the language field is. Otherwise will return that the language is
invalid.
Does not currently have any support for generic values like "Other".
2019-07-29 18:59:42 +03:00
Alan Orth
e49b4e8f22
README.md: Try to simplify list of functionality
2019-07-29 18:25:38 +03:00
Alan Orth
0eb852a65b
README.md: Improve note about unsafe options
2019-07-29 18:14:50 +03:00