Alan Orth
4f3174a543
CHANGELOG.md: add note about SPDX license list
continuous-integration/drone/push Build is passing
Details
2024-03-02 10:39:00 +03:00
Alan Orth
a21ffb0fa8
Use py3langid instead of langid
...
Faster and more modern code for Python 3 as a drop-in replacement.
See: https://adrien.barbaresi.eu/blog/language-detection-langid-py-faster.html
2023-12-28 14:11:21 +03:00
Alan Orth
1f637f32cd
Rework requests-cache
...
We should only be running this once per invocation, not for every
row we check. This should be more efficient, but it means that we
don't cache responses when running via pytest, which is actually
probably a good thing.
2023-10-15 23:37:38 +03:00
Alan Orth
f3fb1ff7fb
Don't crash when title is missing
...
We shouldn't crash the country/region checker/fixer when the title
field is missing, since we only use it to show status to the user.
2023-06-12 10:42:50 +03:00
Alan Orth
8d4295b2b3
CHANGELOG.md: add note about description field
2023-04-22 12:17:44 -07:00
Alan Orth
c64b7eb1f1
CHANGELOG.md: add note about Pandas 2.0.0
2023-04-05 11:17:48 +03:00
Alan Orth
20a2cce34b
CHANGELOG.md: add fixes
continuous-integration/drone/push Build is failing
Details
2023-03-10 16:17:20 +03:00
Alan Orth
fdccdf7318
Version 0.6.1
continuous-integration/drone/push Build is failing
Details
2023-02-23 13:46:56 +03:00
Alan Orth
8bc4cd419c
Strip filename descriptions before checking
...
continuous-integration/drone/push Build is failing
Details
When checking for uncommon file extensions in the filename field
we should strip descriptions that are meant for SAF Bundler, for
example: Annual_Report_2020.pdf__description:Report. This ends up
as a false positive that spams the output with warnings.
2023-02-13 11:00:57 +03:00
Alan Orth
bde38e9ed4
CHANGELOG.md: add notes about abstracts
2023-02-13 10:39:03 +03:00
Alan Orth
fbb625be5c
Ignore common non-SPDX licenses
...
This is meant to catch licenses that are supposed to be SPDX but
aren't, not licenses that *aren't* supposed to be SPDX. We have so
many free-text license descriptions like "Copyrighted" and "Other"
that I'm sick of seeing warnings for them!
2023-02-07 17:01:56 +03:00
Alan Orth
084b970798
CHANGELOG.md: add note about abstract field
2023-02-07 16:52:34 +03:00
Alan Orth
c4a2ee8563
CHANGELOG.md: add note about fix.separators()
2023-01-24 14:16:23 +03:00
Alan Orth
5abd32a41f
CHANGELOG.md: run poetry update
2022-12-20 15:09:58 +02:00
Alan Orth
f640161d87
CHANGELOG.md: add notes about SPDX and Python
2022-12-13 10:45:36 +03:00
Alan Orth
051777bcec
Ignore subregion field for missing region checks
...
continuous-integration/drone/push Build is passing
Details
Due to a sloppy regex I was sometimes matching the subregion field
when checking for missing UN M.49 regions in the region field.
2022-12-07 23:18:47 +01:00
Alan Orth
8f3db86a36
CHANGELOG.md: fix header
continuous-integration/drone/push Build is passing
Details
2022-10-31 11:43:14 +03:00
Alan Orth
58b7b6e9d8
Version 0.6.0
continuous-integration/drone/push Build is passing
Details
2022-09-02 16:35:58 +03:00
Alan Orth
566c2b45cf
Remove Excel support
...
I never used this and it seems xlrd doesn't even support .xlsx any-
more anyways. If this was needed I could theoretically use openpyxl
but I'd rather just stick to CSV.
2022-09-02 16:14:24 +03:00
Alan Orth
41b813be6e
CHANGELOG.md: add not about exclude logic
2022-09-02 16:03:51 +03:00
Alan Orth
da87531779
CHANGELOG.md: Add note about adding missing regions
2022-07-28 16:54:05 +03:00
Alan Orth
e1b270cf83
CHANGELOG.md: add note about dropping invalid AGROVOC values
continuous-integration/drone/push Build is passing
Details
2021-12-23 12:47:42 +02:00
Alan Orth
a351ba9706
CHANGELOG.md: add notes about ftfy
2021-12-15 22:09:01 +02:00
Alan Orth
5854f8e865
CHANGELOG.md: add note about unnecessary Unicode
2021-12-15 13:56:31 +02:00
Alan Orth
cef6c66b30
CHANGELOG.md: start next changes
2021-12-09 23:21:58 +02:00
Alan Orth
cc34db7ff8
Version 0.5.0
continuous-integration/drone/push Build is passing
Details
2021-12-08 15:29:46 +02:00
Alan Orth
b79e07b814
CHANGELOG.md: Add note about countries without regions
2021-12-08 15:21:45 +02:00
Alan Orth
f5fa33bbc6
CHANGELOG.md: add title in citation note
2021-12-05 16:23:39 +02:00
Alan Orth
c95261f522
CHANGELOG.md: Add note about fix.newlines
continuous-integration/drone/push Build is passing
Details
2021-10-08 14:37:12 +03:00
Alan Orth
831ce979c3
CHANGELOG.md: Clarify regex fixes
2021-10-06 21:23:35 +03:00
Alan Orth
72dd3e7272
CHANGELOG.md: Add notes about regexes
2021-10-06 19:35:59 +03:00
Alan Orth
81069259ba
CHANGELOG.md: Add note about bibliographicCitation
continuous-integration/drone/push Build is passing
Details
2021-10-06 16:16:51 +03:00
Alan Orth
dbc0437d59
CHANGELOG.md: Add note about Python deps
continuous-integration/drone/push Build is passing
Details
2021-04-14 16:16:02 +03:00
Alan Orth
a04dbc50db
Add notes about checking and fixing mojibake
2021-03-19 11:48:27 +02:00
Alan Orth
f816e17fe7
Version 0.4.7
continuous-integration/drone/push Build is passing
Details
2021-03-17 10:00:34 +02:00
Alan Orth
652b7ea98c
CHANGELOG.md: Add note about poetry dependencies
2021-03-17 09:58:02 +02:00
Alan Orth
a313b7527a
CHANGELOG.md: Add note about duplicate items
2021-03-17 09:55:07 +02:00
Alan Orth
1aa2084230
CHANGELOG.md: Add note about checks
2021-03-16 16:11:24 +02:00
Alan Orth
ed084da08c
CHANGELOG.md: Add note about multi-value separators
continuous-integration/drone/push Build is passing
Details
2021-03-14 21:04:19 +02:00
Alan Orth
fb35afd937
CHANGELOG.md: Add note about requests cache
2021-03-14 09:13:51 +02:00
Alan Orth
1008acf35e
Always fix invalid multi-value separators
...
continuous-integration/drone/push Build is passing
Details
This is no longer class-ified as "unsafe" as I have yet to see a
case where this was intentional, and it always causes issues when
you import the data in a DSpace repository.
2021-03-13 12:59:45 +02:00
Alan Orth
1554cfd5c9
Version 0.4.6
2021-03-11 12:14:54 +02:00
Alan Orth
00b8faad6d
CHANGELOG.md: Fix headers
2021-03-11 12:13:22 +02:00
Alan Orth
7ad821dcad
CHANGELOG.md: Add note about poetry dependencies
2021-03-11 11:10:27 +02:00
Alan Orth
e0e3ca6c58
CHANGELOG.md: Add notes about DCTERMS in data/test.csv
2021-03-11 10:50:52 +02:00
Alan Orth
d7d4d4efca
CHANGELOG.md: Add note about SPDX license identifiers
2021-03-11 10:37:27 +02:00
Alan Orth
202bda862a
Bump version to 0.4.5
continuous-integration/drone/push Build is passing
Details
2021-03-04 21:38:10 +02:00
Alan Orth
fc5bedcc5c
CHANGELOG.md: Add poetry update
2021-03-04 21:32:46 +02:00
Alan Orth
27b2d81ca8
CHANGELOG.md: Add note about dcterms.issued
continuous-integration/drone/push Build is passing
Details
2021-02-28 15:14:39 +02:00
Alan Orth
d76e72532a
Move unreleased changes to v0.4.4
continuous-integration/drone/push Build is passing
Details
2021-02-21 13:25:22 +02:00