Commit Graph

70 Commits

Author SHA1 Message Date
692a62b454 src/main/java: update curation tasks README.md
Add eperson ID to curation invocation. DSpace 7 requires this.
2024-04-29 09:33:39 +03:00
d4ca92066a Version 7.6.1.2 2024-04-25 12:58:07 +03:00
5ad8c556e9 src/main/java: simplify curation task results
We don't need to print the Handle because some items can be in the
workflow still so this will be null, but also because DSpace will
already show the Handle in the log before printing the result.
2024-04-25 12:53:15 +03:00
77425c13bf src/main/java: remove report() from curation tasks
Results are a single-line status that shows the result of the task,
but reports are like a running log of changes to the item and have
more complicated use cases and configuration requirements.

For now I will disable reports since I'm not using them.
2024-04-25 12:51:30 +03:00
9050caf37f Version 7.6.1.1
Unsure of the versioning, but something tells me I should follow
the upstream DSpace versioning to keep things simple.
2024-04-23 13:11:12 +03:00
639148dc19 src/main/java: minor update to ctasks README.md 2024-04-23 13:08:52 +03:00
7a91305742 Add new NormalizeDOIs curation task 2024-04-23 13:07:55 +03:00
0cb533b2c4 Fix license headers
I meant to use GPL-3.0-only.
2024-04-22 16:59:12 +03:00
ee6518035e Bump version to 7.6.1 2024-01-02 20:34:14 +03:00
9faf657c59 Bump version to 7.6-SNAPSHOT 2024-01-02 19:54:46 +03:00
7fb78c2722 src/main/java: minor refactoring
Suggested by IntelliJ.
2024-01-02 19:34:51 +03:00
6ef9f521bf src/main/resources: fix trailing comma in JSON 2024-01-02 18:03:52 +03:00
f9d7e5f6a2 src/main/java: minor refactor
Use isEmpty() instead of checking size.
2023-12-28 10:26:11 +03:00
9e965afdb7 src/main/java: change getSize() to getSizeBytes()
Apparently this changed in DSpace 7. Untested, but it compiles now.
2023-12-28 10:18:40 +03:00
408a0e1c19 src/main/java: update log4j usage
Untested, but compiles.
2023-12-28 10:17:24 +03:00
0a7cf7bf59 Import iso-codes snapshot
After my merge request to Debian's iso-codes package was merged we
now no longer need to maintain local overrides for Iran, Laos, and
Syria, as those are officially in iso-codes.

See: https://salsa.debian.org/iso-codes-team/iso-codes/-/merge_requests/32
2023-02-26 21:13:44 +03:00
8c0a8fbcd1 Bump version to 6.2-SNAPSHOT
I can't figure out how to get non-snapshot releases on Central.
2023-02-21 10:59:54 +03:00
c05a2e4f96 Version 6.2 2023-02-20 20:37:40 +03:00
1f6ba4af67 src: import iso-codes 4.12.0
This updates the name for TR from "Turkey" to "Türkiye".

See: https://salsa.debian.org/iso-codes-team/iso-codes/-/blob/main/CHANGELOG.md#4120-2022-11-06
2022-11-07 12:21:39 +03:00
dfaa234a90 src/main/resources: sync cgspace-countries.json with iso-codes
Not sure this is needed, but we copy the JSON object from iso-codes
so we should keep it in sync when there are changes to countries we
override.
2022-10-14 20:49:23 +03:00
f46e81b8cd src/main/resources: import iso-codes 4.11.0
This is a bit old by now even, but there are two changes:

- South Korea
- North Korea
2022-10-14 20:47:26 +03:00
dbd8721579 src: add better status messages to FixLowQualityThumbnails 2022-10-07 15:33:13 +03:00
80a336f94d src: fix context commit in scripts
I was wondering why the same bitstreams appeared to be getting de-
leted on every single run. It turns out that the only mode we were
committing the context in was in single item mode. If the argument
was a site, community, or collection we were updating the item but
not actually committing the changes!
2022-10-07 14:49:58 +03:00
5ebf4930cf src: re-organize switch statements in scripts
It makes more sense to me to start from the top level of the hier-
archy.
2022-10-07 13:11:03 +03:00
b396fba043 src: format Java files with google-java-format
Using AOSP format so we get four spaces instead of two.
2022-10-06 14:27:51 +03:00
38a9cc5188 src: organize imports in VS Code 2022-10-06 14:26:44 +03:00
16db38967b src: handle null descriptions in FixJpgJpgThumbnails 2022-10-06 14:17:41 +03:00
2604dc3cce src: skip Infographics and Maps in FixJpgJpgThumbnails
Instead of checking whether they exist and then skipping them just
at the moment when we want to swap the bitstreams let's bail early
when we know an item is an Infographic or a Map.
2022-10-06 14:15:58 +03:00
f0754ab419 src: fix npe on null description
In FixLowQualityThumbnails we need to make sure that bitstream de-
scriptions are not null or empty before trying to evaluate them.
2022-10-05 21:00:14 +03:00
6772145bec src: fix SPDX license header
Use GPL-3.0-or-later instead of GPL-3.0-only. I had specified this
in pom.xml already.
2022-10-05 16:53:00 +03:00
095f843067 src: add SPDX license headers 2022-10-05 15:48:57 +03:00
922e3892a7 Update README.md files 2022-10-05 15:24:08 +03:00
6b648c2c85 src: add FixLowQualityThumbnails.java
This adds another script to detect and remove more low-quality thu-
mbnails. For example:

- If an item has an "IM Thumbnail" and a "Generated Thumbnail" in the
  THUMBNAIL bundle, remove the "Generated Thumbnail"
- If an item has a PDF bitstream and a JPEG bitstream with a name or
  description "thumbnail" in the ORIGINAL bundle, remove the
  "thumbnail" bitstream in the ORIGINAL bundle and try to remove the
  "thumbnail.jpg" bitstream in the THUMBNAIL bundle

The idea is that we should *always* prefer thumbnails generated by
ImageMagick from PDFs in the ORIGINAL bundle and should remove any
other manually uploaded thumbnails.
2022-10-05 15:07:56 +03:00
3aa1503163 src: bump version of FixJpgJpgThumbnails.java 2022-10-04 21:13:24 +03:00
26597e2f8f Use dcterms.type in FixJpgJpgThumbnails script
We are now using dcterms.type instead of dc.type.
2022-10-04 16:16:43 +03:00
2e779efb14 src/main/java: Adjust curation README
DSpace 6 doesn't have the `-l` option to limit the cache size.
2020-08-10 20:04:46 +03:00
735e759033 Adjust READMEs again... 2020-08-10 17:16:14 +03:00
271a9ce970 Adjust README.md files 2020-08-10 15:55:11 +03:00
4bc7971ecb src/main/java: Remove debug comment 2020-08-07 22:55:35 +03:00
da1ecad238 src/main/java: DSpace 6 port of FixJpgJpgThumbnails.java
Need to use the new DSpace 6 service model in most places. Not sure
why addBitstream is no longer public, but removeBitstream is...
2020-08-07 22:45:07 +03:00
f3ab89f7a1 CountryCodeTagger.java: Port to DSpace 6
We need to use the new DSpace 6 service API. Also, the way we read
task properties changes because of the configuration changes.

See: https://wiki.lyrasis.org/display/DSDOC6x/Curation+System
See: https://wiki.lyrasis.org/display/DSDOC6x/Configuration+Reference
2020-08-05 12:28:37 +03:00
7251b85436 cgspace-countries.json: Remove Palestine
It's the same in the ISO 3166-1 list.
2020-08-04 14:52:36 +03:00
dcb0532be2 Change groupId to prepare for upload to Central
It's much easier to get your package verified on Central if it uses
a GitHub groupId. Otherwise you need to use DNS verification! This
changes the groupId:

- from: org.cgiar.cgspace.ctask
- to: io.github.ilri.cgspace

Also the package changed as well.

See: https://central.sonatype.org/pages/producers.html
2020-08-02 23:48:13 +03:00
ca7deaac8f CountryCodeTagger.java: Remove unused variable
Some of the other curation tasks use an array of results.
2020-08-02 22:03:10 +03:00
e158e4bc98 CountryCodeTagger.java: Refactor adding of alpha2 codes
We can append the codes we will add to a List of Strings and then
actually apply them later in one addMetadata call, and update the
item with one item.update() call. This reduces identical code and
is more efficient.

Note that when testing this on a collection with thousands of items
I realized that it is really important to limit both the cache size
as well as set the database transaction model to be per object/item
or else you will crash due to Java heap issues. For example:

    $ ~/dspace/bin/dspace curate -t countrycodetagger -i 10568/3 -r - -l 500 -s object

See: https://wiki.lyrasis.org/display/DSPACE/Curation+Task+Cookbook
2020-08-02 18:33:32 +03:00
1c866bdf64 src/main/java: Remove unnecessary comments and prints 2020-08-02 18:32:04 +03:00
e5d45e62be src/main/java: Refactor CountryCodeTagger.java
Now is much more modular and can easily, cleanly be extended to do
ISO 3166-1 Alpha3, numeric, etc...
2020-08-02 15:51:18 +03:00
6228f337e9 src/main/java: Skip items that have country codes
Originally I wasn't sure if I was going to try to parse each code,
check them against the mapping, and possibly correct them, but it's
easier to just skip items with codes unless we're in "force" mode.
2020-08-01 23:14:19 +03:00
4b553676dd src/main/java: Implement task "profiles"
The DSpace curation system has task properties that can be used to
create "profiles" of sorts. For example, if you set a custom task
name in curate.cfg:

    plugin.named.org.dspace.curate.CurationTask = \
        org.cgiar.cgspace.ctasks.CountryCodeTagger = countrycodetagger \
        org.cgiar.cgspace.ctasks.CountryCodeTagger = countrycodetagger.force

... then DSpace will look for countrycodetagger.cfg by default, and
countrycodetagger.force.cfg for the second task. We can set different
properties in each one, for example "force=true", and then operate
accordingly in the task when we check the value using taskProperty().

I will use this to force all country tags to be cleared and updated,
where by default we only tag if there are no existing country tags.

See: https://wiki.lyrasis.org/display/DSDOC5x/Curation+System
2020-08-01 23:04:35 +03:00
d4cd5bfd61 src/main/java: Optimize imports 2020-08-01 23:03:51 +03:00