Alan Orth 26d3cbd778
src/main/java: Tune FixJpgJpgThumbnails a bit
Make sure we don't modify thumbnails if the item is an Infographic
because the JPG in the ORIGINAL bundle might actually be the "real"
file, in which case the THUMBNAIL bundle would have a legitimate
".jpg.jpg" file.

Also, limit the criteria for replacement to original bitstreams
that are less than 100KiB. In my tests I found that we had 4,022
items with ".jpg.jpg" thumbnails, and the average file size of the
originals in those items was 98KiB. Without considering the large
inforgraphics, which are several megabytes apiece, the average of
the remaining 3,765 originals was ~20KiB so 100KiB should be very
safe.
2020-08-07 09:50:03 +03:00
2020-08-04 15:34:31 +03:00
2020-08-04 15:32:34 +03:00
2020-08-03 14:29:17 +03:00
2020-07-31 21:40:15 +03:00
2020-08-06 16:17:05 +03:00
2020-08-06 16:23:17 +03:00

CGSpace Java Helpers Build Status

DSpace curation tasks and other Java-based helpers used on the CGSpace institutional repository:

  • CountryCodeTagger: add ISO 3166-1 Alpha2 country codes to items based on their existing country metadata
  • FixJpgJpgThumbnails: Fix low-quality ".jpg.jpg" thumbnails by replacing them with their originals

Tested on DSpace 5.8. Read more about the DSpace curation system.

Build and Install

Integrate into DSpace Build

To use these curation tasks in a DSpace project add the following dependency to dspace/modules/additions/pom.xml:

<dependency>
  <groupId>io.github.ilri.cgspace</groupId>
  <artifactId>cgspace-java-helpers</artifactId>
  <version>5.2</version>
</dependency>

The jar will be copied to all DSpace applications.

Manual Build and Install

To build the standalone jar:

$ mvn package

Copy the resulting jar to the DSpace lib directory:

$ cp target/cgspace-java-helpers-5.2.jar ~/dspace/lib

Configuration

Add the curation task to DSpace's config/modules/curate.cfg:

plugin.named.org.dspace.curate.CurationTask = \
...
    io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger \
    io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger.force

And then add a configuration file for the task in config/modules/countrycodetagger.cfg:

# name of the field containing ISO 3166-1 country names
iso3166.field = cg.coverage.country

# name of the field containing ISO 3166-1 Alpha2 country codes
iso3166-alpha2.field = cg.coverage.iso3166-alpha2

# only add country codes if an item doesn't have any (default false)
#forceupdate = false

Note: DSpace's curation system supports "profiles" where you can use the same task with different options, for example above I have a normal country code tagger and a "force" variant. To use the "force" variant you create a new configuration file with the overridden options in config/modules/countrycodetagger.force.cfg. The "force" profile clears all existing country codes and updates everything.

Invocation

Once the jar is installed and you have added appropriate configuration in ~/dspace/config/modules:

$ ~/dspace/bin/dspace curate -t countrycodetagger -i 10568/3 -r - -l 500 -s object

Note: it is very important to set the cache limit (-l) and the database transaction scope to something sensible (object) if you're curating a community or collection with more than a few hundred items.

Notes

This project was initially created according to the Maven Getting Started Guide:

$ mvn -B archetype:generate -DgroupId=io.github.ilri.cgspace -DartifactId=cgspace-java-helpers -DarchetypeArtifactId=maven-archetype-quickstart -DarchetypeVersion=1.4

TODO

  • Make sure this doesn't work on items in the workflow
  • Check for existence of metadata field before trying to add metadata
  • Add tests

License

This work is licensed under the GPLv3.

This repository contains data from the Debian iso-codes project project, which is licensed under the GNU Lesser General Public License v2.1.

Description
Curation tasks and helper scripts for the CGSpace institutional repository.
Readme 417 KiB
Languages
Java 100%