mirror of
https://github.com/ilri/cgspace-java-helpers.git
synced 2025-05-13 16:37:50 +02:00
Compare commits
11 Commits
Author | SHA1 | Date | |
---|---|---|---|
d5cf51c464
|
|||
98c7cfb3a5
|
|||
58365cdfda
|
|||
7190b751e1
|
|||
34acc351a5
|
|||
ec293b3b28
|
|||
31cd979b61
|
|||
fce81c6003
|
|||
26d3cbd778
|
|||
fdc910f93b
|
|||
e0d514e797
|
19
CHANGELOG.md
Normal file
19
CHANGELOG.md
Normal file
@ -0,0 +1,19 @@
|
|||||||
|
# Changelog
|
||||||
|
All notable changes to this project will be documented in this file.
|
||||||
|
|
||||||
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
||||||
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||||
|
|
||||||
|
## [5.3] - 2020-08-07
|
||||||
|
### Changed
|
||||||
|
- Make sure `FixJpgJpgThumbnails` only replaces thumbnails where the original is less than ~100KiB
|
||||||
|
- Make sure `FixJpgJpgThumbnails` only replaces thumbnails if the item type is not `Infographic` (because the JPG in the ORIGINAL bundle is the "real" file and it's OK that the thumbnail is ".jpg.jpg")
|
||||||
|
|
||||||
|
## [5.2] - 2020-08-06
|
||||||
|
### Changed
|
||||||
|
- Make `FixJpgJpgThumbnails` helper check for files named "JPG" as well as "jpg" (case insensitive)
|
||||||
|
- Make `FixJpgJpgThumbnails` helper replace thumbnails with description `IM Thumbnail` as well as `Generated Thumbnail`
|
||||||
|
|
||||||
|
## [5.1] - 2020-08-06
|
||||||
|
### Added
|
||||||
|
- Add `FixJpgJpgThumbnails` helper to replace ".jpg.jpg" thumbnails with their originals
|
38
README.md
38
README.md
@ -15,7 +15,7 @@ To use these curation tasks in a DSpace project add the following dependency to
|
|||||||
<dependency>
|
<dependency>
|
||||||
<groupId>io.github.ilri.cgspace</groupId>
|
<groupId>io.github.ilri.cgspace</groupId>
|
||||||
<artifactId>cgspace-java-helpers</artifactId>
|
<artifactId>cgspace-java-helpers</artifactId>
|
||||||
<version>5.1</version>
|
<version>5.4-SNAPSHOT</version>
|
||||||
</dependency>
|
</dependency>
|
||||||
```
|
```
|
||||||
|
|
||||||
@ -31,42 +31,14 @@ $ mvn package
|
|||||||
Copy the resulting jar to the DSpace `lib` directory:
|
Copy the resulting jar to the DSpace `lib` directory:
|
||||||
|
|
||||||
```
|
```
|
||||||
$ cp target/cgspace-java-helpers-5.1.jar ~/dspace/lib
|
$ cp target/cgspace-java-helpers-5.4-SNAPSHOT.jar ~/dspace/lib
|
||||||
```
|
```
|
||||||
|
|
||||||
## Configuration
|
## Configuration
|
||||||
Add the curation task to DSpace's `config/modules/curate.cfg`:
|
Please refer to the appropriate README.md file:
|
||||||
|
|
||||||
```
|
- Curation Tasks: [src/main/java/io/github/ilri/cgspace/ctasks/README.md](https://github.com/ilri/cgspace-java-helpers/blob/dspace5/src/main/java/io/github/ilri/cgspace/ctasks/README.md)
|
||||||
plugin.named.org.dspace.curate.CurationTask = \
|
- Scripts: [src/main/java/io/github/ilri/cgspace/scripts/README.md](https://github.com/ilri/cgspace-java-helpers/blob/dspace5/src/main/java/io/github/ilri/cgspace/scripts/README.md)
|
||||||
...
|
|
||||||
io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger \
|
|
||||||
io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger.force
|
|
||||||
```
|
|
||||||
|
|
||||||
And then add a configuration file for the task in `config/modules/countrycodetagger.cfg`:
|
|
||||||
|
|
||||||
```
|
|
||||||
# name of the field containing ISO 3166-1 country names
|
|
||||||
iso3166.field = cg.coverage.country
|
|
||||||
|
|
||||||
# name of the field containing ISO 3166-1 Alpha2 country codes
|
|
||||||
iso3166-alpha2.field = cg.coverage.iso3166-alpha2
|
|
||||||
|
|
||||||
# only add country codes if an item doesn't have any (default false)
|
|
||||||
#forceupdate = false
|
|
||||||
```
|
|
||||||
|
|
||||||
*Note*: DSpace's curation system supports "profiles" where you can use the same task with different options, for example above I have a normal country code tagger and a "force" variant. To use the "force" variant you create a new configuration file with the overridden options in `config/modules/countrycodetagger.force.cfg`. The "force" profile clears all existing country codes and updates everything.
|
|
||||||
|
|
||||||
## Invocation
|
|
||||||
Once the jar is installed and you have added appropriate configuration in `~/dspace/config/modules`:
|
|
||||||
|
|
||||||
```
|
|
||||||
$ ~/dspace/bin/dspace curate -t countrycodetagger -i 10568/3 -r - -l 500 -s object
|
|
||||||
```
|
|
||||||
|
|
||||||
*Note*: it is very important to set the cache limit (`-l`) and the database transaction scope to something sensible (`object`) if you're curating a community or collection with more than a few hundred items.
|
|
||||||
|
|
||||||
## Notes
|
## Notes
|
||||||
This project was initially created according to the [Maven Getting Started Guide](https://maven.apache.org/guides/getting-started/):
|
This project was initially created according to the [Maven Getting Started Guide](https://maven.apache.org/guides/getting-started/):
|
||||||
|
2
pom.xml
2
pom.xml
@ -6,7 +6,7 @@
|
|||||||
|
|
||||||
<groupId>io.github.ilri.cgspace</groupId>
|
<groupId>io.github.ilri.cgspace</groupId>
|
||||||
<artifactId>cgspace-java-helpers</artifactId>
|
<artifactId>cgspace-java-helpers</artifactId>
|
||||||
<version>5.2</version>
|
<version>5.4-SNAPSHOT</version>
|
||||||
|
|
||||||
<name>cgspace-java-helpers</name>
|
<name>cgspace-java-helpers</name>
|
||||||
<url>https://github.com/ilri/cgspace-java-helpers</url>
|
<url>https://github.com/ilri/cgspace-java-helpers</url>
|
||||||
|
@ -35,6 +35,11 @@ import java.sql.SQLException;
|
|||||||
import java.util.ArrayList;
|
import java.util.ArrayList;
|
||||||
import java.util.List;
|
import java.util.List;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @author Alan Orth for the International Livestock Research Institute
|
||||||
|
* @version 5.1
|
||||||
|
* @since 1.0
|
||||||
|
*/
|
||||||
public class CountryCodeTagger extends AbstractCurationTask
|
public class CountryCodeTagger extends AbstractCurationTask
|
||||||
{
|
{
|
||||||
public class CountryCodeTaggerConfig {
|
public class CountryCodeTaggerConfig {
|
||||||
|
74
src/main/java/io/github/ilri/cgspace/ctasks/README.md
Normal file
74
src/main/java/io/github/ilri/cgspace/ctasks/README.md
Normal file
@ -0,0 +1,74 @@
|
|||||||
|
# Curation Tasks
|
||||||
|
DSpace curation tasks used on the [CGSpace](https://cgspace.cgiar.org) institutional repository:
|
||||||
|
|
||||||
|
- **CountryCodeTagger**: add ISO 3166-1 Alpha2 country codes to items based on their existing country metadata
|
||||||
|
|
||||||
|
Tested on DSpace 5.8. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC5x/Curation+System).
|
||||||
|
|
||||||
|
## Build and Install
|
||||||
|
|
||||||
|
### Integrate into DSpace Build
|
||||||
|
To use these curation tasks in a DSpace project add the following dependency to `dspace/modules/additions/pom.xml`:
|
||||||
|
|
||||||
|
```
|
||||||
|
<dependency>
|
||||||
|
<groupId>io.github.ilri.cgspace</groupId>
|
||||||
|
<artifactId>cgspace-java-helpers</artifactId>
|
||||||
|
<version>5.3</version>
|
||||||
|
</dependency>
|
||||||
|
```
|
||||||
|
|
||||||
|
The jar will be copied to all DSpace applications.
|
||||||
|
|
||||||
|
### Manual Build and Install
|
||||||
|
To build the standalone jar:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ mvn package
|
||||||
|
```
|
||||||
|
|
||||||
|
Copy the resulting jar to the DSpace `lib` directory:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ cp target/cgspace-java-helpers-5.3.jar ~/dspace/lib
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
Add the curation task to DSpace's `config/modules/curate.cfg`:
|
||||||
|
|
||||||
|
```
|
||||||
|
plugin.named.org.dspace.curate.CurationTask = \
|
||||||
|
...
|
||||||
|
io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger \
|
||||||
|
io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger.force
|
||||||
|
```
|
||||||
|
|
||||||
|
And then add a configuration file for the task in `config/modules/countrycodetagger.cfg`:
|
||||||
|
|
||||||
|
```
|
||||||
|
# name of the field containing ISO 3166-1 country names
|
||||||
|
iso3166.field = cg.coverage.country
|
||||||
|
|
||||||
|
# name of the field containing ISO 3166-1 Alpha2 country codes
|
||||||
|
iso3166-alpha2.field = cg.coverage.iso3166-alpha2
|
||||||
|
|
||||||
|
# only add country codes if an item doesn't have any (default false)
|
||||||
|
#forceupdate = false
|
||||||
|
```
|
||||||
|
|
||||||
|
*Note*: DSpace's curation system supports "profiles" where you can use the same task with different options, for example above I have a normal country code tagger and a "force" variant. To use the "force" variant you create a new configuration file with the overridden options in `config/modules/countrycodetagger.force.cfg`. The "force" profile clears all existing country codes and updates everything.
|
||||||
|
|
||||||
|
## Invocation
|
||||||
|
Once the jar is installed and you have added appropriate configuration in `~/dspace/config/modules`:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ ~/dspace/bin/dspace curate -t countrycodetagger -i 10568/3 -r - -l 500 -s object
|
||||||
|
```
|
||||||
|
|
||||||
|
*Note*: it is very important to set the cache limit (`-l`) and the database transaction scope to something sensible (`object`) if you're curating a community or collection with more than a few hundred items.
|
||||||
|
|
||||||
|
## TODO
|
||||||
|
|
||||||
|
- Make sure this doesn't work on items in the workflow
|
||||||
|
- Check for existence of metadata field before trying to add metadata
|
||||||
|
- Add tests
|
@ -13,8 +13,8 @@ import java.sql.SQLException;
|
|||||||
/**
|
/**
|
||||||
* @author Andrea Schweer schweer@waikato.ac.nz for the LCoNZ Institutional Research Repositories
|
* @author Andrea Schweer schweer@waikato.ac.nz for the LCoNZ Institutional Research Repositories
|
||||||
* @author Alan Orth for the International Livestock Research Institute
|
* @author Alan Orth for the International Livestock Research Institute
|
||||||
* @version 5.1-SNAPSHOT
|
* @version 5.4
|
||||||
* @since 5.1-SNAPSHOT
|
* @since 5.1
|
||||||
*/
|
*/
|
||||||
public class FixJpgJpgThumbnails {
|
public class FixJpgJpgThumbnails {
|
||||||
|
|
||||||
@ -73,6 +73,17 @@ public class FixJpgJpgThumbnails {
|
|||||||
}
|
}
|
||||||
|
|
||||||
private static void processItem(Item item) throws SQLException, AuthorizeException, IOException {
|
private static void processItem(Item item) throws SQLException, AuthorizeException, IOException {
|
||||||
|
// Some bitstreams like Infographics are large JPGs and put in the ORIGINAL bundle on purpose so we shouldn't
|
||||||
|
// swap them.
|
||||||
|
Metadatum[] itemTypes = item.getMetadataByMetadataString("dc.type");
|
||||||
|
boolean itemHasInfographic = false;
|
||||||
|
for (Metadatum itemType: itemTypes) {
|
||||||
|
if (itemType.value.equals("Infographic")) {
|
||||||
|
itemHasInfographic = true;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
Bundle[] thumbnailBundles = item.getBundles("THUMBNAIL");
|
Bundle[] thumbnailBundles = item.getBundles("THUMBNAIL");
|
||||||
for (Bundle thumbnailBundle : thumbnailBundles) {
|
for (Bundle thumbnailBundle : thumbnailBundles) {
|
||||||
Bitstream[] thumbnailBundleBitstreams = thumbnailBundle.getBitstreams();
|
Bitstream[] thumbnailBundleBitstreams = thumbnailBundle.getBitstreams();
|
||||||
@ -84,11 +95,25 @@ public class FixJpgJpgThumbnails {
|
|||||||
for (Bundle originalBundle : originalBundles) {
|
for (Bundle originalBundle : originalBundles) {
|
||||||
Bitstream[] originalBundleBitstreams = originalBundle.getBitstreams();
|
Bitstream[] originalBundleBitstreams = originalBundle.getBitstreams();
|
||||||
|
|
||||||
for(Bitstream originalBitstream : originalBundleBitstreams) {
|
for (Bitstream originalBitstream : originalBundleBitstreams) {
|
||||||
String originalName = originalBitstream.getName();
|
String originalName = originalBitstream.getName();
|
||||||
|
|
||||||
//check if the original file name is the same as the thumbnail name minus the extra ".jpg"
|
long originalBitstreamBytes = originalBitstream.getSize();
|
||||||
if (originalName.equalsIgnoreCase(StringUtils.removeEndIgnoreCase(thumbnailName, ".jpg")) && ("Generated Thumbnail".equals(thumbnailBitstream.getDescription()) || "IM Thumbnail".equals(thumbnailBitstream.getDescription()))) {
|
|
||||||
|
/*
|
||||||
|
- check if the original file name is the same as the thumbnail name minus the extra ".jpg"
|
||||||
|
- check if the thumbnail description indicates it was automatically generated
|
||||||
|
- check if the item has dc.type Infographic (JPG could be the "real" item!)
|
||||||
|
- check if the original bitstream is less than ~100KiB
|
||||||
|
- Note: in my tests there were 4022 items with ".jpg.jpg" thumbnails totaling 394549249
|
||||||
|
bytes for an average of about 98KiB so ~100KiB seems like a good cut off
|
||||||
|
*/
|
||||||
|
if (
|
||||||
|
originalName.equalsIgnoreCase(StringUtils.removeEndIgnoreCase(thumbnailName, ".jpg"))
|
||||||
|
&& ("Generated Thumbnail".equals(thumbnailBitstream.getDescription()) || "IM Thumbnail".equals(thumbnailBitstream.getDescription()))
|
||||||
|
&& !itemHasInfographic
|
||||||
|
&& originalBitstreamBytes < 100000
|
||||||
|
) {
|
||||||
System.out.println(item.getHandle() + ": replacing " + thumbnailName + " with " + originalName);
|
System.out.println(item.getHandle() + ": replacing " + thumbnailName + " with " + originalName);
|
||||||
|
|
||||||
//add the original bitstream to the THUMBNAIL bundle
|
//add the original bitstream to the THUMBNAIL bundle
|
||||||
|
41
src/main/java/io/github/ilri/cgspace/scripts/README.md
Normal file
41
src/main/java/io/github/ilri/cgspace/scripts/README.md
Normal file
@ -0,0 +1,41 @@
|
|||||||
|
# Scripts
|
||||||
|
Java-based helpers used on the [CGSpace](https://cgspace.cgiar.org) institutional repository:
|
||||||
|
|
||||||
|
- **FixJpgJpgThumbnails**: Fix low-quality ".jpg.jpg" thumbnails by replacing them with their originals
|
||||||
|
|
||||||
|
Tested on DSpace 5.8. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC5x/Curation+System).
|
||||||
|
|
||||||
|
## Build and Install
|
||||||
|
|
||||||
|
### Integrate into DSpace Build
|
||||||
|
To use these curation tasks in a DSpace project add the following dependency to `dspace/modules/additions/pom.xml`:
|
||||||
|
|
||||||
|
```
|
||||||
|
<dependency>
|
||||||
|
<groupId>io.github.ilri.cgspace</groupId>
|
||||||
|
<artifactId>cgspace-java-helpers</artifactId>
|
||||||
|
<version>5.3</version>
|
||||||
|
</dependency>
|
||||||
|
```
|
||||||
|
|
||||||
|
The jar will be copied to all DSpace applications.
|
||||||
|
|
||||||
|
### Manual Build and Install
|
||||||
|
To build the standalone jar:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ mvn package
|
||||||
|
```
|
||||||
|
|
||||||
|
Copy the resulting jar to the DSpace `lib` directory:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ cp target/cgspace-java-helpers-5.3.jar ~/dspace/lib
|
||||||
|
```
|
||||||
|
|
||||||
|
### Invocation
|
||||||
|
The script only takes one argument, which is a community, collection, or item:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ dspace dsrun io.github.ilri.cgspace.scripts.FixJpgJpgThumbnails 10568/83389
|
||||||
|
```
|
Reference in New Issue
Block a user