mirror of
https://github.com/ilri/cgspace-java-helpers.git
synced 2025-05-10 23:26:05 +02:00
Compare commits
25 Commits
Author | SHA1 | Date | |
---|---|---|---|
d5cf51c464
|
|||
98c7cfb3a5
|
|||
58365cdfda
|
|||
7190b751e1
|
|||
34acc351a5
|
|||
ec293b3b28
|
|||
31cd979b61
|
|||
fce81c6003
|
|||
26d3cbd778
|
|||
fdc910f93b
|
|||
e0d514e797
|
|||
fd893d8c4e
|
|||
2263ac27e8
|
|||
cf7012d698
|
|||
7edc60e6ca
|
|||
fe2abc86c6
|
|||
e1d92ef2c7
|
|||
3e3c544cfa
|
|||
db9881faf6
|
|||
fa5fb60b5b
|
|||
44fb9a9f4d
|
|||
b790d5e4db
|
|||
08e7546a87
|
|||
ff076ecf50
|
|||
7a5dd1c094
|
19
CHANGELOG.md
Normal file
19
CHANGELOG.md
Normal file
@ -0,0 +1,19 @@
|
||||
# Changelog
|
||||
All notable changes to this project will be documented in this file.
|
||||
|
||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
||||
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||
|
||||
## [5.3] - 2020-08-07
|
||||
### Changed
|
||||
- Make sure `FixJpgJpgThumbnails` only replaces thumbnails where the original is less than ~100KiB
|
||||
- Make sure `FixJpgJpgThumbnails` only replaces thumbnails if the item type is not `Infographic` (because the JPG in the ORIGINAL bundle is the "real" file and it's OK that the thumbnail is ".jpg.jpg")
|
||||
|
||||
## [5.2] - 2020-08-06
|
||||
### Changed
|
||||
- Make `FixJpgJpgThumbnails` helper check for files named "JPG" as well as "jpg" (case insensitive)
|
||||
- Make `FixJpgJpgThumbnails` helper replace thumbnails with description `IM Thumbnail` as well as `Generated Thumbnail`
|
||||
|
||||
## [5.1] - 2020-08-06
|
||||
### Added
|
||||
- Add `FixJpgJpgThumbnails` helper to replace ".jpg.jpg" thumbnails with their originals
|
49
README.md
49
README.md
@ -1,7 +1,8 @@
|
||||
# DSpace Curation Tasks [](https://travis-ci.org/ilri/dspace-curation-tasks)
|
||||
Metadata curation tasks used on the [CGSpace](https://cgspace.cgiar.org) institutional repository:
|
||||
# CGSpace Java Helpers [](https://travis-ci.org/ilri/dspace-curation-tasks)
|
||||
DSpace curation tasks and other Java-based helpers used on the [CGSpace](https://cgspace.cgiar.org) institutional repository:
|
||||
|
||||
- **CountryCodeTagger**: add ISO 3166-1 Alpha2 country codes to items based on their existing country metadata
|
||||
- **FixJpgJpgThumbnails**: Fix low-quality ".jpg.jpg" thumbnails by replacing them with their originals
|
||||
|
||||
Tested on DSpace 5.8. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC5x/Curation+System).
|
||||
|
||||
@ -13,8 +14,8 @@ To use these curation tasks in a DSpace project add the following dependency to
|
||||
```
|
||||
<dependency>
|
||||
<groupId>io.github.ilri.cgspace</groupId>
|
||||
<artifactId>dspace-curation-tasks</artifactId>
|
||||
<version>1.0-SNAPSHOT</version>
|
||||
<artifactId>cgspace-java-helpers</artifactId>
|
||||
<version>5.4-SNAPSHOT</version>
|
||||
</dependency>
|
||||
```
|
||||
|
||||
@ -30,55 +31,25 @@ $ mvn package
|
||||
Copy the resulting jar to the DSpace `lib` directory:
|
||||
|
||||
```
|
||||
$ cp target/dspace-curation-tasks-1.0-SNAPSHOT.jar ~/dspace/lib/dspace-curation-tasks-1.0-SNAPSHOT.jar
|
||||
$ cp target/cgspace-java-helpers-5.4-SNAPSHOT.jar ~/dspace/lib
|
||||
```
|
||||
|
||||
## Configuration
|
||||
Add the curation task to DSpace's `config/modules/curate.cfg`:
|
||||
Please refer to the appropriate README.md file:
|
||||
|
||||
```
|
||||
plugin.named.org.dspace.curate.CurationTask = \
|
||||
...
|
||||
io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger \
|
||||
io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger.force
|
||||
```
|
||||
|
||||
And then add a configuration file for the task in `config/modules/countrycodetagger.cfg`:
|
||||
|
||||
```
|
||||
# name of the field containing ISO 3166-1 country names
|
||||
iso3166.field = cg.coverage.country
|
||||
|
||||
# name of the field containing ISO 3166-1 Alpha2 country codes
|
||||
iso3166-alpha2.field = cg.coverage.iso3166-alpha2
|
||||
|
||||
# only add country codes if an item doesn't have any (default false)
|
||||
#forceupdate = false
|
||||
```
|
||||
|
||||
*Note*: DSpace's curation system supports "profiles" where you can use the same task with different options, for example above I have a normal country code tagger and a "force" variant. To use the "force" variant you create a new configuration file with the overridden options in `config/modules/countrycodetagger.force.cfg`. The "force" profile clears all existing country codes and updates everything.
|
||||
|
||||
## Invocation
|
||||
Once the jar is installed and you have added appropriate configuration in `~/dspace/config/modules`:
|
||||
|
||||
```
|
||||
$ ~/dspace/bin/dspace curate -t countrycodetagger -i 10568/3 -r - -l 500 -s object
|
||||
```
|
||||
|
||||
*Note*: it is very important to set the cache limit (`-l`) and the database transaction scope to something sensible (`object`) if you're curating a community or collection with more than a few hundred items.
|
||||
- Curation Tasks: [src/main/java/io/github/ilri/cgspace/ctasks/README.md](https://github.com/ilri/cgspace-java-helpers/blob/dspace5/src/main/java/io/github/ilri/cgspace/ctasks/README.md)
|
||||
- Scripts: [src/main/java/io/github/ilri/cgspace/scripts/README.md](https://github.com/ilri/cgspace-java-helpers/blob/dspace5/src/main/java/io/github/ilri/cgspace/scripts/README.md)
|
||||
|
||||
## Notes
|
||||
This project was initially created according to the [Maven Getting Started Guide](https://maven.apache.org/guides/getting-started/):
|
||||
|
||||
```console
|
||||
$ mvn -B archetype:generate -DgroupId=io.github.ilri.cgspace -DartifactId=dspace-curation-tasks -DarchetypeArtifactId=maven-archetype-quickstart -DarchetypeVersion=1.4
|
||||
$ mvn -B archetype:generate -DgroupId=io.github.ilri.cgspace -DartifactId=cgspace-java-helpers -DarchetypeArtifactId=maven-archetype-quickstart -DarchetypeVersion=1.4
|
||||
```
|
||||
|
||||
## TODO
|
||||
|
||||
- Make sure this doesn't work on items in the workflow
|
||||
- Port to DSpace 6
|
||||
- Remember to bump Gson version!
|
||||
- Check for existence of metadata field before trying to add metadata
|
||||
- Add tests
|
||||
|
||||
|
14
pom.xml
14
pom.xml
@ -5,11 +5,11 @@
|
||||
<modelVersion>4.0.0</modelVersion>
|
||||
|
||||
<groupId>io.github.ilri.cgspace</groupId>
|
||||
<artifactId>dspace-curation-tasks</artifactId>
|
||||
<version>1.0-SNAPSHOT</version>
|
||||
<artifactId>cgspace-java-helpers</artifactId>
|
||||
<version>5.4-SNAPSHOT</version>
|
||||
|
||||
<name>dspace-curation-tasks</name>
|
||||
<url>https://github.com/ilri/dspace-curation-tasks</url>
|
||||
<name>cgspace-java-helpers</name>
|
||||
<url>https://github.com/ilri/cgspace-java-helpers</url>
|
||||
|
||||
<licenses>
|
||||
<license>
|
||||
@ -53,9 +53,9 @@
|
||||
</dependencies>
|
||||
|
||||
<scm>
|
||||
<connection>scm:git:git://github.com/ilri/dspace-curation-tasks.git</connection>
|
||||
<developerConnection>scm:git:ssh://github.com:nanosai/dspace-curation-tasks.git</developerConnection>
|
||||
<url>http://github.com/ilri/dspace-curation-tasks</url>
|
||||
<connection>scm:git:git://github.com/ilri/cgspace-java-helpers.git</connection>
|
||||
<developerConnection>scm:git:ssh://github.com:nanosai/cgspace-java-helpers.git</developerConnection>
|
||||
<url>http://github.com/ilri/cgspace-java-helpers</url>
|
||||
</scm>
|
||||
|
||||
<distributionManagement>
|
||||
|
@ -35,6 +35,11 @@ import java.sql.SQLException;
|
||||
import java.util.ArrayList;
|
||||
import java.util.List;
|
||||
|
||||
/**
|
||||
* @author Alan Orth for the International Livestock Research Institute
|
||||
* @version 5.1
|
||||
* @since 1.0
|
||||
*/
|
||||
public class CountryCodeTagger extends AbstractCurationTask
|
||||
{
|
||||
public class CountryCodeTaggerConfig {
|
||||
|
74
src/main/java/io/github/ilri/cgspace/ctasks/README.md
Normal file
74
src/main/java/io/github/ilri/cgspace/ctasks/README.md
Normal file
@ -0,0 +1,74 @@
|
||||
# Curation Tasks
|
||||
DSpace curation tasks used on the [CGSpace](https://cgspace.cgiar.org) institutional repository:
|
||||
|
||||
- **CountryCodeTagger**: add ISO 3166-1 Alpha2 country codes to items based on their existing country metadata
|
||||
|
||||
Tested on DSpace 5.8. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC5x/Curation+System).
|
||||
|
||||
## Build and Install
|
||||
|
||||
### Integrate into DSpace Build
|
||||
To use these curation tasks in a DSpace project add the following dependency to `dspace/modules/additions/pom.xml`:
|
||||
|
||||
```
|
||||
<dependency>
|
||||
<groupId>io.github.ilri.cgspace</groupId>
|
||||
<artifactId>cgspace-java-helpers</artifactId>
|
||||
<version>5.3</version>
|
||||
</dependency>
|
||||
```
|
||||
|
||||
The jar will be copied to all DSpace applications.
|
||||
|
||||
### Manual Build and Install
|
||||
To build the standalone jar:
|
||||
|
||||
```
|
||||
$ mvn package
|
||||
```
|
||||
|
||||
Copy the resulting jar to the DSpace `lib` directory:
|
||||
|
||||
```
|
||||
$ cp target/cgspace-java-helpers-5.3.jar ~/dspace/lib
|
||||
```
|
||||
|
||||
## Configuration
|
||||
Add the curation task to DSpace's `config/modules/curate.cfg`:
|
||||
|
||||
```
|
||||
plugin.named.org.dspace.curate.CurationTask = \
|
||||
...
|
||||
io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger \
|
||||
io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger.force
|
||||
```
|
||||
|
||||
And then add a configuration file for the task in `config/modules/countrycodetagger.cfg`:
|
||||
|
||||
```
|
||||
# name of the field containing ISO 3166-1 country names
|
||||
iso3166.field = cg.coverage.country
|
||||
|
||||
# name of the field containing ISO 3166-1 Alpha2 country codes
|
||||
iso3166-alpha2.field = cg.coverage.iso3166-alpha2
|
||||
|
||||
# only add country codes if an item doesn't have any (default false)
|
||||
#forceupdate = false
|
||||
```
|
||||
|
||||
*Note*: DSpace's curation system supports "profiles" where you can use the same task with different options, for example above I have a normal country code tagger and a "force" variant. To use the "force" variant you create a new configuration file with the overridden options in `config/modules/countrycodetagger.force.cfg`. The "force" profile clears all existing country codes and updates everything.
|
||||
|
||||
## Invocation
|
||||
Once the jar is installed and you have added appropriate configuration in `~/dspace/config/modules`:
|
||||
|
||||
```
|
||||
$ ~/dspace/bin/dspace curate -t countrycodetagger -i 10568/3 -r - -l 500 -s object
|
||||
```
|
||||
|
||||
*Note*: it is very important to set the cache limit (`-l`) and the database transaction scope to something sensible (`object`) if you're curating a community or collection with more than a few hundred items.
|
||||
|
||||
## TODO
|
||||
|
||||
- Make sure this doesn't work on items in the workflow
|
||||
- Check for existence of metadata field before trying to add metadata
|
||||
- Add tests
|
@ -0,0 +1,132 @@
|
||||
package io.github.ilri.cgspace.scripts;
|
||||
|
||||
import org.apache.commons.lang.StringUtils;
|
||||
import org.dspace.authorize.AuthorizeException;
|
||||
import org.dspace.content.*;
|
||||
import org.dspace.core.Constants;
|
||||
import org.dspace.core.Context;
|
||||
import org.dspace.handle.HandleManager;
|
||||
|
||||
import java.io.IOException;
|
||||
import java.sql.SQLException;
|
||||
|
||||
/**
|
||||
* @author Andrea Schweer schweer@waikato.ac.nz for the LCoNZ Institutional Research Repositories
|
||||
* @author Alan Orth for the International Livestock Research Institute
|
||||
* @version 5.4
|
||||
* @since 5.1
|
||||
*/
|
||||
public class FixJpgJpgThumbnails {
|
||||
|
||||
public static void main(String[] args) {
|
||||
String parentHandle = null;
|
||||
if (args.length >= 1) {
|
||||
parentHandle = args[0];
|
||||
}
|
||||
|
||||
Context context = null;
|
||||
try {
|
||||
context = new Context();
|
||||
context.turnOffAuthorisationSystem();
|
||||
|
||||
if (StringUtils.isBlank(parentHandle)) {
|
||||
process(context, Item.findAll(context));
|
||||
} else {
|
||||
DSpaceObject parent = HandleManager.resolveToObject(context, parentHandle);
|
||||
if (parent != null) {
|
||||
switch (parent.getType()) {
|
||||
case Constants.COLLECTION:
|
||||
process(context, ((Collection) parent).getAllItems()); // getAllItems because we want to work on non-archived ones as well
|
||||
break;
|
||||
case Constants.COMMUNITY:
|
||||
Collection[] collections = ((Community) parent).getCollections();
|
||||
for (Collection collection : collections) {
|
||||
process(context, collection.getAllItems()); // getAllItems because we want to work on non-archived ones as well
|
||||
}
|
||||
break;
|
||||
case Constants.SITE:
|
||||
process(context, Item.findAll(context));
|
||||
break;
|
||||
case Constants.ITEM:
|
||||
processItem((Item) parent);
|
||||
context.commit();
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
} catch (SQLException | AuthorizeException | IOException e) {
|
||||
e.printStackTrace(System.err);
|
||||
} finally {
|
||||
if (context != null && context.isValid()) {
|
||||
context.abort();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
private static void process(Context context, ItemIterator items) throws SQLException, IOException, AuthorizeException {
|
||||
while (items.hasNext()) {
|
||||
Item item = items.next();
|
||||
processItem(item);
|
||||
context.commit();
|
||||
item.decache();
|
||||
}
|
||||
}
|
||||
|
||||
private static void processItem(Item item) throws SQLException, AuthorizeException, IOException {
|
||||
// Some bitstreams like Infographics are large JPGs and put in the ORIGINAL bundle on purpose so we shouldn't
|
||||
// swap them.
|
||||
Metadatum[] itemTypes = item.getMetadataByMetadataString("dc.type");
|
||||
boolean itemHasInfographic = false;
|
||||
for (Metadatum itemType: itemTypes) {
|
||||
if (itemType.value.equals("Infographic")) {
|
||||
itemHasInfographic = true;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
Bundle[] thumbnailBundles = item.getBundles("THUMBNAIL");
|
||||
for (Bundle thumbnailBundle : thumbnailBundles) {
|
||||
Bitstream[] thumbnailBundleBitstreams = thumbnailBundle.getBitstreams();
|
||||
for (Bitstream thumbnailBitstream : thumbnailBundleBitstreams) {
|
||||
String thumbnailName = thumbnailBitstream.getName();
|
||||
|
||||
if (thumbnailName.toLowerCase().contains(".jpg.jpg")) {
|
||||
Bundle[] originalBundles = item.getBundles("ORIGINAL");
|
||||
for (Bundle originalBundle : originalBundles) {
|
||||
Bitstream[] originalBundleBitstreams = originalBundle.getBitstreams();
|
||||
|
||||
for (Bitstream originalBitstream : originalBundleBitstreams) {
|
||||
String originalName = originalBitstream.getName();
|
||||
|
||||
long originalBitstreamBytes = originalBitstream.getSize();
|
||||
|
||||
/*
|
||||
- check if the original file name is the same as the thumbnail name minus the extra ".jpg"
|
||||
- check if the thumbnail description indicates it was automatically generated
|
||||
- check if the item has dc.type Infographic (JPG could be the "real" item!)
|
||||
- check if the original bitstream is less than ~100KiB
|
||||
- Note: in my tests there were 4022 items with ".jpg.jpg" thumbnails totaling 394549249
|
||||
bytes for an average of about 98KiB so ~100KiB seems like a good cut off
|
||||
*/
|
||||
if (
|
||||
originalName.equalsIgnoreCase(StringUtils.removeEndIgnoreCase(thumbnailName, ".jpg"))
|
||||
&& ("Generated Thumbnail".equals(thumbnailBitstream.getDescription()) || "IM Thumbnail".equals(thumbnailBitstream.getDescription()))
|
||||
&& !itemHasInfographic
|
||||
&& originalBitstreamBytes < 100000
|
||||
) {
|
||||
System.out.println(item.getHandle() + ": replacing " + thumbnailName + " with " + originalName);
|
||||
|
||||
//add the original bitstream to the THUMBNAIL bundle
|
||||
thumbnailBundle.addBitstream(originalBitstream);
|
||||
//remove the original bitstream from the ORIGINAL bundle
|
||||
originalBundle.removeBitstream(originalBitstream);
|
||||
//remove the JpgJpg bitstream from the THUMBNAIL bundle
|
||||
thumbnailBundle.removeBitstream(thumbnailBitstream);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
41
src/main/java/io/github/ilri/cgspace/scripts/README.md
Normal file
41
src/main/java/io/github/ilri/cgspace/scripts/README.md
Normal file
@ -0,0 +1,41 @@
|
||||
# Scripts
|
||||
Java-based helpers used on the [CGSpace](https://cgspace.cgiar.org) institutional repository:
|
||||
|
||||
- **FixJpgJpgThumbnails**: Fix low-quality ".jpg.jpg" thumbnails by replacing them with their originals
|
||||
|
||||
Tested on DSpace 5.8. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC5x/Curation+System).
|
||||
|
||||
## Build and Install
|
||||
|
||||
### Integrate into DSpace Build
|
||||
To use these curation tasks in a DSpace project add the following dependency to `dspace/modules/additions/pom.xml`:
|
||||
|
||||
```
|
||||
<dependency>
|
||||
<groupId>io.github.ilri.cgspace</groupId>
|
||||
<artifactId>cgspace-java-helpers</artifactId>
|
||||
<version>5.3</version>
|
||||
</dependency>
|
||||
```
|
||||
|
||||
The jar will be copied to all DSpace applications.
|
||||
|
||||
### Manual Build and Install
|
||||
To build the standalone jar:
|
||||
|
||||
```
|
||||
$ mvn package
|
||||
```
|
||||
|
||||
Copy the resulting jar to the DSpace `lib` directory:
|
||||
|
||||
```
|
||||
$ cp target/cgspace-java-helpers-5.3.jar ~/dspace/lib
|
||||
```
|
||||
|
||||
### Invocation
|
||||
The script only takes one argument, which is a community, collection, or item:
|
||||
|
||||
```
|
||||
$ dspace dsrun io.github.ilri.cgspace.scripts.FixJpgJpgThumbnails 10568/83389
|
||||
```
|
Reference in New Issue
Block a user