12 Commits

Author SHA1 Message Date
cea97aebe5 Version 6.0 2020-08-08 13:13:28 +03:00
4bc7971ecb src/main/java: Remove debug comment 2020-08-07 22:55:35 +03:00
197aad0124 README.md: Add FixJpgJpgThumbnails 2020-08-07 22:48:09 +03:00
da1ecad238 src/main/java: DSpace 6 port of FixJpgJpgThumbnails.java
Need to use the new DSpace 6 service model in most places. Not sure
why addBitstream is no longer public, but removeBitstream is...
2020-08-07 22:45:07 +03:00
307480f249 Rename to cgspace-java-helpers again
I don't know what the hell happened.
2020-08-07 22:37:40 +03:00
4698b6eb38 README.md: Use badge from dspace6 branch 2020-08-06 15:49:25 +03:00
f1629f65fe README.md: Rename to CGSpace Java Helpers
Will eventually include more than just curation tasks.
2020-08-06 15:25:04 +03:00
29f6aff35e README.md: Update notes for DSpace 6 2020-08-05 12:40:55 +03:00
9bf487a336 pom.xml: Use 6.0-SNAPSHOT for DSpace 6 version
I think the most easily understandable versioning scheme is to use
the major number from the compatible DSpace version.
2020-08-05 12:33:25 +03:00
f50357b7cc README.md: Remove DSpace 6 TODO 2020-08-05 12:31:30 +03:00
f3ab89f7a1 CountryCodeTagger.java: Port to DSpace 6
We need to use the new DSpace 6 service API. Also, the way we read
task properties changes because of the configuration changes.

See: https://wiki.lyrasis.org/display/DSDOC6x/Curation+System
See: https://wiki.lyrasis.org/display/DSDOC6x/Configuration+Reference
2020-08-05 12:28:37 +03:00
5a467f92e0 pom.xml: Bump dependencies for DSpace 6 2020-08-04 15:37:39 +03:00
7 changed files with 84 additions and 187 deletions

View File

@ -1,19 +0,0 @@
# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [5.3] - 2020-08-07
### Changed
- Make sure `FixJpgJpgThumbnails` only replaces thumbnails where the original is less than ~100KiB
- Make sure `FixJpgJpgThumbnails` only replaces thumbnails if the item type is not `Infographic` (because the JPG in the ORIGINAL bundle is the "real" file and it's OK that the thumbnail is ".jpg.jpg")
## [5.2] - 2020-08-06
### Changed
- Make `FixJpgJpgThumbnails` helper check for files named "JPG" as well as "jpg" (case insensitive)
- Make `FixJpgJpgThumbnails` helper replace thumbnails with description `IM Thumbnail` as well as `Generated Thumbnail`
## [5.1] - 2020-08-06
### Added
- Add `FixJpgJpgThumbnails` helper to replace ".jpg.jpg" thumbnails with their originals

View File

@ -1,10 +1,10 @@
# CGSpace Java Helpers [![Build Status](https://travis-ci.org/ilri/cgspace-java-helpers.svg?branch=dspace5)](https://travis-ci.org/ilri/dspace-curation-tasks)
# CGSpace Java Helpers [![Build Status](https://travis-ci.org/ilri/cgspace-java-helpers.svg?branch=dspace6)](https://travis-ci.org/ilri/cgspace-java-helpers)
DSpace curation tasks and other Java-based helpers used on the [CGSpace](https://cgspace.cgiar.org) institutional repository:
- **CountryCodeTagger**: add ISO 3166-1 Alpha2 country codes to items based on their existing country metadata
- **FixJpgJpgThumbnails**: Fix low-quality ".jpg.jpg" thumbnails by replacing them with their originals
Tested on DSpace 5.8. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC5x/Curation+System).
Tested on DSpace 6.3. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC5x/Curation+System).
## Build and Install
@ -15,7 +15,7 @@ To use these curation tasks in a DSpace project add the following dependency to
<dependency>
<groupId>io.github.ilri.cgspace</groupId>
<artifactId>cgspace-java-helpers</artifactId>
<version>5.4-SNAPSHOT</version>
<version>6.0</version>
</dependency>
```
@ -31,14 +31,40 @@ $ mvn package
Copy the resulting jar to the DSpace `lib` directory:
```
$ cp target/cgspace-java-helpers-5.4-SNAPSHOT.jar ~/dspace/lib
$ cp target/cgspace-java-helpers-6.0.jar ~/dspace/lib/
```
## Configuration
Please refer to the appropriate README.md file:
Add the curation task to DSpace's `config/modules/curate.cfg`:
- Curation Tasks: [src/main/java/io/github/ilri/cgspace/ctasks/README.md](https://github.com/ilri/cgspace-java-helpers/blob/dspace5/src/main/java/io/github/ilri/cgspace/ctasks/README.md)
- Scripts: [src/main/java/io/github/ilri/cgspace/scripts/README.md](https://github.com/ilri/cgspace-java-helpers/blob/dspace5/src/main/java/io/github/ilri/cgspace/scripts/README.md)
```
plugin.named.org.dspace.curate.CurationTask = io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger
plugin.named.org.dspace.curate.CurationTask = io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger.force
```
And then add the following variables to your `local.cfg` or some other [configuration file that is included](https://wiki.lyrasis.org/display/DSDOC6x/Configuration+Reference#ConfigurationReference-IncludingotherPropertyFiles):
```
# name of the field containing ISO 3166-1 country names
countrycodetagger.iso3166.field = cg.coverage.country
# name of the field containing ISO 3166-1 Alpha2 country codes
countrycodetagger.iso3166-alpha2.field = cg.coverage.iso3166-alpha2
# only add country codes if an item doesn't have any (default false)
#countrycodetagger.forceupdate = false
```
*Note*: DSpace's curation system supports "profiles" where you can use the same task with different options, for example above I have a normal country code tagger task and a "force" variant. The "force" variant is the same task, but it looks for configuration variables using the `countrycodetagger.force` instead. To use the "force" variant you simply need to add these new variables with the `forceupdate` parameter overridden to the same configuration file where you put the other variables. The "force" profile clears all existing country codes and updates everything.
## Invocation
Once the jar is installed and you have added appropriate configuration in `~/dspace/config/modules`:
```
$ ~/dspace/bin/dspace curate -t countrycodetagger -i 10568/3 -r - -l 500 -s object
```
*Note*: it is very important to set the cache limit (`-l`) and the database transaction scope to something sensible (`object`) if you're curating a community or collection with more than a few hundred items.
## Notes
This project was initially created according to the [Maven Getting Started Guide](https://maven.apache.org/guides/getting-started/):

View File

@ -6,7 +6,7 @@
<groupId>io.github.ilri.cgspace</groupId>
<artifactId>cgspace-java-helpers</artifactId>
<version>5.4-SNAPSHOT</version>
<version>6.0</version>
<name>cgspace-java-helpers</name>
<url>https://github.com/ilri/cgspace-java-helpers</url>
@ -42,12 +42,12 @@
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.2.1</version>
<version>2.6.1</version>
</dependency>
<dependency>
<groupId>org.dspace</groupId>
<artifactId>dspace-api</artifactId>
<version>5.8</version>
<version>6.3</version>
<scope>provided</scope>
</dependency>
</dependencies>

View File

@ -23,7 +23,7 @@ import org.apache.log4j.Logger;
import org.dspace.authorize.AuthorizeException;
import org.dspace.content.DSpaceObject;
import org.dspace.content.Item;
import org.dspace.content.Metadatum;
import org.dspace.content.MetadataValue;
import org.dspace.core.Constants;
import org.dspace.curate.AbstractCurationTask;
import org.dspace.curate.Curator;
@ -35,11 +35,6 @@ import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;
/**
* @author Alan Orth for the International Livestock Research Institute
* @version 5.1
* @since 1.0
*/
public class CountryCodeTagger extends AbstractCurationTask
{
public class CountryCodeTaggerConfig {
@ -86,7 +81,11 @@ public class CountryCodeTagger extends AbstractCurationTask
Item item = (Item)dso;
alpha2Result = performAlpha2(item, config);
try {
alpha2Result = performAlpha2(item, config);
} catch (SQLException throwables) {
throwables.printStackTrace();
}
setResult(alpha2Result.getResult());
report(alpha2Result.getResult());
@ -95,15 +94,14 @@ public class CountryCodeTagger extends AbstractCurationTask
return alpha2Result.getStatus();
}
public CountryCodeTaggerResult performAlpha2(Item item, CountryCodeTaggerConfig config) throws IOException
{
public CountryCodeTaggerResult performAlpha2(Item item, CountryCodeTaggerConfig config) throws IOException, SQLException {
CountryCodeTaggerResult alpha2Result = new CountryCodeTaggerResult();
String itemHandle = item.getHandle();
Metadatum[] itemCountries = item.getMetadataByMetadataString(config.iso3166Field);
List<MetadataValue> itemCountries = itemService.getMetadataByMetadataString(item, config.iso3166Field);
// skip items that don't have country metadata
if (itemCountries.length == 0) {
if (itemCountries.size() == 0) {
alpha2Result.setResult(itemHandle + ": no countries, skipping.");
alpha2Result.setStatus(Curator.CURATE_SKIP);
} else {
@ -122,25 +120,25 @@ public class CountryCodeTagger extends AbstractCurationTask
String[] iso3166Alpha2FieldParts = config.iso3166Alpha2Field.split("\\.");
if (config.forceupdate) {
item.clearMetadata(iso3166Alpha2FieldParts[0], iso3166Alpha2FieldParts[1], iso3166Alpha2FieldParts[2], Item.ANY);
itemService.clearMetadata(Curator.curationContext(), item, iso3166Alpha2FieldParts[0], iso3166Alpha2FieldParts[1], iso3166Alpha2FieldParts[2], Item.ANY);
}
// check the item's country codes, if any
Metadatum[] itemAlpha2CountryCodes = item.getMetadataByMetadataString(config.iso3166Alpha2Field);
List<MetadataValue> itemAlpha2CountryCodes = itemService.getMetadataByMetadataString(item, config.iso3166Alpha2Field);
if (itemAlpha2CountryCodes.length == 0) {
if (itemAlpha2CountryCodes.size() == 0) {
List<String> newAlpha2Codes = new ArrayList<String>();
for (Metadatum itemCountry : itemCountries) {
for (MetadataValue itemCountry : itemCountries) {
//check ISO 3166-1 countries
for (CountriesVocabulary.Country country : isocodesCountriesJson.countries) {
if (itemCountry.value.equalsIgnoreCase(country.getName()) || itemCountry.value.equalsIgnoreCase(country.get_official_name()) || itemCountry.value.equalsIgnoreCase(country.get_common_name())) {
if (itemCountry.getValue().equalsIgnoreCase(country.getName()) || itemCountry.getValue().equalsIgnoreCase(country.get_official_name()) || itemCountry.getValue().equalsIgnoreCase(country.get_common_name())) {
newAlpha2Codes.add(country.getAlpha_2());
}
}
//check CGSpace countries
for (CountriesVocabulary.Country country : cgspaceCountriesJson.countries) {
if (itemCountry.value.equalsIgnoreCase(country.getCgspace_name())) {
if (itemCountry.getValue().equalsIgnoreCase(country.getCgspace_name())) {
newAlpha2Codes.add(country.getAlpha_2());
}
}
@ -148,9 +146,8 @@ public class CountryCodeTagger extends AbstractCurationTask
if (newAlpha2Codes.size() > 0) {
try {
// add metadata values (casting the List<String> to an array)
item.addMetadata(iso3166Alpha2FieldParts[0], iso3166Alpha2FieldParts[1], iso3166Alpha2FieldParts[2], "en_US", newAlpha2Codes.toArray(new String[0]));
item.update();
itemService.addMetadata(Curator.curationContext(), item, iso3166Alpha2FieldParts[0], iso3166Alpha2FieldParts[1], iso3166Alpha2FieldParts[2], "en_US", newAlpha2Codes);
itemService.update(Curator.curationContext(), item);
} catch (SQLException | AuthorizeException sqle) {
config.log.debug(sqle.getMessage());
alpha2Result.setResult(itemHandle + ": error");

View File

@ -1,74 +0,0 @@
# Curation Tasks
DSpace curation tasks used on the [CGSpace](https://cgspace.cgiar.org) institutional repository:
- **CountryCodeTagger**: add ISO 3166-1 Alpha2 country codes to items based on their existing country metadata
Tested on DSpace 5.8. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC5x/Curation+System).
## Build and Install
### Integrate into DSpace Build
To use these curation tasks in a DSpace project add the following dependency to `dspace/modules/additions/pom.xml`:
```
<dependency>
<groupId>io.github.ilri.cgspace</groupId>
<artifactId>cgspace-java-helpers</artifactId>
<version>5.3</version>
</dependency>
```
The jar will be copied to all DSpace applications.
### Manual Build and Install
To build the standalone jar:
```
$ mvn package
```
Copy the resulting jar to the DSpace `lib` directory:
```
$ cp target/cgspace-java-helpers-5.3.jar ~/dspace/lib
```
## Configuration
Add the curation task to DSpace's `config/modules/curate.cfg`:
```
plugin.named.org.dspace.curate.CurationTask = \
...
io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger \
io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger.force
```
And then add a configuration file for the task in `config/modules/countrycodetagger.cfg`:
```
# name of the field containing ISO 3166-1 country names
iso3166.field = cg.coverage.country
# name of the field containing ISO 3166-1 Alpha2 country codes
iso3166-alpha2.field = cg.coverage.iso3166-alpha2
# only add country codes if an item doesn't have any (default false)
#forceupdate = false
```
*Note*: DSpace's curation system supports "profiles" where you can use the same task with different options, for example above I have a normal country code tagger and a "force" variant. To use the "force" variant you create a new configuration file with the overridden options in `config/modules/countrycodetagger.force.cfg`. The "force" profile clears all existing country codes and updates everything.
## Invocation
Once the jar is installed and you have added appropriate configuration in `~/dspace/config/modules`:
```
$ ~/dspace/bin/dspace curate -t countrycodetagger -i 10568/3 -r - -l 500 -s object
```
*Note*: it is very important to set the cache limit (`-l`) and the database transaction scope to something sensible (`object`) if you're curating a community or collection with more than a few hundred items.
## TODO
- Make sure this doesn't work on items in the workflow
- Check for existence of metadata field before trying to add metadata
- Add tests

View File

@ -5,18 +5,28 @@ import org.dspace.authorize.AuthorizeException;
import org.dspace.content.*;
import org.dspace.core.Constants;
import org.dspace.core.Context;
import org.dspace.handle.HandleManager;
import org.dspace.content.factory.ContentServiceFactory;
import org.dspace.content.service.ItemService;
import org.dspace.handle.factory.HandleServiceFactory;
import org.dspace.handle.service.HandleService;
import org.dspace.content.service.BundleService;
import java.io.IOException;
import java.sql.SQLException;
import java.util.Iterator;
import java.util.List;
/**
* @author Andrea Schweer schweer@waikato.ac.nz for the LCoNZ Institutional Research Repositories
* @author Alan Orth for the International Livestock Research Institute
* @version 5.4
* @version 6.0
* @since 5.1
*/
public class FixJpgJpgThumbnails {
//note: static members belong to the class itself, not any one instance
public static ItemService itemService = ContentServiceFactory.getInstance().getItemService();
public static HandleService handleService = HandleServiceFactory.getInstance().getHandleService();
public static BundleService bundleService = ContentServiceFactory.getInstance().getBundleService();
public static void main(String[] args) {
String parentHandle = null;
@ -30,25 +40,25 @@ public class FixJpgJpgThumbnails {
context.turnOffAuthorisationSystem();
if (StringUtils.isBlank(parentHandle)) {
process(context, Item.findAll(context));
process(context, itemService.findAll(context));
} else {
DSpaceObject parent = HandleManager.resolveToObject(context, parentHandle);
DSpaceObject parent = handleService.resolveToObject(context, parentHandle);
if (parent != null) {
switch (parent.getType()) {
case Constants.COLLECTION:
process(context, ((Collection) parent).getAllItems()); // getAllItems because we want to work on non-archived ones as well
process(context, itemService.findByCollection(context, (Collection) parent));
break;
case Constants.COMMUNITY:
Collection[] collections = ((Community) parent).getCollections();
List<Collection> collections = ((Community) parent).getCollections();
for (Collection collection : collections) {
process(context, collection.getAllItems()); // getAllItems because we want to work on non-archived ones as well
process(context, itemService.findAllByCollection(context, collection));
}
break;
case Constants.SITE:
process(context, Item.findAll(context));
process(context, itemService.findAll(context));
break;
case Constants.ITEM:
processItem((Item) parent);
processItem(context, (Item) parent);
context.commit();
break;
}
@ -63,37 +73,35 @@ public class FixJpgJpgThumbnails {
}
}
private static void process(Context context, ItemIterator items) throws SQLException, IOException, AuthorizeException {
private static void process(Context context, Iterator<Item> items) throws SQLException, IOException, AuthorizeException {
while (items.hasNext()) {
Item item = items.next();
processItem(item);
context.commit();
item.decache();
processItem(context, item);
itemService.update(context, item);
}
}
private static void processItem(Item item) throws SQLException, AuthorizeException, IOException {
private static void processItem(Context context, Item item) throws SQLException, AuthorizeException, IOException {
// Some bitstreams like Infographics are large JPGs and put in the ORIGINAL bundle on purpose so we shouldn't
// swap them.
Metadatum[] itemTypes = item.getMetadataByMetadataString("dc.type");
List<MetadataValue> itemTypes = itemService.getMetadataByMetadataString(item, "dc.type");
boolean itemHasInfographic = false;
for (Metadatum itemType: itemTypes) {
if (itemType.value.equals("Infographic")) {
for (MetadataValue itemType: itemTypes) {
if (itemType.getValue().equals("Infographic")) {
itemHasInfographic = true;
break;
}
}
Bundle[] thumbnailBundles = item.getBundles("THUMBNAIL");
List<Bundle> thumbnailBundles = item.getBundles("THUMBNAIL");
for (Bundle thumbnailBundle : thumbnailBundles) {
Bitstream[] thumbnailBundleBitstreams = thumbnailBundle.getBitstreams();
List<Bitstream> thumbnailBundleBitstreams = thumbnailBundle.getBitstreams();
for (Bitstream thumbnailBitstream : thumbnailBundleBitstreams) {
String thumbnailName = thumbnailBitstream.getName();
if (thumbnailName.toLowerCase().contains(".jpg.jpg")) {
Bundle[] originalBundles = item.getBundles("ORIGINAL");
List<Bundle> originalBundles = item.getBundles("ORIGINAL");
for (Bundle originalBundle : originalBundles) {
Bitstream[] originalBundleBitstreams = originalBundle.getBitstreams();
List<Bitstream> originalBundleBitstreams = originalBundle.getBitstreams();
for (Bitstream originalBitstream : originalBundleBitstreams) {
String originalName = originalBitstream.getName();
@ -117,7 +125,7 @@ public class FixJpgJpgThumbnails {
System.out.println(item.getHandle() + ": replacing " + thumbnailName + " with " + originalName);
//add the original bitstream to the THUMBNAIL bundle
thumbnailBundle.addBitstream(originalBitstream);
bundleService.addBitstream(context, thumbnailBundle, originalBitstream);
//remove the original bitstream from the ORIGINAL bundle
originalBundle.removeBitstream(originalBitstream);
//remove the JpgJpg bitstream from the THUMBNAIL bundle

View File

@ -1,41 +0,0 @@
# Scripts
Java-based helpers used on the [CGSpace](https://cgspace.cgiar.org) institutional repository:
- **FixJpgJpgThumbnails**: Fix low-quality ".jpg.jpg" thumbnails by replacing them with their originals
Tested on DSpace 5.8. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC5x/Curation+System).
## Build and Install
### Integrate into DSpace Build
To use these curation tasks in a DSpace project add the following dependency to `dspace/modules/additions/pom.xml`:
```
<dependency>
<groupId>io.github.ilri.cgspace</groupId>
<artifactId>cgspace-java-helpers</artifactId>
<version>5.3</version>
</dependency>
```
The jar will be copied to all DSpace applications.
### Manual Build and Install
To build the standalone jar:
```
$ mvn package
```
Copy the resulting jar to the DSpace `lib` directory:
```
$ cp target/cgspace-java-helpers-5.3.jar ~/dspace/lib
```
### Invocation
The script only takes one argument, which is a community, collection, or item:
```
$ dspace dsrun io.github.ilri.cgspace.scripts.FixJpgJpgThumbnails 10568/83389
```