18 Commits
v6.0 ... v5.3

Author SHA1 Message Date
fce81c6003 Version 5.3 2020-08-07 09:57:19 +03:00
26d3cbd778 src/main/java: Tune FixJpgJpgThumbnails a bit
Make sure we don't modify thumbnails if the item is an Infographic
because the JPG in the ORIGINAL bundle might actually be the "real"
file, in which case the THUMBNAIL bundle would have a legitimate
".jpg.jpg" file.

Also, limit the criteria for replacement to original bitstreams
that are less than 100KiB. In my tests I found that we had 4,022
items with ".jpg.jpg" thumbnails, and the average file size of the
originals in those items was 98KiB. Without considering the large
inforgraphics, which are several megabytes apiece, the average of
the remaining 3,765 originals was ~20KiB so 100KiB should be very
safe.
2020-08-07 09:50:03 +03:00
fdc910f93b README.md: Update versions 2020-08-06 16:23:17 +03:00
e0d514e797 pom.xml: Move version to 5.3-SNAPSHOT 2020-08-06 16:17:05 +03:00
fd893d8c4e pom.xml: Release version 5.2 2020-08-06 16:16:13 +03:00
2263ac27e8 src/main/java: Handle more corner cases in FixJpgJpgThumbnails.java
We should make sure we are catching .JPG and .jpg. Also, we should
check for Generated Thumbnails as well as IM Thumbnail.
2020-08-06 16:13:51 +03:00
cf7012d698 pom.xml: Change version to 5.2-SNAPSHOT 2020-08-06 16:13:27 +03:00
7edc60e6ca README.md: Use badge for dspace5 branch 2020-08-06 15:47:33 +03:00
fe2abc86c6 Publish version 5.1 2020-08-06 15:30:36 +03:00
e1d92ef2c7 src/main/java: Add Javadoc tags to FixJpgJpgThumbnails.java
I'm not sure how these are used by anything other than Javadoc, but
it seems useful.
2020-08-06 15:13:33 +03:00
3e3c544cfa Rename to cgspace-java-helpers
Now this includes the curation tasks as well as some helper scripts
for general DSpace tasks.
2020-08-06 15:06:42 +03:00
db9881faf6 README.md: Add note about FixJpgJpgThumbnails 2020-08-06 15:05:58 +03:00
fa5fb60b5b README.md: Version 2020-08-06 14:53:04 +03:00
44fb9a9f4d pom.xml: Bump version to 5.1-SNAPSHOT 2020-08-06 14:51:11 +03:00
b790d5e4db src/main/java: Minimum working version of FixJpgJpgThumbnails
It goes through each item and checks the THUMBNAIL bundle to find
any bitstreams ending in ".jpg.jpg", which indicates that they are
a thumbnail of a thumbnail. For each match it checks the ORIGINAL
bundle for a bitstream with the same name (minus ".jpg") and then
moves it to the THUMBNAIL bundle and deletes the original as well
as the "JpgJpg" thumbnail.
2020-08-06 13:53:58 +03:00
08e7546a87 RemovePNGThumbnailsForPDFs to FixJpgJpgThumbnails
I want to use this to fix occurences of ".jpg.jpg" thumbnails that
are caused by users uploading manually created JPG thumbnails to
the ORIGINAL bundle, which causes DSpace to create another one in
the THUMBNAIL bundle.
2020-08-06 12:58:37 +03:00
ff076ecf50 Import RemovePNGThumbnailsForPDFs.java
Written by Andrea Schweer under the BSD license. I will use this
as a base to do other thumbnail-related tasks.

See: https://github.com/UoW-IRRs/DSpace-Scripts
2020-08-06 12:51:52 +03:00
7a5dd1c094 Use 5.0-SNAPSHOT for DSpace 5 version
The DSpace 6 version is in another branch. I decided that I will
use the major from the compatible DSpace version to make it easier
to manage versioning schemes.
2020-08-05 12:42:32 +03:00
4 changed files with 55 additions and 64 deletions

View File

@ -1,10 +1,10 @@
# CGSpace Java Helpers [![Build Status](https://travis-ci.org/ilri/cgspace-java-helpers.svg?branch=dspace6)](https://travis-ci.org/ilri/cgspace-java-helpers) # CGSpace Java Helpers [![Build Status](https://travis-ci.org/ilri/cgspace-java-helpers.svg?branch=dspace5)](https://travis-ci.org/ilri/dspace-curation-tasks)
DSpace curation tasks and other Java-based helpers used on the [CGSpace](https://cgspace.cgiar.org) institutional repository: DSpace curation tasks and other Java-based helpers used on the [CGSpace](https://cgspace.cgiar.org) institutional repository:
- **CountryCodeTagger**: add ISO 3166-1 Alpha2 country codes to items based on their existing country metadata - **CountryCodeTagger**: add ISO 3166-1 Alpha2 country codes to items based on their existing country metadata
- **FixJpgJpgThumbnails**: Fix low-quality ".jpg.jpg" thumbnails by replacing them with their originals - **FixJpgJpgThumbnails**: Fix low-quality ".jpg.jpg" thumbnails by replacing them with their originals
Tested on DSpace 6.3. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC5x/Curation+System). Tested on DSpace 5.8. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC5x/Curation+System).
## Build and Install ## Build and Install
@ -15,7 +15,7 @@ To use these curation tasks in a DSpace project add the following dependency to
<dependency> <dependency>
<groupId>io.github.ilri.cgspace</groupId> <groupId>io.github.ilri.cgspace</groupId>
<artifactId>cgspace-java-helpers</artifactId> <artifactId>cgspace-java-helpers</artifactId>
<version>6.0</version> <version>5.3</version>
</dependency> </dependency>
``` ```
@ -31,31 +31,33 @@ $ mvn package
Copy the resulting jar to the DSpace `lib` directory: Copy the resulting jar to the DSpace `lib` directory:
``` ```
$ cp target/cgspace-java-helpers-6.0.jar ~/dspace/lib/ $ cp target/cgspace-java-helpers-5.3.jar ~/dspace/lib
``` ```
## Configuration ## Configuration
Add the curation task to DSpace's `config/modules/curate.cfg`: Add the curation task to DSpace's `config/modules/curate.cfg`:
``` ```
plugin.named.org.dspace.curate.CurationTask = io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger plugin.named.org.dspace.curate.CurationTask = \
plugin.named.org.dspace.curate.CurationTask = io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger.force ...
io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger \
io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger.force
``` ```
And then add the following variables to your `local.cfg` or some other [configuration file that is included](https://wiki.lyrasis.org/display/DSDOC6x/Configuration+Reference#ConfigurationReference-IncludingotherPropertyFiles): And then add a configuration file for the task in `config/modules/countrycodetagger.cfg`:
``` ```
# name of the field containing ISO 3166-1 country names # name of the field containing ISO 3166-1 country names
countrycodetagger.iso3166.field = cg.coverage.country iso3166.field = cg.coverage.country
# name of the field containing ISO 3166-1 Alpha2 country codes # name of the field containing ISO 3166-1 Alpha2 country codes
countrycodetagger.iso3166-alpha2.field = cg.coverage.iso3166-alpha2 iso3166-alpha2.field = cg.coverage.iso3166-alpha2
# only add country codes if an item doesn't have any (default false) # only add country codes if an item doesn't have any (default false)
#countrycodetagger.forceupdate = false #forceupdate = false
``` ```
*Note*: DSpace's curation system supports "profiles" where you can use the same task with different options, for example above I have a normal country code tagger task and a "force" variant. The "force" variant is the same task, but it looks for configuration variables using the `countrycodetagger.force` instead. To use the "force" variant you simply need to add these new variables with the `forceupdate` parameter overridden to the same configuration file where you put the other variables. The "force" profile clears all existing country codes and updates everything. *Note*: DSpace's curation system supports "profiles" where you can use the same task with different options, for example above I have a normal country code tagger and a "force" variant. To use the "force" variant you create a new configuration file with the overridden options in `config/modules/countrycodetagger.force.cfg`. The "force" profile clears all existing country codes and updates everything.
## Invocation ## Invocation
Once the jar is installed and you have added appropriate configuration in `~/dspace/config/modules`: Once the jar is installed and you have added appropriate configuration in `~/dspace/config/modules`:

View File

@ -6,7 +6,7 @@
<groupId>io.github.ilri.cgspace</groupId> <groupId>io.github.ilri.cgspace</groupId>
<artifactId>cgspace-java-helpers</artifactId> <artifactId>cgspace-java-helpers</artifactId>
<version>6.0</version> <version>5.3</version>
<name>cgspace-java-helpers</name> <name>cgspace-java-helpers</name>
<url>https://github.com/ilri/cgspace-java-helpers</url> <url>https://github.com/ilri/cgspace-java-helpers</url>
@ -42,12 +42,12 @@
<dependency> <dependency>
<groupId>com.google.code.gson</groupId> <groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId> <artifactId>gson</artifactId>
<version>2.6.1</version> <version>2.2.1</version>
</dependency> </dependency>
<dependency> <dependency>
<groupId>org.dspace</groupId> <groupId>org.dspace</groupId>
<artifactId>dspace-api</artifactId> <artifactId>dspace-api</artifactId>
<version>6.3</version> <version>5.8</version>
<scope>provided</scope> <scope>provided</scope>
</dependency> </dependency>
</dependencies> </dependencies>

View File

@ -23,7 +23,7 @@ import org.apache.log4j.Logger;
import org.dspace.authorize.AuthorizeException; import org.dspace.authorize.AuthorizeException;
import org.dspace.content.DSpaceObject; import org.dspace.content.DSpaceObject;
import org.dspace.content.Item; import org.dspace.content.Item;
import org.dspace.content.MetadataValue; import org.dspace.content.Metadatum;
import org.dspace.core.Constants; import org.dspace.core.Constants;
import org.dspace.curate.AbstractCurationTask; import org.dspace.curate.AbstractCurationTask;
import org.dspace.curate.Curator; import org.dspace.curate.Curator;
@ -81,11 +81,7 @@ public class CountryCodeTagger extends AbstractCurationTask
Item item = (Item)dso; Item item = (Item)dso;
try { alpha2Result = performAlpha2(item, config);
alpha2Result = performAlpha2(item, config);
} catch (SQLException throwables) {
throwables.printStackTrace();
}
setResult(alpha2Result.getResult()); setResult(alpha2Result.getResult());
report(alpha2Result.getResult()); report(alpha2Result.getResult());
@ -94,14 +90,15 @@ public class CountryCodeTagger extends AbstractCurationTask
return alpha2Result.getStatus(); return alpha2Result.getStatus();
} }
public CountryCodeTaggerResult performAlpha2(Item item, CountryCodeTaggerConfig config) throws IOException, SQLException { public CountryCodeTaggerResult performAlpha2(Item item, CountryCodeTaggerConfig config) throws IOException
{
CountryCodeTaggerResult alpha2Result = new CountryCodeTaggerResult(); CountryCodeTaggerResult alpha2Result = new CountryCodeTaggerResult();
String itemHandle = item.getHandle(); String itemHandle = item.getHandle();
List<MetadataValue> itemCountries = itemService.getMetadataByMetadataString(item, config.iso3166Field); Metadatum[] itemCountries = item.getMetadataByMetadataString(config.iso3166Field);
// skip items that don't have country metadata // skip items that don't have country metadata
if (itemCountries.size() == 0) { if (itemCountries.length == 0) {
alpha2Result.setResult(itemHandle + ": no countries, skipping."); alpha2Result.setResult(itemHandle + ": no countries, skipping.");
alpha2Result.setStatus(Curator.CURATE_SKIP); alpha2Result.setStatus(Curator.CURATE_SKIP);
} else { } else {
@ -120,25 +117,25 @@ public class CountryCodeTagger extends AbstractCurationTask
String[] iso3166Alpha2FieldParts = config.iso3166Alpha2Field.split("\\."); String[] iso3166Alpha2FieldParts = config.iso3166Alpha2Field.split("\\.");
if (config.forceupdate) { if (config.forceupdate) {
itemService.clearMetadata(Curator.curationContext(), item, iso3166Alpha2FieldParts[0], iso3166Alpha2FieldParts[1], iso3166Alpha2FieldParts[2], Item.ANY); item.clearMetadata(iso3166Alpha2FieldParts[0], iso3166Alpha2FieldParts[1], iso3166Alpha2FieldParts[2], Item.ANY);
} }
// check the item's country codes, if any // check the item's country codes, if any
List<MetadataValue> itemAlpha2CountryCodes = itemService.getMetadataByMetadataString(item, config.iso3166Alpha2Field); Metadatum[] itemAlpha2CountryCodes = item.getMetadataByMetadataString(config.iso3166Alpha2Field);
if (itemAlpha2CountryCodes.size() == 0) { if (itemAlpha2CountryCodes.length == 0) {
List<String> newAlpha2Codes = new ArrayList<String>(); List<String> newAlpha2Codes = new ArrayList<String>();
for (MetadataValue itemCountry : itemCountries) { for (Metadatum itemCountry : itemCountries) {
//check ISO 3166-1 countries //check ISO 3166-1 countries
for (CountriesVocabulary.Country country : isocodesCountriesJson.countries) { for (CountriesVocabulary.Country country : isocodesCountriesJson.countries) {
if (itemCountry.getValue().equalsIgnoreCase(country.getName()) || itemCountry.getValue().equalsIgnoreCase(country.get_official_name()) || itemCountry.getValue().equalsIgnoreCase(country.get_common_name())) { if (itemCountry.value.equalsIgnoreCase(country.getName()) || itemCountry.value.equalsIgnoreCase(country.get_official_name()) || itemCountry.value.equalsIgnoreCase(country.get_common_name())) {
newAlpha2Codes.add(country.getAlpha_2()); newAlpha2Codes.add(country.getAlpha_2());
} }
} }
//check CGSpace countries //check CGSpace countries
for (CountriesVocabulary.Country country : cgspaceCountriesJson.countries) { for (CountriesVocabulary.Country country : cgspaceCountriesJson.countries) {
if (itemCountry.getValue().equalsIgnoreCase(country.getCgspace_name())) { if (itemCountry.value.equalsIgnoreCase(country.getCgspace_name())) {
newAlpha2Codes.add(country.getAlpha_2()); newAlpha2Codes.add(country.getAlpha_2());
} }
} }
@ -146,8 +143,9 @@ public class CountryCodeTagger extends AbstractCurationTask
if (newAlpha2Codes.size() > 0) { if (newAlpha2Codes.size() > 0) {
try { try {
itemService.addMetadata(Curator.curationContext(), item, iso3166Alpha2FieldParts[0], iso3166Alpha2FieldParts[1], iso3166Alpha2FieldParts[2], "en_US", newAlpha2Codes); // add metadata values (casting the List<String> to an array)
itemService.update(Curator.curationContext(), item); item.addMetadata(iso3166Alpha2FieldParts[0], iso3166Alpha2FieldParts[1], iso3166Alpha2FieldParts[2], "en_US", newAlpha2Codes.toArray(new String[0]));
item.update();
} catch (SQLException | AuthorizeException sqle) { } catch (SQLException | AuthorizeException sqle) {
config.log.debug(sqle.getMessage()); config.log.debug(sqle.getMessage());
alpha2Result.setResult(itemHandle + ": error"); alpha2Result.setResult(itemHandle + ": error");

View File

@ -5,28 +5,18 @@ import org.dspace.authorize.AuthorizeException;
import org.dspace.content.*; import org.dspace.content.*;
import org.dspace.core.Constants; import org.dspace.core.Constants;
import org.dspace.core.Context; import org.dspace.core.Context;
import org.dspace.content.factory.ContentServiceFactory; import org.dspace.handle.HandleManager;
import org.dspace.content.service.ItemService;
import org.dspace.handle.factory.HandleServiceFactory;
import org.dspace.handle.service.HandleService;
import org.dspace.content.service.BundleService;
import java.io.IOException; import java.io.IOException;
import java.sql.SQLException; import java.sql.SQLException;
import java.util.Iterator;
import java.util.List;
/** /**
* @author Andrea Schweer schweer@waikato.ac.nz for the LCoNZ Institutional Research Repositories * @author Andrea Schweer schweer@waikato.ac.nz for the LCoNZ Institutional Research Repositories
* @author Alan Orth for the International Livestock Research Institute * @author Alan Orth for the International Livestock Research Institute
* @version 6.0 * @version 5.3
* @since 5.1 * @since 5.1
*/ */
public class FixJpgJpgThumbnails { public class FixJpgJpgThumbnails {
//note: static members belong to the class itself, not any one instance
public static ItemService itemService = ContentServiceFactory.getInstance().getItemService();
public static HandleService handleService = HandleServiceFactory.getInstance().getHandleService();
public static BundleService bundleService = ContentServiceFactory.getInstance().getBundleService();
public static void main(String[] args) { public static void main(String[] args) {
String parentHandle = null; String parentHandle = null;
@ -40,25 +30,25 @@ public class FixJpgJpgThumbnails {
context.turnOffAuthorisationSystem(); context.turnOffAuthorisationSystem();
if (StringUtils.isBlank(parentHandle)) { if (StringUtils.isBlank(parentHandle)) {
process(context, itemService.findAll(context)); process(context, Item.findAll(context));
} else { } else {
DSpaceObject parent = handleService.resolveToObject(context, parentHandle); DSpaceObject parent = HandleManager.resolveToObject(context, parentHandle);
if (parent != null) { if (parent != null) {
switch (parent.getType()) { switch (parent.getType()) {
case Constants.COLLECTION: case Constants.COLLECTION:
process(context, itemService.findByCollection(context, (Collection) parent)); process(context, ((Collection) parent).getAllItems()); // getAllItems because we want to work on non-archived ones as well
break; break;
case Constants.COMMUNITY: case Constants.COMMUNITY:
List<Collection> collections = ((Community) parent).getCollections(); Collection[] collections = ((Community) parent).getCollections();
for (Collection collection : collections) { for (Collection collection : collections) {
process(context, itemService.findAllByCollection(context, collection)); process(context, collection.getAllItems()); // getAllItems because we want to work on non-archived ones as well
} }
break; break;
case Constants.SITE: case Constants.SITE:
process(context, itemService.findAll(context)); process(context, Item.findAll(context));
break; break;
case Constants.ITEM: case Constants.ITEM:
processItem(context, (Item) parent); processItem((Item) parent);
context.commit(); context.commit();
break; break;
} }
@ -73,40 +63,41 @@ public class FixJpgJpgThumbnails {
} }
} }
private static void process(Context context, Iterator<Item> items) throws SQLException, IOException, AuthorizeException { private static void process(Context context, ItemIterator items) throws SQLException, IOException, AuthorizeException {
while (items.hasNext()) { while (items.hasNext()) {
Item item = items.next(); Item item = items.next();
processItem(context, item); processItem(item);
itemService.update(context, item); context.commit();
item.decache();
} }
} }
private static void processItem(Context context, Item item) throws SQLException, AuthorizeException, IOException { private static void processItem(Item item) throws SQLException, AuthorizeException, IOException {
// Some bitstreams like Infographics are large JPGs and put in the ORIGINAL bundle on purpose so we shouldn't // Some bitstreams like Infographics are large JPGs and put in the ORIGINAL bundle on purpose so we shouldn't
// swap them. // swap them.
List<MetadataValue> itemTypes = itemService.getMetadataByMetadataString(item, "dc.type"); Metadatum[] itemTypes = item.getMetadataByMetadataString("dc.type");
boolean itemHasInfographic = false; Boolean itemHasInfographic = false;
for (MetadataValue itemType: itemTypes) { for (Metadatum itemType: itemTypes) {
if (itemType.getValue().equals("Infographic")) { if (itemType.value.equals("Infographic")) {
itemHasInfographic = true; itemHasInfographic = true;
} }
} }
List<Bundle> thumbnailBundles = item.getBundles("THUMBNAIL"); Bundle[] thumbnailBundles = item.getBundles("THUMBNAIL");
for (Bundle thumbnailBundle : thumbnailBundles) { for (Bundle thumbnailBundle : thumbnailBundles) {
List<Bitstream> thumbnailBundleBitstreams = thumbnailBundle.getBitstreams(); Bitstream[] thumbnailBundleBitstreams = thumbnailBundle.getBitstreams();
for (Bitstream thumbnailBitstream : thumbnailBundleBitstreams) { for (Bitstream thumbnailBitstream : thumbnailBundleBitstreams) {
String thumbnailName = thumbnailBitstream.getName(); String thumbnailName = thumbnailBitstream.getName();
if (thumbnailName.toLowerCase().contains(".jpg.jpg")) { if (thumbnailName.toLowerCase().contains(".jpg.jpg")) {
List<Bundle> originalBundles = item.getBundles("ORIGINAL"); Bundle[] originalBundles = item.getBundles("ORIGINAL");
for (Bundle originalBundle : originalBundles) { for (Bundle originalBundle : originalBundles) {
List<Bitstream> originalBundleBitstreams = originalBundle.getBitstreams(); Bitstream[] originalBundleBitstreams = originalBundle.getBitstreams();
for (Bitstream originalBitstream : originalBundleBitstreams) { for (Bitstream originalBitstream : originalBundleBitstreams) {
String originalName = originalBitstream.getName(); String originalName = originalBitstream.getName();
long originalBitstreamBytes = originalBitstream.getSize(); Long originalBitstreamBytes = originalBitstream.getSize();
/* /*
- check if the original file name is the same as the thumbnail name minus the extra ".jpg" - check if the original file name is the same as the thumbnail name minus the extra ".jpg"
@ -125,7 +116,7 @@ public class FixJpgJpgThumbnails {
System.out.println(item.getHandle() + ": replacing " + thumbnailName + " with " + originalName); System.out.println(item.getHandle() + ": replacing " + thumbnailName + " with " + originalName);
//add the original bitstream to the THUMBNAIL bundle //add the original bitstream to the THUMBNAIL bundle
bundleService.addBitstream(context, thumbnailBundle, originalBitstream); thumbnailBundle.addBitstream(originalBitstream);
//remove the original bitstream from the ORIGINAL bundle //remove the original bitstream from the ORIGINAL bundle
originalBundle.removeBitstream(originalBitstream); originalBundle.removeBitstream(originalBitstream);
//remove the JpgJpg bitstream from the THUMBNAIL bundle //remove the JpgJpg bitstream from the THUMBNAIL bundle