8 Commits
v5.1 ... v5.3

Author SHA1 Message Date
fce81c6003 Version 5.3 2020-08-07 09:57:19 +03:00
26d3cbd778 src/main/java: Tune FixJpgJpgThumbnails a bit
Make sure we don't modify thumbnails if the item is an Infographic
because the JPG in the ORIGINAL bundle might actually be the "real"
file, in which case the THUMBNAIL bundle would have a legitimate
".jpg.jpg" file.

Also, limit the criteria for replacement to original bitstreams
that are less than 100KiB. In my tests I found that we had 4,022
items with ".jpg.jpg" thumbnails, and the average file size of the
originals in those items was 98KiB. Without considering the large
inforgraphics, which are several megabytes apiece, the average of
the remaining 3,765 originals was ~20KiB so 100KiB should be very
safe.
2020-08-07 09:50:03 +03:00
fdc910f93b README.md: Update versions 2020-08-06 16:23:17 +03:00
e0d514e797 pom.xml: Move version to 5.3-SNAPSHOT 2020-08-06 16:17:05 +03:00
fd893d8c4e pom.xml: Release version 5.2 2020-08-06 16:16:13 +03:00
2263ac27e8 src/main/java: Handle more corner cases in FixJpgJpgThumbnails.java
We should make sure we are catching .JPG and .jpg. Also, we should
check for Generated Thumbnails as well as IM Thumbnail.
2020-08-06 16:13:51 +03:00
cf7012d698 pom.xml: Change version to 5.2-SNAPSHOT 2020-08-06 16:13:27 +03:00
7edc60e6ca README.md: Use badge for dspace5 branch 2020-08-06 15:47:33 +03:00
3 changed files with 34 additions and 10 deletions

View File

@ -1,4 +1,4 @@
# CGSpace Java Helpers [![Build Status](https://travis-ci.org/ilri/cgspace-java-helpers.svg?branch=master)](https://travis-ci.org/ilri/dspace-curation-tasks)
# CGSpace Java Helpers [![Build Status](https://travis-ci.org/ilri/cgspace-java-helpers.svg?branch=dspace5)](https://travis-ci.org/ilri/dspace-curation-tasks)
DSpace curation tasks and other Java-based helpers used on the [CGSpace](https://cgspace.cgiar.org) institutional repository:
- **CountryCodeTagger**: add ISO 3166-1 Alpha2 country codes to items based on their existing country metadata
@ -15,7 +15,7 @@ To use these curation tasks in a DSpace project add the following dependency to
<dependency>
<groupId>io.github.ilri.cgspace</groupId>
<artifactId>cgspace-java-helpers</artifactId>
<version>5.1</version>
<version>5.3</version>
</dependency>
```
@ -31,7 +31,7 @@ $ mvn package
Copy the resulting jar to the DSpace `lib` directory:
```
$ cp target/cgspace-java-helpers-5.1.jar ~/dspace/lib
$ cp target/cgspace-java-helpers-5.3.jar ~/dspace/lib
```
## Configuration

View File

@ -6,7 +6,7 @@
<groupId>io.github.ilri.cgspace</groupId>
<artifactId>cgspace-java-helpers</artifactId>
<version>5.1</version>
<version>5.3</version>
<name>cgspace-java-helpers</name>
<url>https://github.com/ilri/cgspace-java-helpers</url>

View File

@ -13,8 +13,8 @@ import java.sql.SQLException;
/**
* @author Andrea Schweer schweer@waikato.ac.nz for the LCoNZ Institutional Research Repositories
* @author Alan Orth for the International Livestock Research Institute
* @version 5.1-SNAPSHOT
* @since 5.1-SNAPSHOT
* @version 5.3
* @since 5.1
*/
public class FixJpgJpgThumbnails {
@ -73,22 +73,46 @@ public class FixJpgJpgThumbnails {
}
private static void processItem(Item item) throws SQLException, AuthorizeException, IOException {
// Some bitstreams like Infographics are large JPGs and put in the ORIGINAL bundle on purpose so we shouldn't
// swap them.
Metadatum[] itemTypes = item.getMetadataByMetadataString("dc.type");
Boolean itemHasInfographic = false;
for (Metadatum itemType: itemTypes) {
if (itemType.value.equals("Infographic")) {
itemHasInfographic = true;
}
}
Bundle[] thumbnailBundles = item.getBundles("THUMBNAIL");
for (Bundle thumbnailBundle : thumbnailBundles) {
Bitstream[] thumbnailBundleBitstreams = thumbnailBundle.getBitstreams();
for (Bitstream thumbnailBitstream : thumbnailBundleBitstreams) {
String thumbnailName = thumbnailBitstream.getName();
if (thumbnailName.contains(".jpg.jpg")) {
if (thumbnailName.toLowerCase().contains(".jpg.jpg")) {
Bundle[] originalBundles = item.getBundles("ORIGINAL");
for (Bundle originalBundle : originalBundles) {
Bitstream[] originalBundleBitstreams = originalBundle.getBitstreams();
for(Bitstream originalBitstream : originalBundleBitstreams) {
for (Bitstream originalBitstream : originalBundleBitstreams) {
String originalName = originalBitstream.getName();
//check if the original file name is the same as the thumbnail name minus the extra ".jpg"
if (originalName.equals(StringUtils.removeEndIgnoreCase(thumbnailName, ".jpg")) && "Generated Thumbnail".equals(thumbnailBitstream.getDescription())) {
Long originalBitstreamBytes = originalBitstream.getSize();
/*
- check if the original file name is the same as the thumbnail name minus the extra ".jpg"
- check if the thumbnail description indicates it was automatically generated
- check if the item has dc.type Infographic (JPG could be the "real" item!)
- check if the original bitstream is less than ~100KiB
- Note: in my tests there were 4022 items with ".jpg.jpg" thumbnails totaling 394549249
bytes for an average of about 98KiB so ~100KiB seems like a good cut off
*/
if (
originalName.equalsIgnoreCase(StringUtils.removeEndIgnoreCase(thumbnailName, ".jpg"))
&& ("Generated Thumbnail".equals(thumbnailBitstream.getDescription()) || "IM Thumbnail".equals(thumbnailBitstream.getDescription()))
&& !itemHasInfographic
&& originalBitstreamBytes < 100000
) {
System.out.println(item.getHandle() + ": replacing " + thumbnailName + " with " + originalName);
//add the original bitstream to the THUMBNAIL bundle