mirror of
				https://github.com/ilri/cgspace-java-helpers.git
				synced 2025-11-04 06:39:09 +01:00 
			
		
		
		
	Compare commits
	
		
			11 Commits
		
	
	
		
	
	| Author | SHA1 | Date | |
|---|---|---|---|
| 
						
						
							
						
						d5cf51c464
	
				 | 
					
					
						|||
| 
						
						
							
						
						98c7cfb3a5
	
				 | 
					
					
						|||
| 
						
						
							
						
						58365cdfda
	
				 | 
					
					
						|||
| 
						
						
							
						
						7190b751e1
	
				 | 
					
					
						|||
| 
						
						
							
						
						34acc351a5
	
				 | 
					
					
						|||
| 
						
						
							
						
						ec293b3b28
	
				 | 
					
					
						|||
| 
						
						
							
						
						31cd979b61
	
				 | 
					
					
						|||
| 
						
						
							
						
						fce81c6003
	
				 | 
					
					
						|||
| 
						
						
							
						
						26d3cbd778
	
				 | 
					
					
						|||
| 
						
						
							
						
						fdc910f93b
	
				 | 
					
					
						|||
| 
						
						
							
						
						e0d514e797
	
				 | 
					
					
						
							
								
								
									
										19
									
								
								CHANGELOG.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										19
									
								
								CHANGELOG.md
									
									
									
									
									
										Normal file
									
								
							@@ -0,0 +1,19 @@
 | 
			
		||||
# Changelog
 | 
			
		||||
All notable changes to this project will be documented in this file.
 | 
			
		||||
 | 
			
		||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 | 
			
		||||
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 | 
			
		||||
 | 
			
		||||
## [5.3] - 2020-08-07
 | 
			
		||||
### Changed
 | 
			
		||||
- Make sure `FixJpgJpgThumbnails` only replaces thumbnails where the original is less than ~100KiB
 | 
			
		||||
- Make sure `FixJpgJpgThumbnails` only replaces thumbnails if the item type is not `Infographic` (because the JPG in the ORIGINAL bundle is the "real" file and it's OK that the thumbnail is ".jpg.jpg")
 | 
			
		||||
 | 
			
		||||
## [5.2] - 2020-08-06
 | 
			
		||||
### Changed
 | 
			
		||||
- Make `FixJpgJpgThumbnails` helper check for files named "JPG" as well as "jpg" (case insensitive)
 | 
			
		||||
- Make `FixJpgJpgThumbnails` helper replace thumbnails with description `IM Thumbnail` as well as `Generated Thumbnail`
 | 
			
		||||
 | 
			
		||||
## [5.1] - 2020-08-06
 | 
			
		||||
### Added
 | 
			
		||||
- Add `FixJpgJpgThumbnails` helper to replace ".jpg.jpg" thumbnails with their originals
 | 
			
		||||
							
								
								
									
										38
									
								
								README.md
									
									
									
									
									
								
							
							
						
						
									
										38
									
								
								README.md
									
									
									
									
									
								
							@@ -15,7 +15,7 @@ To use these curation tasks in a DSpace project add the following dependency to
 | 
			
		||||
<dependency>
 | 
			
		||||
  <groupId>io.github.ilri.cgspace</groupId>
 | 
			
		||||
  <artifactId>cgspace-java-helpers</artifactId>
 | 
			
		||||
  <version>5.1</version>
 | 
			
		||||
  <version>5.4-SNAPSHOT</version>
 | 
			
		||||
</dependency>
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
@@ -31,42 +31,14 @@ $ mvn package
 | 
			
		||||
Copy the resulting jar to the DSpace `lib` directory:
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
$ cp target/cgspace-java-helpers-5.1.jar ~/dspace/lib
 | 
			
		||||
$ cp target/cgspace-java-helpers-5.4-SNAPSHOT.jar ~/dspace/lib
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
## Configuration
 | 
			
		||||
Add the curation task to DSpace's `config/modules/curate.cfg`:
 | 
			
		||||
Please refer to the appropriate README.md file:
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
plugin.named.org.dspace.curate.CurationTask = \
 | 
			
		||||
...
 | 
			
		||||
    io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger \
 | 
			
		||||
    io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger.force
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
And then add a configuration file for the task in `config/modules/countrycodetagger.cfg`:
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
# name of the field containing ISO 3166-1 country names
 | 
			
		||||
iso3166.field = cg.coverage.country
 | 
			
		||||
 | 
			
		||||
# name of the field containing ISO 3166-1 Alpha2 country codes
 | 
			
		||||
iso3166-alpha2.field = cg.coverage.iso3166-alpha2
 | 
			
		||||
 | 
			
		||||
# only add country codes if an item doesn't have any (default false)
 | 
			
		||||
#forceupdate = false
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
*Note*: DSpace's curation system supports "profiles" where you can use the same task with different options, for example above I have a normal country code tagger and a "force" variant. To use the "force" variant you create a new configuration file with the overridden options in `config/modules/countrycodetagger.force.cfg`. The "force" profile clears all existing country codes and updates everything.
 | 
			
		||||
 | 
			
		||||
## Invocation
 | 
			
		||||
Once the jar is installed and you have added appropriate configuration in `~/dspace/config/modules`:
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
$ ~/dspace/bin/dspace curate -t countrycodetagger -i 10568/3 -r - -l 500 -s object
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
*Note*: it is very important to set the cache limit (`-l`) and the database transaction scope to something sensible (`object`) if you're curating a community or collection with more than a few hundred items.
 | 
			
		||||
- Curation Tasks: [src/main/java/io/github/ilri/cgspace/ctasks/README.md](https://github.com/ilri/cgspace-java-helpers/blob/dspace5/src/main/java/io/github/ilri/cgspace/ctasks/README.md)
 | 
			
		||||
- Scripts: [src/main/java/io/github/ilri/cgspace/scripts/README.md](https://github.com/ilri/cgspace-java-helpers/blob/dspace5/src/main/java/io/github/ilri/cgspace/scripts/README.md)
 | 
			
		||||
 | 
			
		||||
## Notes
 | 
			
		||||
This project was initially created according to the [Maven Getting Started Guide](https://maven.apache.org/guides/getting-started/):
 | 
			
		||||
 
 | 
			
		||||
							
								
								
									
										2
									
								
								pom.xml
									
									
									
									
									
								
							
							
						
						
									
										2
									
								
								pom.xml
									
									
									
									
									
								
							@@ -6,7 +6,7 @@
 | 
			
		||||
 | 
			
		||||
  <groupId>io.github.ilri.cgspace</groupId>
 | 
			
		||||
  <artifactId>cgspace-java-helpers</artifactId>
 | 
			
		||||
  <version>5.2</version>
 | 
			
		||||
  <version>5.4-SNAPSHOT</version>
 | 
			
		||||
 | 
			
		||||
  <name>cgspace-java-helpers</name>
 | 
			
		||||
  <url>https://github.com/ilri/cgspace-java-helpers</url>
 | 
			
		||||
 
 | 
			
		||||
@@ -35,6 +35,11 @@ import java.sql.SQLException;
 | 
			
		||||
import java.util.ArrayList;
 | 
			
		||||
import java.util.List;
 | 
			
		||||
 | 
			
		||||
/**
 | 
			
		||||
 * @author Alan Orth for the International Livestock Research Institute
 | 
			
		||||
 * @version 5.1
 | 
			
		||||
 * @since 1.0
 | 
			
		||||
*/
 | 
			
		||||
public class CountryCodeTagger extends AbstractCurationTask
 | 
			
		||||
{
 | 
			
		||||
    public class CountryCodeTaggerConfig {
 | 
			
		||||
 
 | 
			
		||||
							
								
								
									
										74
									
								
								src/main/java/io/github/ilri/cgspace/ctasks/README.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										74
									
								
								src/main/java/io/github/ilri/cgspace/ctasks/README.md
									
									
									
									
									
										Normal file
									
								
							@@ -0,0 +1,74 @@
 | 
			
		||||
# Curation Tasks
 | 
			
		||||
DSpace curation tasks used on the [CGSpace](https://cgspace.cgiar.org) institutional repository:
 | 
			
		||||
 | 
			
		||||
- **CountryCodeTagger**: add ISO 3166-1 Alpha2 country codes to items based on their existing country metadata
 | 
			
		||||
 | 
			
		||||
Tested on DSpace 5.8. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC5x/Curation+System).
 | 
			
		||||
 | 
			
		||||
## Build and Install
 | 
			
		||||
 | 
			
		||||
### Integrate into DSpace Build
 | 
			
		||||
To use these curation tasks in a DSpace project add the following dependency to `dspace/modules/additions/pom.xml`:
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
<dependency>
 | 
			
		||||
  <groupId>io.github.ilri.cgspace</groupId>
 | 
			
		||||
  <artifactId>cgspace-java-helpers</artifactId>
 | 
			
		||||
  <version>5.3</version>
 | 
			
		||||
</dependency>
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
The jar will be copied to all DSpace applications.
 | 
			
		||||
 | 
			
		||||
### Manual Build and Install
 | 
			
		||||
To build the standalone jar:
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
$ mvn package
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Copy the resulting jar to the DSpace `lib` directory:
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
$ cp target/cgspace-java-helpers-5.3.jar ~/dspace/lib
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
## Configuration
 | 
			
		||||
Add the curation task to DSpace's `config/modules/curate.cfg`:
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
plugin.named.org.dspace.curate.CurationTask = \
 | 
			
		||||
...
 | 
			
		||||
    io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger \
 | 
			
		||||
    io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger.force
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
And then add a configuration file for the task in `config/modules/countrycodetagger.cfg`:
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
# name of the field containing ISO 3166-1 country names
 | 
			
		||||
iso3166.field = cg.coverage.country
 | 
			
		||||
 | 
			
		||||
# name of the field containing ISO 3166-1 Alpha2 country codes
 | 
			
		||||
iso3166-alpha2.field = cg.coverage.iso3166-alpha2
 | 
			
		||||
 | 
			
		||||
# only add country codes if an item doesn't have any (default false)
 | 
			
		||||
#forceupdate = false
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
*Note*: DSpace's curation system supports "profiles" where you can use the same task with different options, for example above I have a normal country code tagger and a "force" variant. To use the "force" variant you create a new configuration file with the overridden options in `config/modules/countrycodetagger.force.cfg`. The "force" profile clears all existing country codes and updates everything.
 | 
			
		||||
 | 
			
		||||
## Invocation
 | 
			
		||||
Once the jar is installed and you have added appropriate configuration in `~/dspace/config/modules`:
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
$ ~/dspace/bin/dspace curate -t countrycodetagger -i 10568/3 -r - -l 500 -s object
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
*Note*: it is very important to set the cache limit (`-l`) and the database transaction scope to something sensible (`object`) if you're curating a community or collection with more than a few hundred items.
 | 
			
		||||
 | 
			
		||||
## TODO
 | 
			
		||||
 | 
			
		||||
- Make sure this doesn't work on items in the workflow
 | 
			
		||||
- Check for existence of metadata field before trying to add metadata
 | 
			
		||||
- Add tests
 | 
			
		||||
@@ -13,8 +13,8 @@ import java.sql.SQLException;
 | 
			
		||||
/**
 | 
			
		||||
 * @author Andrea Schweer schweer@waikato.ac.nz for the LCoNZ Institutional Research Repositories
 | 
			
		||||
 * @author Alan Orth for the International Livestock Research Institute
 | 
			
		||||
 * @version 5.1-SNAPSHOT
 | 
			
		||||
 * @since 5.1-SNAPSHOT
 | 
			
		||||
 * @version 5.4
 | 
			
		||||
 * @since 5.1
 | 
			
		||||
 */
 | 
			
		||||
public class FixJpgJpgThumbnails {
 | 
			
		||||
 | 
			
		||||
@@ -73,6 +73,17 @@ public class FixJpgJpgThumbnails {
 | 
			
		||||
	}
 | 
			
		||||
 | 
			
		||||
	private static void processItem(Item item) throws SQLException, AuthorizeException, IOException {
 | 
			
		||||
		// Some bitstreams like Infographics are large JPGs and put in the ORIGINAL bundle on purpose so we shouldn't
 | 
			
		||||
		// swap them.
 | 
			
		||||
		Metadatum[] itemTypes = item.getMetadataByMetadataString("dc.type");
 | 
			
		||||
		boolean itemHasInfographic = false;
 | 
			
		||||
		for (Metadatum itemType: itemTypes) {
 | 
			
		||||
			if (itemType.value.equals("Infographic")) {
 | 
			
		||||
				itemHasInfographic = true;
 | 
			
		||||
				break;
 | 
			
		||||
			}
 | 
			
		||||
		}
 | 
			
		||||
 | 
			
		||||
		Bundle[] thumbnailBundles = item.getBundles("THUMBNAIL");
 | 
			
		||||
		for (Bundle thumbnailBundle : thumbnailBundles) {
 | 
			
		||||
			Bitstream[] thumbnailBundleBitstreams = thumbnailBundle.getBitstreams();
 | 
			
		||||
@@ -84,11 +95,25 @@ public class FixJpgJpgThumbnails {
 | 
			
		||||
					for (Bundle originalBundle : originalBundles) {
 | 
			
		||||
						Bitstream[] originalBundleBitstreams = originalBundle.getBitstreams();
 | 
			
		||||
 | 
			
		||||
						for(Bitstream originalBitstream : originalBundleBitstreams) {
 | 
			
		||||
						for (Bitstream originalBitstream : originalBundleBitstreams) {
 | 
			
		||||
							String originalName = originalBitstream.getName();
 | 
			
		||||
 | 
			
		||||
							//check if the original file name is the same as the thumbnail name minus the extra ".jpg"
 | 
			
		||||
							if (originalName.equalsIgnoreCase(StringUtils.removeEndIgnoreCase(thumbnailName, ".jpg")) && ("Generated Thumbnail".equals(thumbnailBitstream.getDescription()) || "IM Thumbnail".equals(thumbnailBitstream.getDescription()))) {
 | 
			
		||||
							long originalBitstreamBytes = originalBitstream.getSize();
 | 
			
		||||
 | 
			
		||||
							/*
 | 
			
		||||
							- check if the original file name is the same as the thumbnail name minus the extra ".jpg"
 | 
			
		||||
							- check if the thumbnail description indicates it was automatically generated
 | 
			
		||||
							- check if the item has dc.type Infographic (JPG could be the "real" item!)
 | 
			
		||||
							- check if the original bitstream is less than ~100KiB
 | 
			
		||||
							    - Note: in my tests there were 4022 items with ".jpg.jpg" thumbnails totaling 394549249
 | 
			
		||||
							      bytes for an average of about 98KiB so ~100KiB seems like a good cut off
 | 
			
		||||
							*/
 | 
			
		||||
							if (
 | 
			
		||||
									originalName.equalsIgnoreCase(StringUtils.removeEndIgnoreCase(thumbnailName, ".jpg"))
 | 
			
		||||
									&& ("Generated Thumbnail".equals(thumbnailBitstream.getDescription()) || "IM Thumbnail".equals(thumbnailBitstream.getDescription()))
 | 
			
		||||
									&& !itemHasInfographic
 | 
			
		||||
									&& originalBitstreamBytes < 100000
 | 
			
		||||
							) {
 | 
			
		||||
								System.out.println(item.getHandle() + ": replacing " + thumbnailName + " with " + originalName);
 | 
			
		||||
 | 
			
		||||
								//add the original bitstream to the THUMBNAIL bundle
 | 
			
		||||
 
 | 
			
		||||
							
								
								
									
										41
									
								
								src/main/java/io/github/ilri/cgspace/scripts/README.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										41
									
								
								src/main/java/io/github/ilri/cgspace/scripts/README.md
									
									
									
									
									
										Normal file
									
								
							@@ -0,0 +1,41 @@
 | 
			
		||||
# Scripts
 | 
			
		||||
Java-based helpers used on the [CGSpace](https://cgspace.cgiar.org) institutional repository:
 | 
			
		||||
 | 
			
		||||
- **FixJpgJpgThumbnails**: Fix low-quality ".jpg.jpg" thumbnails by replacing them with their originals
 | 
			
		||||
 | 
			
		||||
Tested on DSpace 5.8. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC5x/Curation+System).
 | 
			
		||||
 | 
			
		||||
## Build and Install
 | 
			
		||||
 | 
			
		||||
### Integrate into DSpace Build
 | 
			
		||||
To use these curation tasks in a DSpace project add the following dependency to `dspace/modules/additions/pom.xml`:
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
<dependency>
 | 
			
		||||
  <groupId>io.github.ilri.cgspace</groupId>
 | 
			
		||||
  <artifactId>cgspace-java-helpers</artifactId>
 | 
			
		||||
  <version>5.3</version>
 | 
			
		||||
</dependency>
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
The jar will be copied to all DSpace applications.
 | 
			
		||||
 | 
			
		||||
### Manual Build and Install
 | 
			
		||||
To build the standalone jar:
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
$ mvn package
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Copy the resulting jar to the DSpace `lib` directory:
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
$ cp target/cgspace-java-helpers-5.3.jar ~/dspace/lib
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
### Invocation
 | 
			
		||||
The script only takes one argument, which is a community, collection, or item:
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
$ dspace dsrun io.github.ilri.cgspace.scripts.FixJpgJpgThumbnails 10568/83389
 | 
			
		||||
```
 | 
			
		||||
		Reference in New Issue
	
	Block a user