Compare commits

..

No commits in common. "692a62b454d47b94b40bdc8ba6a1568cdc7c5777" and "ee6518035e04d7e8bc78ccd2615420dbe3226ed1" have entirely different histories.

12 changed files with 34 additions and 146 deletions

View File

@ -4,17 +4,6 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [7.6.1.2] - 2024-04-25
### Changed
- Remove reporting from curation tasks since "results" are enough
## [7.6.1.1] - 2024-04-23
### Added
- New `NormalizeDOIs` curation task
### Updated
- Update dependencies in `pom.xml`
## [7.6.1] - 2024-01-02
### Changed
- Pin gson dependency to 2.9.0 to avoid dependency convergence issues with DSpace

View File

@ -4,7 +4,6 @@ DSpace curation tasks and other Java-based helpers used on the [CGSpace](https:/
- **CountryCodeTagger**: add ISO 3166-1 Alpha2 country codes to items based on their existing country metadata
- **FixJpgJpgThumbnails**: fix low-quality ".jpg.jpg" thumbnails by replacing them with their originals
- **FixLowQualityThumbnails**: remove low-quality thumbnails when PDF bitstreams are present
- **NormalizeDOIs**: normalize DOIs by stripping whitespace, lowercasing, and converting to https://doi.org/ format
Tested on DSpace 7.6. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC7x/Curation+System).
@ -17,7 +16,7 @@ To use these curation tasks in a DSpace project add the following dependency to
<dependency>
<groupId>io.github.ilri.cgspace</groupId>
<artifactId>cgspace-java-helpers</artifactId>
<version>7.6.1.2-SNAPSHOT</version>
<version>7.6.1-SNAPSHOT</version>
</dependency>
```
@ -33,17 +32,18 @@ $ mvn package
Copy the resulting jar to the DSpace `lib` directory:
```console
$ cp target/cgspace-java-helpers-7.6.1.2-SNAPSHOT.jar ~/dspace/lib/
$ cp target/cgspace-java-helpers-7.6.1-SNAPSHOT.jar ~/dspace/lib/
```
## Configuration
Please refer to the appropriate README.md file:
- Curation Tasks: [src/main/java/io/github/ilri/cgspace/ctasks/README.md](https://github.com/ilri/cgspace-java-helpers/blob/dspace7/src/main/java/io/github/ilri/cgspace/ctasks/README.md)
- Scripts: [src/main/java/io/github/ilri/cgspace/scripts/README.md](https://github.com/ilri/cgspace-java-helpers/blob/dspace7/src/main/java/io/github/ilri/cgspace/scripts/README.md)
- Curation Tasks: [src/main/java/io/github/ilri/cgspace/ctasks/README.md](https://github.com/ilri/cgspace-java-helpers/blob/dspace6/src/main/java/io/github/ilri/cgspace/ctasks/README.md)
- Scripts: [src/main/java/io/github/ilri/cgspace/scripts/README.md](https://github.com/ilri/cgspace-java-helpers/blob/dspace6/src/main/java/io/github/ilri/cgspace/scripts/README.md)
## TODO
- Add a curation task to normalize DOIs to "https://doi.org" format
- Migrate from maven-deploy-plugin to nexus-staging-maven-plugin, see: https://central.sonatype.org/publish/publish-maven/#nexus-staging-maven-plugin-for-deployment-and-release
- Stop using oss-parent, see: https://central.sonatype.org/publish/publish-maven/#create-a-ticket-with-sonatype

17
pom.xml
View File

@ -6,7 +6,7 @@
<groupId>io.github.ilri.cgspace</groupId>
<artifactId>cgspace-java-helpers</artifactId>
<version>7.6.1.2-SNAPSHOT</version>
<version>7.6.1-SNAPSHOT</version>
<name>cgspace-java-helpers</name>
<url>https://github.com/ilri/cgspace-java-helpers</url>
@ -14,7 +14,7 @@
<licenses>
<license>
<name>GPL-3.0-only</name>
<url>https://spdx.org/licenses/GPL-3.0-only.html</url>
<url>https://spdx.org/licenses/GPL-3.0-or-later.html</url>
</license>
</licenses>
@ -28,7 +28,8 @@
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.release>11</maven.compiler.release>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
</properties>
<dependencies>
@ -47,8 +48,8 @@
<scm>
<connection>scm:git:git://github.com/ilri/cgspace-java-helpers.git</connection>
<developerConnection>scm:git:ssh://github.com:ilri/cgspace-java-helpers.git</developerConnection>
<url>https://github.com/ilri/cgspace-java-helpers</url>
<developerConnection>scm:git:ssh://github.com:nanosai/cgspace-java-helpers.git</developerConnection>
<url>http://github.com/ilri/cgspace-java-helpers</url>
</scm>
<distributionManagement>
@ -77,15 +78,15 @@
</plugin>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.13.0</version>
<version>3.12.1</version>
</plugin>
<plugin>
<artifactId>maven-surefire-plugin</artifactId>
<version>3.2.5</version>
<version>3.2.3</version>
</plugin>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<version>3.4.1</version>
<version>3.3.0</version>
</plugin>
<plugin>
<artifactId>maven-install-plugin</artifactId>

View File

@ -1,7 +1,7 @@
/*
* Copyright (C) 2020 Alan Orth
*
* SPDX-License-Identifier: GPL-3.0-only
* SPDX-License-Identifier: GPL-3.0-or-later
*/
package io.github.ilri.cgspace.ctasks;

View File

@ -1,7 +1,7 @@
/*
* Copyright (C) 2020 Alan Orth
*
* SPDX-License-Identifier: GPL-3.0-only
* SPDX-License-Identifier: GPL-3.0-or-later
*/
package io.github.ilri.cgspace.ctasks;

View File

@ -1,7 +1,7 @@
/*
* Copyright (C) 2020 Alan Orth
*
* SPDX-License-Identifier: GPL-3.0-only
* SPDX-License-Identifier: GPL-3.0-or-later
*/
package io.github.ilri.cgspace.ctasks;
@ -26,13 +26,6 @@ import java.util.ArrayList;
import java.util.List;
import java.util.Objects;
/*
* Add ISO 3166-1 Alpha2 country codes to items based on their existing country metadata.
*
* @author Alan Orth for the International Livestock Research Institute
* @version 7.6.1.2
* @since 5.1
*/
public class CountryCodeTagger extends AbstractCurationTask {
public class CountryCodeTaggerConfig {
private final String isocodesJsonPath = "/io/github/ilri/cgspace/ctasks/iso_3166-1.json";
@ -84,6 +77,7 @@ public class CountryCodeTagger extends AbstractCurationTask {
}
setResult(alpha2Result.getResult());
report(alpha2Result.getResult());
}
return alpha2Result.getStatus();
@ -92,13 +86,14 @@ public class CountryCodeTagger extends AbstractCurationTask {
public CountryCodeTaggerResult performAlpha2(Item item, CountryCodeTaggerConfig config)
throws IOException, SQLException {
CountryCodeTaggerResult alpha2Result = new CountryCodeTaggerResult();
String itemHandle = item.getHandle();
List<MetadataValue> itemCountries =
itemService.getMetadataByMetadataString(item, config.iso3166Field);
// skip items that don't have country metadata
if (itemCountries.isEmpty()) {
alpha2Result.setResult("No countries, skipping.");
alpha2Result.setResult(itemHandle + ": no countries, skipping.");
alpha2Result.setStatus(Curator.CURATE_SKIP);
} else {
Gson gson = new Gson();
@ -177,20 +172,21 @@ public class CountryCodeTagger extends AbstractCurationTask {
itemService.update(Curator.curationContext(), item);
} catch (SQLException | AuthorizeException sqle) {
config.log.debug(sqle.getMessage());
alpha2Result.setResult("Error");
alpha2Result.setResult(itemHandle + ": error");
alpha2Result.setStatus(Curator.CURATE_ERROR);
}
alpha2Result.setResult(
"Added "
itemHandle
+ ": added "
+ newAlpha2Codes.size()
+ " alpha2 country code(s)");
} else {
alpha2Result.setResult("No matching countries found");
alpha2Result.setResult(itemHandle + ": no matching countries found");
}
alpha2Result.setStatus(Curator.CURATE_SUCCESS);
} else {
alpha2Result.setResult("Item already has country codes, skipping unless forced");
alpha2Result.setResult(itemHandle + ": item has country codes, skipping");
alpha2Result.setStatus(Curator.CURATE_SKIP);
}
}

View File

@ -1,7 +1,7 @@
/*
* Copyright (C) 2020 Alan Orth
*
* SPDX-License-Identifier: GPL-3.0-only
* SPDX-License-Identifier: GPL-3.0-or-later
*/
package io.github.ilri.cgspace.ctasks;

View File

@ -1,96 +0,0 @@
/*
* Copyright (C) 2024 Alan Orth
*
* SPDX-License-Identifier: GPL-3.0-only
*/
package io.github.ilri.cgspace.ctasks;
import org.dspace.content.DSpaceObject;
import org.dspace.content.Item;
import org.dspace.content.MetadataValue;
import org.dspace.core.Constants;
import org.dspace.curate.AbstractCurationTask;
import org.dspace.curate.Curator;
import org.dspace.curate.Suspendable;
import java.io.IOException;
import java.util.List;
/**
* Attempt to normalize DOIs by stripping whitespace, lower casing, and
* converting to <code>https://doi.org</code> format. The reason is that DOIs are case
* insensitive and must be unique, which we can only guarantee if they are
* normalized to the same format.
*
* See: <a href="https://www.crossref.org/documentation/member-setup/constructing-your-dois/">https://www.crossref.org/documentation/member-setup/constructing-your-dois/</a>
*
* TODO: set curation to failed if invalid DOI submitted (and configure to reject in workflow)
* TODO: allow operation on communities and collections (currently only works on items)
*
* @author Alan Orth for the International Livestock Research Institute
* @version 7.6.1.2
* @since 7.6.1.1
*/
@Suspendable
public class NormalizeDOIs extends AbstractCurationTask {
@Override
public int perform(DSpaceObject dso) throws IOException {
if (dso.getType() == Constants.ITEM) {
Item item = (Item) dso;
String result;
// Keep track of whether we change metadata, and how many
boolean metadataChanged = false;
int count = 0;
// Hard coding the metadata field for now since I can't figure out how to read the taskProperty
List<MetadataValue> itemDOIs = itemService.getMetadataByMetadataString(item, "cg.identifier.doi");
// skip items that don't have DOIs
if (itemDOIs.isEmpty()) {
setResult("No DOIs, skipping");
return Curator.CURATE_SKIP;
} else {
for (MetadataValue itemDOI : itemDOIs) {
String newDOI = getNormalizedDOI(itemDOI);
// Check if the normalized DOI is different than the original
if (!newDOI.equals(itemDOI.getValue())) {
itemDOI.setValue(newDOI);
metadataChanged = true;
count++;
}
}
}
if (metadataChanged) {
result = "Normalized " + count + " DOI(s)";
} else {
result = "All DOIs already normalized";
}
setResult(result);
return Curator.CURATE_SUCCESS;
} else {
setResult("Object skipped");
return Curator.CURATE_SKIP;
}
}
private static String getNormalizedDOI(MetadataValue itemDOI) {
// 1. Convert to lowercase
String newDOI = itemDOI.getValue().toLowerCase();
// 2. Strip leading and trailing whitespace
newDOI = newDOI.strip();
// 3. Convert to HTTPS
newDOI = newDOI.replace("http://", "https://");
// 4. Prefer doi.org to dx.doi.org
newDOI = newDOI.replace("dx.doi.org", "doi.org");
// 5. Replace values like doi: 10.11648/j.jps.20140201.14
newDOI = newDOI.replaceAll("^doi: 10\\.", "https://doi.org/10.");
// 6. Replace values like 10.3390/foods12010115
newDOI = newDOI.replaceAll("^10\\.", "https://doi.org/10.");
return newDOI;
}
}

View File

@ -2,7 +2,6 @@
DSpace curation tasks used on the [CGSpace](https://cgspace.cgiar.org) institutional repository:
- **CountryCodeTagger**: add ISO 3166-1 Alpha2 country codes to items based on their existing country metadata
- **NormalizeDOIs**: normalize DOIs by stripping whitespace, lowercasing, and converting to https://doi.org/ format
Tested on DSpace 7.6. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC5x/Curation+System).
@ -15,7 +14,7 @@ To use these curation tasks in a DSpace project add the following dependency to
<dependency>
<groupId>io.github.ilri.cgspace</groupId>
<artifactId>cgspace-java-helpers</artifactId>
<version>7.6.1.2-SNAPSHOT</version>
<version>7.6.1-SNAPSHOT</version>
</dependency>
```
@ -31,16 +30,15 @@ $ mvn package
Copy the resulting jar to the DSpace `lib` directory:
```
$ cp target/cgspace-java-helpers-7.6.1.2-SNAPSHOT.jar ~/dspace/lib/
$ cp target/cgspace-java-helpers-7.6.1-SNAPSHOT.jar ~/dspace/lib/
```
## Configuration
Add the curation task(s) to DSpace's `config/modules/curate.cfg`:
Add the curation task to DSpace's `config/modules/curate.cfg`:
```
plugin.named.org.dspace.curate.CurationTask = io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger
plugin.named.org.dspace.curate.CurationTask = io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger.force
plugin.named.org.dspace.curate.CurationTask = io.github.ilri.cgspace.ctasks.NormalizeDOIs = normalizedois
```
And then add the following variables to your `local.cfg` or some other [configuration file that is included](https://wiki.lyrasis.org/display/DSDOC6x/Configuration+Reference#ConfigurationReference-IncludingotherPropertyFiles):
@ -62,7 +60,7 @@ countrycodetagger.iso3166-alpha2.field = cg.coverage.iso3166-alpha2
Once the jar is installed and you have added appropriate configuration in `~/dspace/config/modules`:
```
$ ~/dspace/bin/dspace curate -e eperson@repo.org -t countrycodetagger -i 10568/3 -r - -s object
$ ~/dspace/bin/dspace curate -t countrycodetagger -i 10568/3 -r - -s object
```
*Note*: it is very important to set the database transaction scope to something sensible (`object`) if you're curating a community or collection with more than a few hundred items.

View File

@ -1,7 +1,7 @@
/*
* Copyright (C) 2020 Alan Orth
*
* SPDX-License-Identifier: GPL-3.0-only
* SPDX-License-Identifier: GPL-3.0-or-later
*/
package io.github.ilri.cgspace.scripts;

View File

@ -1,7 +1,7 @@
/*
* Copyright (C) 2022 Alan Orth
*
* SPDX-License-Identifier: GPL-3.0-only
* SPDX-License-Identifier: GPL-3.0-or-later
*/
package io.github.ilri.cgspace.scripts;

View File

@ -15,7 +15,7 @@ To use these curation tasks in a DSpace project add the following dependency to
<dependency>
<groupId>io.github.ilri.cgspace</groupId>
<artifactId>cgspace-java-helpers</artifactId>
<version>7.6.1.2-SNAPSHOT</version>
<version>7.6.1-SNAPSHOT</version>
</dependency>
```
@ -31,7 +31,7 @@ $ mvn package
Copy the resulting jar to the DSpace `lib` directory:
```console
$ cp target/cgspace-java-helpers-7.6.1.2-SNAPSHOT.jar ~/dspace/lib/
$ cp target/cgspace-java-helpers-7.6.1-SNAPSHOT.jar ~/dspace/lib/
```
## Invocation