Compare commits

..

No commits in common. "692a62b454d47b94b40bdc8ba6a1568cdc7c5777" and "ee6518035e04d7e8bc78ccd2615420dbe3226ed1" have entirely different histories.

12 changed files with 34 additions and 146 deletions

View File

@ -4,17 +4,6 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [7.6.1.2] - 2024-04-25
### Changed
- Remove reporting from curation tasks since "results" are enough
## [7.6.1.1] - 2024-04-23
### Added
- New `NormalizeDOIs` curation task
### Updated
- Update dependencies in `pom.xml`
## [7.6.1] - 2024-01-02 ## [7.6.1] - 2024-01-02
### Changed ### Changed
- Pin gson dependency to 2.9.0 to avoid dependency convergence issues with DSpace - Pin gson dependency to 2.9.0 to avoid dependency convergence issues with DSpace

View File

@ -4,7 +4,6 @@ DSpace curation tasks and other Java-based helpers used on the [CGSpace](https:/
- **CountryCodeTagger**: add ISO 3166-1 Alpha2 country codes to items based on their existing country metadata - **CountryCodeTagger**: add ISO 3166-1 Alpha2 country codes to items based on their existing country metadata
- **FixJpgJpgThumbnails**: fix low-quality ".jpg.jpg" thumbnails by replacing them with their originals - **FixJpgJpgThumbnails**: fix low-quality ".jpg.jpg" thumbnails by replacing them with their originals
- **FixLowQualityThumbnails**: remove low-quality thumbnails when PDF bitstreams are present - **FixLowQualityThumbnails**: remove low-quality thumbnails when PDF bitstreams are present
- **NormalizeDOIs**: normalize DOIs by stripping whitespace, lowercasing, and converting to https://doi.org/ format
Tested on DSpace 7.6. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC7x/Curation+System). Tested on DSpace 7.6. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC7x/Curation+System).
@ -17,7 +16,7 @@ To use these curation tasks in a DSpace project add the following dependency to
<dependency> <dependency>
<groupId>io.github.ilri.cgspace</groupId> <groupId>io.github.ilri.cgspace</groupId>
<artifactId>cgspace-java-helpers</artifactId> <artifactId>cgspace-java-helpers</artifactId>
<version>7.6.1.2-SNAPSHOT</version> <version>7.6.1-SNAPSHOT</version>
</dependency> </dependency>
``` ```
@ -33,17 +32,18 @@ $ mvn package
Copy the resulting jar to the DSpace `lib` directory: Copy the resulting jar to the DSpace `lib` directory:
```console ```console
$ cp target/cgspace-java-helpers-7.6.1.2-SNAPSHOT.jar ~/dspace/lib/ $ cp target/cgspace-java-helpers-7.6.1-SNAPSHOT.jar ~/dspace/lib/
``` ```
## Configuration ## Configuration
Please refer to the appropriate README.md file: Please refer to the appropriate README.md file:
- Curation Tasks: [src/main/java/io/github/ilri/cgspace/ctasks/README.md](https://github.com/ilri/cgspace-java-helpers/blob/dspace7/src/main/java/io/github/ilri/cgspace/ctasks/README.md) - Curation Tasks: [src/main/java/io/github/ilri/cgspace/ctasks/README.md](https://github.com/ilri/cgspace-java-helpers/blob/dspace6/src/main/java/io/github/ilri/cgspace/ctasks/README.md)
- Scripts: [src/main/java/io/github/ilri/cgspace/scripts/README.md](https://github.com/ilri/cgspace-java-helpers/blob/dspace7/src/main/java/io/github/ilri/cgspace/scripts/README.md) - Scripts: [src/main/java/io/github/ilri/cgspace/scripts/README.md](https://github.com/ilri/cgspace-java-helpers/blob/dspace6/src/main/java/io/github/ilri/cgspace/scripts/README.md)
## TODO ## TODO
- Add a curation task to normalize DOIs to "https://doi.org" format
- Migrate from maven-deploy-plugin to nexus-staging-maven-plugin, see: https://central.sonatype.org/publish/publish-maven/#nexus-staging-maven-plugin-for-deployment-and-release - Migrate from maven-deploy-plugin to nexus-staging-maven-plugin, see: https://central.sonatype.org/publish/publish-maven/#nexus-staging-maven-plugin-for-deployment-and-release
- Stop using oss-parent, see: https://central.sonatype.org/publish/publish-maven/#create-a-ticket-with-sonatype - Stop using oss-parent, see: https://central.sonatype.org/publish/publish-maven/#create-a-ticket-with-sonatype

17
pom.xml
View File

@ -6,7 +6,7 @@
<groupId>io.github.ilri.cgspace</groupId> <groupId>io.github.ilri.cgspace</groupId>
<artifactId>cgspace-java-helpers</artifactId> <artifactId>cgspace-java-helpers</artifactId>
<version>7.6.1.2-SNAPSHOT</version> <version>7.6.1-SNAPSHOT</version>
<name>cgspace-java-helpers</name> <name>cgspace-java-helpers</name>
<url>https://github.com/ilri/cgspace-java-helpers</url> <url>https://github.com/ilri/cgspace-java-helpers</url>
@ -14,7 +14,7 @@
<licenses> <licenses>
<license> <license>
<name>GPL-3.0-only</name> <name>GPL-3.0-only</name>
<url>https://spdx.org/licenses/GPL-3.0-only.html</url> <url>https://spdx.org/licenses/GPL-3.0-or-later.html</url>
</license> </license>
</licenses> </licenses>
@ -28,7 +28,8 @@
<properties> <properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.release>11</maven.compiler.release> <maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
</properties> </properties>
<dependencies> <dependencies>
@ -47,8 +48,8 @@
<scm> <scm>
<connection>scm:git:git://github.com/ilri/cgspace-java-helpers.git</connection> <connection>scm:git:git://github.com/ilri/cgspace-java-helpers.git</connection>
<developerConnection>scm:git:ssh://github.com:ilri/cgspace-java-helpers.git</developerConnection> <developerConnection>scm:git:ssh://github.com:nanosai/cgspace-java-helpers.git</developerConnection>
<url>https://github.com/ilri/cgspace-java-helpers</url> <url>http://github.com/ilri/cgspace-java-helpers</url>
</scm> </scm>
<distributionManagement> <distributionManagement>
@ -77,15 +78,15 @@
</plugin> </plugin>
<plugin> <plugin>
<artifactId>maven-compiler-plugin</artifactId> <artifactId>maven-compiler-plugin</artifactId>
<version>3.13.0</version> <version>3.12.1</version>
</plugin> </plugin>
<plugin> <plugin>
<artifactId>maven-surefire-plugin</artifactId> <artifactId>maven-surefire-plugin</artifactId>
<version>3.2.5</version> <version>3.2.3</version>
</plugin> </plugin>
<plugin> <plugin>
<artifactId>maven-jar-plugin</artifactId> <artifactId>maven-jar-plugin</artifactId>
<version>3.4.1</version> <version>3.3.0</version>
</plugin> </plugin>
<plugin> <plugin>
<artifactId>maven-install-plugin</artifactId> <artifactId>maven-install-plugin</artifactId>

View File

@ -1,7 +1,7 @@
/* /*
* Copyright (C) 2020 Alan Orth * Copyright (C) 2020 Alan Orth
* *
* SPDX-License-Identifier: GPL-3.0-only * SPDX-License-Identifier: GPL-3.0-or-later
*/ */
package io.github.ilri.cgspace.ctasks; package io.github.ilri.cgspace.ctasks;

View File

@ -1,7 +1,7 @@
/* /*
* Copyright (C) 2020 Alan Orth * Copyright (C) 2020 Alan Orth
* *
* SPDX-License-Identifier: GPL-3.0-only * SPDX-License-Identifier: GPL-3.0-or-later
*/ */
package io.github.ilri.cgspace.ctasks; package io.github.ilri.cgspace.ctasks;

View File

@ -1,7 +1,7 @@
/* /*
* Copyright (C) 2020 Alan Orth * Copyright (C) 2020 Alan Orth
* *
* SPDX-License-Identifier: GPL-3.0-only * SPDX-License-Identifier: GPL-3.0-or-later
*/ */
package io.github.ilri.cgspace.ctasks; package io.github.ilri.cgspace.ctasks;
@ -26,13 +26,6 @@ import java.util.ArrayList;
import java.util.List; import java.util.List;
import java.util.Objects; import java.util.Objects;
/*
* Add ISO 3166-1 Alpha2 country codes to items based on their existing country metadata.
*
* @author Alan Orth for the International Livestock Research Institute
* @version 7.6.1.2
* @since 5.1
*/
public class CountryCodeTagger extends AbstractCurationTask { public class CountryCodeTagger extends AbstractCurationTask {
public class CountryCodeTaggerConfig { public class CountryCodeTaggerConfig {
private final String isocodesJsonPath = "/io/github/ilri/cgspace/ctasks/iso_3166-1.json"; private final String isocodesJsonPath = "/io/github/ilri/cgspace/ctasks/iso_3166-1.json";
@ -84,6 +77,7 @@ public class CountryCodeTagger extends AbstractCurationTask {
} }
setResult(alpha2Result.getResult()); setResult(alpha2Result.getResult());
report(alpha2Result.getResult());
} }
return alpha2Result.getStatus(); return alpha2Result.getStatus();
@ -92,13 +86,14 @@ public class CountryCodeTagger extends AbstractCurationTask {
public CountryCodeTaggerResult performAlpha2(Item item, CountryCodeTaggerConfig config) public CountryCodeTaggerResult performAlpha2(Item item, CountryCodeTaggerConfig config)
throws IOException, SQLException { throws IOException, SQLException {
CountryCodeTaggerResult alpha2Result = new CountryCodeTaggerResult(); CountryCodeTaggerResult alpha2Result = new CountryCodeTaggerResult();
String itemHandle = item.getHandle();
List<MetadataValue> itemCountries = List<MetadataValue> itemCountries =
itemService.getMetadataByMetadataString(item, config.iso3166Field); itemService.getMetadataByMetadataString(item, config.iso3166Field);
// skip items that don't have country metadata // skip items that don't have country metadata
if (itemCountries.isEmpty()) { if (itemCountries.isEmpty()) {
alpha2Result.setResult("No countries, skipping."); alpha2Result.setResult(itemHandle + ": no countries, skipping.");
alpha2Result.setStatus(Curator.CURATE_SKIP); alpha2Result.setStatus(Curator.CURATE_SKIP);
} else { } else {
Gson gson = new Gson(); Gson gson = new Gson();
@ -177,20 +172,21 @@ public class CountryCodeTagger extends AbstractCurationTask {
itemService.update(Curator.curationContext(), item); itemService.update(Curator.curationContext(), item);
} catch (SQLException | AuthorizeException sqle) { } catch (SQLException | AuthorizeException sqle) {
config.log.debug(sqle.getMessage()); config.log.debug(sqle.getMessage());
alpha2Result.setResult("Error"); alpha2Result.setResult(itemHandle + ": error");
alpha2Result.setStatus(Curator.CURATE_ERROR); alpha2Result.setStatus(Curator.CURATE_ERROR);
} }
alpha2Result.setResult( alpha2Result.setResult(
"Added " itemHandle
+ ": added "
+ newAlpha2Codes.size() + newAlpha2Codes.size()
+ " alpha2 country code(s)"); + " alpha2 country code(s)");
} else { } else {
alpha2Result.setResult("No matching countries found"); alpha2Result.setResult(itemHandle + ": no matching countries found");
} }
alpha2Result.setStatus(Curator.CURATE_SUCCESS); alpha2Result.setStatus(Curator.CURATE_SUCCESS);
} else { } else {
alpha2Result.setResult("Item already has country codes, skipping unless forced"); alpha2Result.setResult(itemHandle + ": item has country codes, skipping");
alpha2Result.setStatus(Curator.CURATE_SKIP); alpha2Result.setStatus(Curator.CURATE_SKIP);
} }
} }

View File

@ -1,7 +1,7 @@
/* /*
* Copyright (C) 2020 Alan Orth * Copyright (C) 2020 Alan Orth
* *
* SPDX-License-Identifier: GPL-3.0-only * SPDX-License-Identifier: GPL-3.0-or-later
*/ */
package io.github.ilri.cgspace.ctasks; package io.github.ilri.cgspace.ctasks;

View File

@ -1,96 +0,0 @@
/*
* Copyright (C) 2024 Alan Orth
*
* SPDX-License-Identifier: GPL-3.0-only
*/
package io.github.ilri.cgspace.ctasks;
import org.dspace.content.DSpaceObject;
import org.dspace.content.Item;
import org.dspace.content.MetadataValue;
import org.dspace.core.Constants;
import org.dspace.curate.AbstractCurationTask;
import org.dspace.curate.Curator;
import org.dspace.curate.Suspendable;
import java.io.IOException;
import java.util.List;
/**
* Attempt to normalize DOIs by stripping whitespace, lower casing, and
* converting to <code>https://doi.org</code> format. The reason is that DOIs are case
* insensitive and must be unique, which we can only guarantee if they are
* normalized to the same format.
*
* See: <a href="https://www.crossref.org/documentation/member-setup/constructing-your-dois/">https://www.crossref.org/documentation/member-setup/constructing-your-dois/</a>
*
* TODO: set curation to failed if invalid DOI submitted (and configure to reject in workflow)
* TODO: allow operation on communities and collections (currently only works on items)
*
* @author Alan Orth for the International Livestock Research Institute
* @version 7.6.1.2
* @since 7.6.1.1
*/
@Suspendable
public class NormalizeDOIs extends AbstractCurationTask {
@Override
public int perform(DSpaceObject dso) throws IOException {
if (dso.getType() == Constants.ITEM) {
Item item = (Item) dso;
String result;
// Keep track of whether we change metadata, and how many
boolean metadataChanged = false;
int count = 0;
// Hard coding the metadata field for now since I can't figure out how to read the taskProperty
List<MetadataValue> itemDOIs = itemService.getMetadataByMetadataString(item, "cg.identifier.doi");
// skip items that don't have DOIs
if (itemDOIs.isEmpty()) {
setResult("No DOIs, skipping");
return Curator.CURATE_SKIP;
} else {
for (MetadataValue itemDOI : itemDOIs) {
String newDOI = getNormalizedDOI(itemDOI);
// Check if the normalized DOI is different than the original
if (!newDOI.equals(itemDOI.getValue())) {
itemDOI.setValue(newDOI);
metadataChanged = true;
count++;
}
}
}
if (metadataChanged) {
result = "Normalized " + count + " DOI(s)";
} else {
result = "All DOIs already normalized";
}
setResult(result);
return Curator.CURATE_SUCCESS;
} else {
setResult("Object skipped");
return Curator.CURATE_SKIP;
}
}
private static String getNormalizedDOI(MetadataValue itemDOI) {
// 1. Convert to lowercase
String newDOI = itemDOI.getValue().toLowerCase();
// 2. Strip leading and trailing whitespace
newDOI = newDOI.strip();
// 3. Convert to HTTPS
newDOI = newDOI.replace("http://", "https://");
// 4. Prefer doi.org to dx.doi.org
newDOI = newDOI.replace("dx.doi.org", "doi.org");
// 5. Replace values like doi: 10.11648/j.jps.20140201.14
newDOI = newDOI.replaceAll("^doi: 10\\.", "https://doi.org/10.");
// 6. Replace values like 10.3390/foods12010115
newDOI = newDOI.replaceAll("^10\\.", "https://doi.org/10.");
return newDOI;
}
}

View File

@ -2,7 +2,6 @@
DSpace curation tasks used on the [CGSpace](https://cgspace.cgiar.org) institutional repository: DSpace curation tasks used on the [CGSpace](https://cgspace.cgiar.org) institutional repository:
- **CountryCodeTagger**: add ISO 3166-1 Alpha2 country codes to items based on their existing country metadata - **CountryCodeTagger**: add ISO 3166-1 Alpha2 country codes to items based on their existing country metadata
- **NormalizeDOIs**: normalize DOIs by stripping whitespace, lowercasing, and converting to https://doi.org/ format
Tested on DSpace 7.6. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC5x/Curation+System). Tested on DSpace 7.6. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC5x/Curation+System).
@ -15,7 +14,7 @@ To use these curation tasks in a DSpace project add the following dependency to
<dependency> <dependency>
<groupId>io.github.ilri.cgspace</groupId> <groupId>io.github.ilri.cgspace</groupId>
<artifactId>cgspace-java-helpers</artifactId> <artifactId>cgspace-java-helpers</artifactId>
<version>7.6.1.2-SNAPSHOT</version> <version>7.6.1-SNAPSHOT</version>
</dependency> </dependency>
``` ```
@ -31,16 +30,15 @@ $ mvn package
Copy the resulting jar to the DSpace `lib` directory: Copy the resulting jar to the DSpace `lib` directory:
``` ```
$ cp target/cgspace-java-helpers-7.6.1.2-SNAPSHOT.jar ~/dspace/lib/ $ cp target/cgspace-java-helpers-7.6.1-SNAPSHOT.jar ~/dspace/lib/
``` ```
## Configuration ## Configuration
Add the curation task(s) to DSpace's `config/modules/curate.cfg`: Add the curation task to DSpace's `config/modules/curate.cfg`:
``` ```
plugin.named.org.dspace.curate.CurationTask = io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger plugin.named.org.dspace.curate.CurationTask = io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger
plugin.named.org.dspace.curate.CurationTask = io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger.force plugin.named.org.dspace.curate.CurationTask = io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger.force
plugin.named.org.dspace.curate.CurationTask = io.github.ilri.cgspace.ctasks.NormalizeDOIs = normalizedois
``` ```
And then add the following variables to your `local.cfg` or some other [configuration file that is included](https://wiki.lyrasis.org/display/DSDOC6x/Configuration+Reference#ConfigurationReference-IncludingotherPropertyFiles): And then add the following variables to your `local.cfg` or some other [configuration file that is included](https://wiki.lyrasis.org/display/DSDOC6x/Configuration+Reference#ConfigurationReference-IncludingotherPropertyFiles):
@ -62,7 +60,7 @@ countrycodetagger.iso3166-alpha2.field = cg.coverage.iso3166-alpha2
Once the jar is installed and you have added appropriate configuration in `~/dspace/config/modules`: Once the jar is installed and you have added appropriate configuration in `~/dspace/config/modules`:
``` ```
$ ~/dspace/bin/dspace curate -e eperson@repo.org -t countrycodetagger -i 10568/3 -r - -s object $ ~/dspace/bin/dspace curate -t countrycodetagger -i 10568/3 -r - -s object
``` ```
*Note*: it is very important to set the database transaction scope to something sensible (`object`) if you're curating a community or collection with more than a few hundred items. *Note*: it is very important to set the database transaction scope to something sensible (`object`) if you're curating a community or collection with more than a few hundred items.

View File

@ -1,7 +1,7 @@
/* /*
* Copyright (C) 2020 Alan Orth * Copyright (C) 2020 Alan Orth
* *
* SPDX-License-Identifier: GPL-3.0-only * SPDX-License-Identifier: GPL-3.0-or-later
*/ */
package io.github.ilri.cgspace.scripts; package io.github.ilri.cgspace.scripts;

View File

@ -1,7 +1,7 @@
/* /*
* Copyright (C) 2022 Alan Orth * Copyright (C) 2022 Alan Orth
* *
* SPDX-License-Identifier: GPL-3.0-only * SPDX-License-Identifier: GPL-3.0-or-later
*/ */
package io.github.ilri.cgspace.scripts; package io.github.ilri.cgspace.scripts;

View File

@ -15,7 +15,7 @@ To use these curation tasks in a DSpace project add the following dependency to
<dependency> <dependency>
<groupId>io.github.ilri.cgspace</groupId> <groupId>io.github.ilri.cgspace</groupId>
<artifactId>cgspace-java-helpers</artifactId> <artifactId>cgspace-java-helpers</artifactId>
<version>7.6.1.2-SNAPSHOT</version> <version>7.6.1-SNAPSHOT</version>
</dependency> </dependency>
``` ```
@ -31,7 +31,7 @@ $ mvn package
Copy the resulting jar to the DSpace `lib` directory: Copy the resulting jar to the DSpace `lib` directory:
```console ```console
$ cp target/cgspace-java-helpers-7.6.1.2-SNAPSHOT.jar ~/dspace/lib/ $ cp target/cgspace-java-helpers-7.6.1-SNAPSHOT.jar ~/dspace/lib/
``` ```
## Invocation ## Invocation