11 Commits

Author SHA1 Message Date
9050caf37f Version 7.6.1.1
Unsure of the versioning, but something tells me I should follow
the upstream DSpace versioning to keep things simple.
2024-04-23 13:11:12 +03:00
639148dc19 src/main/java: minor update to ctasks README.md 2024-04-23 13:08:52 +03:00
369f81d181 README.md: minor updates 2024-04-23 13:08:34 +03:00
7a91305742 Add new NormalizeDOIs curation task 2024-04-23 13:07:55 +03:00
b15dd50c16 pom.xml: upgrade all maven plugins to latest 2024-04-23 08:10:05 +03:00
0c35e81362 pom.xml: compile for Java 11
New as of JDK 9:

> The --release option ensures that the code is compiled following the rules of the programming language of the specified release, and that generated classes target the release as well as the public API of that release. This means that, unlike the -source and -target options, the compiler will detect and generate an error when using APIs that don't exist in previous releases of Java SE.

Also, as of DSpace 7 we the minimum JDK is 11 anyway.

See: https://maven.apache.org/plugins/maven-compiler-plugin/examples/set-compiler-release.html
2024-04-23 08:05:59 +03:00
2fb8d274c9 pom.xml: fix developer connection scm link
Not sure what this is used for, but the link is wrong.
2024-04-23 07:58:27 +03:00
169b063e9a pom.xml: use https for GitHub link 2024-04-23 07:55:53 +03:00
0cb533b2c4 Fix license headers
I meant to use GPL-3.0-only.
2024-04-22 16:59:12 +03:00
ee6518035e Bump version to 7.6.1 2024-01-02 20:34:14 +03:00
14051984f3 pom.xml: downgrade gson to v2.9.0
Downgrade gson to avoid dependency convergence issues in DSpace.
2024-01-02 20:28:19 +03:00
12 changed files with 133 additions and 24 deletions

View File

@ -4,6 +4,17 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [7.6.1.1] - 2024-04-23
### Added
- New `NormalizeDOIs` curation task
### Updated
- Update dependencies in `pom.xml`
## [7.6.1] - 2024-01-02
### Changed
- Pin gson dependency to 2.9.0 to avoid dependency convergence issues with DSpace
## [7.6] - 2024-01-02
### Updated
- `iso_3166-1.json` from iso-codes 4.13.0-SNAPSHOT, which [adds common names for Iran, Laos, and Syria](https://salsa.debian.org/iso-codes-team/iso-codes/-/merge_requests/32)

View File

@ -4,6 +4,7 @@ DSpace curation tasks and other Java-based helpers used on the [CGSpace](https:/
- **CountryCodeTagger**: add ISO 3166-1 Alpha2 country codes to items based on their existing country metadata
- **FixJpgJpgThumbnails**: fix low-quality ".jpg.jpg" thumbnails by replacing them with their originals
- **FixLowQualityThumbnails**: remove low-quality thumbnails when PDF bitstreams are present
- **NormalizeDOIs**: normalize DOIs by stripping whitespace, lowercasing, and converting to https://doi.org/ format
Tested on DSpace 7.6. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC7x/Curation+System).
@ -16,7 +17,7 @@ To use these curation tasks in a DSpace project add the following dependency to
<dependency>
<groupId>io.github.ilri.cgspace</groupId>
<artifactId>cgspace-java-helpers</artifactId>
<version>7.6-SNAPSHOT</version>
<version>7.6.1.1-SNAPSHOT</version>
</dependency>
```
@ -32,7 +33,7 @@ $ mvn package
Copy the resulting jar to the DSpace `lib` directory:
```console
$ cp target/cgspace-java-helpers-7.6-SNAPSHOT.jar ~/dspace/lib/
$ cp target/cgspace-java-helpers-7.6.1.1-SNAPSHOT.jar ~/dspace/lib/
```
## Configuration
@ -43,7 +44,6 @@ Please refer to the appropriate README.md file:
## TODO
- Add a curation task to normalize DOIs to "https://doi.org" format
- Migrate from maven-deploy-plugin to nexus-staging-maven-plugin, see: https://central.sonatype.org/publish/publish-maven/#nexus-staging-maven-plugin-for-deployment-and-release
- Stop using oss-parent, see: https://central.sonatype.org/publish/publish-maven/#create-a-ticket-with-sonatype

19
pom.xml
View File

@ -6,7 +6,7 @@
<groupId>io.github.ilri.cgspace</groupId>
<artifactId>cgspace-java-helpers</artifactId>
<version>7.6-SNAPSHOT</version>
<version>7.6.1.1-SNAPSHOT</version>
<name>cgspace-java-helpers</name>
<url>https://github.com/ilri/cgspace-java-helpers</url>
@ -14,7 +14,7 @@
<licenses>
<license>
<name>GPL-3.0-only</name>
<url>https://spdx.org/licenses/GPL-3.0-or-later.html</url>
<url>https://spdx.org/licenses/GPL-3.0-only.html</url>
</license>
</licenses>
@ -28,15 +28,14 @@
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<maven.compiler.release>11</maven.compiler.release>
</properties>
<dependencies>
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.10.1</version>
<version>2.9.0</version>
</dependency>
<dependency>
<groupId>org.dspace</groupId>
@ -48,8 +47,8 @@
<scm>
<connection>scm:git:git://github.com/ilri/cgspace-java-helpers.git</connection>
<developerConnection>scm:git:ssh://github.com:nanosai/cgspace-java-helpers.git</developerConnection>
<url>http://github.com/ilri/cgspace-java-helpers</url>
<developerConnection>scm:git:ssh://github.com:ilri/cgspace-java-helpers.git</developerConnection>
<url>https://github.com/ilri/cgspace-java-helpers</url>
</scm>
<distributionManagement>
@ -78,15 +77,15 @@
</plugin>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.12.1</version>
<version>3.13.0</version>
</plugin>
<plugin>
<artifactId>maven-surefire-plugin</artifactId>
<version>3.2.3</version>
<version>3.2.5</version>
</plugin>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<version>3.3.0</version>
<version>3.4.1</version>
</plugin>
<plugin>
<artifactId>maven-install-plugin</artifactId>

View File

@ -1,7 +1,7 @@
/*
* Copyright (C) 2020 Alan Orth
*
* SPDX-License-Identifier: GPL-3.0-or-later
* SPDX-License-Identifier: GPL-3.0-only
*/
package io.github.ilri.cgspace.ctasks;

View File

@ -1,7 +1,7 @@
/*
* Copyright (C) 2020 Alan Orth
*
* SPDX-License-Identifier: GPL-3.0-or-later
* SPDX-License-Identifier: GPL-3.0-only
*/
package io.github.ilri.cgspace.ctasks;

View File

@ -1,7 +1,7 @@
/*
* Copyright (C) 2020 Alan Orth
*
* SPDX-License-Identifier: GPL-3.0-or-later
* SPDX-License-Identifier: GPL-3.0-only
*/
package io.github.ilri.cgspace.ctasks;

View File

@ -1,7 +1,7 @@
/*
* Copyright (C) 2020 Alan Orth
*
* SPDX-License-Identifier: GPL-3.0-or-later
* SPDX-License-Identifier: GPL-3.0-only
*/
package io.github.ilri.cgspace.ctasks;

View File

@ -0,0 +1,97 @@
/*
* Copyright (C) 2024 Alan Orth
*
* SPDX-License-Identifier: GPL-3.0-only
*/
package io.github.ilri.cgspace.ctasks;
import org.dspace.content.DSpaceObject;
import org.dspace.content.Item;
import org.dspace.content.MetadataValue;
import org.dspace.core.Constants;
import org.dspace.curate.AbstractCurationTask;
import org.dspace.curate.Curator;
import org.dspace.curate.Suspendable;
import java.io.IOException;
import java.util.List;
/**
* Attempt to normalize DOIs by stripping whitespace, lower casing, and
* converting to <code>https://doi.org</code> format. The reason is that DOIs are case
* insensitive and must be unique, which we can only guarantee if they are
* normalized to the same format.
*
* See: <a href="https://www.crossref.org/documentation/member-setup/constructing-your-dois/">https://www.crossref.org/documentation/member-setup/constructing-your-dois/</a>
*
* TODO: set curation to failed if invalid DOI submitted (and configure to reject in workflow)
* TODO: allow operation on communities and collections (currently only works on items)
*
* @author Alan Orth for the International Livestock Research Institute
* @version 7.6.1.1
* @since 7.6.1.1
*/
@Suspendable
public class NormalizeDOIs extends AbstractCurationTask {
@Override
public int perform(DSpaceObject dso) throws IOException {
if (dso.getType() == Constants.ITEM) {
Item item = (Item) dso;
String result;
// Keep track of whether we change metadata, and how many
boolean metadataChanged = false;
int count = 0;
// Hard coding the metadata field for now since I can't figure out how to read the taskProperty
List<MetadataValue> itemDOIs = itemService.getMetadataByMetadataString(item, "cg.identifier.doi");
// skip items that don't have DOIs
if (itemDOIs.isEmpty()) {
setResult("No DOIs, skipping");
return Curator.CURATE_SKIP;
} else {
for (MetadataValue itemDOI : itemDOIs) {
String newDOI = getNormalizedDOI(itemDOI);
// Check if the normalized DOI is different than the original
if (!newDOI.equals(itemDOI.getValue())) {
itemDOI.setValue(newDOI);
metadataChanged = true;
count++;
}
}
}
if (metadataChanged) {
result = "Normalized " + count + " DOI(s)";
} else {
result = "All DOIs already normalized";
}
report(result);
setResult(result);
return Curator.CURATE_SUCCESS;
} else {
setResult("Object skipped");
return Curator.CURATE_SKIP;
}
}
private static String getNormalizedDOI(MetadataValue itemDOI) {
// 1. Convert to lowercase
String newDOI = itemDOI.getValue().toLowerCase();
// 2. Strip leading and trailing whitespace
newDOI = newDOI.strip();
// 3. Convert to HTTPS
newDOI = newDOI.replace("http://", "https://");
// 4. Prefer doi.org to dx.doi.org
newDOI = newDOI.replace("dx.doi.org", "doi.org");
// 5. Replace values like doi: 10.11648/j.jps.20140201.14
newDOI = newDOI.replaceAll("^doi: 10\\.", "https://doi.org/10.");
// 6. Replace values like 10.3390/foods12010115
newDOI = newDOI.replaceAll("^10\\.", "https://doi.org/10.");
return newDOI;
}
}

View File

@ -2,6 +2,7 @@
DSpace curation tasks used on the [CGSpace](https://cgspace.cgiar.org) institutional repository:
- **CountryCodeTagger**: add ISO 3166-1 Alpha2 country codes to items based on their existing country metadata
- **NormalizeDOIs**: normalize DOIs by stripping whitespace, lowercasing, and converting to https://doi.org/ format
Tested on DSpace 7.6. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC5x/Curation+System).
@ -14,7 +15,7 @@ To use these curation tasks in a DSpace project add the following dependency to
<dependency>
<groupId>io.github.ilri.cgspace</groupId>
<artifactId>cgspace-java-helpers</artifactId>
<version>7.6-SNAPSHOT</version>
<version>7.6.1.1-SNAPSHOT</version>
</dependency>
```
@ -30,15 +31,16 @@ $ mvn package
Copy the resulting jar to the DSpace `lib` directory:
```
$ cp target/cgspace-java-helpers-7.6-SNAPSHOT.jar ~/dspace/lib/
$ cp target/cgspace-java-helpers-7.6.1.1-SNAPSHOT.jar ~/dspace/lib/
```
## Configuration
Add the curation task to DSpace's `config/modules/curate.cfg`:
Add the curation task(s) to DSpace's `config/modules/curate.cfg`:
```
plugin.named.org.dspace.curate.CurationTask = io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger
plugin.named.org.dspace.curate.CurationTask = io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger.force
plugin.named.org.dspace.curate.CurationTask = io.github.ilri.cgspace.ctasks.NormalizeDOIs = normalizedois
```
And then add the following variables to your `local.cfg` or some other [configuration file that is included](https://wiki.lyrasis.org/display/DSDOC6x/Configuration+Reference#ConfigurationReference-IncludingotherPropertyFiles):

View File

@ -1,7 +1,7 @@
/*
* Copyright (C) 2020 Alan Orth
*
* SPDX-License-Identifier: GPL-3.0-or-later
* SPDX-License-Identifier: GPL-3.0-only
*/
package io.github.ilri.cgspace.scripts;

View File

@ -1,7 +1,7 @@
/*
* Copyright (C) 2022 Alan Orth
*
* SPDX-License-Identifier: GPL-3.0-or-later
* SPDX-License-Identifier: GPL-3.0-only
*/
package io.github.ilri.cgspace.scripts;

View File

@ -15,7 +15,7 @@ To use these curation tasks in a DSpace project add the following dependency to
<dependency>
<groupId>io.github.ilri.cgspace</groupId>
<artifactId>cgspace-java-helpers</artifactId>
<version>7.6-SNAPSHOT</version>
<version>7.6.1.1-SNAPSHOT</version>
</dependency>
```
@ -31,7 +31,7 @@ $ mvn package
Copy the resulting jar to the DSpace `lib` directory:
```console
$ cp target/cgspace-java-helpers-7.6-SNAPSHOT.jar ~/dspace/lib/
$ cp target/cgspace-java-helpers-7.6.1.1-SNAPSHOT.jar ~/dspace/lib/
```
## Invocation