Compare commits

...

38 Commits

Author SHA1 Message Date
Alan Orth 12a606ac61
pom.xml: bump version to 7.6.1.3-SNAPSHOT 2024-05-14 12:47:47 +03:00
Alan Orth 692a62b454
src/main/java: update curation tasks README.md
Add eperson ID to curation invocation. DSpace 7 requires this.
2024-04-29 09:33:39 +03:00
Alan Orth d4ca92066a
Version 7.6.1.2 2024-04-25 12:58:07 +03:00
Alan Orth 5ad8c556e9
src/main/java: simplify curation task results
We don't need to print the Handle because some items can be in the
workflow still so this will be null, but also because DSpace will
already show the Handle in the log before printing the result.
2024-04-25 12:53:15 +03:00
Alan Orth 77425c13bf
src/main/java: remove report() from curation tasks
Results are a single-line status that shows the result of the task,
but reports are like a running log of changes to the item and have
more complicated use cases and configuration requirements.

For now I will disable reports since I'm not using them.
2024-04-25 12:51:30 +03:00
Alan Orth 5e0a456fb5
README.md: fix links 2024-04-23 14:28:27 +03:00
Alan Orth 9050caf37f
Version 7.6.1.1
Unsure of the versioning, but something tells me I should follow
the upstream DSpace versioning to keep things simple.
2024-04-23 13:11:12 +03:00
Alan Orth 639148dc19
src/main/java: minor update to ctasks README.md 2024-04-23 13:08:52 +03:00
Alan Orth 369f81d181
README.md: minor updates 2024-04-23 13:08:34 +03:00
Alan Orth 7a91305742
Add new NormalizeDOIs curation task 2024-04-23 13:07:55 +03:00
Alan Orth b15dd50c16
pom.xml: upgrade all maven plugins to latest 2024-04-23 08:10:05 +03:00
Alan Orth 0c35e81362
pom.xml: compile for Java 11
New as of JDK 9:

> The --release option ensures that the code is compiled following the rules of the programming language of the specified release, and that generated classes target the release as well as the public API of that release. This means that, unlike the -source and -target options, the compiler will detect and generate an error when using APIs that don't exist in previous releases of Java SE.

Also, as of DSpace 7 we the minimum JDK is 11 anyway.

See: https://maven.apache.org/plugins/maven-compiler-plugin/examples/set-compiler-release.html
2024-04-23 08:05:59 +03:00
Alan Orth 2fb8d274c9
pom.xml: fix developer connection scm link
Not sure what this is used for, but the link is wrong.
2024-04-23 07:58:27 +03:00
Alan Orth 169b063e9a
pom.xml: use https for GitHub link 2024-04-23 07:55:53 +03:00
Alan Orth 0cb533b2c4
Fix license headers
I meant to use GPL-3.0-only.
2024-04-22 16:59:12 +03:00
Alan Orth ee6518035e
Bump version to 7.6.1 2024-01-02 20:34:14 +03:00
Alan Orth 14051984f3
pom.xml: downgrade gson to v2.9.0
Downgrade gson to avoid dependency convergence issues in DSpace.
2024-01-02 20:28:19 +03:00
Alan Orth 9faf657c59
Bump version to 7.6-SNAPSHOT 2024-01-02 19:54:46 +03:00
Alan Orth 7fb78c2722
src/main/java: minor refactoring
Suggested by IntelliJ.
2024-01-02 19:34:51 +03:00
Alan Orth 6ef9f521bf
src/main/resources: fix trailing comma in JSON 2024-01-02 18:03:52 +03:00
Alan Orth 1a345de36a
pom.xml: fix missing Handle jar
It seems Handle jars are not published on Maven Central so we get
this error while packaging:

    [ERROR] Failed to execute goal on project cgspace-java-helpers: Could not resolve dependencies for project io.github.ilri.cgspace:cgspace-java-helpers:jar:7.6-SNAPSHOT: net.handle:handle:jar:9.3.0 was not found in https://repo.maven.apache.org/maven2 during a previous attempt. This failure was cached in the local repository and resolution is not reattempted until the update interval of central has elapsed or updates are forced -> [Help 1]

This is probably related to DSpace 7.x using a vanilla Handle jar
instead of the customized one.
2024-01-02 16:56:43 +03:00
Alan Orth eb66ccbd0d
.idea/misc.xml: use Java 17
Latest IDEA configuration after updating settings in the IDE.
2023-12-28 10:49:42 +03:00
Alan Orth 62138540ae
.github/workflows/maven.yml: update setup actions 2023-12-28 10:37:55 +03:00
Alan Orth c0d0e40321
.github/workflows/maven.yml: use Java 17 2023-12-28 10:36:14 +03:00
Alan Orth f2a637f0a8
.github/workflows/maven.yml: dspace7 branch 2023-12-28 10:35:32 +03:00
Alan Orth 6e38a2f7e1
pom.xml: update dependencies
Package builds. Haven't tested releasing.
2023-12-28 10:33:44 +03:00
Alan Orth f9d7e5f6a2
src/main/java: minor refactor
Use isEmpty() instead of checking size.
2023-12-28 10:26:11 +03:00
Alan Orth 9e965afdb7
src/main/java: change getSize() to getSizeBytes()
Apparently this changed in DSpace 7. Untested, but it compiles now.
2023-12-28 10:18:40 +03:00
Alan Orth 408a0e1c19
src/main/java: update log4j usage
Untested, but compiles.
2023-12-28 10:17:24 +03:00
Alan Orth ea9f669e9c
pom.xml: use dspace-api 7.6.1 2023-12-28 10:16:16 +03:00
Alan Orth 546101bc92
CHANGELOG.md: Add notes about new common names 2023-02-26 21:16:46 +03:00
Alan Orth 0a7cf7bf59
Import iso-codes snapshot
After my merge request to Debian's iso-codes package was merged we
now no longer need to maintain local overrides for Iran, Laos, and
Syria, as those are officially in iso-codes.

See: https://salsa.debian.org/iso-codes-team/iso-codes/-/merge_requests/32
2023-02-26 21:13:44 +03:00
Alan Orth 8c0a8fbcd1
Bump version to 6.2-SNAPSHOT
I can't figure out how to get non-snapshot releases on Central.
2023-02-21 10:59:54 +03:00
Alan Orth c05a2e4f96
Version 6.2 2023-02-20 20:37:40 +03:00
Alan Orth cf2af393c0
CHANGELOG.md: add note about iso-codes 4.12.0 2022-11-07 12:23:07 +03:00
Alan Orth 1f6ba4af67
src: import iso-codes 4.12.0
This updates the name for TR from "Turkey" to "Türkiye".

See: https://salsa.debian.org/iso-codes-team/iso-codes/-/blob/main/CHANGELOG.md#4120-2022-11-06
2022-11-07 12:21:39 +03:00
Alan Orth 5ceaebaeae
README.md: add more TODO 2022-10-31 11:49:39 +03:00
Alan Orth f3dcc6e261
pom.xml: bump version to 6.2-SNAPSHOT 2022-10-31 11:47:13 +03:00
16 changed files with 218 additions and 93 deletions

View File

@ -5,9 +5,9 @@ name: Build
on:
push:
branches: [ dspace6 ]
branches: [ dspace7 ]
pull_request:
branches: [ dspace6 ]
branches: [ dspace7 ]
jobs:
build:
@ -15,11 +15,11 @@ jobs:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v3
- name: Set up JDK 8
uses: actions/setup-java@v3
- uses: actions/checkout@v4
- name: Set up JDK 17
uses: actions/setup-java@v4
with:
java-version: 8
java-version: 17
distribution: 'temurin'
cache: 'maven'
- name: Build with Maven

View File

@ -1,11 +1,13 @@
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="ExternalStorageConfigurationManager" enabled="true" />
<component name="MavenProjectsManager">
<option name="originalFiles">
<list>
<option value="$PROJECT_DIR$/pom.xml" />
</list>
</option>
<option name="workspaceImportForciblyTurnedOn" value="true" />
</component>
<component name="ProjectRootManager" version="2" languageLevel="JDK_11" project-jdk-name="11" project-jdk-type="JavaSDK" />
<component name="ProjectRootManager" version="2" languageLevel="JDK_11" project-jdk-name="17" project-jdk-type="JavaSDK" />
</project>

View File

@ -4,6 +4,30 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [7.6.1.2] - 2024-04-25
### Changed
- Remove reporting from curation tasks since "results" are enough
## [7.6.1.1] - 2024-04-23
### Added
- New `NormalizeDOIs` curation task
### Updated
- Update dependencies in `pom.xml`
## [7.6.1] - 2024-01-02
### Changed
- Pin gson dependency to 2.9.0 to avoid dependency convergence issues with DSpace
## [7.6] - 2024-01-02
### Updated
- `iso_3166-1.json` from iso-codes 4.13.0-SNAPSHOT, which [adds common names for Iran, Laos, and Syria](https://salsa.debian.org/iso-codes-team/iso-codes/-/merge_requests/32)
- DSpace 7.6 compatibility
## [6.2] - 2023-02-20
### Updated
- `iso_3166-1.json` from iso-codes 4.12.0, which updates the name for TR to "Türkiye"
## [6.1] - 2022-10-31
### Updated
- Update dependencies in `pom.xml`

View File

@ -4,8 +4,9 @@ DSpace curation tasks and other Java-based helpers used on the [CGSpace](https:/
- **CountryCodeTagger**: add ISO 3166-1 Alpha2 country codes to items based on their existing country metadata
- **FixJpgJpgThumbnails**: fix low-quality ".jpg.jpg" thumbnails by replacing them with their originals
- **FixLowQualityThumbnails**: remove low-quality thumbnails when PDF bitstreams are present
- **NormalizeDOIs**: normalize DOIs by stripping whitespace, lowercasing, and converting to https://doi.org/ format
Tested on DSpace 6.3. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC6x/Curation+System).
Tested on DSpace 7.6. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC7x/Curation+System).
## Build and Install
@ -16,7 +17,7 @@ To use these curation tasks in a DSpace project add the following dependency to
<dependency>
<groupId>io.github.ilri.cgspace</groupId>
<artifactId>cgspace-java-helpers</artifactId>
<version>6.1-SNAPSHOT</version>
<version>7.6.1.2-SNAPSHOT</version>
</dependency>
```
@ -32,18 +33,19 @@ $ mvn package
Copy the resulting jar to the DSpace `lib` directory:
```console
$ cp target/cgspace-java-helpers-6.1-SNAPSHOT.jar ~/dspace/lib/
$ cp target/cgspace-java-helpers-7.6.1.2-SNAPSHOT.jar ~/dspace/lib/
```
## Configuration
Please refer to the appropriate README.md file:
- Curation Tasks: [src/main/java/io/github/ilri/cgspace/ctasks/README.md](https://github.com/ilri/cgspace-java-helpers/blob/dspace6/src/main/java/io/github/ilri/cgspace/ctasks/README.md)
- Scripts: [src/main/java/io/github/ilri/cgspace/scripts/README.md](https://github.com/ilri/cgspace-java-helpers/blob/dspace6/src/main/java/io/github/ilri/cgspace/scripts/README.md)
- Curation Tasks: [src/main/java/io/github/ilri/cgspace/ctasks/README.md](https://github.com/ilri/cgspace-java-helpers/blob/dspace7/src/main/java/io/github/ilri/cgspace/ctasks/README.md)
- Scripts: [src/main/java/io/github/ilri/cgspace/scripts/README.md](https://github.com/ilri/cgspace-java-helpers/blob/dspace7/src/main/java/io/github/ilri/cgspace/scripts/README.md)
## TODO
- Add a curation task to normalize DOIs to "https://doi.org" format
- Migrate from maven-deploy-plugin to nexus-staging-maven-plugin, see: https://central.sonatype.org/publish/publish-maven/#nexus-staging-maven-plugin-for-deployment-and-release
- Stop using oss-parent, see: https://central.sonatype.org/publish/publish-maven/#create-a-ticket-with-sonatype
## Notes
This project was initially created according to the [Maven Getting Started Guide](https://maven.apache.org/guides/getting-started/):

44
pom.xml
View File

@ -6,7 +6,7 @@
<groupId>io.github.ilri.cgspace</groupId>
<artifactId>cgspace-java-helpers</artifactId>
<version>6.1</version>
<version>7.6.1.3-SNAPSHOT</version>
<name>cgspace-java-helpers</name>
<url>https://github.com/ilri/cgspace-java-helpers</url>
@ -14,7 +14,7 @@
<licenses>
<license>
<name>GPL-3.0-only</name>
<url>https://spdx.org/licenses/GPL-3.0-or-later.html</url>
<url>https://spdx.org/licenses/GPL-3.0-only.html</url>
</license>
</licenses>
@ -28,28 +28,27 @@
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<maven.compiler.release>11</maven.compiler.release>
</properties>
<dependencies>
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.9.1</version>
<version>2.9.0</version>
</dependency>
<dependency>
<groupId>org.dspace</groupId>
<artifactId>dspace-api</artifactId>
<version>6.3</version>
<version>7.6.1</version>
<scope>provided</scope>
</dependency>
</dependencies>
<scm>
<connection>scm:git:git://github.com/ilri/cgspace-java-helpers.git</connection>
<developerConnection>scm:git:ssh://github.com:nanosai/cgspace-java-helpers.git</developerConnection>
<url>http://github.com/ilri/cgspace-java-helpers</url>
<developerConnection>scm:git:ssh://github.com:ilri/cgspace-java-helpers.git</developerConnection>
<url>https://github.com/ilri/cgspace-java-helpers</url>
</scm>
<distributionManagement>
@ -69,32 +68,32 @@
<!-- clean lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#clean_Lifecycle -->
<plugin>
<artifactId>maven-clean-plugin</artifactId>
<version>3.2.0</version>
<version>3.3.2</version>
</plugin>
<!-- default lifecycle, jar packaging: see https://maven.apache.org/ref/current/maven-core/default-bindings.html#Plugin_bindings_for_jar_packaging -->
<plugin>
<artifactId>maven-resources-plugin</artifactId>
<version>3.3.0</version>
<version>3.3.1</version>
</plugin>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.10.1</version>
<version>3.13.0</version>
</plugin>
<plugin>
<artifactId>maven-surefire-plugin</artifactId>
<version>3.0.0-M7</version>
<version>3.2.5</version>
</plugin>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<version>3.3.0</version>
<version>3.4.1</version>
</plugin>
<plugin>
<artifactId>maven-install-plugin</artifactId>
<version>3.0.1</version>
<version>3.1.1</version>
</plugin>
<plugin>
<artifactId>maven-deploy-plugin</artifactId>
<version>3.0.0</version>
<version>3.1.1</version>
</plugin>
<!-- site lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#site_Lifecycle -->
<plugin>
@ -103,9 +102,22 @@
</plugin>
<plugin>
<artifactId>maven-project-info-reports-plugin</artifactId>
<version>3.4.1</version>
<version>3.5.0</version>
</plugin>
</plugins>
</pluginManagement>
</build>
<repositories>
<!-- Check Maven Central first (before other repos below) -->
<repository>
<id>maven-central</id>
<url>https://repo.maven.apache.org/maven2</url>
</repository>
<!-- For Handle Server -->
<repository>
<id>handle.net</id>
<url>https://handle.net/maven</url>
</repository>
</repositories>
</project>

View File

@ -1,7 +1,7 @@
/*
* Copyright (C) 2020 Alan Orth
*
* SPDX-License-Identifier: GPL-3.0-or-later
* SPDX-License-Identifier: GPL-3.0-only
*/
package io.github.ilri.cgspace.ctasks;

View File

@ -1,7 +1,7 @@
/*
* Copyright (C) 2020 Alan Orth
*
* SPDX-License-Identifier: GPL-3.0-or-later
* SPDX-License-Identifier: GPL-3.0-only
*/
package io.github.ilri.cgspace.ctasks;
@ -10,14 +10,14 @@ import javax.annotation.Nullable;
public class CountriesVocabulary {
class Country {
private String name; // required
private String common_name; // optional
private String official_name; // optional
private String cgspace_name; // optional
private String numeric; // required Hmmmm need to cast this...
private String alpha_2; // required
private String alpha_3; // required
static class Country {
private final String name; // required
private final String common_name; // optional
private final String official_name; // optional
private final String cgspace_name; // optional
private final String numeric; // required Hmmmm need to cast this...
private final String alpha_2; // required
private final String alpha_3; // required
public Country(
String name,

View File

@ -1,14 +1,15 @@
/*
* Copyright (C) 2020 Alan Orth
*
* SPDX-License-Identifier: GPL-3.0-or-later
* SPDX-License-Identifier: GPL-3.0-only
*/
package io.github.ilri.cgspace.ctasks;
import com.google.gson.Gson;
import org.apache.log4j.Logger;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.dspace.authorize.AuthorizeException;
import org.dspace.content.DSpaceObject;
import org.dspace.content.Item;
@ -23,7 +24,15 @@ import java.io.InputStreamReader;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;
import java.util.Objects;
/*
* Add ISO 3166-1 Alpha2 country codes to items based on their existing country metadata.
*
* @author Alan Orth for the International Livestock Research Institute
* @version 7.6.1.2
* @since 5.1
*/
public class CountryCodeTagger extends AbstractCurationTask {
public class CountryCodeTaggerConfig {
private final String isocodesJsonPath = "/io/github/ilri/cgspace/ctasks/iso_3166-1.json";
@ -33,10 +42,10 @@ public class CountryCodeTagger extends AbstractCurationTask {
private final String iso3166Alpha2Field = taskProperty("iso3166-alpha2.field");
private final boolean forceupdate = taskBooleanProperty("forceupdate", false);
private Logger log = Logger.getLogger(CountryCodeTagger.class);
private final Logger log = LogManager.getLogger();
}
public class CountryCodeTaggerResult {
public static class CountryCodeTaggerResult {
private int status = Curator.CURATE_UNSET;
private String result = null;
@ -75,7 +84,6 @@ public class CountryCodeTagger extends AbstractCurationTask {
}
setResult(alpha2Result.getResult());
report(alpha2Result.getResult());
}
return alpha2Result.getStatus();
@ -84,14 +92,13 @@ public class CountryCodeTagger extends AbstractCurationTask {
public CountryCodeTaggerResult performAlpha2(Item item, CountryCodeTaggerConfig config)
throws IOException, SQLException {
CountryCodeTaggerResult alpha2Result = new CountryCodeTaggerResult();
String itemHandle = item.getHandle();
List<MetadataValue> itemCountries =
itemService.getMetadataByMetadataString(item, config.iso3166Field);
// skip items that don't have country metadata
if (itemCountries.size() == 0) {
alpha2Result.setResult(itemHandle + ": no countries, skipping.");
if (itemCountries.isEmpty()) {
alpha2Result.setResult("No countries, skipping.");
alpha2Result.setStatus(Curator.CURATE_SKIP);
} else {
Gson gson = new Gson();
@ -101,7 +108,7 @@ public class CountryCodeTagger extends AbstractCurationTask {
BufferedReader reader =
new BufferedReader(
new InputStreamReader(
this.getClass().getResourceAsStream(config.isocodesJsonPath)));
Objects.requireNonNull(this.getClass().getResourceAsStream(config.isocodesJsonPath))));
ISO3166CountriesVocabulary isocodesCountriesJson =
gson.fromJson(reader, ISO3166CountriesVocabulary.class);
reader.close();
@ -109,8 +116,8 @@ public class CountryCodeTagger extends AbstractCurationTask {
reader =
new BufferedReader(
new InputStreamReader(
this.getClass()
.getResourceAsStream(config.cgspaceCountriesJsonPath)));
Objects.requireNonNull(this.getClass()
.getResourceAsStream(config.cgspaceCountriesJsonPath))));
CGSpaceCountriesVocabulary cgspaceCountriesJson =
gson.fromJson(reader, CGSpaceCountriesVocabulary.class);
reader.close();
@ -133,7 +140,7 @@ public class CountryCodeTagger extends AbstractCurationTask {
List<MetadataValue> itemAlpha2CountryCodes =
itemService.getMetadataByMetadataString(item, config.iso3166Alpha2Field);
if (itemAlpha2CountryCodes.size() == 0) {
if (itemAlpha2CountryCodes.isEmpty()) {
List<String> newAlpha2Codes = new ArrayList<String>();
for (MetadataValue itemCountry : itemCountries) {
// check ISO 3166-1 countries
@ -157,7 +164,7 @@ public class CountryCodeTagger extends AbstractCurationTask {
}
}
if (newAlpha2Codes.size() > 0) {
if (!newAlpha2Codes.isEmpty()) {
try {
itemService.addMetadata(
Curator.curationContext(),
@ -170,21 +177,20 @@ public class CountryCodeTagger extends AbstractCurationTask {
itemService.update(Curator.curationContext(), item);
} catch (SQLException | AuthorizeException sqle) {
config.log.debug(sqle.getMessage());
alpha2Result.setResult(itemHandle + ": error");
alpha2Result.setResult("Error");
alpha2Result.setStatus(Curator.CURATE_ERROR);
}
alpha2Result.setResult(
itemHandle
+ ": added "
"Added "
+ newAlpha2Codes.size()
+ " alpha2 country code(s)");
} else {
alpha2Result.setResult(itemHandle + ": no matching countries found");
alpha2Result.setResult("No matching countries found");
}
alpha2Result.setStatus(Curator.CURATE_SUCCESS);
} else {
alpha2Result.setResult(itemHandle + ": item has country codes, skipping");
alpha2Result.setResult("Item already has country codes, skipping unless forced");
alpha2Result.setStatus(Curator.CURATE_SKIP);
}
}

View File

@ -1,7 +1,7 @@
/*
* Copyright (C) 2020 Alan Orth
*
* SPDX-License-Identifier: GPL-3.0-or-later
* SPDX-License-Identifier: GPL-3.0-only
*/
package io.github.ilri.cgspace.ctasks;

View File

@ -0,0 +1,96 @@
/*
* Copyright (C) 2024 Alan Orth
*
* SPDX-License-Identifier: GPL-3.0-only
*/
package io.github.ilri.cgspace.ctasks;
import org.dspace.content.DSpaceObject;
import org.dspace.content.Item;
import org.dspace.content.MetadataValue;
import org.dspace.core.Constants;
import org.dspace.curate.AbstractCurationTask;
import org.dspace.curate.Curator;
import org.dspace.curate.Suspendable;
import java.io.IOException;
import java.util.List;
/**
* Attempt to normalize DOIs by stripping whitespace, lower casing, and
* converting to <code>https://doi.org</code> format. The reason is that DOIs are case
* insensitive and must be unique, which we can only guarantee if they are
* normalized to the same format.
*
* See: <a href="https://www.crossref.org/documentation/member-setup/constructing-your-dois/">https://www.crossref.org/documentation/member-setup/constructing-your-dois/</a>
*
* TODO: set curation to failed if invalid DOI submitted (and configure to reject in workflow)
* TODO: allow operation on communities and collections (currently only works on items)
*
* @author Alan Orth for the International Livestock Research Institute
* @version 7.6.1.2
* @since 7.6.1.1
*/
@Suspendable
public class NormalizeDOIs extends AbstractCurationTask {
@Override
public int perform(DSpaceObject dso) throws IOException {
if (dso.getType() == Constants.ITEM) {
Item item = (Item) dso;
String result;
// Keep track of whether we change metadata, and how many
boolean metadataChanged = false;
int count = 0;
// Hard coding the metadata field for now since I can't figure out how to read the taskProperty
List<MetadataValue> itemDOIs = itemService.getMetadataByMetadataString(item, "cg.identifier.doi");
// skip items that don't have DOIs
if (itemDOIs.isEmpty()) {
setResult("No DOIs, skipping");
return Curator.CURATE_SKIP;
} else {
for (MetadataValue itemDOI : itemDOIs) {
String newDOI = getNormalizedDOI(itemDOI);
// Check if the normalized DOI is different than the original
if (!newDOI.equals(itemDOI.getValue())) {
itemDOI.setValue(newDOI);
metadataChanged = true;
count++;
}
}
}
if (metadataChanged) {
result = "Normalized " + count + " DOI(s)";
} else {
result = "All DOIs already normalized";
}
setResult(result);
return Curator.CURATE_SUCCESS;
} else {
setResult("Object skipped");
return Curator.CURATE_SKIP;
}
}
private static String getNormalizedDOI(MetadataValue itemDOI) {
// 1. Convert to lowercase
String newDOI = itemDOI.getValue().toLowerCase();
// 2. Strip leading and trailing whitespace
newDOI = newDOI.strip();
// 3. Convert to HTTPS
newDOI = newDOI.replace("http://", "https://");
// 4. Prefer doi.org to dx.doi.org
newDOI = newDOI.replace("dx.doi.org", "doi.org");
// 5. Replace values like doi: 10.11648/j.jps.20140201.14
newDOI = newDOI.replaceAll("^doi: 10\\.", "https://doi.org/10.");
// 6. Replace values like 10.3390/foods12010115
newDOI = newDOI.replaceAll("^10\\.", "https://doi.org/10.");
return newDOI;
}
}

View File

@ -2,8 +2,9 @@
DSpace curation tasks used on the [CGSpace](https://cgspace.cgiar.org) institutional repository:
- **CountryCodeTagger**: add ISO 3166-1 Alpha2 country codes to items based on their existing country metadata
- **NormalizeDOIs**: normalize DOIs by stripping whitespace, lowercasing, and converting to https://doi.org/ format
Tested on DSpace 6.3. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC5x/Curation+System).
Tested on DSpace 7.6. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC5x/Curation+System).
## Build and Install
@ -14,7 +15,7 @@ To use these curation tasks in a DSpace project add the following dependency to
<dependency>
<groupId>io.github.ilri.cgspace</groupId>
<artifactId>cgspace-java-helpers</artifactId>
<version>6.1-SNAPSHOT</version>
<version>7.6.1.2-SNAPSHOT</version>
</dependency>
```
@ -30,15 +31,16 @@ $ mvn package
Copy the resulting jar to the DSpace `lib` directory:
```
$ cp target/cgspace-java-helpers-6.1-SNAPSHOT.jar ~/dspace/lib/
$ cp target/cgspace-java-helpers-7.6.1.2-SNAPSHOT.jar ~/dspace/lib/
```
## Configuration
Add the curation task to DSpace's `config/modules/curate.cfg`:
Add the curation task(s) to DSpace's `config/modules/curate.cfg`:
```
plugin.named.org.dspace.curate.CurationTask = io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger
plugin.named.org.dspace.curate.CurationTask = io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger.force
plugin.named.org.dspace.curate.CurationTask = io.github.ilri.cgspace.ctasks.NormalizeDOIs = normalizedois
```
And then add the following variables to your `local.cfg` or some other [configuration file that is included](https://wiki.lyrasis.org/display/DSDOC6x/Configuration+Reference#ConfigurationReference-IncludingotherPropertyFiles):
@ -60,7 +62,7 @@ countrycodetagger.iso3166-alpha2.field = cg.coverage.iso3166-alpha2
Once the jar is installed and you have added appropriate configuration in `~/dspace/config/modules`:
```
$ ~/dspace/bin/dspace curate -t countrycodetagger -i 10568/3 -r - -s object
$ ~/dspace/bin/dspace curate -e eperson@repo.org -t countrycodetagger -i 10568/3 -r - -s object
```
*Note*: it is very important to set the database transaction scope to something sensible (`object`) if you're curating a community or collection with more than a few hundred items.

View File

@ -1,7 +1,7 @@
/*
* Copyright (C) 2020 Alan Orth
*
* SPDX-License-Identifier: GPL-3.0-or-later
* SPDX-License-Identifier: GPL-3.0-only
*/
package io.github.ilri.cgspace.scripts;
@ -138,7 +138,7 @@ public class FixJpgJpgThumbnails {
for (Bitstream originalBitstream : originalBundleBitstreams) {
String originalName = originalBitstream.getName();
long originalBitstreamBytes = originalBitstream.getSize();
long originalBitstreamBytes = originalBitstream.getSizeBytes();
/*
- check if the original file name is the same as the thumbnail name minus the extra ".jpg"

View File

@ -1,7 +1,7 @@
/*
* Copyright (C) 2022 Alan Orth
*
* SPDX-License-Identifier: GPL-3.0-or-later
* SPDX-License-Identifier: GPL-3.0-only
*/
package io.github.ilri.cgspace.scripts;

View File

@ -4,7 +4,7 @@ Java-based helpers used on the [CGSpace](https://cgspace.cgiar.org) institutiona
- **FixJpgJpgThumbnails**: fix low-quality ".jpg.jpg" thumbnails by replacing them with their originals
- **FixLowQualityThumbnails**: remove low-quality thumbnails when PDF bitstreams are present
Tested on DSpace 6.3. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC6x/Curation+System).
Tested on DSpace 7.6. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC6x/Curation+System).
## Build and Install
@ -15,7 +15,7 @@ To use these curation tasks in a DSpace project add the following dependency to
<dependency>
<groupId>io.github.ilri.cgspace</groupId>
<artifactId>cgspace-java-helpers</artifactId>
<version>6.1-SNAPSHOT</version>
<version>7.6.1.2-SNAPSHOT</version>
</dependency>
```
@ -31,7 +31,7 @@ $ mvn package
Copy the resulting jar to the DSpace `lib` directory:
```console
$ cp target/cgspace-java-helpers-6.1-SNAPSHOT.jar ~/dspace/lib/
$ cp target/cgspace-java-helpers-7.6.1.2-SNAPSHOT.jar ~/dspace/lib/
```
## Invocation

View File

@ -16,14 +16,6 @@
"name": "Congo, The Democratic Republic of the",
"numeric": "180"
},
{
"alpha_2": "IR",
"alpha_3": "IRN",
"name": "Iran, Islamic Republic of",
"cgspace_name": "Iran",
"numeric": "364",
"official_name": "Islamic Republic of Iran"
},
{
"alpha_2": "KP",
"alpha_3": "PRK",
@ -33,13 +25,6 @@
"numeric": "408",
"official_name": "Democratic People's Republic of Korea"
},
{
"alpha_2": "LA",
"alpha_3": "LAO",
"name": "Lao People's Democratic Republic",
"cgspace_name": "Laos",
"numeric": "418"
},
{
"alpha_2": "FM",
"alpha_3": "FSM",
@ -54,13 +39,6 @@
"name": "Russian Federation",
"cgspace_name": "Russia",
"numeric": "643"
},
{
"alpha_2": "SY",
"alpha_3": "SYR",
"name": "Syrian Arab Republic",
"cgspace_name": "Syria",
"numeric": "760"
}
]
}

View File

@ -821,6 +821,7 @@
{
"alpha_2": "IR",
"alpha_3": "IRN",
"common_name": "Iran",
"flag": "🇮🇷",
"name": "Iran, Islamic Republic of",
"numeric": "364",
@ -953,6 +954,7 @@
{
"alpha_2": "LA",
"alpha_3": "LAO",
"common_name": "Laos",
"flag": "🇱🇦",
"name": "Lao People's Democratic Republic",
"numeric": "418"
@ -1653,6 +1655,7 @@
{
"alpha_2": "SY",
"alpha_3": "SYR",
"common_name": "Syria",
"flag": "🇸🇾",
"name": "Syrian Arab Republic",
"numeric": "760"
@ -1746,9 +1749,9 @@
"alpha_2": "TR",
"alpha_3": "TUR",
"flag": "🇹🇷",
"name": "Turkey",
"name": "Türkiye",
"numeric": "792",
"official_name": "Republic of Turkey"
"official_name": "Republic of Türkiye"
},
{
"alpha_2": "TV",