Compare commits

...

27 Commits

Author SHA1 Message Date
13c6612c7f
Update gson to version used by dspace-api 2025-04-12 20:19:50 +03:00
813517c789
README.md: bump tested version 2025-02-13 09:51:09 +03:00
5f9490e4e5
Use dspace-api 7.6.3 2025-02-12 15:16:45 +03:00
9a46416331
Use gson 2.10.1
Prevent dependency convergence.
2025-01-28 16:19:50 +03:00
2be5c62d92
CHANGELOG.md: add changes 2025-01-27 16:03:52 +03:00
2bd7d5e679
src/main: update DSDOC links 2025-01-27 16:03:12 +03:00
70cf68b8bc
Update tested on versions 2025-01-27 16:01:39 +03:00
4f81e1e17e
pom.xml: use gson >= 2.10
This is used by dspace-api 7.6.2+.
2025-01-27 13:24:46 +03:00
5113a91257
pom.xml: use dspace-api 7.6.2 2025-01-27 13:24:17 +03:00
3c36452891
Update "tested on" versions. 2024-06-26 16:45:11 +03:00
3a860dabe4
Update install instructions 2024-06-26 16:42:30 +03:00
5f44c9ea8a
README.md: remove TODO about migrating to nexus-staging-maven-plugin 2024-06-26 16:40:47 +03:00
32a14c0ea5
pom.xml: replace maven-deploy-plugin
The nexus-staging-maven-plugin replaces maven-deploy-plugin. I am
not sure if my configuration is correct yet.

See: https://github.com/sonatype/nexus-maven-plugins/tree/main/staging/maven-plugin
2024-06-26 16:29:35 +03:00
13d3dfb885
pom.xml: add more information
Add description and developers section to satisfy requirements.

See: https://central.sonatype.org/publish/requirements/
2024-06-26 16:10:36 +03:00
1e7df1ce46
Remove use of oss-parent
This is boilerplate that came from setting up the project and has
been deprecated for several years.
2024-06-26 16:04:14 +03:00
443e5576ab
Bump version to 7.6.1.4-SNAPSHOT 2024-06-26 15:02:07 +03:00
8531992412
Version 7.6.1.3 2024-06-26 15:00:25 +03:00
27016f5f77
CHANGELOG.md: add unreleased notes 2024-06-26 14:12:37 +03:00
3a583c4f86
src/main/java: more DOI normalization
Normalize %2f to /.
2024-06-26 12:46:08 +03:00
28668f76c9
src/main: remove numbered comments in NormalizeDOIs 2024-06-25 11:55:36 +03:00
e0153fd38a
src/main: add more DOI formats to NormalizeDOIs
I saw some DOIs like "www.doi.org" in our repository recently.
2024-06-25 11:42:37 +03:00
12a606ac61
pom.xml: bump version to 7.6.1.3-SNAPSHOT 2024-05-14 12:47:47 +03:00
692a62b454
src/main/java: update curation tasks README.md
Add eperson ID to curation invocation. DSpace 7 requires this.
2024-04-29 09:33:39 +03:00
d4ca92066a
Version 7.6.1.2 2024-04-25 12:58:07 +03:00
5ad8c556e9
src/main/java: simplify curation task results
We don't need to print the Handle because some items can be in the
workflow still so this will be null, but also because DSpace will
already show the Handle in the log before printing the result.
2024-04-25 12:53:15 +03:00
77425c13bf
src/main/java: remove report() from curation tasks
Results are a single-line status that shows the result of the task,
but reports are like a running log of changes to the item and have
more complicated use cases and configuration requirements.

For now I will disable reports since I'm not using them.
2024-04-25 12:51:30 +03:00
5e0a456fb5
README.md: fix links 2024-04-23 14:28:27 +03:00
7 changed files with 79 additions and 49 deletions

View File

@ -4,6 +4,19 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## Unreleased
### Updated
- Update dspace-api dependency to 7.6.3
- Update gson dependency to 2.11.0 to match dspace-api
## [7.6.1.3] - 2024-06-26
### Updated
- Add more formats to `NormalizeDOIs` curation task
## [7.6.1.2] - 2024-04-25
### Changed
- Remove reporting from curation tasks since "results" are enough
## [7.6.1.1] - 2024-04-23
### Added
- New `NormalizeDOIs` curation task

View File

@ -6,7 +6,7 @@ DSpace curation tasks and other Java-based helpers used on the [CGSpace](https:/
- **FixLowQualityThumbnails**: remove low-quality thumbnails when PDF bitstreams are present
- **NormalizeDOIs**: normalize DOIs by stripping whitespace, lowercasing, and converting to https://doi.org/ format
Tested on DSpace 7.6. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC7x/Curation+System).
Tested on DSpace 7.6.3. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC7x/Curation+System).
## Build and Install
@ -17,7 +17,7 @@ To use these curation tasks in a DSpace project add the following dependency to
<dependency>
<groupId>io.github.ilri.cgspace</groupId>
<artifactId>cgspace-java-helpers</artifactId>
<version>7.6.1.1-SNAPSHOT</version>
<version>7.6.1.4-SNAPSHOT</version>
</dependency>
```
@ -33,19 +33,14 @@ $ mvn package
Copy the resulting jar to the DSpace `lib` directory:
```console
$ cp target/cgspace-java-helpers-7.6.1.1-SNAPSHOT.jar ~/dspace/lib/
$ cp target/cgspace-java-helpers-7.6.1.4-SNAPSHOT.jar ~/dspace/lib/
```
## Configuration
Please refer to the appropriate README.md file:
- Curation Tasks: [src/main/java/io/github/ilri/cgspace/ctasks/README.md](https://github.com/ilri/cgspace-java-helpers/blob/dspace6/src/main/java/io/github/ilri/cgspace/ctasks/README.md)
- Scripts: [src/main/java/io/github/ilri/cgspace/scripts/README.md](https://github.com/ilri/cgspace-java-helpers/blob/dspace6/src/main/java/io/github/ilri/cgspace/scripts/README.md)
## TODO
- Migrate from maven-deploy-plugin to nexus-staging-maven-plugin, see: https://central.sonatype.org/publish/publish-maven/#nexus-staging-maven-plugin-for-deployment-and-release
- Stop using oss-parent, see: https://central.sonatype.org/publish/publish-maven/#create-a-ticket-with-sonatype
- Curation Tasks: [src/main/java/io/github/ilri/cgspace/ctasks/README.md](https://github.com/ilri/cgspace-java-helpers/blob/dspace7/src/main/java/io/github/ilri/cgspace/ctasks/README.md)
- Scripts: [src/main/java/io/github/ilri/cgspace/scripts/README.md](https://github.com/ilri/cgspace-java-helpers/blob/dspace7/src/main/java/io/github/ilri/cgspace/scripts/README.md)
## Notes
This project was initially created according to the [Maven Getting Started Guide](https://maven.apache.org/guides/getting-started/):

45
pom.xml
View File

@ -6,10 +6,19 @@
<groupId>io.github.ilri.cgspace</groupId>
<artifactId>cgspace-java-helpers</artifactId>
<version>7.6.1.1-SNAPSHOT</version>
<version>7.6.1.4-SNAPSHOT</version>
<name>cgspace-java-helpers</name>
<url>https://github.com/ilri/cgspace-java-helpers</url>
<description>Curation tasks and helper scripts for the CGSpace institutional repository</description>
<developers>
<developer>
<name>Alan Orth</name>
<email>maven@mjanja.mozmail.com</email>
<organizationUrl>https://mjanja.ch</organizationUrl>
</developer>
</developers>
<licenses>
<license>
@ -18,14 +27,6 @@
</license>
</licenses>
<!-- brings the sonatype snapshot repository and signing requirement on board -->
<parent>
<groupId>org.sonatype.oss</groupId>
<artifactId>oss-parent</artifactId>
<version>9</version>
<relativePath />
</parent>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.release>11</maven.compiler.release>
@ -35,12 +36,19 @@
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.9.0</version>
<version>2.11.0</version>
<!-- Ignore gson's dependency on error_prone_annotations because it causes dependency convergence with something pulled in by dspace-api -->
<exclusions>
<exclusion>
<groupId>com.google.errorprone</groupId>
<artifactId>error_prone_annotations</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.dspace</groupId>
<artifactId>dspace-api</artifactId>
<version>7.6.1</version>
<version>7.6.3</version>
<scope>provided</scope>
</dependency>
</dependencies>
@ -91,10 +99,6 @@
<artifactId>maven-install-plugin</artifactId>
<version>3.1.1</version>
</plugin>
<plugin>
<artifactId>maven-deploy-plugin</artifactId>
<version>3.1.1</version>
</plugin>
<!-- site lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#site_Lifecycle -->
<plugin>
<artifactId>maven-site-plugin</artifactId>
@ -104,6 +108,17 @@
<artifactId>maven-project-info-reports-plugin</artifactId>
<version>3.5.0</version>
</plugin>
<plugin>
<groupId>org.sonatype.plugins</groupId>
<artifactId>nexus-staging-maven-plugin</artifactId>
<version>1.7.0</version>
<extensions>true</extensions>
<configuration>
<serverId>ossrh</serverId>
<nexusUrl>https://oss.sonatype.org/</nexusUrl>
<autoReleaseAfterClose>true</autoReleaseAfterClose>
</configuration>
</plugin>
</plugins>
</pluginManagement>
</build>

View File

@ -26,6 +26,13 @@ import java.util.ArrayList;
import java.util.List;
import java.util.Objects;
/*
* Add ISO 3166-1 Alpha2 country codes to items based on their existing country metadata.
*
* @author Alan Orth for the International Livestock Research Institute
* @version 7.6.1.2
* @since 5.1
*/
public class CountryCodeTagger extends AbstractCurationTask {
public class CountryCodeTaggerConfig {
private final String isocodesJsonPath = "/io/github/ilri/cgspace/ctasks/iso_3166-1.json";
@ -77,7 +84,6 @@ public class CountryCodeTagger extends AbstractCurationTask {
}
setResult(alpha2Result.getResult());
report(alpha2Result.getResult());
}
return alpha2Result.getStatus();
@ -86,14 +92,13 @@ public class CountryCodeTagger extends AbstractCurationTask {
public CountryCodeTaggerResult performAlpha2(Item item, CountryCodeTaggerConfig config)
throws IOException, SQLException {
CountryCodeTaggerResult alpha2Result = new CountryCodeTaggerResult();
String itemHandle = item.getHandle();
List<MetadataValue> itemCountries =
itemService.getMetadataByMetadataString(item, config.iso3166Field);
// skip items that don't have country metadata
if (itemCountries.isEmpty()) {
alpha2Result.setResult(itemHandle + ": no countries, skipping.");
alpha2Result.setResult("No countries, skipping.");
alpha2Result.setStatus(Curator.CURATE_SKIP);
} else {
Gson gson = new Gson();
@ -172,21 +177,20 @@ public class CountryCodeTagger extends AbstractCurationTask {
itemService.update(Curator.curationContext(), item);
} catch (SQLException | AuthorizeException sqle) {
config.log.debug(sqle.getMessage());
alpha2Result.setResult(itemHandle + ": error");
alpha2Result.setResult("Error");
alpha2Result.setStatus(Curator.CURATE_ERROR);
}
alpha2Result.setResult(
itemHandle
+ ": added "
"Added "
+ newAlpha2Codes.size()
+ " alpha2 country code(s)");
} else {
alpha2Result.setResult(itemHandle + ": no matching countries found");
alpha2Result.setResult("No matching countries found");
}
alpha2Result.setStatus(Curator.CURATE_SUCCESS);
} else {
alpha2Result.setResult(itemHandle + ": item has country codes, skipping");
alpha2Result.setResult("Item already has country codes, skipping unless forced");
alpha2Result.setStatus(Curator.CURATE_SKIP);
}
}

View File

@ -29,7 +29,7 @@ import java.util.List;
* TODO: allow operation on communities and collections (currently only works on items)
*
* @author Alan Orth for the International Livestock Research Institute
* @version 7.6.1.1
* @version 7.6.1.3
* @since 7.6.1.1
*/
@Suspendable
@ -68,7 +68,6 @@ public class NormalizeDOIs extends AbstractCurationTask {
} else {
result = "All DOIs already normalized";
}
report(result);
setResult(result);
return Curator.CURATE_SUCCESS;
@ -79,17 +78,21 @@ public class NormalizeDOIs extends AbstractCurationTask {
}
private static String getNormalizedDOI(MetadataValue itemDOI) {
// 1. Convert to lowercase
// Convert to lowercase
String newDOI = itemDOI.getValue().toLowerCase();
// 2. Strip leading and trailing whitespace
// Strip leading and trailing whitespace
newDOI = newDOI.strip();
// 3. Convert to HTTPS
// Convert to HTTPS
newDOI = newDOI.replace("http://", "https://");
// 4. Prefer doi.org to dx.doi.org
// Prefer doi.org to dx.doi.org
newDOI = newDOI.replace("dx.doi.org", "doi.org");
// 5. Replace values like doi: 10.11648/j.jps.20140201.14
// Prefer doi.org to www.doi.org
newDOI = newDOI.replace("www.doi.org", "doi.org");
// Fix URL encoded slashes (%2f)
newDOI = newDOI.replace("%2f", "/");
// Replace values like doi: 10.11648/j.jps.20140201.14
newDOI = newDOI.replaceAll("^doi: 10\\.", "https://doi.org/10.");
// 6. Replace values like 10.3390/foods12010115
// Replace values like 10.3390/foods12010115
newDOI = newDOI.replaceAll("^10\\.", "https://doi.org/10.");
return newDOI;

View File

@ -4,7 +4,7 @@ DSpace curation tasks used on the [CGSpace](https://cgspace.cgiar.org) instituti
- **CountryCodeTagger**: add ISO 3166-1 Alpha2 country codes to items based on their existing country metadata
- **NormalizeDOIs**: normalize DOIs by stripping whitespace, lowercasing, and converting to https://doi.org/ format
Tested on DSpace 7.6. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC5x/Curation+System).
Tested on DSpace 7.6.3. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC7x/Curation+System).
## Build and Install
@ -15,7 +15,7 @@ To use these curation tasks in a DSpace project add the following dependency to
<dependency>
<groupId>io.github.ilri.cgspace</groupId>
<artifactId>cgspace-java-helpers</artifactId>
<version>7.6.1.1-SNAPSHOT</version>
<version>7.6.1.4-SNAPSHOT</version>
</dependency>
```
@ -31,7 +31,7 @@ $ mvn package
Copy the resulting jar to the DSpace `lib` directory:
```
$ cp target/cgspace-java-helpers-7.6.1.1-SNAPSHOT.jar ~/dspace/lib/
$ cp target/cgspace-java-helpers-7.6.1.4-SNAPSHOT.jar ~/dspace/lib/
```
## Configuration
@ -43,7 +43,7 @@ plugin.named.org.dspace.curate.CurationTask = io.github.ilri.cgspace.ctasks.Coun
plugin.named.org.dspace.curate.CurationTask = io.github.ilri.cgspace.ctasks.NormalizeDOIs = normalizedois
```
And then add the following variables to your `local.cfg` or some other [configuration file that is included](https://wiki.lyrasis.org/display/DSDOC6x/Configuration+Reference#ConfigurationReference-IncludingotherPropertyFiles):
And then add the following variables to your `local.cfg` or some other [configuration file that is included](https://wiki.lyrasis.org/display/DSDOC7x/Configuration+Reference#ConfigurationReference-IncludingotherPropertyFiles):
```
# name of the field containing ISO 3166-1 country names
@ -62,7 +62,7 @@ countrycodetagger.iso3166-alpha2.field = cg.coverage.iso3166-alpha2
Once the jar is installed and you have added appropriate configuration in `~/dspace/config/modules`:
```
$ ~/dspace/bin/dspace curate -t countrycodetagger -i 10568/3 -r - -s object
$ ~/dspace/bin/dspace curate -e eperson@repo.org -t countrycodetagger -i 10568/3 -r - -s object
```
*Note*: it is very important to set the database transaction scope to something sensible (`object`) if you're curating a community or collection with more than a few hundred items.

View File

@ -4,7 +4,7 @@ Java-based helpers used on the [CGSpace](https://cgspace.cgiar.org) institutiona
- **FixJpgJpgThumbnails**: fix low-quality ".jpg.jpg" thumbnails by replacing them with their originals
- **FixLowQualityThumbnails**: remove low-quality thumbnails when PDF bitstreams are present
Tested on DSpace 7.6. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC6x/Curation+System).
Tested on DSpace 7.6.3. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC7x/Curation+System).
## Build and Install
@ -15,7 +15,7 @@ To use these curation tasks in a DSpace project add the following dependency to
<dependency>
<groupId>io.github.ilri.cgspace</groupId>
<artifactId>cgspace-java-helpers</artifactId>
<version>7.6.1.1-SNAPSHOT</version>
<version>7.6.1.4-SNAPSHOT</version>
</dependency>
```
@ -31,7 +31,7 @@ $ mvn package
Copy the resulting jar to the DSpace `lib` directory:
```console
$ cp target/cgspace-java-helpers-7.6.1.1-SNAPSHOT.jar ~/dspace/lib/
$ cp target/cgspace-java-helpers-7.6.1.4-SNAPSHOT.jar ~/dspace/lib/
```
## Invocation