cgspace-java-helpers/README.md

85 lines
3.8 KiB
Markdown
Raw Normal View History

# DSpace Curation Tasks [![Build Status](https://travis-ci.org/ilri/dspace-curation-tasks.svg?branch=master)](https://travis-ci.org/ilri/dspace-curation-tasks)
2020-07-31 20:40:15 +02:00
Metadata curation tasks used on the [CGSpace](https://cgspace.cgiar.org) institutional repository:
2020-08-02 22:20:29 +02:00
- **CountryCodeTagger**: add ISO 3166-1 Alpha2 country codes to items based on their existing country metadata
2020-07-31 20:40:15 +02:00
2020-08-05 11:40:55 +02:00
Tested on DSpace 6.3. Read more about the [DSpace curation system](https://wiki.lyrasis.org/display/DSDOC5x/Curation+System).
2020-07-31 20:40:15 +02:00
## Build and Install
2020-08-03 13:43:38 +02:00
### Integrate into DSpace Build
To use these curation tasks in a DSpace project add the following dependency to `dspace/modules/additions/pom.xml`:
```
<dependency>
<groupId>io.github.ilri.cgspace</groupId>
<artifactId>dspace-curation-tasks</artifactId>
2020-08-05 11:40:55 +02:00
<version>6.0-SNAPSHOT</version>
2020-08-03 13:43:38 +02:00
</dependency>
```
The jar will be copied to all DSpace applications.
### Manual Build and Install
To build the standalone jar:
```
$ mvn package
```
Copy the resulting jar to the DSpace `lib` directory:
```
2020-08-05 11:40:55 +02:00
$ cp target/dspace-curation-tasks-6.0-SNAPSHOT.jar ~/dspace/lib/dspace-curation-tasks-6.0-SNAPSHOT.jar
```
2020-08-03 13:43:38 +02:00
## Configuration
Add the curation task to DSpace's `config/modules/curate.cfg`:
```
2020-08-05 11:40:55 +02:00
plugin.named.org.dspace.curate.CurationTask = io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger
plugin.named.org.dspace.curate.CurationTask = io.github.ilri.cgspace.ctasks.CountryCodeTagger = countrycodetagger.force
2020-08-03 13:43:38 +02:00
```
2020-08-05 11:40:55 +02:00
And then add the following variables to your `local.cfg` or some other [configuration file that is included](https://wiki.lyrasis.org/display/DSDOC6x/Configuration+Reference#ConfigurationReference-IncludingotherPropertyFiles):
2020-08-03 13:43:38 +02:00
```
# name of the field containing ISO 3166-1 country names
2020-08-05 11:40:55 +02:00
countrycodetagger.iso3166.field = cg.coverage.country
2020-08-03 13:43:38 +02:00
# name of the field containing ISO 3166-1 Alpha2 country codes
2020-08-05 11:40:55 +02:00
countrycodetagger.iso3166-alpha2.field = cg.coverage.iso3166-alpha2
2020-08-03 13:43:38 +02:00
# only add country codes if an item doesn't have any (default false)
2020-08-05 11:40:55 +02:00
#countrycodetagger.forceupdate = false
2020-08-03 13:43:38 +02:00
```
2020-08-05 11:40:55 +02:00
*Note*: DSpace's curation system supports "profiles" where you can use the same task with different options, for example above I have a normal country code tagger task and a "force" variant. The "force" variant is the same task, but it looks for configuration variables using the `countrycodetagger.force` instead. To use the "force" variant you simply need to add these new variables with the `forceupdate` parameter overridden to the same configuration file where you put the other variables. The "force" profile clears all existing country codes and updates everything.
2020-08-03 13:43:38 +02:00
## Invocation
Once the jar is installed and you have added appropriate configuration in `~/dspace/config/modules`:
```
$ ~/dspace/bin/dspace curate -t countrycodetagger -i 10568/3 -r - -l 500 -s object
```
2020-08-03 13:43:38 +02:00
*Note*: it is very important to set the cache limit (`-l`) and the database transaction scope to something sensible (`object`) if you're curating a community or collection with more than a few hundred items.
2020-07-31 20:40:15 +02:00
## Notes
This project was initially created according to the [Maven Getting Started Guide](https://maven.apache.org/guides/getting-started/):
```console
2020-08-02 22:52:12 +02:00
$ mvn -B archetype:generate -DgroupId=io.github.ilri.cgspace -DartifactId=dspace-curation-tasks -DarchetypeArtifactId=maven-archetype-quickstart -DarchetypeVersion=1.4
2020-07-31 20:40:15 +02:00
```
2020-08-03 13:43:38 +02:00
## TODO
2020-08-01 20:56:36 +02:00
- Make sure this doesn't work on items in the workflow
2020-08-02 14:53:37 +02:00
- Check for existence of metadata field before trying to add metadata
- Add tests
2020-07-31 20:40:15 +02:00
## License
This work is licensed under the [GPLv3](https://www.gnu.org/licenses/gpl-3.0.en.html).
This repository contains data from the [Debian iso-codes project](https://salsa.debian.org/iso-codes-team/iso-codes) project, which is licensed under the [GNU Lesser General Public License v2.1](https://salsa.debian.org/iso-codes-team/iso-codes/-/blob/main/COPYING).