It goes through each item and checks the THUMBNAIL bundle to find
any bitstreams ending in ".jpg.jpg", which indicates that they are
a thumbnail of a thumbnail. For each match it checks the ORIGINAL
bundle for a bitstream with the same name (minus ".jpg") and then
moves it to the THUMBNAIL bundle and deletes the original as well
as the "JpgJpg" thumbnail.
I want to use this to fix occurences of ".jpg.jpg" thumbnails that
are caused by users uploading manually created JPG thumbnails to
the ORIGINAL bundle, which causes DSpace to create another one in
the THUMBNAIL bundle.
The DSpace 6 version is in another branch. I decided that I will
use the major from the compatible DSpace version to make it easier
to manage versioning schemes.
Any time I run `mvn deploy` it will upload a snapshot to OSSRH with
the version "1.0-SNAPSHOT" and some timestamp. I still haven't fig-
ured out how to "promote a release".
It's much easier to get your package verified on Central if it uses
a GitHub groupId. Otherwise you need to use DNS verification! This
changes the groupId:
- from: org.cgiar.cgspace.ctask
- to: io.github.ilri.cgspace
Also the package changed as well.
See: https://central.sonatype.org/pages/producers.html
We can append the codes we will add to a List of Strings and then
actually apply them later in one addMetadata call, and update the
item with one item.update() call. This reduces identical code and
is more efficient.
Note that when testing this on a collection with thousands of items
I realized that it is really important to limit both the cache size
as well as set the database transaction model to be per object/item
or else you will crash due to Java heap issues. For example:
$ ~/dspace/bin/dspace curate -t countrycodetagger -i 10568/3 -r - -l 500 -s object
See: https://wiki.lyrasis.org/display/DSPACE/Curation+Task+Cookbook
Originally I wasn't sure if I was going to try to parse each code,
check them against the mapping, and possibly correct them, but it's
easier to just skip items with codes unless we're in "force" mode.
The DSpace curation system has task properties that can be used to
create "profiles" of sorts. For example, if you set a custom task
name in curate.cfg:
plugin.named.org.dspace.curate.CurationTask = \
org.cgiar.cgspace.ctasks.CountryCodeTagger = countrycodetagger \
org.cgiar.cgspace.ctasks.CountryCodeTagger = countrycodetagger.force
... then DSpace will look for countrycodetagger.cfg by default, and
countrycodetagger.force.cfg for the second task. We can set different
properties in each one, for example "force=true", and then operate
accordingly in the task when we check the value using taskProperty().
I will use this to force all country tags to be cleared and updated,
where by default we only tag if there are no existing country tags.
See: https://wiki.lyrasis.org/display/DSDOC5x/Curation+System
We can't use the same class to map ISO 3166-1 and CGSpace country
vocabularies because our Gson is old and lacks the support for the
"alternate" value in its annotations (added in Gson 2.5). So it's
better to create multiple classes that extend the base one instead
of creating a custom deserializer. Each extended class then uses
its own Serializedname.
That's the same version that DSpace 5.8 is using so we should use
it here as well so we don't forget. Unfortunately this means that
we can't use the ability to use alternate serializednames. We will
need to create different classes to map to our different JSON files
instead of simply matching different elements on the fly.
Based on Peter's preferred display values for these countries. We
will still use their ISO 3166-1 country codes so we include their
appropriate data from the iso-codes iso_3166-1.json list.
I will use the same format as the ISO 3166-1 JSON to make parsing
easier. I will add a new "cgspace_name" key to indicate our custom
name, though the codes will map to the standard ISO 3166-1 codes.