Commit Graph

55 Commits

Author SHA1 Message Date
Alan Orth 0a7cf7bf59
Import iso-codes snapshot
After my merge request to Debian's iso-codes package was merged we
now no longer need to maintain local overrides for Iran, Laos, and
Syria, as those are officially in iso-codes.

See: https://salsa.debian.org/iso-codes-team/iso-codes/-/merge_requests/32
2023-02-26 21:13:44 +03:00
Alan Orth 8c0a8fbcd1
Bump version to 6.2-SNAPSHOT
I can't figure out how to get non-snapshot releases on Central.
2023-02-21 10:59:54 +03:00
Alan Orth c05a2e4f96
Version 6.2 2023-02-20 20:37:40 +03:00
Alan Orth 1f6ba4af67
src: import iso-codes 4.12.0
This updates the name for TR from "Turkey" to "Türkiye".

See: https://salsa.debian.org/iso-codes-team/iso-codes/-/blob/main/CHANGELOG.md#4120-2022-11-06
2022-11-07 12:21:39 +03:00
Alan Orth dfaa234a90
src/main/resources: sync cgspace-countries.json with iso-codes
Not sure this is needed, but we copy the JSON object from iso-codes
so we should keep it in sync when there are changes to countries we
override.
2022-10-14 20:49:23 +03:00
Alan Orth f46e81b8cd
src/main/resources: import iso-codes 4.11.0
This is a bit old by now even, but there are two changes:

- South Korea
- North Korea
2022-10-14 20:47:26 +03:00
Alan Orth dbd8721579
src: add better status messages to FixLowQualityThumbnails 2022-10-07 15:33:13 +03:00
Alan Orth 80a336f94d
src: fix context commit in scripts
I was wondering why the same bitstreams appeared to be getting de-
leted on every single run. It turns out that the only mode we were
committing the context in was in single item mode. If the argument
was a site, community, or collection we were updating the item but
not actually committing the changes!
2022-10-07 14:49:58 +03:00
Alan Orth 5ebf4930cf
src: re-organize switch statements in scripts
It makes more sense to me to start from the top level of the hier-
archy.
2022-10-07 13:11:03 +03:00
Alan Orth b396fba043
src: format Java files with google-java-format
Using AOSP format so we get four spaces instead of two.
2022-10-06 14:27:51 +03:00
Alan Orth 38a9cc5188
src: organize imports in VS Code 2022-10-06 14:26:44 +03:00
Alan Orth 16db38967b
src: handle null descriptions in FixJpgJpgThumbnails 2022-10-06 14:17:41 +03:00
Alan Orth 2604dc3cce
src: skip Infographics and Maps in FixJpgJpgThumbnails
Instead of checking whether they exist and then skipping them just
at the moment when we want to swap the bitstreams let's bail early
when we know an item is an Infographic or a Map.
2022-10-06 14:15:58 +03:00
Alan Orth f0754ab419
src: fix npe on null description
In FixLowQualityThumbnails we need to make sure that bitstream de-
scriptions are not null or empty before trying to evaluate them.
2022-10-05 21:00:14 +03:00
Alan Orth 6772145bec
src: fix SPDX license header
Use GPL-3.0-or-later instead of GPL-3.0-only. I had specified this
in pom.xml already.
2022-10-05 16:53:00 +03:00
Alan Orth 095f843067
src: add SPDX license headers 2022-10-05 15:48:57 +03:00
Alan Orth 922e3892a7
Update README.md files 2022-10-05 15:24:08 +03:00
Alan Orth 6b648c2c85
src: add FixLowQualityThumbnails.java
This adds another script to detect and remove more low-quality thu-
mbnails. For example:

- If an item has an "IM Thumbnail" and a "Generated Thumbnail" in the
  THUMBNAIL bundle, remove the "Generated Thumbnail"
- If an item has a PDF bitstream and a JPEG bitstream with a name or
  description "thumbnail" in the ORIGINAL bundle, remove the
  "thumbnail" bitstream in the ORIGINAL bundle and try to remove the
  "thumbnail.jpg" bitstream in the THUMBNAIL bundle

The idea is that we should *always* prefer thumbnails generated by
ImageMagick from PDFs in the ORIGINAL bundle and should remove any
other manually uploaded thumbnails.
2022-10-05 15:07:56 +03:00
Alan Orth 3aa1503163
src: bump version of FixJpgJpgThumbnails.java 2022-10-04 21:13:24 +03:00
Alan Orth 26597e2f8f
Use dcterms.type in FixJpgJpgThumbnails script
We are now using dcterms.type instead of dc.type.
2022-10-04 16:16:43 +03:00
Alan Orth 2e779efb14
src/main/java: Adjust curation README
DSpace 6 doesn't have the `-l` option to limit the cache size.
2020-08-10 20:04:46 +03:00
Alan Orth 735e759033
Adjust READMEs again... 2020-08-10 17:16:14 +03:00
Alan Orth 271a9ce970
Adjust README.md files 2020-08-10 15:55:11 +03:00
Alan Orth 4bc7971ecb
src/main/java: Remove debug comment 2020-08-07 22:55:35 +03:00
Alan Orth da1ecad238
src/main/java: DSpace 6 port of FixJpgJpgThumbnails.java
Need to use the new DSpace 6 service model in most places. Not sure
why addBitstream is no longer public, but removeBitstream is...
2020-08-07 22:45:07 +03:00
Alan Orth f3ab89f7a1
CountryCodeTagger.java: Port to DSpace 6
We need to use the new DSpace 6 service API. Also, the way we read
task properties changes because of the configuration changes.

See: https://wiki.lyrasis.org/display/DSDOC6x/Curation+System
See: https://wiki.lyrasis.org/display/DSDOC6x/Configuration+Reference
2020-08-05 12:28:37 +03:00
Alan Orth 7251b85436
cgspace-countries.json: Remove Palestine
It's the same in the ISO 3166-1 list.
2020-08-04 14:52:36 +03:00
Alan Orth dcb0532be2
Change groupId to prepare for upload to Central
It's much easier to get your package verified on Central if it uses
a GitHub groupId. Otherwise you need to use DNS verification! This
changes the groupId:

- from: org.cgiar.cgspace.ctask
- to: io.github.ilri.cgspace

Also the package changed as well.

See: https://central.sonatype.org/pages/producers.html
2020-08-02 23:48:13 +03:00
Alan Orth ca7deaac8f
CountryCodeTagger.java: Remove unused variable
Some of the other curation tasks use an array of results.
2020-08-02 22:03:10 +03:00
Alan Orth e158e4bc98
CountryCodeTagger.java: Refactor adding of alpha2 codes
We can append the codes we will add to a List of Strings and then
actually apply them later in one addMetadata call, and update the
item with one item.update() call. This reduces identical code and
is more efficient.

Note that when testing this on a collection with thousands of items
I realized that it is really important to limit both the cache size
as well as set the database transaction model to be per object/item
or else you will crash due to Java heap issues. For example:

    $ ~/dspace/bin/dspace curate -t countrycodetagger -i 10568/3 -r - -l 500 -s object

See: https://wiki.lyrasis.org/display/DSPACE/Curation+Task+Cookbook
2020-08-02 18:33:32 +03:00
Alan Orth 1c866bdf64
src/main/java: Remove unnecessary comments and prints 2020-08-02 18:32:04 +03:00
Alan Orth e5d45e62be
src/main/java: Refactor CountryCodeTagger.java
Now is much more modular and can easily, cleanly be extended to do
ISO 3166-1 Alpha3, numeric, etc...
2020-08-02 15:51:18 +03:00
Alan Orth 6228f337e9
src/main/java: Skip items that have country codes
Originally I wasn't sure if I was going to try to parse each code,
check them against the mapping, and possibly correct them, but it's
easier to just skip items with codes unless we're in "force" mode.
2020-08-01 23:14:19 +03:00
Alan Orth 4b553676dd
src/main/java: Implement task "profiles"
The DSpace curation system has task properties that can be used to
create "profiles" of sorts. For example, if you set a custom task
name in curate.cfg:

    plugin.named.org.dspace.curate.CurationTask = \
        org.cgiar.cgspace.ctasks.CountryCodeTagger = countrycodetagger \
        org.cgiar.cgspace.ctasks.CountryCodeTagger = countrycodetagger.force

... then DSpace will look for countrycodetagger.cfg by default, and
countrycodetagger.force.cfg for the second task. We can set different
properties in each one, for example "force=true", and then operate
accordingly in the task when we check the value using taskProperty().

I will use this to force all country tags to be cleared and updated,
where by default we only tag if there are no existing country tags.

See: https://wiki.lyrasis.org/display/DSDOC5x/Curation+System
2020-08-01 23:04:35 +03:00
Alan Orth d4cd5bfd61
src/main/java: Optimize imports 2020-08-01 23:03:51 +03:00
Alan Orth cf73935ea9
src/main/java: Use tokenized alpha2 field parts 2020-08-01 21:02:58 +03:00
Alan Orth 409eb3bd02
src/main/java: Refactor vocabularies classes
We can't use the same class to map ISO 3166-1 and CGSpace country
vocabularies because our Gson is old and lacks the support for the
"alternate" value in its annotations (added in Gson 2.5). So it's
better to create multiple classes that extend the base one instead
of creating a custom deserializer. Each extended class then uses
its own Serializedname.
2020-08-01 20:53:59 +03:00
Alan Orth 98d3d56d78
src/main/java: Fix comment 2020-08-01 20:31:31 +03:00
Alan Orth fdcd1811a2
src/main/resources: Adjust CGSpace country list
Based on Peter's preferred display values for these countries. We
will still use their ISO 3166-1 country codes so we include their
appropriate data from the iso-codes iso_3166-1.json list.
2020-08-01 11:50:55 +03:00
Alan Orth 4a6edba467
src/main/java: Add cgspace_name to Countries class
We will eventually use this to read CGSpace-specific mappings to
ISO 3166-1 values.
2020-08-01 11:49:22 +03:00
Alan Orth b3a993d5bd
src/main/java: Fix comment alignment 2020-08-01 11:46:13 +03:00
Alan Orth 0f2081db51
src/main/java: Correctly map common_name and official_name
I forgot to fix these so that they map exactly to the ISO 3166-1
JSON so that GSON can deserialize them automatically.
2020-08-01 11:44:54 +03:00
Alan Orth 91a4367f38
src/main/java: Add comment 2020-08-01 11:01:27 +03:00
Alan Orth 8c23277382
src/main/resources: Start collecting CGSpace countries
I will use the same format as the ISO 3166-1 JSON to make parsing
easier. I will add a new "cgspace_name" key to indicate our custom
name, though the codes will map to the standard ISO 3166-1 codes.
2020-08-01 09:31:26 +03:00
Alan Orth 6477b923b6
Add working tagging of ISO 3166-1 countries
If an item has country metadata (cg.coverage.country) and no alpha
codes we check for name matches in ISO 3166 and add alpha_2 codes.
The name matching checks for a case-insensitive match on either an
ISO 3166-1 name, official name, or common name.
2020-08-01 00:05:21 +03:00
Alan Orth 6995d7a864
Match alpha_2 and alpha_3 JSON elements with class
For GSON to automatically map these to our class we need to make
sure they use the same name.
2020-08-01 00:02:27 +03:00
Alan Orth edd08c859a
CountryCodeTagger.java: Remove FileReader import
We are using an InputStream now.
2020-07-31 23:37:06 +03:00
Alan Orth 94ceabb732
Close BufferedReader after we use it 2020-07-31 22:26:50 +03:00
Alan Orth 9089ffb66f
Add TODO about using try-with-resource
This would automatically close the BufferedReader after we are done
with it, but it also means that the JSON object we create is lost
when we exit the try() scope...

See: https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html
2020-07-31 22:26:33 +03:00
Alan Orth af708933b2
Use BufferedReader for iso-codes JSON 2020-07-31 22:25:09 +03:00