cgspace-java-helpers

mirror of https://github.com/ilri/cgspace-java-helpers.git synced 2025-08-20 03:35:53 +02:00

Author	SHA1	Message	Date
Alan Orth	2604dc3cce	src: skip Infographics and Maps in FixJpgJpgThumbnails Instead of checking whether they exist and then skipping them just at the moment when we want to swap the bitstreams let's bail early when we know an item is an Infographic or a Map.	2022-10-06 14:15:58 +03:00
Alan Orth	f0754ab419	src: fix npe on null description In FixLowQualityThumbnails we need to make sure that bitstream de- scriptions are not null or empty before trying to evaluate them.	2022-10-05 21:00:14 +03:00
Alan Orth	6772145bec	src: fix SPDX license header Use GPL-3.0-or-later instead of GPL-3.0-only. I had specified this in pom.xml already.	2022-10-05 16:53:00 +03:00
Alan Orth	095f843067	src: add SPDX license headers	2022-10-05 15:48:57 +03:00
Alan Orth	922e3892a7	Update README.md files	2022-10-05 15:24:08 +03:00
Alan Orth	6b648c2c85	src: add FixLowQualityThumbnails.java This adds another script to detect and remove more low-quality thu- mbnails. For example: - If an item has an "IM Thumbnail" and a "Generated Thumbnail" in the THUMBNAIL bundle, remove the "Generated Thumbnail" - If an item has a PDF bitstream and a JPEG bitstream with a name or description "thumbnail" in the ORIGINAL bundle, remove the "thumbnail" bitstream in the ORIGINAL bundle and try to remove the "thumbnail.jpg" bitstream in the THUMBNAIL bundle The idea is that we should always prefer thumbnails generated by ImageMagick from PDFs in the ORIGINAL bundle and should remove any other manually uploaded thumbnails.	2022-10-05 15:07:56 +03:00
Alan Orth	3aa1503163	src: bump version of FixJpgJpgThumbnails.java	2022-10-04 21:13:24 +03:00
Alan Orth	26597e2f8f	Use dcterms.type in FixJpgJpgThumbnails script We are now using dcterms.type instead of dc.type.	2022-10-04 16:16:43 +03:00
Alan Orth	2e779efb14	src/main/java: Adjust curation README DSpace 6 doesn't have the `-l` option to limit the cache size.	2020-08-10 20:04:46 +03:00
Alan Orth	735e759033	Adjust READMEs again...	2020-08-10 17:16:14 +03:00
Alan Orth	271a9ce970	Adjust README.md files	2020-08-10 15:55:11 +03:00
Alan Orth	4bc7971ecb	src/main/java: Remove debug comment	2020-08-07 22:55:35 +03:00
Alan Orth	da1ecad238	src/main/java: DSpace 6 port of FixJpgJpgThumbnails.java Need to use the new DSpace 6 service model in most places. Not sure why addBitstream is no longer public, but removeBitstream is...	2020-08-07 22:45:07 +03:00
Alan Orth	f3ab89f7a1	CountryCodeTagger.java: Port to DSpace 6 We need to use the new DSpace 6 service API. Also, the way we read task properties changes because of the configuration changes. See: https://wiki.lyrasis.org/display/DSDOC6x/Curation+System See: https://wiki.lyrasis.org/display/DSDOC6x/Configuration+Reference	2020-08-05 12:28:37 +03:00
Alan Orth	7251b85436	cgspace-countries.json: Remove Palestine It's the same in the ISO 3166-1 list.	2020-08-04 14:52:36 +03:00
Alan Orth	dcb0532be2	Change groupId to prepare for upload to Central It's much easier to get your package verified on Central if it uses a GitHub groupId. Otherwise you need to use DNS verification! This changes the groupId: - from: org.cgiar.cgspace.ctask - to: io.github.ilri.cgspace Also the package changed as well. See: https://central.sonatype.org/pages/producers.html	2020-08-02 23:48:13 +03:00
Alan Orth	ca7deaac8f	CountryCodeTagger.java: Remove unused variable Some of the other curation tasks use an array of results.	2020-08-02 22:03:10 +03:00
Alan Orth	e158e4bc98	CountryCodeTagger.java: Refactor adding of alpha2 codes We can append the codes we will add to a List of Strings and then actually apply them later in one addMetadata call, and update the item with one item.update() call. This reduces identical code and is more efficient. Note that when testing this on a collection with thousands of items I realized that it is really important to limit both the cache size as well as set the database transaction model to be per object/item or else you will crash due to Java heap issues. For example: $ ~/dspace/bin/dspace curate -t countrycodetagger -i 10568/3 -r - -l 500 -s object See: https://wiki.lyrasis.org/display/DSPACE/Curation+Task+Cookbook	2020-08-02 18:33:32 +03:00
Alan Orth	1c866bdf64	src/main/java: Remove unnecessary comments and prints	2020-08-02 18:32:04 +03:00
Alan Orth	e5d45e62be	src/main/java: Refactor CountryCodeTagger.java Now is much more modular and can easily, cleanly be extended to do ISO 3166-1 Alpha3, numeric, etc...	2020-08-02 15:51:18 +03:00
Alan Orth	6228f337e9	src/main/java: Skip items that have country codes Originally I wasn't sure if I was going to try to parse each code, check them against the mapping, and possibly correct them, but it's easier to just skip items with codes unless we're in "force" mode.	2020-08-01 23:14:19 +03:00
Alan Orth	4b553676dd	src/main/java: Implement task "profiles" The DSpace curation system has task properties that can be used to create "profiles" of sorts. For example, if you set a custom task name in curate.cfg: plugin.named.org.dspace.curate.CurationTask = \ org.cgiar.cgspace.ctasks.CountryCodeTagger = countrycodetagger \ org.cgiar.cgspace.ctasks.CountryCodeTagger = countrycodetagger.force ... then DSpace will look for countrycodetagger.cfg by default, and countrycodetagger.force.cfg for the second task. We can set different properties in each one, for example "force=true", and then operate accordingly in the task when we check the value using taskProperty(). I will use this to force all country tags to be cleared and updated, where by default we only tag if there are no existing country tags. See: https://wiki.lyrasis.org/display/DSDOC5x/Curation+System	2020-08-01 23:04:35 +03:00
Alan Orth	d4cd5bfd61	src/main/java: Optimize imports	2020-08-01 23:03:51 +03:00
Alan Orth	cf73935ea9	src/main/java: Use tokenized alpha2 field parts	2020-08-01 21:02:58 +03:00
Alan Orth	409eb3bd02	src/main/java: Refactor vocabularies classes We can't use the same class to map ISO 3166-1 and CGSpace country vocabularies because our Gson is old and lacks the support for the "alternate" value in its annotations (added in Gson 2.5). So it's better to create multiple classes that extend the base one instead of creating a custom deserializer. Each extended class then uses its own Serializedname.	2020-08-01 20:53:59 +03:00
Alan Orth	98d3d56d78	src/main/java: Fix comment	2020-08-01 20:31:31 +03:00
Alan Orth	fdcd1811a2	src/main/resources: Adjust CGSpace country list Based on Peter's preferred display values for these countries. We will still use their ISO 3166-1 country codes so we include their appropriate data from the iso-codes iso_3166-1.json list.	2020-08-01 11:50:55 +03:00
Alan Orth	4a6edba467	src/main/java: Add cgspace_name to Countries class We will eventually use this to read CGSpace-specific mappings to ISO 3166-1 values.	2020-08-01 11:49:22 +03:00
Alan Orth	b3a993d5bd	src/main/java: Fix comment alignment	2020-08-01 11:46:13 +03:00
Alan Orth	0f2081db51	src/main/java: Correctly map common_name and official_name I forgot to fix these so that they map exactly to the ISO 3166-1 JSON so that GSON can deserialize them automatically.	2020-08-01 11:44:54 +03:00
Alan Orth	91a4367f38	src/main/java: Add comment	2020-08-01 11:01:27 +03:00
Alan Orth	8c23277382	src/main/resources: Start collecting CGSpace countries I will use the same format as the ISO 3166-1 JSON to make parsing easier. I will add a new "cgspace_name" key to indicate our custom name, though the codes will map to the standard ISO 3166-1 codes.	2020-08-01 09:31:26 +03:00
Alan Orth	6477b923b6	Add working tagging of ISO 3166-1 countries If an item has country metadata (cg.coverage.country) and no alpha codes we check for name matches in ISO 3166 and add alpha_2 codes. The name matching checks for a case-insensitive match on either an ISO 3166-1 name, official name, or common name.	2020-08-01 00:05:21 +03:00
Alan Orth	6995d7a864	Match alpha_2 and alpha_3 JSON elements with class For GSON to automatically map these to our class we need to make sure they use the same name.	2020-08-01 00:02:27 +03:00
Alan Orth	edd08c859a	CountryCodeTagger.java: Remove FileReader import We are using an InputStream now.	2020-07-31 23:37:06 +03:00
Alan Orth	94ceabb732	Close BufferedReader after we use it	2020-07-31 22:26:50 +03:00
Alan Orth	9089ffb66f	Add TODO about using try-with-resource This would automatically close the BufferedReader after we are done with it, but it also means that the JSON object we create is lost when we exit the try() scope... See: https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html	2020-07-31 22:26:33 +03:00
Alan Orth	af708933b2	Use BufferedReader for iso-codes JSON	2020-07-31 22:25:09 +03:00
Alan Orth	d11bd00fa9	Use country vocabs from package resources Import a local copy of iso_3166-1.json from iso-codes version 4.5.0 so we don't need to load it from the system. See: https://salsa.debian.org/iso-codes-team/iso-codes	2020-07-31 22:18:32 +03:00
Alan Orth	4cf0626385	Update comments	2020-07-31 22:00:41 +03:00
Alan Orth	f62b50f5a1	Use the @SerializedName annotation for ISO 3166-1 Our Java class needs to match the input JSON structure exactly, but we can't use "3166-1" as a variable name so we tell GSON to use the name "3166-1" when deserializing to countries.	2020-07-31 21:52:48 +03:00
Alan Orth	968bd354fe	Optimize imports	2020-07-31 21:42:41 +03:00
Alan Orth	89f1734a9a	Initial commit	2020-07-31 21:40:15 +03:00

43 Commits