Commit Graph

78 Commits

Author SHA1 Message Date
Alan Orth d5cf51c464
README.md: Use correct version
I can't figure out how to publish releases on Maven central so let's
stick to SNAPSHOT releases for now.
2020-08-10 15:58:22 +03:00
Alan Orth 98c7cfb3a5
README.md: Make README links shorter 2020-08-10 15:38:04 +03:00
Alan Orth 58365cdfda
Adjust README.md files
We need to try to keep the main README.md clean and move specific
configuration instructions to each separate component.
2020-08-10 15:30:32 +03:00
Alan Orth 7190b751e1
Minor edits to FixJpgJpgThumbnails.java
Use primitive types instead of Java generics when we don't need to
do anything special, and break from the loop once our condition is
set.
2020-08-07 22:18:32 +03:00
Alan Orth 34acc351a5
src/main/java: Add Javadoc stuff to CountryCodeTagger.java 2020-08-07 12:27:44 +03:00
Alan Orth ec293b3b28
Add CHANGELOG.md 2020-08-07 12:25:48 +03:00
Alan Orth 31cd979b61
pom.xml: Move to next development snapshot
Version 5.4-SNAPSHOT
2020-08-07 09:58:00 +03:00
Alan Orth fce81c6003
Version 5.3 2020-08-07 09:57:19 +03:00
Alan Orth 26d3cbd778
src/main/java: Tune FixJpgJpgThumbnails a bit
Make sure we don't modify thumbnails if the item is an Infographic
because the JPG in the ORIGINAL bundle might actually be the "real"
file, in which case the THUMBNAIL bundle would have a legitimate
".jpg.jpg" file.

Also, limit the criteria for replacement to original bitstreams
that are less than 100KiB. In my tests I found that we had 4,022
items with ".jpg.jpg" thumbnails, and the average file size of the
originals in those items was 98KiB. Without considering the large
inforgraphics, which are several megabytes apiece, the average of
the remaining 3,765 originals was ~20KiB so 100KiB should be very
safe.
2020-08-07 09:50:03 +03:00
Alan Orth fdc910f93b
README.md: Update versions 2020-08-06 16:23:17 +03:00
Alan Orth e0d514e797
pom.xml: Move version to 5.3-SNAPSHOT 2020-08-06 16:17:05 +03:00
Alan Orth fd893d8c4e
pom.xml: Release version 5.2 2020-08-06 16:16:13 +03:00
Alan Orth 2263ac27e8
src/main/java: Handle more corner cases in FixJpgJpgThumbnails.java
We should make sure we are catching .JPG and .jpg. Also, we should
check for Generated Thumbnails as well as IM Thumbnail.
2020-08-06 16:13:51 +03:00
Alan Orth cf7012d698
pom.xml: Change version to 5.2-SNAPSHOT 2020-08-06 16:13:27 +03:00
Alan Orth 7edc60e6ca
README.md: Use badge for dspace5 branch 2020-08-06 15:47:33 +03:00
Alan Orth fe2abc86c6
Publish version 5.1 2020-08-06 15:30:36 +03:00
Alan Orth e1d92ef2c7
src/main/java: Add Javadoc tags to FixJpgJpgThumbnails.java
I'm not sure how these are used by anything other than Javadoc, but
it seems useful.
2020-08-06 15:13:33 +03:00
Alan Orth 3e3c544cfa
Rename to cgspace-java-helpers
Now this includes the curation tasks as well as some helper scripts
for general DSpace tasks.
2020-08-06 15:06:42 +03:00
Alan Orth db9881faf6
README.md: Add note about FixJpgJpgThumbnails 2020-08-06 15:05:58 +03:00
Alan Orth fa5fb60b5b
README.md: Version 2020-08-06 14:53:04 +03:00
Alan Orth 44fb9a9f4d
pom.xml: Bump version to 5.1-SNAPSHOT 2020-08-06 14:51:11 +03:00
Alan Orth b790d5e4db
src/main/java: Minimum working version of FixJpgJpgThumbnails
It goes through each item and checks the THUMBNAIL bundle to find
any bitstreams ending in ".jpg.jpg", which indicates that they are
a thumbnail of a thumbnail. For each match it checks the ORIGINAL
bundle for a bitstream with the same name (minus ".jpg") and then
moves it to the THUMBNAIL bundle and deletes the original as well
as the "JpgJpg" thumbnail.
2020-08-06 13:53:58 +03:00
Alan Orth 08e7546a87
RemovePNGThumbnailsForPDFs to FixJpgJpgThumbnails
I want to use this to fix occurences of ".jpg.jpg" thumbnails that
are caused by users uploading manually created JPG thumbnails to
the ORIGINAL bundle, which causes DSpace to create another one in
the THUMBNAIL bundle.
2020-08-06 12:58:37 +03:00
Alan Orth ff076ecf50
Import RemovePNGThumbnailsForPDFs.java
Written by Andrea Schweer under the BSD license. I will use this
as a base to do other thumbnail-related tasks.

See: https://github.com/UoW-IRRs/DSpace-Scripts
2020-08-06 12:51:52 +03:00
Alan Orth 7a5dd1c094
Use 5.0-SNAPSHOT for DSpace 5 version
The DSpace 6 version is in another branch. I decided that I will
use the major from the compatible DSpace version to make it easier
to manage versioning schemes.
2020-08-05 12:42:32 +03:00
Alan Orth 96e4ed6614
Add .idea
Apparently we should track *some* of .idea?
2020-08-04 15:34:31 +03:00
Alan Orth c1f209ef4f
.gitignore: Add target and others 2020-08-04 15:32:34 +03:00
Alan Orth 83602486c0
Use GitHub's JetBrains gitignore
See: https://github.com/github/gitignore/blob/master/Global/JetBrains.gitignore
2020-08-04 15:31:47 +03:00
Alan Orth 28238440a4
Remove IntelliJ IDEA stuff 2020-08-04 15:30:47 +03:00
Alan Orth 7251b85436
cgspace-countries.json: Remove Palestine
It's the same in the ISO 3166-1 list.
2020-08-04 14:52:36 +03:00
Alan Orth a2616460bf
README.md: Use badge from ILRI repository 2020-08-03 14:47:10 +03:00
Alan Orth 26f08e5903
README.md: Update 2020-08-03 14:43:38 +03:00
Alan Orth 50a4f68b9d pom.xml: Add bits for deploying to OSSRH
Any time I run `mvn deploy` it will upload a snapshot to OSSRH with
the version "1.0-SNAPSHOT" and some timestamp. I still haven't fig-
ured out how to "promote a release".
2020-08-03 14:32:54 +03:00
Alan Orth 03bfacf5d3
README.md: Add TravisCI badge 2020-08-03 14:32:31 +03:00
Alan Orth df4d9b313e
Add TravisCI support 2020-08-03 14:29:17 +03:00
Alan Orth 3a6e407765
README.md: Remove TODO about integrating with DSpace
I have now published the code on https://oss.sonatype.org/ via the
Sonatype OSSRH (OSS Repository Hosting) project. Now it is possible
to use it from DSpace's build system by adding it as a dependency
in the dspace/modules/additions/pom.xml.

See: https://issues.sonatype.org/browse/OSSRH-59650
See: https://central.sonatype.org/pages/ossrh-guide.html
2020-08-03 14:20:15 +03:00
Alan Orth af990c2670
README.md: Update mvn note 2020-08-02 23:52:12 +03:00
Alan Orth dcb0532be2
Change groupId to prepare for upload to Central
It's much easier to get your package verified on Central if it uses
a GitHub groupId. Otherwise you need to use DNS verification! This
changes the groupId:

- from: org.cgiar.cgspace.ctask
- to: io.github.ilri.cgspace

Also the package changed as well.

See: https://central.sonatype.org/pages/producers.html
2020-08-02 23:48:13 +03:00
Alan Orth 497ce719c2
README.md: Adjust intro text 2020-08-02 23:20:29 +03:00
Alan Orth 74caed79fa
pom.xml: Use ILRI GitHub 2020-08-02 23:19:32 +03:00
Alan Orth 820e09a08f
pom.xml: Add link to GitHub project 2020-08-02 23:06:50 +03:00
Alan Orth 3a805f9bf2
README.md: Add more documentation and notes 2020-08-02 22:55:23 +03:00
Alan Orth ca7deaac8f
CountryCodeTagger.java: Remove unused variable
Some of the other curation tasks use an array of results.
2020-08-02 22:03:10 +03:00
Alan Orth e158e4bc98
CountryCodeTagger.java: Refactor adding of alpha2 codes
We can append the codes we will add to a List of Strings and then
actually apply them later in one addMetadata call, and update the
item with one item.update() call. This reduces identical code and
is more efficient.

Note that when testing this on a collection with thousands of items
I realized that it is really important to limit both the cache size
as well as set the database transaction model to be per object/item
or else you will crash due to Java heap issues. For example:

    $ ~/dspace/bin/dspace curate -t countrycodetagger -i 10568/3 -r - -l 500 -s object

See: https://wiki.lyrasis.org/display/DSPACE/Curation+Task+Cookbook
2020-08-02 18:33:32 +03:00
Alan Orth 1c866bdf64
src/main/java: Remove unnecessary comments and prints 2020-08-02 18:32:04 +03:00
Alan Orth 28b4707426
README.md: Add TODOs 2020-08-02 15:53:37 +03:00
Alan Orth cc35c45a05
Remove tests
They were automatically generated by Maven and I haven't created
proper ones yet.
2020-08-02 15:52:43 +03:00
Alan Orth e5d45e62be
src/main/java: Refactor CountryCodeTagger.java
Now is much more modular and can easily, cleanly be extended to do
ISO 3166-1 Alpha3, numeric, etc...
2020-08-02 15:51:18 +03:00
Alan Orth a6d3653c9e
README.md: Remove profile todo 2020-08-01 23:39:09 +03:00
Alan Orth 6228f337e9
src/main/java: Skip items that have country codes
Originally I wasn't sure if I was going to try to parse each code,
check them against the mapping, and possibly correct them, but it's
easier to just skip items with codes unless we're in "force" mode.
2020-08-01 23:14:19 +03:00