CGSpace Notes

Documenting day-to-day work on the CGSpace repository.

May, 2023

2023-05-03

  • Alliance’s TIP team emailed me to ask about issues authenticating on CGSpace
    • It seems their password expired, which is annoying
  • I continued looking at the CGSpace subjects for the FAO / AGROVOC exercise that I started last week
    • There are many of our subjects that would match if they added a “-” like “high yielding varieties” or used singular…
    • Also I found at least two spelling mistakes, for example “decison support systems”, which would match if it was spelled correctly
  • Work on cleaning, proofing, and uploading twenty-seven records for IFPRI to CGSpace
  • I notice there are a few dozen locks from the dspaceWeb pool that are five days old on CGSpace so I killed them
$ psql < locks-age.sql | grep " days " | awk -F"|" '{print $10}' | sort -u | xargs kill

2023-05-04

  • Sync DSpace Test with CGSpace
  • I replaced one item’s thumbnail with a WebP version and XMLUI displays it fine
  • I spent some time checking the CMYK issue with Arch’s ImageMagick 7 and the Docker container and I think ImageMagick 7 just handles CMYK wrong…
    • libvips does it correctly automatically and looks closer to the PDF
  • Meeting about CG Core types

2023-05-10

  • Write a script to find the metadata_field_id values associated with the non-AGROVOC subjects I am working on for Sara
    • This is useful because we want to know who to contact for a definition
    • The script was:
while read -r subject; do
    metadata_field_id=$(psql -h localhost -U postgres -d dspacetest -qtAX <<SQL
        SELECT DISTINCT(metadata_field_id) FROM metadatavalue WHERE LOWER(text_value)='$subject'
SQL
)
    metadata_field_id=$(echo $metadata_field_id | sed 's/[[:space:]]/||/g')

    echo "$subject,$metadata_field_id"
done < <(csvcut -c 1 ~/Downloads/2023-04-26\ CGIAR\ non-AGROVOC\ subjects.csv | sed 1d)
  • I also realized that Bernard Bett didn’t have any items on CGSpace tagged with his ORCID identifier, so I tagged 230!

2023-05-11

  • CG Core meeting
  • Finalize looking at the CGSpace non-AGROVOC subjects for FAO

2023-05-12

  • Export the Alliance community to do some country/region fixes
    • I also sent Maria and Francesca the export because they want to add more regions and subregions
  • Export the entire CGSpace to check for missing Initiative collection mappings
    • I also adding missing regions

2023-05-16

2023-05-17

  • Re-sync CGSpace to DSpace 7 Test
  • I came up with a naive patch to use WebP instead of JPEG in the DSpace ImageMagick filter, and it works, but doesn’t replace existing JPEGs… hmmm
    • Also, it does PDF to WebP to WebP haha

2023-05-18

  • I created a pull request to improve some minor documentation, typo, and logic issues in the DSpace ImageMagick thumbnail filters
  • I realized that there is a quick win to the generation loss issue with ImageMagickThumbnailFilter
    • We can use ImageMagick’s internal MIFF instead of JPEG when writing the intermediate image
    • According to the libvips author PNG is very slow!
    • I re-ran my generation-loss.sh script using MIFF and found that it had essentially the same results as PNG, which is about 1.1 points higher on the ssimulacra2 (v2.1) scoring scale
    • Also, according to my tests with the cosmo rusage.com utility, I see that MIFF is indeed much faster than PNG
    • I updated my pull request to add this quick win
  • Weekly CG Core types meeting
    • Low attendance so I just kept working on the spreadsheet
    • We are at the stage of voting on definitions

2023-05-19

  • I ported a few of the minor ImageMagick Thumbnail Filter improvements to our 6_x-prod branch

2023-05-20

  • I deployed the latest thumbnail changes on CGSpace, ran all updates, and rebooted it
  • I exported CGSpace to check for missing Initiative mappings
  • Then I started a harvest on AReS