cgspace-notes/content/posts/2022-11.md

122 lines
6.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "November, 2022"
date: 2022-11-01T09:11:36+03:00
author: "Alan Orth"
categories: ["Notes"]
---
## 2022-11-01
- Last night I re-synced DSpace 7 Test from CGSpace
- I also updated all my local `7_x-dev` branches on the latest upstreams
- I spent some time updating the authorizations in Alliance collections
- I want to make sure they use groups instead of individuals where possible!
- I reverted the Cocoon autosave change because it was more of a nuissance that Peter can't upload CSVs from the web interface and is a very low severity security issue
<!--more-->
- I ran FixLowQualityThumbnails from cgspace-java-helpers on some large collections on CGSpace and ended up fixing 194 items!
- I did some minor checking and uploaded twenty-four IFPRI outputs for the Initiatives to DSpace Test
- Tim merged my [pull request to override the ImageMagick PDF density in DSpace 7](https://github.com/DSpace/DSpace/pull/8553)
- I ported it to DSpace 6.x and submitted a pull request: https://github.com/DSpace/DSpace/pull/8560
## 2022-11-02
- I joined the FAOCGIAR AGROVOC results sharing meeting
- From June to October, 2022 we suggested 39 new keywords, added 27 to AGROVOC, 4 rejected, and 9 still under discussion
- Doing duplicate check on IFPRI's batch upload and I found one duplicate uploaded by IWMI earlier this year
- I will update the metadata of that item and map it to the correct Initiative collection
## 2022-11-03
- I added countries to the twenty-three IFPRI items in OpenRefine based on their titles and abstracts (using the Jython trick I learned a few months ago), then added regions using csv-metadata-quality, and uploaded them to CGSpace
- I exported a list of collections from CGSpace so I can run the thumbnail fixes on each, as we seem to have issues when doing it on (some) large communities like the CRP community:
```console
localhost/dspace= ☘ \COPY (SELECT ds6_collection2collectionhandle(uuid) AS collection FROM collection) to /tmp/collections.txt
COPY 1268
```
- Then I started a test run on DSpace Test:
```console
$ while read -r collection; do chrt -b 0 dspace dsrun io.github.ilri.cgspace.scripts.FixLowQualityThumbnails $collection | tee -a /tmp/FixLowQualityThumbnails.log; done < /tmp/collections.txt
```
- I'll be curious to check the log after it's all done.
- After a few hours I see:
```console
$ grep -c 'Action: remove' /tmp/FixLowQualityThumbnails.log
626
```
- Not bad, because last week I did a more manual selection of collections and deleted ~200
- I will replicate this on CGSpace soon, and also try the FixJpgJpgThumbnails tool
- I see that the CIAT Library is still up, so I should really grab all the PDFs before they shut that old server down
- Export a list of items with PDFs linked there:
```console
localhost/dspacetest= ☘ \COPY (SELECT dspace_object_id,text_value FROM metadatavalue WHERE metadata_field_id=219 AND text_value LIKE '%ciat-library%') to /tmp/ciat-library-items.csv;
COPY 4621
```
- After stripping the page numbers off I see there are only about 2,700 unique files, and we have to filter the dead JSPUI ones...
```console
$ csvcut -c url 2022-11-03-CIAT-Library-items.csv | sed 1d | grep -v jspui | sort -u | wc -l
2752
```
- I'm not sure how we'll handle the duplicates because many items are book chapters or something where they share a PDF
## 2022-11-04
- I decided to check for old pre-ImageMagick thumbnails on CGSpace by finding any bitstreams with the description "Generated Thumbnail":
```console
localhost/dspacetest= ☘ \COPY (SELECT ds6_bitstream2itemhandle(dspace_object_id) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND text_value='Generated Thumbnail') to /tmp/old-thumbnails.txt;
COPY 1147
$ grep -v '\\N' /tmp/old-thumbnails.txt > /tmp/old-thumbnail-handles.txt
$ wc -l /tmp/old-thumbnail-handles.txt
987 /tmp/old-thumbnail-handles.txt
```
- A bunch of these have `\N` for some reason when I use the `ds6_bitstream2itemhandle` function to get their handles so I had to exclude those...
- I forced the media-filter for these items on CGSpace:
```console
$ while read -r handle; do JAVA_OPTS="-Xmx512m -Dfile.encoding=UTF-8" dspace filter-media -p "ImageMagick PDF Thumbnail" -i $handle -f -v; done < /tmp/old-thumbnail-handles.txt
```
- Upload some batch records via CSV for Peter
- Update the about page on CGSpace with new text from Peter
- Add a few more ORCID identifiers and names to my growing file `2022-09-22-add-orcids.csv`
- I tagged fifty-four new authors using this list
- I deleted and mapped one duplicate item for Maria Garruccio
- I updated the CG Core website from Bootstrap v4.6 to v5.2
## 2022-11-07
- I did a harvest on AReS last night but it seems that MELSpace's sitemap is broken again because we have 10,000 fewer records
- I filed [an issue](https://github.com/ecrmnn/iso-3166-1/issues/10) on the iso-3166-1 npm package to update the name of Turkey to Türkiye
- I also filed [an issue](https://github.com/flyingcircusio/pycountry/issues/148) and [a pull request](https://github.com/flyingcircusio/pycountry/pull/149) on the pycountry package
- I also filed [an issue](https://github.com/konstantinstadler/country_converter/issues/121) and [a pull request](https://github.com/konstantinstadler/country_converter/pull/122) on the country-converter package
- I also changed one item on CGSpace that had been submitted since the name was changed
- I also imported the new iso-codes 4.12.0 into cgspace-java-helpers
- I also updated it in the DSpace `input-forms.xml`
- I also forked the iso-3166-1 package from npm and updated Swaziland, Macedonia, and Turkey in my fork
- I submitted a [pull request](https://github.com/ecrmnn/iso-3166-1/pull/11) to update this upstream
- Since I was making all these pull requests I also made [one on country-converter for the UN M.49 region "South-eastern Asia"](https://github.com/konstantinstadler/country_converter/pull/123)
- Port the [ImageMagick PDF cropbox fix](https://github.com/DSpace/DSpace/pull/8550) to DSpace 6.x
- I deployed it on CGSpace, ran all updates, and rebooted the host
- I ran the filter-media script on one large collection where many of these PDFs with cropbox issues exist:
```console
$ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" dspace filter-media -p "ImageMagick PDF Thumbnail" -v -f -i 10568/78 >& /tmp/filter-media-cropbox.log
```
- But looking at the items it processed, I'm not sure it's working as expected
<!-- vim: set sw=2 ts=2: -->