diff --git a/content/posts/2022-11.md b/content/posts/2022-11.md index 7f1525d77..90964e961 100644 --- a/content/posts/2022-11.md +++ b/content/posts/2022-11.md @@ -20,4 +20,102 @@ categories: ["Notes"] - Tim merged my [pull request to override the ImageMagick PDF density in DSpace 7](https://github.com/DSpace/DSpace/pull/8553) - I ported it to DSpace 6.x and submitted a pull request: https://github.com/DSpace/DSpace/pull/8560 +## 2022-11-02 + +- I joined the FAO–CGIAR AGROVOC results sharing meeting + - From June to October, 2022 we suggested 39 new keywords, added 27 to AGROVOC, 4 rejected, and 9 still under discussion +- Doing duplicate check on IFPRI's batch upload and I found one duplicate uploaded by IWMI earlier this year + - I will update the metadata of that item and map it to the correct Initiative collection + +## 2022-11-03 + +- I added countries to the twenty-three IFPRI items in OpenRefine based on their titles and abstracts (using the Jython trick I learned a few months ago), then added regions using csv-metadata-quality, and uploaded them to CGSpace +- I exported a list of collections from CGSpace so I can run the thumbnail fixes on each, as we seem to have issues when doing it on (some) large communities like the CRP community: + +```console +localhost/dspace= ☘ \COPY (SELECT ds6_collection2collectionhandle(uuid) AS collection FROM collection) to /tmp/collections.txt +COPY 1268 +``` + +- Then I started a test run on DSpace Test: + +```console +$ while read -r collection; do chrt -b 0 dspace dsrun io.github.ilri.cgspace.scripts.FixLowQualityThumbnails $collection | tee -a /tmp/FixLowQualityThumbnails.log; done < /tmp/collections.txt +``` + +- I'll be curious to check the log after it's all done. + - After a few hours I see: + +```console +$ grep -c 'Action: remove' /tmp/FixLowQualityThumbnails.log +626 +``` + +- Not bad, because last week I did a more manual selection of collections and deleted ~200 + - I will replicate this on CGSpace soon, and also try the FixJpgJpgThumbnails tool +- I see that the CIAT Library is still up, so I should really grab all the PDFs before they shut that old server down + - Export a list of items with PDFs linked there: + +```console +localhost/dspacetest= ☘ \COPY (SELECT dspace_object_id,text_value FROM metadatavalue WHERE metadata_field_id=219 AND text_value LIKE '%ciat-library%') to /tmp/ciat-library-items.csv; +COPY 4621 +``` + +- After stripping the page numbers off I see there are only about 2,700 unique files, and we have to filter the dead JSPUI ones... + +```console +$ csvcut -c url 2022-11-03-CIAT-Library-items.csv | sed 1d | grep -v jspui | sort -u | wc -l +2752 +``` + +- I'm not sure how we'll handle the duplicates because many items are book chapters or something where they share a PDF + +## 2022-11-04 + +- I decided to check for old pre-ImageMagick thumbnails on CGSpace by finding any bitstreams with the description "Generated Thumbnail": + +```console +localhost/dspacetest= ☘ \COPY (SELECT ds6_bitstream2itemhandle(dspace_object_id) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND text_value='Generated Thumbnail') to /tmp/old-thumbnails.txt; +COPY 1147 +$ grep -v '\\N' /tmp/old-thumbnails.txt > /tmp/old-thumbnail-handles.txt +$ wc -l /tmp/old-thumbnail-handles.txt +987 /tmp/old-thumbnail-handles.txt +``` + +- A bunch of these have `\N` for some reason when I use the `ds6_bitstream2itemhandle` function to get their handles so I had to exclude those... + - I forced the media-filter for these items on CGSpace: + +```console +$ while read -r handle; do JAVA_OPTS="-Xmx512m -Dfile.encoding=UTF-8" dspace filter-media -p "ImageMagick PDF Thumbnail" -i $handle -f -v; done < /tmp/old-thumbnail-handles.txt +``` + +- Upload some batch records via CSV for Peter +- Update the about page on CGSpace with new text from Peter +- Add a few more ORCID identifiers and names to my growing file `2022-09-22-add-orcids.csv` + - I tagged fifty-four new authors using this list +- I deleted and mapped one duplicate item for Maria Garruccio +- I updated the CG Core website from Bootstrap v4.6 to v5.2 + +## 2022-11-07 + +- I did a harvest on AReS last night but it seems that MELSpace's sitemap is broken again because we have 10,000 fewer records +- I filed [an issue](https://github.com/ecrmnn/iso-3166-1/issues/10) on the iso-3166-1 npm package to update the name of Turkey to Türkiye + - I also filed [an issue](https://github.com/flyingcircusio/pycountry/issues/148) and [a pull request](https://github.com/flyingcircusio/pycountry/pull/149) on the pycountry package + - I also filed [an issue](https://github.com/konstantinstadler/country_converter/issues/121) and [a pull request](https://github.com/konstantinstadler/country_converter/pull/122) on the country-converter package + - I also changed one item on CGSpace that had been submitted since the name was changed + - I also imported the new iso-codes 4.12.0 into cgspace-java-helpers + - I also updated it in the DSpace `input-forms.xml` + - I also forked the iso-3166-1 package from npm and updated Swaziland, Macedonia, and Turkey in my fork + - I submitted a [pull request](https://github.com/ecrmnn/iso-3166-1/pull/11) to update this upstream +- Since I was making all these pull requests I also made [one on country-converter for the UN M.49 region "South-eastern Asia"](https://github.com/konstantinstadler/country_converter/pull/123) +- Port the [ImageMagick PDF cropbox fix](https://github.com/DSpace/DSpace/pull/8550) to DSpace 6.x + - I deployed it on CGSpace, ran all updates, and rebooted the host + - I ran the filter-media script on one large collection where many of these PDFs with cropbox issues exist: + +```console +$ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" dspace filter-media -p "ImageMagick PDF Thumbnail" -v -f -i 10568/78 >& /tmp/filter-media-cropbox.log +``` + +- But looking at the items it processed, I'm not sure it's working as expected + diff --git a/docs/2022-11/index.html b/docs/2022-11/index.html index 75329383e..f664e17a1 100644 --- a/docs/2022-11/index.html +++ b/docs/2022-11/index.html @@ -24,7 +24,7 @@ I reverted the Cocoon autosave change because it was more of a nuissance that Pe - + @@ -54,9 +54,9 @@ I reverted the Cocoon autosave change because it was more of a nuissance that Pe "@type": "BlogPosting", "headline": "November, 2022", "url": "https://alanorth.github.io/cgspace-notes/2022-11/", - "wordCount": "134", + "wordCount": "863", "datePublished": "2022-11-01T09:11:36+03:00", - "dateModified": "2022-11-01T09:11:36+03:00", + "dateModified": "2022-11-01T22:12:24+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -153,6 +153,118 @@ I reverted the Cocoon autosave change because it was more of a nuissance that Pe +

2022-11-02

+ +

2022-11-03

+ +
localhost/dspace= ☘ \COPY (SELECT ds6_collection2collectionhandle(uuid) AS collection FROM collection) to /tmp/collections.txt
+COPY 1268
+
+
$ while read -r collection; do chrt -b 0 dspace dsrun io.github.ilri.cgspace.scripts.FixLowQualityThumbnails $collection | tee -a /tmp/FixLowQualityThumbnails.log; done < /tmp/collections.txt
+
+
$ grep -c 'Action: remove' /tmp/FixLowQualityThumbnails.log 
+626
+
+
localhost/dspacetest= ☘ \COPY (SELECT dspace_object_id,text_value FROM metadatavalue WHERE metadata_field_id=219 AND text_value LIKE '%ciat-library%') to /tmp/ciat-library-items.csv;
+COPY 4621
+
+
$ csvcut -c url 2022-11-03-CIAT-Library-items.csv | sed 1d | grep -v jspui | sort -u | wc -l
+2752
+
+

2022-11-04

+ +
localhost/dspacetest= ☘ \COPY (SELECT ds6_bitstream2itemhandle(dspace_object_id) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND text_value='Generated Thumbnail') to /tmp/old-thumbnails.txt;
+COPY 1147
+$ grep -v '\\N' /tmp/old-thumbnails.txt > /tmp/old-thumbnail-handles.txt
+$ wc -l /tmp/old-thumbnail-handles.txt 
+987 /tmp/old-thumbnail-handles.txt
+
+
$ while read -r handle; do JAVA_OPTS="-Xmx512m -Dfile.encoding=UTF-8" dspace filter-media -p "ImageMagick PDF Thumbnail" -i $handle -f -v; done < /tmp/old-thumbnail-handles.txt
+
+

2022-11-07

+ +
$ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" dspace filter-media -p "ImageMagick PDF Thumbnail" -v -f -i 10568/78 >& /tmp/filter-media-cropbox.log
+
diff --git a/docs/categories/index.html b/docs/categories/index.html index 162591efa..5f8fb221d 100644 --- a/docs/categories/index.html +++ b/docs/categories/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html index 1ec939aba..7c02837a3 100644 --- a/docs/categories/notes/index.html +++ b/docs/categories/notes/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html index 365e955ed..ba7d503f8 100644 --- a/docs/categories/notes/page/2/index.html +++ b/docs/categories/notes/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html index 97ed1e61b..c4a8cd60d 100644 --- a/docs/categories/notes/page/3/index.html +++ b/docs/categories/notes/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html index bef27a070..29b3ce18e 100644 --- a/docs/categories/notes/page/4/index.html +++ b/docs/categories/notes/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html index ecc322453..15274e250 100644 --- a/docs/categories/notes/page/5/index.html +++ b/docs/categories/notes/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/6/index.html b/docs/categories/notes/page/6/index.html index b824f3895..cf91f1183 100644 --- a/docs/categories/notes/page/6/index.html +++ b/docs/categories/notes/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/7/index.html b/docs/categories/notes/page/7/index.html index 078cf0001..77bed3fd0 100644 --- a/docs/categories/notes/page/7/index.html +++ b/docs/categories/notes/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/index.html b/docs/index.html index adbd0c930..fa8037be2 100644 --- a/docs/index.html +++ b/docs/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/2/index.html b/docs/page/2/index.html index ac83dd868..4f8f56616 100644 --- a/docs/page/2/index.html +++ b/docs/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/3/index.html b/docs/page/3/index.html index 43fd1b961..7354d0c90 100644 --- a/docs/page/3/index.html +++ b/docs/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/4/index.html b/docs/page/4/index.html index fbd9d482d..a76c90bdc 100644 --- a/docs/page/4/index.html +++ b/docs/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/5/index.html b/docs/page/5/index.html index 415ad0ea5..41e866f99 100644 --- a/docs/page/5/index.html +++ b/docs/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/6/index.html b/docs/page/6/index.html index e95a93abe..8daa48921 100644 --- a/docs/page/6/index.html +++ b/docs/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/7/index.html b/docs/page/7/index.html index ec4d8e163..35ebcdb57 100644 --- a/docs/page/7/index.html +++ b/docs/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/8/index.html b/docs/page/8/index.html index 6045fc7cb..b96b14d48 100644 --- a/docs/page/8/index.html +++ b/docs/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/9/index.html b/docs/page/9/index.html index 70ff7e6f9..c2028ee32 100644 --- a/docs/page/9/index.html +++ b/docs/page/9/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/index.html b/docs/posts/index.html index e3ee327dc..3e6815854 100644 --- a/docs/posts/index.html +++ b/docs/posts/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html index 6d9b4e74b..fd020fe03 100644 --- a/docs/posts/page/2/index.html +++ b/docs/posts/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html index f1ca06303..4f19318a6 100644 --- a/docs/posts/page/3/index.html +++ b/docs/posts/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html index 946a1fede..9eb9a6641 100644 --- a/docs/posts/page/4/index.html +++ b/docs/posts/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html index 55803b2a0..9b02cb17a 100644 --- a/docs/posts/page/5/index.html +++ b/docs/posts/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html index dd2012879..57df92c29 100644 --- a/docs/posts/page/6/index.html +++ b/docs/posts/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html index 144c066c3..0e3dc3ec5 100644 --- a/docs/posts/page/7/index.html +++ b/docs/posts/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/8/index.html b/docs/posts/page/8/index.html index b2079558f..0991494e5 100644 --- a/docs/posts/page/8/index.html +++ b/docs/posts/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/9/index.html b/docs/posts/page/9/index.html index 5c875b495..759a02d97 100644 --- a/docs/posts/page/9/index.html +++ b/docs/posts/page/9/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 97c4c6bcc..7a6516f7a 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -3,19 +3,19 @@ xmlns:xhtml="http://www.w3.org/1999/xhtml"> https://alanorth.github.io/cgspace-notes/categories/ - 2022-11-01T09:11:36+03:00 + 2022-11-01T22:12:24+03:00 https://alanorth.github.io/cgspace-notes/ - 2022-11-01T09:11:36+03:00 + 2022-11-01T22:12:24+03:00 https://alanorth.github.io/cgspace-notes/categories/notes/ - 2022-11-01T09:11:36+03:00 + 2022-11-01T22:12:24+03:00 https://alanorth.github.io/cgspace-notes/2022-11/ - 2022-11-01T09:11:36+03:00 + 2022-11-01T22:12:24+03:00 https://alanorth.github.io/cgspace-notes/posts/ - 2022-11-01T09:11:36+03:00 + 2022-11-01T22:12:24+03:00 https://alanorth.github.io/cgspace-notes/2022-10/ 2022-10-31T16:59:47+03:00