diff --git a/content/posts/2022-11.md b/content/posts/2022-11.md
index 7f1525d77..90964e961 100644
--- a/content/posts/2022-11.md
+++ b/content/posts/2022-11.md
@@ -20,4 +20,102 @@ categories: ["Notes"]
- Tim merged my [pull request to override the ImageMagick PDF density in DSpace 7](https://github.com/DSpace/DSpace/pull/8553)
- I ported it to DSpace 6.x and submitted a pull request: https://github.com/DSpace/DSpace/pull/8560
+## 2022-11-02
+
+- I joined the FAO–CGIAR AGROVOC results sharing meeting
+ - From June to October, 2022 we suggested 39 new keywords, added 27 to AGROVOC, 4 rejected, and 9 still under discussion
+- Doing duplicate check on IFPRI's batch upload and I found one duplicate uploaded by IWMI earlier this year
+ - I will update the metadata of that item and map it to the correct Initiative collection
+
+## 2022-11-03
+
+- I added countries to the twenty-three IFPRI items in OpenRefine based on their titles and abstracts (using the Jython trick I learned a few months ago), then added regions using csv-metadata-quality, and uploaded them to CGSpace
+- I exported a list of collections from CGSpace so I can run the thumbnail fixes on each, as we seem to have issues when doing it on (some) large communities like the CRP community:
+
+```console
+localhost/dspace= ☘ \COPY (SELECT ds6_collection2collectionhandle(uuid) AS collection FROM collection) to /tmp/collections.txt
+COPY 1268
+```
+
+- Then I started a test run on DSpace Test:
+
+```console
+$ while read -r collection; do chrt -b 0 dspace dsrun io.github.ilri.cgspace.scripts.FixLowQualityThumbnails $collection | tee -a /tmp/FixLowQualityThumbnails.log; done < /tmp/collections.txt
+```
+
+- I'll be curious to check the log after it's all done.
+ - After a few hours I see:
+
+```console
+$ grep -c 'Action: remove' /tmp/FixLowQualityThumbnails.log
+626
+```
+
+- Not bad, because last week I did a more manual selection of collections and deleted ~200
+ - I will replicate this on CGSpace soon, and also try the FixJpgJpgThumbnails tool
+- I see that the CIAT Library is still up, so I should really grab all the PDFs before they shut that old server down
+ - Export a list of items with PDFs linked there:
+
+```console
+localhost/dspacetest= ☘ \COPY (SELECT dspace_object_id,text_value FROM metadatavalue WHERE metadata_field_id=219 AND text_value LIKE '%ciat-library%') to /tmp/ciat-library-items.csv;
+COPY 4621
+```
+
+- After stripping the page numbers off I see there are only about 2,700 unique files, and we have to filter the dead JSPUI ones...
+
+```console
+$ csvcut -c url 2022-11-03-CIAT-Library-items.csv | sed 1d | grep -v jspui | sort -u | wc -l
+2752
+```
+
+- I'm not sure how we'll handle the duplicates because many items are book chapters or something where they share a PDF
+
+## 2022-11-04
+
+- I decided to check for old pre-ImageMagick thumbnails on CGSpace by finding any bitstreams with the description "Generated Thumbnail":
+
+```console
+localhost/dspacetest= ☘ \COPY (SELECT ds6_bitstream2itemhandle(dspace_object_id) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND text_value='Generated Thumbnail') to /tmp/old-thumbnails.txt;
+COPY 1147
+$ grep -v '\\N' /tmp/old-thumbnails.txt > /tmp/old-thumbnail-handles.txt
+$ wc -l /tmp/old-thumbnail-handles.txt
+987 /tmp/old-thumbnail-handles.txt
+```
+
+- A bunch of these have `\N` for some reason when I use the `ds6_bitstream2itemhandle` function to get their handles so I had to exclude those...
+ - I forced the media-filter for these items on CGSpace:
+
+```console
+$ while read -r handle; do JAVA_OPTS="-Xmx512m -Dfile.encoding=UTF-8" dspace filter-media -p "ImageMagick PDF Thumbnail" -i $handle -f -v; done < /tmp/old-thumbnail-handles.txt
+```
+
+- Upload some batch records via CSV for Peter
+- Update the about page on CGSpace with new text from Peter
+- Add a few more ORCID identifiers and names to my growing file `2022-09-22-add-orcids.csv`
+ - I tagged fifty-four new authors using this list
+- I deleted and mapped one duplicate item for Maria Garruccio
+- I updated the CG Core website from Bootstrap v4.6 to v5.2
+
+## 2022-11-07
+
+- I did a harvest on AReS last night but it seems that MELSpace's sitemap is broken again because we have 10,000 fewer records
+- I filed [an issue](https://github.com/ecrmnn/iso-3166-1/issues/10) on the iso-3166-1 npm package to update the name of Turkey to Türkiye
+ - I also filed [an issue](https://github.com/flyingcircusio/pycountry/issues/148) and [a pull request](https://github.com/flyingcircusio/pycountry/pull/149) on the pycountry package
+ - I also filed [an issue](https://github.com/konstantinstadler/country_converter/issues/121) and [a pull request](https://github.com/konstantinstadler/country_converter/pull/122) on the country-converter package
+ - I also changed one item on CGSpace that had been submitted since the name was changed
+ - I also imported the new iso-codes 4.12.0 into cgspace-java-helpers
+ - I also updated it in the DSpace `input-forms.xml`
+ - I also forked the iso-3166-1 package from npm and updated Swaziland, Macedonia, and Turkey in my fork
+ - I submitted a [pull request](https://github.com/ecrmnn/iso-3166-1/pull/11) to update this upstream
+- Since I was making all these pull requests I also made [one on country-converter for the UN M.49 region "South-eastern Asia"](https://github.com/konstantinstadler/country_converter/pull/123)
+- Port the [ImageMagick PDF cropbox fix](https://github.com/DSpace/DSpace/pull/8550) to DSpace 6.x
+ - I deployed it on CGSpace, ran all updates, and rebooted the host
+ - I ran the filter-media script on one large collection where many of these PDFs with cropbox issues exist:
+
+```console
+$ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" dspace filter-media -p "ImageMagick PDF Thumbnail" -v -f -i 10568/78 >& /tmp/filter-media-cropbox.log
+```
+
+- But looking at the items it processed, I'm not sure it's working as expected
+
diff --git a/docs/2022-11/index.html b/docs/2022-11/index.html
index 75329383e..f664e17a1 100644
--- a/docs/2022-11/index.html
+++ b/docs/2022-11/index.html
@@ -24,7 +24,7 @@ I reverted the Cocoon autosave change because it was more of a nuissance that Pe
-
+
@@ -54,9 +54,9 @@ I reverted the Cocoon autosave change because it was more of a nuissance that Pe
"@type": "BlogPosting",
"headline": "November, 2022",
"url": "https://alanorth.github.io/cgspace-notes/2022-11/",
- "wordCount": "134",
+ "wordCount": "863",
"datePublished": "2022-11-01T09:11:36+03:00",
- "dateModified": "2022-11-01T09:11:36+03:00",
+ "dateModified": "2022-11-01T22:12:24+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@@ -153,6 +153,118 @@ I reverted the Cocoon autosave change because it was more of a nuissance that Pe
+
2022-11-02
+
+- I joined the FAO–CGIAR AGROVOC results sharing meeting
+
+- From June to October, 2022 we suggested 39 new keywords, added 27 to AGROVOC, 4 rejected, and 9 still under discussion
+
+
+- Doing duplicate check on IFPRI’s batch upload and I found one duplicate uploaded by IWMI earlier this year
+
+- I will update the metadata of that item and map it to the correct Initiative collection
+
+
+
+2022-11-03
+
+- I added countries to the twenty-three IFPRI items in OpenRefine based on their titles and abstracts (using the Jython trick I learned a few months ago), then added regions using csv-metadata-quality, and uploaded them to CGSpace
+- I exported a list of collections from CGSpace so I can run the thumbnail fixes on each, as we seem to have issues when doing it on (some) large communities like the CRP community:
+
+localhost/dspace= ☘ \COPY (SELECT ds6_collection2collectionhandle(uuid) AS collection FROM collection) to /tmp/collections.txt
+COPY 1268
+
+- Then I started a test run on DSpace Test:
+
+$ while read -r collection; do chrt -b 0 dspace dsrun io.github.ilri.cgspace.scripts.FixLowQualityThumbnails $collection | tee -a /tmp/FixLowQualityThumbnails.log; done < /tmp/collections.txt
+
+- I’ll be curious to check the log after it’s all done.
+
+- After a few hours I see:
+
+
+
+$ grep -c 'Action: remove' /tmp/FixLowQualityThumbnails.log
+626
+
+- Not bad, because last week I did a more manual selection of collections and deleted ~200
+
+- I will replicate this on CGSpace soon, and also try the FixJpgJpgThumbnails tool
+
+
+- I see that the CIAT Library is still up, so I should really grab all the PDFs before they shut that old server down
+
+- Export a list of items with PDFs linked there:
+
+
+
+localhost/dspacetest= ☘ \COPY (SELECT dspace_object_id,text_value FROM metadatavalue WHERE metadata_field_id=219 AND text_value LIKE '%ciat-library%') to /tmp/ciat-library-items.csv;
+COPY 4621
+
+- After stripping the page numbers off I see there are only about 2,700 unique files, and we have to filter the dead JSPUI ones…
+
+$ csvcut -c url 2022-11-03-CIAT-Library-items.csv | sed 1d | grep -v jspui | sort -u | wc -l
+2752
+
+- I’m not sure how we’ll handle the duplicates because many items are book chapters or something where they share a PDF
+
+2022-11-04
+
+- I decided to check for old pre-ImageMagick thumbnails on CGSpace by finding any bitstreams with the description “Generated Thumbnail”:
+
+localhost/dspacetest= ☘ \COPY (SELECT ds6_bitstream2itemhandle(dspace_object_id) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND text_value='Generated Thumbnail') to /tmp/old-thumbnails.txt;
+COPY 1147
+$ grep -v '\\N' /tmp/old-thumbnails.txt > /tmp/old-thumbnail-handles.txt
+$ wc -l /tmp/old-thumbnail-handles.txt
+987 /tmp/old-thumbnail-handles.txt
+
+- A bunch of these have
\N
for some reason when I use the ds6_bitstream2itemhandle
function to get their handles so I had to exclude those…
+
+- I forced the media-filter for these items on CGSpace:
+
+
+
+$ while read -r handle; do JAVA_OPTS="-Xmx512m -Dfile.encoding=UTF-8" dspace filter-media -p "ImageMagick PDF Thumbnail" -i $handle -f -v; done < /tmp/old-thumbnail-handles.txt
+
+- Upload some batch records via CSV for Peter
+- Update the about page on CGSpace with new text from Peter
+- Add a few more ORCID identifiers and names to my growing file
2022-09-22-add-orcids.csv
+
+- I tagged fifty-four new authors using this list
+
+
+- I deleted and mapped one duplicate item for Maria Garruccio
+- I updated the CG Core website from Bootstrap v4.6 to v5.2
+
+2022-11-07
+
+- I did a harvest on AReS last night but it seems that MELSpace’s sitemap is broken again because we have 10,000 fewer records
+- I filed an issue on the iso-3166-1 npm package to update the name of Turkey to Türkiye
+
+- I also filed an issue and a pull request on the pycountry package
+- I also filed an issue and a pull request on the country-converter package
+- I also changed one item on CGSpace that had been submitted since the name was changed
+- I also imported the new iso-codes 4.12.0 into cgspace-java-helpers
+- I also updated it in the DSpace
input-forms.xml
+- I also forked the iso-3166-1 package from npm and updated Swaziland, Macedonia, and Turkey in my fork
+
+
+
+
+- Since I was making all these pull requests I also made one on country-converter for the UN M.49 region “South-eastern Asia”
+- Port the ImageMagick PDF cropbox fix to DSpace 6.x
+
+- I deployed it on CGSpace, ran all updates, and rebooted the host
+- I ran the filter-media script on one large collection where many of these PDFs with cropbox issues exist:
+
+
+
+$ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" dspace filter-media -p "ImageMagick PDF Thumbnail" -v -f -i 10568/78 >& /tmp/filter-media-cropbox.log
+
+- But looking at the items it processed, I’m not sure it’s working as expected
+
diff --git a/docs/categories/index.html b/docs/categories/index.html
index 162591efa..5f8fb221d 100644
--- a/docs/categories/index.html
+++ b/docs/categories/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html
index 1ec939aba..7c02837a3 100644
--- a/docs/categories/notes/index.html
+++ b/docs/categories/notes/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html
index 365e955ed..ba7d503f8 100644
--- a/docs/categories/notes/page/2/index.html
+++ b/docs/categories/notes/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html
index 97ed1e61b..c4a8cd60d 100644
--- a/docs/categories/notes/page/3/index.html
+++ b/docs/categories/notes/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html
index bef27a070..29b3ce18e 100644
--- a/docs/categories/notes/page/4/index.html
+++ b/docs/categories/notes/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html
index ecc322453..15274e250 100644
--- a/docs/categories/notes/page/5/index.html
+++ b/docs/categories/notes/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/6/index.html b/docs/categories/notes/page/6/index.html
index b824f3895..cf91f1183 100644
--- a/docs/categories/notes/page/6/index.html
+++ b/docs/categories/notes/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/7/index.html b/docs/categories/notes/page/7/index.html
index 078cf0001..77bed3fd0 100644
--- a/docs/categories/notes/page/7/index.html
+++ b/docs/categories/notes/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/index.html b/docs/index.html
index adbd0c930..fa8037be2 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/2/index.html b/docs/page/2/index.html
index ac83dd868..4f8f56616 100644
--- a/docs/page/2/index.html
+++ b/docs/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/3/index.html b/docs/page/3/index.html
index 43fd1b961..7354d0c90 100644
--- a/docs/page/3/index.html
+++ b/docs/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/4/index.html b/docs/page/4/index.html
index fbd9d482d..a76c90bdc 100644
--- a/docs/page/4/index.html
+++ b/docs/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/5/index.html b/docs/page/5/index.html
index 415ad0ea5..41e866f99 100644
--- a/docs/page/5/index.html
+++ b/docs/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/6/index.html b/docs/page/6/index.html
index e95a93abe..8daa48921 100644
--- a/docs/page/6/index.html
+++ b/docs/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/7/index.html b/docs/page/7/index.html
index ec4d8e163..35ebcdb57 100644
--- a/docs/page/7/index.html
+++ b/docs/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/8/index.html b/docs/page/8/index.html
index 6045fc7cb..b96b14d48 100644
--- a/docs/page/8/index.html
+++ b/docs/page/8/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/9/index.html b/docs/page/9/index.html
index 70ff7e6f9..c2028ee32 100644
--- a/docs/page/9/index.html
+++ b/docs/page/9/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/index.html b/docs/posts/index.html
index e3ee327dc..3e6815854 100644
--- a/docs/posts/index.html
+++ b/docs/posts/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html
index 6d9b4e74b..fd020fe03 100644
--- a/docs/posts/page/2/index.html
+++ b/docs/posts/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html
index f1ca06303..4f19318a6 100644
--- a/docs/posts/page/3/index.html
+++ b/docs/posts/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html
index 946a1fede..9eb9a6641 100644
--- a/docs/posts/page/4/index.html
+++ b/docs/posts/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html
index 55803b2a0..9b02cb17a 100644
--- a/docs/posts/page/5/index.html
+++ b/docs/posts/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html
index dd2012879..57df92c29 100644
--- a/docs/posts/page/6/index.html
+++ b/docs/posts/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html
index 144c066c3..0e3dc3ec5 100644
--- a/docs/posts/page/7/index.html
+++ b/docs/posts/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/8/index.html b/docs/posts/page/8/index.html
index b2079558f..0991494e5 100644
--- a/docs/posts/page/8/index.html
+++ b/docs/posts/page/8/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/9/index.html b/docs/posts/page/9/index.html
index 5c875b495..759a02d97 100644
--- a/docs/posts/page/9/index.html
+++ b/docs/posts/page/9/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index 97c4c6bcc..7a6516f7a 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -3,19 +3,19 @@
xmlns:xhtml="http://www.w3.org/1999/xhtml">
https://alanorth.github.io/cgspace-notes/categories/
- 2022-11-01T09:11:36+03:00
+ 2022-11-01T22:12:24+03:00
https://alanorth.github.io/cgspace-notes/
- 2022-11-01T09:11:36+03:00
+ 2022-11-01T22:12:24+03:00
https://alanorth.github.io/cgspace-notes/categories/notes/
- 2022-11-01T09:11:36+03:00
+ 2022-11-01T22:12:24+03:00
https://alanorth.github.io/cgspace-notes/2022-11/
- 2022-11-01T09:11:36+03:00
+ 2022-11-01T22:12:24+03:00
https://alanorth.github.io/cgspace-notes/posts/
- 2022-11-01T09:11:36+03:00
+ 2022-11-01T22:12:24+03:00
https://alanorth.github.io/cgspace-notes/2022-10/
2022-10-31T16:59:47+03:00