From 510dd965ea602338837956b1063ff594dec58f8c Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Fri, 7 Oct 2022 21:29:35 +0300 Subject: [PATCH] Add notes --- content/posts/2022-10.md | 83 +++++++++++++++++++++ docs/2022-10/index.html | 97 ++++++++++++++++++++++++- docs/categories/index.html | 2 +- docs/categories/notes/index.html | 2 +- docs/categories/notes/page/2/index.html | 2 +- docs/categories/notes/page/3/index.html | 2 +- docs/categories/notes/page/4/index.html | 2 +- docs/categories/notes/page/5/index.html | 2 +- docs/categories/notes/page/6/index.html | 2 +- docs/categories/notes/page/7/index.html | 2 +- docs/index.html | 2 +- docs/page/2/index.html | 2 +- docs/page/3/index.html | 2 +- docs/page/4/index.html | 2 +- docs/page/5/index.html | 2 +- docs/page/6/index.html | 2 +- docs/page/7/index.html | 2 +- docs/page/8/index.html | 2 +- docs/page/9/index.html | 2 +- docs/posts/index.html | 2 +- docs/posts/page/2/index.html | 2 +- docs/posts/page/3/index.html | 2 +- docs/posts/page/4/index.html | 2 +- docs/posts/page/5/index.html | 2 +- docs/posts/page/6/index.html | 2 +- docs/posts/page/7/index.html | 2 +- docs/posts/page/8/index.html | 2 +- docs/posts/page/9/index.html | 2 +- docs/sitemap.xml | 10 +-- 29 files changed, 208 insertions(+), 34 deletions(-) diff --git a/content/posts/2022-10.md b/content/posts/2022-10.md index 7fc9beeed..86080f84a 100644 --- a/content/posts/2022-10.md +++ b/content/posts/2022-10.md @@ -368,5 +368,88 @@ Java stacktrace: java.lang.ClassCastException: org.apache.cocoon.servlet.multipa - I updated the [cgspace-java-helpers](https://github.com/ilri/cgspace-java-helpers) to include a new `FixLowQualityThumbnails` script to detect the low-quality thumbnails I found above - Add missing ORCID identifier for an Alliance author +- I've been running the `dspace cleanup -v` script every few weeks or months on CGSpace and assuming it finished successfully because I didn't get a error on the stdout/stderr, but today I noticed that the script keeps saying it is deleting the same bitstreams + - I looked in dspace.log and found the error I used to see a lot: + +```console +Caused by: org.postgresql.util.PSQLException: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle" + Detail: Key (uuid)=(99b76ee4-15c6-458c-a940-866148bc7dee) is still referenced from table "bundle". +``` + +- If I mark the primary bitstream as null manually the cleanup script continues until it finds a few more + - I ended up with a long list of UUIDs to fix before the script would complete: + +```console +$ psql -d dspace -c "update bundle set primary_bitstream_id=NULL where primary_bitstream_id in ('b76d41c0-0a02-4f53-bfde-a840ccfff903','1981efaa-eadb-46cd-9d7b-12d7a8cff4c4','97a8b1fa-3c12-4122-9c7b-fc2a3eaf570d','99b76ee4-15c6-458c-a940-866148bc7dee','f330fc22-a787-46e2-b8d0-64cc3e166124','592f4a0d-1ed5-4663-be0e-958c0d3e653b','e73b3178-8f29-42bc-bfd1-1a454903343c','e3a5f592-ac23-4934-a2b2-26735fac0c4f','73f4ff6c-6679-44e8-8cbd-9f28a1df6927','11c9a75c-17a6-4966-a4e8-a473010eb34c','155faf93-92c5-4c17-866e-1db50b1f9687','8e073e9e-ab54-4d99-971a-66de073d51e3','76ddd62c-6499-4a8c-beea-3fc8c60200d8','2850fcc9-f450-430a-9317-c42def74e813','8fef3198-2aea-4bd8-aeab-bf5fccb46e42','9e3c3528-e20f-4da3-a0bd-ae9b8515b770')" +``` + +## 2022-10-06 + +- I finished running the cleanup script on CGSpace and the before and after on the number of bitstreams is interesting: + +```console +$ find /home/cgspace.cgiar.org/assetstore -type f | wc -l +181094 +$ find /home/cgspace.cgiar.org/assetstore -type f | wc -l +178329 +``` + +- So that cleaned up ~2,700 bitstreams! +- Interesting, someone on the DSpace Slack mentioned this as being a known issue with discussion, reproducers, and a pull request: https://github.com/DSpace/DSpace/issues/7348 +- I am having an issue with the new FixLowQualityThumbnails script on some communities like 10568/117865 and 10568/97114 + - For some reason it doesn't descend into the collections + - Also, my old FixJpgJpgThumbnails doesn't either... weird + - I might have to resort to getting a list of collections and doing it that way: + +```console +$ psql -h localhost -U postgres -d dspacetest -c 'SELECT ds6_collection2collectionhandle(uuid) FROM collection WHERE uuid in (SELECT uuid FROM collection);' | + sed 1,2d | + tac | + sed 1,3d > /tmp/collections +``` + +- Strange, I don't think doing it by collections is actually working because it says it's replacing the bitstreams, but it doesn't actually do it + - I don't have time to figure out what's happening, because I see "update_item" in dspace.log when the script says it's doing it, but it doesn't do it + - I might just extract a list of items that have .jpg.jpg thumbnails from the database and run the script through item mode + - There might be a problem with the context commit logic...? +- I exported a list of items that have .jpg.jpg thumbnails on CGSpace: + +```console +$ psql -h localhost -p 5432 -U postgres -d dspacetest -c "SELECT ds6_bitstream2itemhandle(dspace_object_id) FROM metadatavalue WHERE text_value ~ '.*\.(jpg|jpeg|JPG|JPEG)\.(jpg|jpeg|JPG|JPEG)' AND dspace_object_id IS NOT NULL;" | + sed 1,2d | + tac | + sed 1,3d | + grep -v '␀' | + sort -u | + sed 's/ //' > /tmp/jpgjpg-handles.txt +``` + +- I restarted DSpace Test because it had high load since yesterday and I don't know why +- Run `check-duplicates.py` on the 1642 MARLO Innovations to try to include matches from the OICRs we uploaded last month + - Then I processed those matches like I did with the OICRs themselves last month, and then cleaned them one last time with csv-metadata-quality, created a SAF bundle, and uploaded them to CGSpace + - BTW this bumps CGSpace over 100,000 items... + - Then I did the same for the 749 MARLO MELIAs and imported them to CGSpace +- Meeting about CG Core types with Abenet, Marie-Angelique, Sara, Margarita, and Valentina +- I made some minor logic changes to the FixJpgJpgThumbnails script in cgspace-java-helpers + - Now it checks to make sure the bitstream description is not empty or null, and also excludes Maps (in addition to Infographics) since those are likely to be JPEG files in the ORIGINAL bundle on purpose + +## 2022-10-07 + +- I did the matching and cleaning on the 512 MARLO Policies and uploaded them to CGSpace +- I sent a list of the IDs and Handles for all four groups of MARLO items to Jose so he can do the redirects on their server: + +```console +$ wc -l /tmp/*mappings.csv + 1643 /tmp/crp-innovation-mappings.csv + 750 /tmp/crp-melia-mappings.csv + 683 /tmp/crp-oicr-mappings.csv + 513 /tmp/crp-policy-mappings.csv + 3589 total +``` + +- I fixed the mysterious issue with my cgspace-java-helpers scripts not working on communities and collections + - It was because the code wasn't committing the context! + - I ran both `FixJpgJpgThumbnails` and `FixLowQualityThumbnails` on a dozen or so large collections on CGSpace and processed about 1,200 low-quality thumbnails +- I did a complete re-sync of CGSpace to DSpace Test diff --git a/docs/2022-10/index.html b/docs/2022-10/index.html index 0d05649b0..48973e595 100644 --- a/docs/2022-10/index.html +++ b/docs/2022-10/index.html @@ -20,7 +20,7 @@ I filed an issue to ask about Java 11+ support - + @@ -46,9 +46,9 @@ I filed an issue to ask about Java 11+ support "@type": "BlogPosting", "headline": "October, 2022", "url": "https://alanorth.github.io/cgspace-notes/2022-10/", - "wordCount": "1009", + "wordCount": "1689", "datePublished": "2022-10-01T19:45:36+03:00", - "dateModified": "2022-10-03T16:26:30+03:00", + "dateModified": "2022-10-05T17:22:42+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -502,6 +502,97 @@ I filed an issue to ask about Java 11+ support +
Caused by: org.postgresql.util.PSQLException: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"
+  Detail: Key (uuid)=(99b76ee4-15c6-458c-a940-866148bc7dee) is still referenced from table "bundle".
+
+
$ psql -d dspace  -c "update bundle set primary_bitstream_id=NULL where primary_bitstream_id in ('b76d41c0-0a02-4f53-bfde-a840ccfff903','1981efaa-eadb-46cd-9d7b-12d7a8cff4c4','97a8b1fa-3c12-4122-9c7b-fc2a3eaf570d','99b76ee4-15c6-458c-a940-866148bc7dee','f330fc22-a787-46e2-b8d0-64cc3e166124','592f4a0d-1ed5-4663-be0e-958c0d3e653b','e73b3178-8f29-42bc-bfd1-1a454903343c','e3a5f592-ac23-4934-a2b2-26735fac0c4f','73f4ff6c-6679-44e8-8cbd-9f28a1df6927','11c9a75c-17a6-4966-a4e8-a473010eb34c','155faf93-92c5-4c17-866e-1db50b1f9687','8e073e9e-ab54-4d99-971a-66de073d51e3','76ddd62c-6499-4a8c-beea-3fc8c60200d8','2850fcc9-f450-430a-9317-c42def74e813','8fef3198-2aea-4bd8-aeab-bf5fccb46e42','9e3c3528-e20f-4da3-a0bd-ae9b8515b770')"
+

2022-10-06

+ +
$ find /home/cgspace.cgiar.org/assetstore -type f | wc -l
+181094
+$ find /home/cgspace.cgiar.org/assetstore -type f | wc -l
+178329
+
+
$ psql -h localhost -U postgres -d dspacetest -c 'SELECT ds6_collection2collectionhandle(uuid) FROM collection WHERE uuid in (SELECT uuid FROM collection);' |
+    sed 1,2d |
+    tac |
+    sed 1,3d > /tmp/collections
+
+
$ psql -h localhost -p 5432 -U postgres -d dspacetest -c "SELECT ds6_bitstream2itemhandle(dspace_object_id) FROM metadatavalue WHERE text_value ~ '.*\.(jpg|jpeg|JPG|JPEG)\.(jpg|jpeg|JPG|JPEG)' AND dspace_object_id IS NOT NULL;" |
+  sed 1,2d |
+  tac |
+  sed 1,3d |
+  grep -v '␀' |
+  sort -u |
+  sed 's/ //' > /tmp/jpgjpg-handles.txt
+
+

2022-10-07

+ +
$ wc -l /tmp/*mappings.csv
+  1643 /tmp/crp-innovation-mappings.csv
+   750 /tmp/crp-melia-mappings.csv
+   683 /tmp/crp-oicr-mappings.csv
+   513 /tmp/crp-policy-mappings.csv
+  3589 total
+
diff --git a/docs/categories/index.html b/docs/categories/index.html index 1a9698804..4d459c5a4 100644 --- a/docs/categories/index.html +++ b/docs/categories/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html index fefe91bff..d914112ca 100644 --- a/docs/categories/notes/index.html +++ b/docs/categories/notes/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html index 94c278346..2810ee3c1 100644 --- a/docs/categories/notes/page/2/index.html +++ b/docs/categories/notes/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html index b033b112a..05a29261a 100644 --- a/docs/categories/notes/page/3/index.html +++ b/docs/categories/notes/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html index 2ddb85133..414ceb917 100644 --- a/docs/categories/notes/page/4/index.html +++ b/docs/categories/notes/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html index 85e982c96..1cad29a9b 100644 --- a/docs/categories/notes/page/5/index.html +++ b/docs/categories/notes/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/6/index.html b/docs/categories/notes/page/6/index.html index 72a9a9944..c38783370 100644 --- a/docs/categories/notes/page/6/index.html +++ b/docs/categories/notes/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/7/index.html b/docs/categories/notes/page/7/index.html index 7c18ccbf6..03306385b 100644 --- a/docs/categories/notes/page/7/index.html +++ b/docs/categories/notes/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/index.html b/docs/index.html index 80e5b6cec..ddb5deba8 100644 --- a/docs/index.html +++ b/docs/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/2/index.html b/docs/page/2/index.html index 4de6d14cd..4c37c5b7e 100644 --- a/docs/page/2/index.html +++ b/docs/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/3/index.html b/docs/page/3/index.html index bcfd7915c..d6f5870c5 100644 --- a/docs/page/3/index.html +++ b/docs/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/4/index.html b/docs/page/4/index.html index 1eb121d64..3a62a213c 100644 --- a/docs/page/4/index.html +++ b/docs/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/5/index.html b/docs/page/5/index.html index 3d2d5caf5..dcb2cdf3f 100644 --- a/docs/page/5/index.html +++ b/docs/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/6/index.html b/docs/page/6/index.html index 59bfffd55..c8327368e 100644 --- a/docs/page/6/index.html +++ b/docs/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/7/index.html b/docs/page/7/index.html index defd46e71..4b6a88009 100644 --- a/docs/page/7/index.html +++ b/docs/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/8/index.html b/docs/page/8/index.html index db71dc9b6..877dd7761 100644 --- a/docs/page/8/index.html +++ b/docs/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/9/index.html b/docs/page/9/index.html index 4111c026b..e0c61b10e 100644 --- a/docs/page/9/index.html +++ b/docs/page/9/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/index.html b/docs/posts/index.html index 6094b7f8c..66ff076f6 100644 --- a/docs/posts/index.html +++ b/docs/posts/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html index b8593f318..43eb6fc3a 100644 --- a/docs/posts/page/2/index.html +++ b/docs/posts/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html index 09fed137c..9c91978e5 100644 --- a/docs/posts/page/3/index.html +++ b/docs/posts/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html index be375cc6d..1d40e7fc7 100644 --- a/docs/posts/page/4/index.html +++ b/docs/posts/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html index 2e788408b..39efe04db 100644 --- a/docs/posts/page/5/index.html +++ b/docs/posts/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html index edd574382..6c6be9fa8 100644 --- a/docs/posts/page/6/index.html +++ b/docs/posts/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html index 0613230a9..ea07661bf 100644 --- a/docs/posts/page/7/index.html +++ b/docs/posts/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/8/index.html b/docs/posts/page/8/index.html index 89870cb7a..717fa3edc 100644 --- a/docs/posts/page/8/index.html +++ b/docs/posts/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/9/index.html b/docs/posts/page/9/index.html index 373fb59c7..6ebd7b9f7 100644 --- a/docs/posts/page/9/index.html +++ b/docs/posts/page/9/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 5f7a1698d..3e99f24d7 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -3,19 +3,19 @@ xmlns:xhtml="http://www.w3.org/1999/xhtml"> https://alanorth.github.io/cgspace-notes/categories/ - 2022-10-03T16:26:30+03:00 + 2022-10-05T17:22:42+03:00 https://alanorth.github.io/cgspace-notes/ - 2022-10-03T16:26:30+03:00 + 2022-10-05T17:22:42+03:00 https://alanorth.github.io/cgspace-notes/categories/notes/ - 2022-10-03T16:26:30+03:00 + 2022-10-05T17:22:42+03:00 https://alanorth.github.io/cgspace-notes/2022-10/ - 2022-10-03T16:26:30+03:00 + 2022-10-05T17:22:42+03:00 https://alanorth.github.io/cgspace-notes/posts/ - 2022-10-03T16:26:30+03:00 + 2022-10-05T17:22:42+03:00 https://alanorth.github.io/cgspace-notes/2022-09/ 2022-09-30T17:29:50+03:00