diff --git a/content/posts/2022-12.md b/content/posts/2022-12.md index d4ecba60d..fe66f2248 100644 --- a/content/posts/2022-12.md +++ b/content/posts/2022-12.md @@ -155,4 +155,66 @@ $ curl -v -X POST --data "user=aorth@omg.com&password=myPassword" "https://dspac - [Items submitted to CGSpace without Initiative](https://github.com/CodeObia/MEL/issues/11083) - PRMS planning meeting before tomorrow's meeting with researchers and submitters +## 2022-12-13 + +- I made some minor changes to csv-metadata-quality + - I switched to using the SPDX license data as a JSON directly from SPDX, instead of via the now-deprecated spdx-license-list package on pypi +- I exported the Initiatives collection to tag missing regions +- I submitted an issue to MEL GitHub: + - [Set the description of bitstreams in the THUMBNAIL bundle to "IM Thumbnail" when submitting to CGSpace](https://github.com/CodeObia/MEL/issues/11084) +- Submit a pull request to [fix the Handle link in the Citizen Lab test URLs for Iran](https://github.com/citizenlab/test-lists/pull/1199) + - I had originally submitted this in 2018, but it seems someone updated the URL in 2020... hmmm +- I normalized the `text_lang` values on CGSpace again: + +```console +dspace=# SELECT DISTINCT text_lang, count(text_lang) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) GROUP BY text_lang ORDER BY count DESC; + text_lang | count +-----------+--------- + en_US | 3050302 + en | 618 + | 605 + fr | 2 + vi | 2 + es | 1 + | 0 +(7 rows) + +dspace=# BEGIN; +BEGIN +dspace=# UPDATE metadatavalue SET text_lang='en_US' WHERE dspace_object_id IN (SELECT uuid FROM item) AND text_lang IN ('en', '', NULL); +UPDATE 1223 +dspace=# COMMIT; +COMMIT +``` + +- I wrote an initial version of a script to map CGSpace items to Initiative collections based on their `cg.contributor.initiative` metadata + - I am still considering if I want to add a mode to *un-map* items that are mapped to collections, but do not have the corresponding metadata tag + +## 2022-12-14 + +- Lots of work on PRMS related metadata issues with CGSpace + - We noticed that PRMS uses `cg.identifier.dataurl` for the FAIR score, but not `cg.identifier.url` + - We don't use these consistently for datasets in CGSpace so I decided to move them to the dataurl field, but we will also ask the PRMS team to consider the normal URL field, as there are commonly other external resources related to the knowledge product there +- I updated the `move-metadata-values.py` script to use the latest best practices from my other scripts and some of the helper functions from `util.py` + - Then I exported a list of text values pointing to Dataverse instances from `cg.identifier.url`: + +```console +localhost/dspace= ☘ \COPY (SELECT text_value FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=219 AND (text_value LIKE '%persistentId%' OR text_value LIKE '%20.500.11766.1/%')) to /tmp/data.txt; +COPY 61 +``` + +- Then I moved them to `cg.identifier.dataurl` on CGSpace: + +```console +$ ./ilri/move-metadata-values.py -i /tmp/data.txt -db dspace -u dspace -p 'dom@in34sniper' -f cg.identifier.url -t cg.identifier.dataurl +``` + +- I still need to add a note to the CGSpace submission form to inform submitters about the correct field for dataset URLs +- I finalized work on my new `fix-initiative-mappings.py` script + - It has two modes: + 1. Check item metadata to see which Initiatives are tagged and then map the item if it is not yet mapped to the corresponding Initiative collection + 2. Check item collections to see which Initiatives are mapped and then unmap the item if the corresponding Initiative metadata is missing + - The second one is disabled by default until I can get more feedback from Abenet, Michael, and others +- After I applied a handful of collection mappings I started a harvest on AReS + diff --git a/docs/2022-12/index.html b/docs/2022-12/index.html index 54af93fb6..3887719c5 100644 --- a/docs/2022-12/index.html +++ b/docs/2022-12/index.html @@ -20,7 +20,7 @@ Replace “East Asia” with “Eastern Asia” region on CGSpac - + @@ -46,9 +46,9 @@ Replace “East Asia” with “Eastern Asia” region on CGSpac "@type": "BlogPosting", "headline": "December, 2022", "url": "https://alanorth.github.io/cgspace-notes/2022-12/", - "wordCount": "993", + "wordCount": "1486", "datePublished": "2022-12-01T08:52:36+03:00", - "dateModified": "2022-12-08T18:59:57+02:00", + "dateModified": "2022-12-12T18:17:33+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -286,6 +286,86 @@ Replace “East Asia” with “Eastern Asia” region on CGSpac
  • PRMS planning meeting before tomorrow’s meeting with researchers and submitters
  • +

    2022-12-13

    + +
    dspace=# SELECT DISTINCT text_lang, count(text_lang) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) GROUP BY text_lang ORDER BY count DESC;
    + text_lang |  count  
    +-----------+---------
    + en_US     | 3050302
    + en        |     618
    +           |     605
    + fr        |       2
    + vi        |       2
    + es        |       1
    +           |       0
    +(7 rows)
    +
    +dspace=# BEGIN;
    +BEGIN
    +dspace=# UPDATE metadatavalue SET text_lang='en_US' WHERE dspace_object_id IN (SELECT uuid FROM item) AND text_lang IN ('en', '', NULL);
    +UPDATE 1223
    +dspace=# COMMIT;
    +COMMIT
    +
    +

    2022-12-14

    + +
    localhost/dspace= ☘ \COPY (SELECT text_value FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=219 AND (text_value LIKE '%persistentId%' OR text_value LIKE '%20.500.11766.1/%')) to /tmp/data.txt;
    +COPY 61
    +
    +
    $ ./ilri/move-metadata-values.py -i /tmp/data.txt -db dspace -u dspace -p 'dom@in34sniper' -f cg.identifier.url -t cg.identifier.dataurl
    +
    diff --git a/docs/categories/index.html b/docs/categories/index.html index 8d2f96939..32156c80a 100644 --- a/docs/categories/index.html +++ b/docs/categories/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html index c790743e5..406ea99e0 100644 --- a/docs/categories/notes/index.html +++ b/docs/categories/notes/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html index ed037bfed..887f5088b 100644 --- a/docs/categories/notes/page/2/index.html +++ b/docs/categories/notes/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html index fc82e505d..ec152f3ca 100644 --- a/docs/categories/notes/page/3/index.html +++ b/docs/categories/notes/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html index e4eac4f83..f00198cf4 100644 --- a/docs/categories/notes/page/4/index.html +++ b/docs/categories/notes/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html index 5cb1050fb..d9e205388 100644 --- a/docs/categories/notes/page/5/index.html +++ b/docs/categories/notes/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/6/index.html b/docs/categories/notes/page/6/index.html index fc16ae5c4..62d1eb256 100644 --- a/docs/categories/notes/page/6/index.html +++ b/docs/categories/notes/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/7/index.html b/docs/categories/notes/page/7/index.html index bdb62adec..3290acdf8 100644 --- a/docs/categories/notes/page/7/index.html +++ b/docs/categories/notes/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/index.html b/docs/index.html index 786ba1d96..e48ad34d4 100644 --- a/docs/index.html +++ b/docs/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/2/index.html b/docs/page/2/index.html index f0bc12897..8bd670b15 100644 --- a/docs/page/2/index.html +++ b/docs/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/3/index.html b/docs/page/3/index.html index 80fb53ac0..d1c80cc22 100644 --- a/docs/page/3/index.html +++ b/docs/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/4/index.html b/docs/page/4/index.html index 7f8f56b69..428177a1f 100644 --- a/docs/page/4/index.html +++ b/docs/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/5/index.html b/docs/page/5/index.html index 64b910e71..6d3743ef3 100644 --- a/docs/page/5/index.html +++ b/docs/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/6/index.html b/docs/page/6/index.html index ccb822643..14abe2d44 100644 --- a/docs/page/6/index.html +++ b/docs/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/7/index.html b/docs/page/7/index.html index e7af76ecf..81c9fb02c 100644 --- a/docs/page/7/index.html +++ b/docs/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/8/index.html b/docs/page/8/index.html index 74039a8df..36bdf2042 100644 --- a/docs/page/8/index.html +++ b/docs/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/9/index.html b/docs/page/9/index.html index 830e7b865..7bb92a70b 100644 --- a/docs/page/9/index.html +++ b/docs/page/9/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/index.html b/docs/posts/index.html index ad0946aef..fb60343cd 100644 --- a/docs/posts/index.html +++ b/docs/posts/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html index a8e3220fd..877853a93 100644 --- a/docs/posts/page/2/index.html +++ b/docs/posts/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html index 6c5648915..f8fcb8086 100644 --- a/docs/posts/page/3/index.html +++ b/docs/posts/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html index aac663684..5b8b0dd6e 100644 --- a/docs/posts/page/4/index.html +++ b/docs/posts/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html index 525325e0d..d769f9b8f 100644 --- a/docs/posts/page/5/index.html +++ b/docs/posts/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html index 7c31cc803..68a060447 100644 --- a/docs/posts/page/6/index.html +++ b/docs/posts/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html index 8389e9d22..2230cbf1f 100644 --- a/docs/posts/page/7/index.html +++ b/docs/posts/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/8/index.html b/docs/posts/page/8/index.html index f7d16caeb..0881c0970 100644 --- a/docs/posts/page/8/index.html +++ b/docs/posts/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/9/index.html b/docs/posts/page/9/index.html index 119dd8944..9a8653b9e 100644 --- a/docs/posts/page/9/index.html +++ b/docs/posts/page/9/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index b90588b1c..ebb5fe1e2 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -3,19 +3,19 @@ xmlns:xhtml="http://www.w3.org/1999/xhtml"> https://alanorth.github.io/cgspace-notes/categories/ - 2022-12-08T18:59:57+02:00 + 2022-12-12T18:17:33+03:00 https://alanorth.github.io/cgspace-notes/ - 2022-12-08T18:59:57+02:00 + 2022-12-12T18:17:33+03:00 https://alanorth.github.io/cgspace-notes/2022-12/ - 2022-12-08T18:59:57+02:00 + 2022-12-12T18:17:33+03:00 https://alanorth.github.io/cgspace-notes/categories/notes/ - 2022-12-08T18:59:57+02:00 + 2022-12-12T18:17:33+03:00 https://alanorth.github.io/cgspace-notes/posts/ - 2022-12-08T18:59:57+02:00 + 2022-12-12T18:17:33+03:00 https://alanorth.github.io/cgspace-notes/2022-11/ 2022-12-03T10:46:29+03:00