Add notes for 2022-12-14

This commit is contained in:
Alan Orth 2022-12-14 22:14:03 +03:00
parent 9c1e60426a
commit aaec17b94d
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
29 changed files with 176 additions and 34 deletions

View File

@ -155,4 +155,66 @@ $ curl -v -X POST --data "user=aorth@omg.com&password=myPassword" "https://dspac
- [Items submitted to CGSpace without Initiative](https://github.com/CodeObia/MEL/issues/11083)
- PRMS planning meeting before tomorrow's meeting with researchers and submitters
## 2022-12-13
- I made some minor changes to csv-metadata-quality
- I switched to using the SPDX license data as a JSON directly from SPDX, instead of via the now-deprecated spdx-license-list package on pypi
- I exported the Initiatives collection to tag missing regions
- I submitted an issue to MEL GitHub:
- [Set the description of bitstreams in the THUMBNAIL bundle to "IM Thumbnail" when submitting to CGSpace](https://github.com/CodeObia/MEL/issues/11084)
- Submit a pull request to [fix the Handle link in the Citizen Lab test URLs for Iran](https://github.com/citizenlab/test-lists/pull/1199)
- I had originally submitted this in 2018, but it seems someone updated the URL in 2020... hmmm
- I normalized the `text_lang` values on CGSpace again:
```console
dspace=# SELECT DISTINCT text_lang, count(text_lang) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) GROUP BY text_lang ORDER BY count DESC;
text_lang | count
-----------+---------
en_US | 3050302
en | 618
| 605
fr | 2
vi | 2
es | 1
| 0
(7 rows)
dspace=# BEGIN;
BEGIN
dspace=# UPDATE metadatavalue SET text_lang='en_US' WHERE dspace_object_id IN (SELECT uuid FROM item) AND text_lang IN ('en', '', NULL);
UPDATE 1223
dspace=# COMMIT;
COMMIT
```
- I wrote an initial version of a script to map CGSpace items to Initiative collections based on their `cg.contributor.initiative` metadata
- I am still considering if I want to add a mode to *un-map* items that are mapped to collections, but do not have the corresponding metadata tag
## 2022-12-14
- Lots of work on PRMS related metadata issues with CGSpace
- We noticed that PRMS uses `cg.identifier.dataurl` for the FAIR score, but not `cg.identifier.url`
- We don't use these consistently for datasets in CGSpace so I decided to move them to the dataurl field, but we will also ask the PRMS team to consider the normal URL field, as there are commonly other external resources related to the knowledge product there
- I updated the `move-metadata-values.py` script to use the latest best practices from my other scripts and some of the helper functions from `util.py`
- Then I exported a list of text values pointing to Dataverse instances from `cg.identifier.url`:
```console
localhost/dspace= ☘ \COPY (SELECT text_value FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=219 AND (text_value LIKE '%persistentId%' OR text_value LIKE '%20.500.11766.1/%')) to /tmp/data.txt;
COPY 61
```
- Then I moved them to `cg.identifier.dataurl` on CGSpace:
```console
$ ./ilri/move-metadata-values.py -i /tmp/data.txt -db dspace -u dspace -p 'dom@in34sniper' -f cg.identifier.url -t cg.identifier.dataurl
```
- I still need to add a note to the CGSpace submission form to inform submitters about the correct field for dataset URLs
- I finalized work on my new `fix-initiative-mappings.py` script
- It has two modes:
1. Check item metadata to see which Initiatives are tagged and then map the item if it is not yet mapped to the corresponding Initiative collection
2. Check item collections to see which Initiatives are mapped and then unmap the item if the corresponding Initiative metadata is missing
- The second one is disabled by default until I can get more feedback from Abenet, Michael, and others
- After I applied a handful of collection mappings I started a harvest on AReS
<!-- vim: set sw=2 ts=2: -->

View File

@ -20,7 +20,7 @@ Replace &ldquo;East Asia&rdquo; with &ldquo;Eastern Asia&rdquo; region on CGSpac
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2022-12/" />
<meta property="article:published_time" content="2022-12-01T08:52:36+03:00" />
<meta property="article:modified_time" content="2022-12-08T18:59:57+02:00" />
<meta property="article:modified_time" content="2022-12-12T18:17:33+03:00" />
@ -46,9 +46,9 @@ Replace &ldquo;East Asia&rdquo; with &ldquo;Eastern Asia&rdquo; region on CGSpac
"@type": "BlogPosting",
"headline": "December, 2022",
"url": "https://alanorth.github.io/cgspace-notes/2022-12/",
"wordCount": "993",
"wordCount": "1486",
"datePublished": "2022-12-01T08:52:36+03:00",
"dateModified": "2022-12-08T18:59:57+02:00",
"dateModified": "2022-12-12T18:17:33+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -286,6 +286,86 @@ Replace &ldquo;East Asia&rdquo; with &ldquo;Eastern Asia&rdquo; region on CGSpac
</li>
<li>PRMS planning meeting before tomorrow&rsquo;s meeting with researchers and submitters</li>
</ul>
<h2 id="2022-12-13">2022-12-13</h2>
<ul>
<li>I made some minor changes to csv-metadata-quality
<ul>
<li>I switched to using the SPDX license data as a JSON directly from SPDX, instead of via the now-deprecated spdx-license-list package on pypi</li>
</ul>
</li>
<li>I exported the Initiatives collection to tag missing regions</li>
<li>I submitted an issue to MEL GitHub:
<ul>
<li><a href="https://github.com/CodeObia/MEL/issues/11084">Set the description of bitstreams in the THUMBNAIL bundle to &ldquo;IM Thumbnail&rdquo; when submitting to CGSpace</a></li>
</ul>
</li>
<li>Submit a pull request to <a href="https://github.com/citizenlab/test-lists/pull/1199">fix the Handle link in the Citizen Lab test URLs for Iran</a>
<ul>
<li>I had originally submitted this in 2018, but it seems someone updated the URL in 2020&hellip; hmmm</li>
</ul>
</li>
<li>I normalized the <code>text_lang</code> values on CGSpace again:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>dspace=# SELECT DISTINCT text_lang, count(text_lang) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) GROUP BY text_lang ORDER BY count DESC;
</span></span><span style="display:flex;"><span> text_lang | count
</span></span><span style="display:flex;"><span>-----------+---------
</span></span><span style="display:flex;"><span> en_US | 3050302
</span></span><span style="display:flex;"><span> en | 618
</span></span><span style="display:flex;"><span> | 605
</span></span><span style="display:flex;"><span> fr | 2
</span></span><span style="display:flex;"><span> vi | 2
</span></span><span style="display:flex;"><span> es | 1
</span></span><span style="display:flex;"><span> | 0
</span></span><span style="display:flex;"><span>(7 rows)
</span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>dspace=# BEGIN;
</span></span><span style="display:flex;"><span>BEGIN
</span></span><span style="display:flex;"><span>dspace=# UPDATE metadatavalue SET text_lang=&#39;en_US&#39; WHERE dspace_object_id IN (SELECT uuid FROM item) AND text_lang IN (&#39;en&#39;, &#39;&#39;, NULL);
</span></span><span style="display:flex;"><span>UPDATE 1223
</span></span><span style="display:flex;"><span>dspace=# COMMIT;
</span></span><span style="display:flex;"><span>COMMIT
</span></span></code></pre></div><ul>
<li>I wrote an initial version of a script to map CGSpace items to Initiative collections based on their <code>cg.contributor.initiative</code> metadata
<ul>
<li>I am still considering if I want to add a mode to <em>un-map</em> items that are mapped to collections, but do not have the corresponding metadata tag</li>
</ul>
</li>
</ul>
<h2 id="2022-12-14">2022-12-14</h2>
<ul>
<li>Lots of work on PRMS related metadata issues with CGSpace
<ul>
<li>We noticed that PRMS uses <code>cg.identifier.dataurl</code> for the FAIR score, but not <code>cg.identifier.url</code></li>
<li>We don&rsquo;t use these consistently for datasets in CGSpace so I decided to move them to the dataurl field, but we will also ask the PRMS team to consider the normal URL field, as there are commonly other external resources related to the knowledge product there</li>
</ul>
</li>
<li>I updated the <code>move-metadata-values.py</code> script to use the latest best practices from my other scripts and some of the helper functions from <code>util.py</code>
<ul>
<li>Then I exported a list of text values pointing to Dataverse instances from <code>cg.identifier.url</code>:</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>localhost/dspace= ☘ \COPY (SELECT text_value FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=219 AND (text_value LIKE &#39;%persistentId%&#39; OR text_value LIKE &#39;%20.500.11766.1/%&#39;)) to /tmp/data.txt;
</span></span><span style="display:flex;"><span>COPY 61
</span></span></code></pre></div><ul>
<li>Then I moved them to <code>cg.identifier.dataurl</code> on CGSpace:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./ilri/move-metadata-values.py -i /tmp/data.txt -db dspace -u dspace -p <span style="color:#e6db74">&#39;dom@in34sniper&#39;</span> -f cg.identifier.url -t cg.identifier.dataurl
</span></span></code></pre></div><ul>
<li>I still need to add a note to the CGSpace submission form to inform submitters about the correct field for dataset URLs</li>
<li>I finalized work on my new <code>fix-initiative-mappings.py</code> script
<ul>
<li>It has two modes:
<ol>
<li>Check item metadata to see which Initiatives are tagged and then map the item if it is not yet mapped to the corresponding Initiative collection</li>
<li>Check item collections to see which Initiatives are mapped and then unmap the item if the corresponding Initiative metadata is missing</li>
</ol>
</li>
<li>The second one is disabled by default until I can get more feedback from Abenet, Michael, and others</li>
</ul>
</li>
<li>After I applied a handful of collection mappings I started a harvest on AReS</li>
</ul>
<!-- raw HTML omitted -->

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
<meta property="og:updated_time" content="2022-12-08T18:59:57+02:00" />
<meta property="og:updated_time" content="2022-12-12T18:17:33+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-12-08T18:59:57+02:00" />
<meta property="og:updated_time" content="2022-12-12T18:17:33+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-12-08T18:59:57+02:00" />
<meta property="og:updated_time" content="2022-12-12T18:17:33+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-12-08T18:59:57+02:00" />
<meta property="og:updated_time" content="2022-12-12T18:17:33+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-12-08T18:59:57+02:00" />
<meta property="og:updated_time" content="2022-12-12T18:17:33+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-12-08T18:59:57+02:00" />
<meta property="og:updated_time" content="2022-12-12T18:17:33+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-12-08T18:59:57+02:00" />
<meta property="og:updated_time" content="2022-12-12T18:17:33+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-12-08T18:59:57+02:00" />
<meta property="og:updated_time" content="2022-12-12T18:17:33+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-12-08T18:59:57+02:00" />
<meta property="og:updated_time" content="2022-12-12T18:17:33+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-12-08T18:59:57+02:00" />
<meta property="og:updated_time" content="2022-12-12T18:17:33+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-12-08T18:59:57+02:00" />
<meta property="og:updated_time" content="2022-12-12T18:17:33+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-12-08T18:59:57+02:00" />
<meta property="og:updated_time" content="2022-12-12T18:17:33+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-12-08T18:59:57+02:00" />
<meta property="og:updated_time" content="2022-12-12T18:17:33+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-12-08T18:59:57+02:00" />
<meta property="og:updated_time" content="2022-12-12T18:17:33+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-12-08T18:59:57+02:00" />
<meta property="og:updated_time" content="2022-12-12T18:17:33+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-12-08T18:59:57+02:00" />
<meta property="og:updated_time" content="2022-12-12T18:17:33+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-12-08T18:59:57+02:00" />
<meta property="og:updated_time" content="2022-12-12T18:17:33+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-12-08T18:59:57+02:00" />
<meta property="og:updated_time" content="2022-12-12T18:17:33+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-12-08T18:59:57+02:00" />
<meta property="og:updated_time" content="2022-12-12T18:17:33+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-12-08T18:59:57+02:00" />
<meta property="og:updated_time" content="2022-12-12T18:17:33+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-12-08T18:59:57+02:00" />
<meta property="og:updated_time" content="2022-12-12T18:17:33+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-12-08T18:59:57+02:00" />
<meta property="og:updated_time" content="2022-12-12T18:17:33+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-12-08T18:59:57+02:00" />
<meta property="og:updated_time" content="2022-12-12T18:17:33+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-12-08T18:59:57+02:00" />
<meta property="og:updated_time" content="2022-12-12T18:17:33+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-12-08T18:59:57+02:00" />
<meta property="og:updated_time" content="2022-12-12T18:17:33+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-12-08T18:59:57+02:00" />
<meta property="og:updated_time" content="2022-12-12T18:17:33+03:00" />

View File

@ -3,19 +3,19 @@
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<url>
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
<lastmod>2022-12-08T18:59:57+02:00</lastmod>
<lastmod>2022-12-12T18:17:33+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2022-12-08T18:59:57+02:00</lastmod>
<lastmod>2022-12-12T18:17:33+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/2022-12/</loc>
<lastmod>2022-12-08T18:59:57+02:00</lastmod>
<lastmod>2022-12-12T18:17:33+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
<lastmod>2022-12-08T18:59:57+02:00</lastmod>
<lastmod>2022-12-12T18:17:33+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2022-12-08T18:59:57+02:00</lastmod>
<lastmod>2022-12-12T18:17:33+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/2022-11/</loc>
<lastmod>2022-12-03T10:46:29+03:00</lastmod>