Add notes for 2022-07-08

This commit is contained in:
2022-07-08 15:49:45 +03:00
parent 19715c3295
commit 11ce30438c
29 changed files with 154 additions and 36 deletions

View File

@ -19,7 +19,7 @@ Also, the trgm functions I’ve used before are case insensitive, but Levens
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2022-07/" />
<meta property="article:published_time" content="2022-07-02T14:07:36+03:00" />
<meta property="article:modified_time" content="2022-07-04T22:10:02+03:00" />
<meta property="article:modified_time" content="2022-07-07T10:02:04+03:00" />
@ -44,9 +44,9 @@ Also, the trgm functions I&rsquo;ve used before are case insensitive, but Levens
"@type": "BlogPosting",
"headline": "July, 2022",
"url": "https://alanorth.github.io/cgspace-notes/2022-07/",
"wordCount": "739",
"wordCount": "1095",
"datePublished": "2022-07-02T14:07:36+03:00",
"dateModified": "2022-07-04T22:10:02+03:00",
"dateModified": "2022-07-07T10:02:04+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -246,7 +246,76 @@ Also, the trgm functions I&rsquo;ve used before are case insensitive, but Levens
</span></span></code></pre></div><ul>
<li>I will also have to remove &ldquo;Academicians&rdquo; from input-forms.xml</li>
</ul>
<!-- raw HTML omitted -->
<h2 id="2022-07-07">2022-07-07</h2>
<ul>
<li>Finalize lists of non-AGROVOC subjects in CGSpace that I started last week
<ul>
<li>I used the <a href="https://wiki.lyrasis.org/display/DSPACE/Helper+SQL+functions+for+DSpace+6">SQL helper functions</a> to find the collections where each term was used:</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>localhost/dspace= ☘ SELECT DISTINCT(ds6_item2collectionhandle(dspace_object_id)) AS collection, COUNT(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND LOWER(text_value) = &#39;water demand&#39; GROUP BY collection ORDER BY count DESC LIMIT 5;
</span></span><span style="display:flex;"><span> collection │ count
</span></span><span style="display:flex;"><span>─────────────┼───────
</span></span><span style="display:flex;"><span> 10568/36178 │ 56
</span></span><span style="display:flex;"><span> 10568/36185 │ 46
</span></span><span style="display:flex;"><span> 10568/36181 │ 35
</span></span><span style="display:flex;"><span> 10568/36188 │ 28
</span></span><span style="display:flex;"><span> 10568/36179 │ 21
</span></span><span style="display:flex;"><span>(5 rows)
</span></span></code></pre></div><ul>
<li>For now I only did terms from my list that had 100 or more occurrences in CGSpace
<ul>
<li>This leaves us with thirty-six terms that I will send to Sara Jani and Elizabeth Arnaud for evaluating possible inclusion to AGROVOC</li>
</ul>
</li>
<li>Write to some submitters from CIAT, Bioversity, and CCAFS to ask if they are still uploading new items with their legacy subject fields on CGSpace
<ul>
<li>We want to remove them from the submission form to create space for new fields</li>
</ul>
</li>
<li>Update one term I noticed people using that was close to AGROVOC:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>dspace=# UPDATE metadatavalue SET text_value=&#39;development policies&#39; WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=187 AND text_value=&#39;development policy&#39;;
</span></span><span style="display:flex;"><span>UPDATE 108
</span></span></code></pre></div><ul>
<li>After contacting some editors I removed some old metadata fields from the submission form and browse indexes:
<ul>
<li>Bioversity subject (<code>cg.subject.bioversity</code>)</li>
<li>CCAFS phase 1 project tag (<code>cg.identifier.ccafsproject</code>)</li>
<li>CIAT project tag (<code>cg.identifier.ciatproject</code>)</li>
<li>CIAT subject (<code>cg.subject.ciat</code>)</li>
</ul>
</li>
<li>Work on cleaning and proofing forty-six AfricaRice items for CGSpace
<ul>
<li>Last week we identified some duplicates so I removed those</li>
<li>The data is of mediocre quality</li>
<li>I&rsquo;ve been fixing citations (nitpick), adding licenses, adding volume/issue/extent, fixing DOIs, and adding some AGROVOC subjects</li>
<li>I even found titles that have typos, looking something like OCR errors&hellip;</li>
</ul>
</li>
</ul>
<h2 id="2022-07-08">2022-07-08</h2>
<ul>
<li>Finalize the cleaning and proofing of AfricaRice records
<ul>
<li>I found two suspicious items that claim to have been published but I can&rsquo;t find in the respective journals, so I removed those</li>
<li>I uploaded the forty-four items to <a href="https://dspacetest.cgiar.org/handle/10568/119135">DSpace Test</a></li>
</ul>
</li>
<li>Margarita from CCAFS said they are no longer using the CCAFS subject or CCAFS phase 2 project tag
<ul>
<li>I removed these from the input-form.xml and Discovery facets:
<ul>
<li>cg.identifier.ccafsprojectpii</li>
<li>cg.subject.cifor</li>
</ul>
</li>
<li>For now we will keep them in the search filters</li>
</ul>
</li>
</ul>