Add notes for 2019-03-10

This commit is contained in:
Alan Orth 2019-03-10 19:34:34 +02:00
parent 5bb64b0e7d
commit d1501620cb
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
3 changed files with 53 additions and 8 deletions

View File

@ -120,5 +120,25 @@ UPDATE 44
- I ran the corrections on CGSpace and DSpace Test
## 2019-03-10
- Working on tagging IITA's items with their new research theme (`cg.identifier.iitatheme`) based on their existing IITA subjects (see [notes from 2019-02]({{< relref "2018-02.md" >}}))
- I exported the entire IITA community from CGSpace and then used `csvcut` to extract only the needed fields:
```
$ csvcut -c 'id,cg.subject.iita,cg.subject.iita[],cg.subject.iita[en],cg.subject.iita[en_US]' ~/Downloads/10568-68616.csv > /tmp/iita.csv
```
- After importing to OpenRefine I realized that tagging items based on their subjects is tricky because of the row/record mode of OpenRefine when you split the multi-value cells as well as the fact that some items might need to be tagged twice (thus needing a `||`)
- I think it might actually be easier to filter by IITA subject, then by IITA theme (if needed), and then do transformations with some conditional values in GREL expressions like:
```
if(isBlank(value), 'PLANT PRODUCTION & HEALTH', value + '||PLANT PRODUCTION & HEALTH')
```
- Then it's more annoying because there are four IITA subject columns...
- In total this would add research themes to 1,755 items
- I want to double check one last time with Bosede that they would like to do this, because I also see that this will tag a few hundred items from the 1970s and 1980s
<!-- vim: set sw=2 ts=2: -->

View File

@ -25,7 +25,7 @@ I think I will need to ask Udana to re-copy and paste the abstracts with more ca
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-03/" />
<meta property="article:published_time" content="2019-03-01T12:16:30&#43;01:00"/>
<meta property="article:modified_time" content="2019-03-08T14:41:01&#43;02:00"/>
<meta property="article:modified_time" content="2019-03-09T23:01:50&#43;02:00"/>
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="March, 2019"/>
@ -55,9 +55,9 @@ I think I will need to ask Udana to re-copy and paste the abstracts with more ca
"@type": "BlogPosting",
"headline": "March, 2019",
"url": "https://alanorth.github.io/cgspace-notes/2019-03/",
"wordCount": "991",
"wordCount": "1180",
"datePublished": "2019-03-01T12:16:30&#43;01:00",
"dateModified": "2019-03-08T14:41:01&#43;02:00",
"dateModified": "2019-03-09T23:01:50&#43;02:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -275,6 +275,31 @@ UPDATE 44
<li>I ran the corrections on CGSpace and DSpace Test</li>
</ul>
<h2 id="2019-03-10">2019-03-10</h2>
<ul>
<li>Working on tagging IITA&rsquo;s items with their new research theme (<code>cg.identifier.iitatheme</code>) based on their existing IITA subjects (see <a href="/cgspace-notes/2018-02/">notes from 2019-02</a>)</li>
<li>I exported the entire IITA community from CGSpace and then used <code>csvcut</code> to extract only the needed fields:</li>
</ul>
<pre><code>$ csvcut -c 'id,cg.subject.iita,cg.subject.iita[],cg.subject.iita[en],cg.subject.iita[en_US]' ~/Downloads/10568-68616.csv &gt; /tmp/iita.csv
</code></pre>
<ul>
<li><p>After importing to OpenRefine I realized that tagging items based on their subjects is tricky because of the row/record mode of OpenRefine when you split the multi-value cells as well as the fact that some items might need to be tagged twice (thus needing a <code>||</code>)</p></li>
<li><p>I think it might actually be easier to filter by IITA subject, then by IITA theme (if needed), and then do transformations with some conditional values in GREL expressions like:</p></li>
</ul>
<pre><code>if(isBlank(value), 'PLANT PRODUCTION &amp; HEALTH', value + '||PLANT PRODUCTION &amp; HEALTH')
</code></pre>
<ul>
<li>Then it&rsquo;s more annoying because there are four IITA subject columns&hellip;</li>
<li>In total this would add research themes to 1,755 items</li>
<li>I want to double check one last time with Bosede that they would like to do this, because I also see that this will tag a few hundred items from the 1970s and 1980s</li>
</ul>
<!-- vim: set sw=2 ts=2: -->

View File

@ -4,7 +4,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/2019-03/</loc>
<lastmod>2019-03-08T14:41:01+02:00</lastmod>
<lastmod>2019-03-09T23:01:50+02:00</lastmod>
</url>
<url>
@ -214,7 +214,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2019-03-08T14:41:01+02:00</lastmod>
<lastmod>2019-03-09T23:01:50+02:00</lastmod>
<priority>0</priority>
</url>
@ -225,7 +225,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
<lastmod>2019-03-08T14:41:01+02:00</lastmod>
<lastmod>2019-03-09T23:01:50+02:00</lastmod>
<priority>0</priority>
</url>
@ -237,13 +237,13 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2019-03-08T14:41:01+02:00</lastmod>
<lastmod>2019-03-09T23:01:50+02:00</lastmod>
<priority>0</priority>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
<lastmod>2019-03-08T14:41:01+02:00</lastmod>
<lastmod>2019-03-09T23:01:50+02:00</lastmod>
<priority>0</priority>
</url>