mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-22 14:45:03 +01:00
Add notes for 2019-03-10
This commit is contained in:
parent
5bb64b0e7d
commit
d1501620cb
@ -120,5 +120,25 @@ UPDATE 44
|
||||
|
||||
- I ran the corrections on CGSpace and DSpace Test
|
||||
|
||||
## 2019-03-10
|
||||
|
||||
- Working on tagging IITA's items with their new research theme (`cg.identifier.iitatheme`) based on their existing IITA subjects (see [notes from 2019-02]({{< relref "2018-02.md" >}}))
|
||||
- I exported the entire IITA community from CGSpace and then used `csvcut` to extract only the needed fields:
|
||||
|
||||
```
|
||||
$ csvcut -c 'id,cg.subject.iita,cg.subject.iita[],cg.subject.iita[en],cg.subject.iita[en_US]' ~/Downloads/10568-68616.csv > /tmp/iita.csv
|
||||
```
|
||||
|
||||
- After importing to OpenRefine I realized that tagging items based on their subjects is tricky because of the row/record mode of OpenRefine when you split the multi-value cells as well as the fact that some items might need to be tagged twice (thus needing a `||`)
|
||||
|
||||
- I think it might actually be easier to filter by IITA subject, then by IITA theme (if needed), and then do transformations with some conditional values in GREL expressions like:
|
||||
|
||||
```
|
||||
if(isBlank(value), 'PLANT PRODUCTION & HEALTH', value + '||PLANT PRODUCTION & HEALTH')
|
||||
```
|
||||
|
||||
- Then it's more annoying because there are four IITA subject columns...
|
||||
- In total this would add research themes to 1,755 items
|
||||
- I want to double check one last time with Bosede that they would like to do this, because I also see that this will tag a few hundred items from the 1970s and 1980s
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
@ -25,7 +25,7 @@ I think I will need to ask Udana to re-copy and paste the abstracts with more ca
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-03/" />
|
||||
<meta property="article:published_time" content="2019-03-01T12:16:30+01:00"/>
|
||||
<meta property="article:modified_time" content="2019-03-08T14:41:01+02:00"/>
|
||||
<meta property="article:modified_time" content="2019-03-09T23:01:50+02:00"/>
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="March, 2019"/>
|
||||
@ -55,9 +55,9 @@ I think I will need to ask Udana to re-copy and paste the abstracts with more ca
|
||||
"@type": "BlogPosting",
|
||||
"headline": "March, 2019",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2019-03/",
|
||||
"wordCount": "991",
|
||||
"wordCount": "1180",
|
||||
"datePublished": "2019-03-01T12:16:30+01:00",
|
||||
"dateModified": "2019-03-08T14:41:01+02:00",
|
||||
"dateModified": "2019-03-09T23:01:50+02:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -275,6 +275,31 @@ UPDATE 44
|
||||
<li>I ran the corrections on CGSpace and DSpace Test</li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2019-03-10">2019-03-10</h2>
|
||||
|
||||
<ul>
|
||||
<li>Working on tagging IITA’s items with their new research theme (<code>cg.identifier.iitatheme</code>) based on their existing IITA subjects (see <a href="/cgspace-notes/2018-02/">notes from 2019-02</a>)</li>
|
||||
<li>I exported the entire IITA community from CGSpace and then used <code>csvcut</code> to extract only the needed fields:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>$ csvcut -c 'id,cg.subject.iita,cg.subject.iita[],cg.subject.iita[en],cg.subject.iita[en_US]' ~/Downloads/10568-68616.csv > /tmp/iita.csv
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li><p>After importing to OpenRefine I realized that tagging items based on their subjects is tricky because of the row/record mode of OpenRefine when you split the multi-value cells as well as the fact that some items might need to be tagged twice (thus needing a <code>||</code>)</p></li>
|
||||
|
||||
<li><p>I think it might actually be easier to filter by IITA subject, then by IITA theme (if needed), and then do transformations with some conditional values in GREL expressions like:</p></li>
|
||||
</ul>
|
||||
|
||||
<pre><code>if(isBlank(value), 'PLANT PRODUCTION & HEALTH', value + '||PLANT PRODUCTION & HEALTH')
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>Then it’s more annoying because there are four IITA subject columns…</li>
|
||||
<li>In total this would add research themes to 1,755 items</li>
|
||||
<li>I want to double check one last time with Bosede that they would like to do this, because I also see that this will tag a few hundred items from the 1970s and 1980s</li>
|
||||
</ul>
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
||||
|
||||
|
@ -4,7 +4,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2019-03/</loc>
|
||||
<lastmod>2019-03-08T14:41:01+02:00</lastmod>
|
||||
<lastmod>2019-03-09T23:01:50+02:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
@ -214,7 +214,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||
<lastmod>2019-03-08T14:41:01+02:00</lastmod>
|
||||
<lastmod>2019-03-09T23:01:50+02:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
@ -225,7 +225,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||
<lastmod>2019-03-08T14:41:01+02:00</lastmod>
|
||||
<lastmod>2019-03-09T23:01:50+02:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
@ -237,13 +237,13 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||
<lastmod>2019-03-08T14:41:01+02:00</lastmod>
|
||||
<lastmod>2019-03-09T23:01:50+02:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||
<lastmod>2019-03-08T14:41:01+02:00</lastmod>
|
||||
<lastmod>2019-03-09T23:01:50+02:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user