diff --git a/content/posts/2019-03.md b/content/posts/2019-03.md index 28738375b..da0b805f9 100644 --- a/content/posts/2019-03.md +++ b/content/posts/2019-03.md @@ -120,5 +120,25 @@ UPDATE 44 - I ran the corrections on CGSpace and DSpace Test +## 2019-03-10 + +- Working on tagging IITA's items with their new research theme (`cg.identifier.iitatheme`) based on their existing IITA subjects (see [notes from 2019-02]({{< relref "2018-02.md" >}})) +- I exported the entire IITA community from CGSpace and then used `csvcut` to extract only the needed fields: + +``` +$ csvcut -c 'id,cg.subject.iita,cg.subject.iita[],cg.subject.iita[en],cg.subject.iita[en_US]' ~/Downloads/10568-68616.csv > /tmp/iita.csv +``` + +- After importing to OpenRefine I realized that tagging items based on their subjects is tricky because of the row/record mode of OpenRefine when you split the multi-value cells as well as the fact that some items might need to be tagged twice (thus needing a `||`) + +- I think it might actually be easier to filter by IITA subject, then by IITA theme (if needed), and then do transformations with some conditional values in GREL expressions like: + +``` +if(isBlank(value), 'PLANT PRODUCTION & HEALTH', value + '||PLANT PRODUCTION & HEALTH') +``` + +- Then it's more annoying because there are four IITA subject columns... +- In total this would add research themes to 1,755 items +- I want to double check one last time with Bosede that they would like to do this, because I also see that this will tag a few hundred items from the 1970s and 1980s diff --git a/docs/2019-03/index.html b/docs/2019-03/index.html index f29f45b9e..2588e02fc 100644 --- a/docs/2019-03/index.html +++ b/docs/2019-03/index.html @@ -25,7 +25,7 @@ I think I will need to ask Udana to re-copy and paste the abstracts with more ca - + @@ -55,9 +55,9 @@ I think I will need to ask Udana to re-copy and paste the abstracts with more ca "@type": "BlogPosting", "headline": "March, 2019", "url": "https://alanorth.github.io/cgspace-notes/2019-03/", - "wordCount": "991", + "wordCount": "1180", "datePublished": "2019-03-01T12:16:30+01:00", - "dateModified": "2019-03-08T14:41:01+02:00", + "dateModified": "2019-03-09T23:01:50+02:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -275,6 +275,31 @@ UPDATE 44
  • I ran the corrections on CGSpace and DSpace Test
  • +

    2019-03-10

    + + + +
    $ csvcut -c 'id,cg.subject.iita,cg.subject.iita[],cg.subject.iita[en],cg.subject.iita[en_US]' ~/Downloads/10568-68616.csv > /tmp/iita.csv
    +
    + + + +
    if(isBlank(value), 'PLANT PRODUCTION & HEALTH', value + '||PLANT PRODUCTION & HEALTH')
    +
    + + + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 49f05995c..150218d01 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,7 +4,7 @@ https://alanorth.github.io/cgspace-notes/2019-03/ - 2019-03-08T14:41:01+02:00 + 2019-03-09T23:01:50+02:00 @@ -214,7 +214,7 @@ https://alanorth.github.io/cgspace-notes/ - 2019-03-08T14:41:01+02:00 + 2019-03-09T23:01:50+02:00 0 @@ -225,7 +225,7 @@ https://alanorth.github.io/cgspace-notes/tags/notes/ - 2019-03-08T14:41:01+02:00 + 2019-03-09T23:01:50+02:00 0 @@ -237,13 +237,13 @@ https://alanorth.github.io/cgspace-notes/posts/ - 2019-03-08T14:41:01+02:00 + 2019-03-09T23:01:50+02:00 0 https://alanorth.github.io/cgspace-notes/tags/ - 2019-03-08T14:41:01+02:00 + 2019-03-09T23:01:50+02:00 0