diff --git a/content/posts/2019-03.md b/content/posts/2019-03.md index 28738375b..da0b805f9 100644 --- a/content/posts/2019-03.md +++ b/content/posts/2019-03.md @@ -120,5 +120,25 @@ UPDATE 44 - I ran the corrections on CGSpace and DSpace Test +## 2019-03-10 + +- Working on tagging IITA's items with their new research theme (`cg.identifier.iitatheme`) based on their existing IITA subjects (see [notes from 2019-02]({{< relref "2018-02.md" >}})) +- I exported the entire IITA community from CGSpace and then used `csvcut` to extract only the needed fields: + +``` +$ csvcut -c 'id,cg.subject.iita,cg.subject.iita[],cg.subject.iita[en],cg.subject.iita[en_US]' ~/Downloads/10568-68616.csv > /tmp/iita.csv +``` + +- After importing to OpenRefine I realized that tagging items based on their subjects is tricky because of the row/record mode of OpenRefine when you split the multi-value cells as well as the fact that some items might need to be tagged twice (thus needing a `||`) + +- I think it might actually be easier to filter by IITA subject, then by IITA theme (if needed), and then do transformations with some conditional values in GREL expressions like: + +``` +if(isBlank(value), 'PLANT PRODUCTION & HEALTH', value + '||PLANT PRODUCTION & HEALTH') +``` + +- Then it's more annoying because there are four IITA subject columns... +- In total this would add research themes to 1,755 items +- I want to double check one last time with Bosede that they would like to do this, because I also see that this will tag a few hundred items from the 1970s and 1980s diff --git a/docs/2019-03/index.html b/docs/2019-03/index.html index f29f45b9e..2588e02fc 100644 --- a/docs/2019-03/index.html +++ b/docs/2019-03/index.html @@ -25,7 +25,7 @@ I think I will need to ask Udana to re-copy and paste the abstracts with more ca - + @@ -55,9 +55,9 @@ I think I will need to ask Udana to re-copy and paste the abstracts with more ca "@type": "BlogPosting", "headline": "March, 2019", "url": "https://alanorth.github.io/cgspace-notes/2019-03/", - "wordCount": "991", + "wordCount": "1180", "datePublished": "2019-03-01T12:16:30+01:00", - "dateModified": "2019-03-08T14:41:01+02:00", + "dateModified": "2019-03-09T23:01:50+02:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -275,6 +275,31 @@ UPDATE 44
cg.identifier.iitatheme
) based on their existing IITA subjects (see notes from 2019-02)csvcut
to extract only the needed fields:$ csvcut -c 'id,cg.subject.iita,cg.subject.iita[],cg.subject.iita[en],cg.subject.iita[en_US]' ~/Downloads/10568-68616.csv > /tmp/iita.csv
+
+
+After importing to OpenRefine I realized that tagging items based on their subjects is tricky because of the row/record mode of OpenRefine when you split the multi-value cells as well as the fact that some items might need to be tagged twice (thus needing a ||
)
I think it might actually be easier to filter by IITA subject, then by IITA theme (if needed), and then do transformations with some conditional values in GREL expressions like:
if(isBlank(value), 'PLANT PRODUCTION & HEALTH', value + '||PLANT PRODUCTION & HEALTH')
+
+
+