Add notes for 2019-03-10

This commit is contained in:
2019-03-10 19:34:34 +02:00
parent 5bb64b0e7d
commit d1501620cb
3 changed files with 53 additions and 8 deletions

View File

@ -120,5 +120,25 @@ UPDATE 44
- I ran the corrections on CGSpace and DSpace Test
## 2019-03-10
- Working on tagging IITA's items with their new research theme (`cg.identifier.iitatheme`) based on their existing IITA subjects (see [notes from 2019-02]({{< relref "2018-02.md" >}}))
- I exported the entire IITA community from CGSpace and then used `csvcut` to extract only the needed fields:
```
$ csvcut -c 'id,cg.subject.iita,cg.subject.iita[],cg.subject.iita[en],cg.subject.iita[en_US]' ~/Downloads/10568-68616.csv > /tmp/iita.csv
```
- After importing to OpenRefine I realized that tagging items based on their subjects is tricky because of the row/record mode of OpenRefine when you split the multi-value cells as well as the fact that some items might need to be tagged twice (thus needing a `||`)
- I think it might actually be easier to filter by IITA subject, then by IITA theme (if needed), and then do transformations with some conditional values in GREL expressions like:
```
if(isBlank(value), 'PLANT PRODUCTION & HEALTH', value + '||PLANT PRODUCTION & HEALTH')
```
- Then it's more annoying because there are four IITA subject columns...
- In total this would add research themes to 1,755 items
- I want to double check one last time with Bosede that they would like to do this, because I also see that this will tag a few hundred items from the 1970s and 1980s
<!-- vim: set sw=2 ts=2: -->