mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2019-03-10
This commit is contained in:
@ -120,5 +120,25 @@ UPDATE 44
|
||||
|
||||
- I ran the corrections on CGSpace and DSpace Test
|
||||
|
||||
## 2019-03-10
|
||||
|
||||
- Working on tagging IITA's items with their new research theme (`cg.identifier.iitatheme`) based on their existing IITA subjects (see [notes from 2019-02]({{< relref "2018-02.md" >}}))
|
||||
- I exported the entire IITA community from CGSpace and then used `csvcut` to extract only the needed fields:
|
||||
|
||||
```
|
||||
$ csvcut -c 'id,cg.subject.iita,cg.subject.iita[],cg.subject.iita[en],cg.subject.iita[en_US]' ~/Downloads/10568-68616.csv > /tmp/iita.csv
|
||||
```
|
||||
|
||||
- After importing to OpenRefine I realized that tagging items based on their subjects is tricky because of the row/record mode of OpenRefine when you split the multi-value cells as well as the fact that some items might need to be tagged twice (thus needing a `||`)
|
||||
|
||||
- I think it might actually be easier to filter by IITA subject, then by IITA theme (if needed), and then do transformations with some conditional values in GREL expressions like:
|
||||
|
||||
```
|
||||
if(isBlank(value), 'PLANT PRODUCTION & HEALTH', value + '||PLANT PRODUCTION & HEALTH')
|
||||
```
|
||||
|
||||
- Then it's more annoying because there are four IITA subject columns...
|
||||
- In total this would add research themes to 1,755 items
|
||||
- I want to double check one last time with Bosede that they would like to do this, because I also see that this will tag a few hundred items from the 1970s and 1980s
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user