Update

2025-01-27 05:49:12 +01:00 · 2018-03-08 17:32:38 +02:00
parent 0bd871a13a
commit 79c025af88
3 changed files with 137 additions and 8 deletions
--- a/content/post/2018-03.md
+++ b/content/post/2018-03.md
@@ -56,3 +56,65 @@ UPDATE 1
 - Help Sisay proof 200 IITA records on DSpace Test
 - Finally import Udana's 24 items to [IWMI Journal Articles](https://cgspace.cgiar.org/handle/10568/36185) on CGSpace
 - Skype with James Stapleton to discuss CGSpace, ILRI website, CKM staff issues, etc
+
+## 2018-03-08
+
+- Looking at a CSV dump of the CIAT community I see there are tons of stupid text languages people add for their metadata
+- This makes the CSV have tons of columns, for example `dc.title`, `dc.title[]`, `dc.title[en]`, `dc.title[eng]`, `dc.title[en_US]` and so on!
+- I think I can fix — or at least normalize — them in the database:
+
+```
+dspace=# select distinct text_lang from metadatavalue where resource_type_id=2;
+ text_lang 
+-----------
+ 
+ ethnob
+ en
+ spa
+ EN
+ En
+ en_
+ en_US
+ E.
+ 
+ EN_US
+ en_U
+ eng
+ fr
+ es_ES
+ es
+(16 rows)
+
+dspace=# update metadatavalue set text_lang='en_US' where resource_type_id=2 and text_lang in ('en','EN','En','en_','EN_US','en_U','eng');
+UPDATE 122227
+dspacetest=# select distinct text_lang from metadatavalue where resource_type_id=2;
+ text_lang
+-----------
+
+ ethnob
+ en_US
+ spa
+ E.
+
+ fr
+ es_ES
+ es
+(9 rows)
+```
+
+- In other news, I was playing with adding ORCID identifiers to a dump of CIAT's community via CSV in OpenRefine
+- Using a series of filters, flags, and GREL expressions to isolate items for a certain author, I figured out how to add ORCID identifiers to the `cg.creator.id` field
+- For example, a GREL expression in a custom text facet to get all items with `dc.contributor.author[en_US]` of a certain author with several name variations (this is how you use a logical OR in OpenRefine):
+
+```
+or(value.contains('Ceballos, Hern'), value.contains('Hernández Ceballos'))
+```
+
+- Then you can flag or star matching items and then use a conditional to either set the value directly or add it to an existing value:
+
+```
+if(isBlank(value), "Hernan Ceballos: 0000-0002-8744-7918", value + "||Hernan Ceballos: 0000-0002-8744-7918")
+```
+
+- One thing that bothers me is that this won't honor author order
+- It might be better to do batches of these in PostgreSQL with a script that takes the `place` column of an author into account when setting the `cg.creator.id`