diff --git a/content/post/2018-03.md b/content/post/2018-03.md index 04eec8575..b23da319c 100644 --- a/content/post/2018-03.md +++ b/content/post/2018-03.md @@ -102,6 +102,15 @@ dspacetest=# select distinct text_lang from metadatavalue where resource_type_id (9 rows) ``` +- On second inspection it looks like `dc.description.provenance` fields use the text_lang "en" so that's probably why there are over 100,000 fields changed... +- If I skip that, there are about 2,000, which seems more reasonably like the amount of fields users have edited manually, or fucked up during CSV import, etc: + +``` +dspace=# update metadatavalue set text_lang='en_US' where resource_type_id=2 and text_lang in ('EN','En','en_','EN_US','en_U','eng'); +UPDATE 2309 +``` + +- I will apply this on CGSpace right now - In other news, I was playing with adding ORCID identifiers to a dump of CIAT's community via CSV in OpenRefine - Using a series of filters, flags, and GREL expressions to isolate items for a certain author, I figured out how to add ORCID identifiers to the `cg.creator.id` field - For example, a GREL expression in a custom text facet to get all items with `dc.contributor.author[en_US]` of a certain author with several name variations (this is how you use a logical OR in OpenRefine): diff --git a/docs/2018-03/index.html b/docs/2018-03/index.html index 0ebe541ad..ee6628f1e 100644 --- a/docs/2018-03/index.html +++ b/docs/2018-03/index.html @@ -20,7 +20,7 @@ Export a CSV of the IITA community metadata for Martin Mueller - + @@ -51,9 +51,9 @@ Export a CSV of the IITA community metadata for Martin Mueller "@type": "BlogPosting", "headline": "March, 2018", "url": "https://alanorth.github.io/cgspace-notes/2018-03/", - "wordCount": "709", + "wordCount": "780", "datePublished": "2018-03-02T16:07:54+02:00", - "dateModified": "2018-03-08T21:20:39+02:00", + "dateModified": "2018-03-08T21:29:37+02:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -229,6 +229,16 @@ dspacetest=# select distinct text_lang from metadatavalue where resource_type_id
dc.description.provenance
fields use the text_lang “en” so that’s probably why there are over 100,000 fields changed…dspace=# update metadatavalue set text_lang='en_US' where resource_type_id=2 and text_lang in ('EN','En','en_','EN_US','en_U','eng');
+UPDATE 2309
+
+
+cg.creator.id
fielddc.contributor.author[en_US]
of a certain author with several name variations (this is how you use a logical OR in OpenRefine):