Update notes

2025-01-27 05:49:12 +01:00 · 2018-03-08 22:47:12 +02:00
parent 8c2e314038
commit 2e5c0e3fed
3 changed files with 27 additions and 8 deletions
--- a/content/post/2018-03.md
+++ b/content/post/2018-03.md
@@ -102,6 +102,15 @@ dspacetest=# select distinct text_lang from metadatavalue where resource_type_id
 (9 rows)
 ```

+- On second inspection it looks like `dc.description.provenance` fields use the text_lang "en" so that's probably why there are over 100,000 fields changed...
+- If I skip that, there are about 2,000, which seems more reasonably like the amount of fields users have edited manually, or fucked up during CSV import, etc:
+
+```
+dspace=# update metadatavalue set text_lang='en_US' where resource_type_id=2 and text_lang in ('EN','En','en_','EN_US','en_U','eng');
+UPDATE 2309
+```
+
+- I will apply this on CGSpace right now
 - In other news, I was playing with adding ORCID identifiers to a dump of CIAT's community via CSV in OpenRefine
 - Using a series of filters, flags, and GREL expressions to isolate items for a certain author, I figured out how to add ORCID identifiers to the `cg.creator.id` field
 - For example, a GREL expression in a custom text facet to get all items with `dc.contributor.author[en_US]` of a certain author with several name variations (this is how you use a logical OR in OpenRefine):