Add notes for 2021-09-24

2025-01-27 05:49:12 +01:00 · 2021-09-24 14:24:00 +03:00
parent 992c58601f
commit bbf478c410
26 changed files with 127 additions and 31 deletions
--- a/content/posts/2021-09.md
+++ b/content/posts/2021-09.md
@ -241,4 +241,51 @@ localhost/dspace63= > \COPY (SELECT collection_id,uuid FROM collection WHERE col
 COPY 1139
 ```

+## 2021-09-24
+
+- Peter and Abenet agreed that we should consider converting more of our UPPER CASE metadata values to Title Case
+  - It seems that these fields are all still using UPPER CASE:
+    - cg.subject.alliancebiovciat
+    - cg.species.breed
+    - cg.subject.bioversity
+    - cg.subject.ccafs
+    - cg.subject.ciat
+    - cg.subject.cip
+    - cg.identifier.iitatheme
+    - cg.subject.iita
+    - cg.subject.ilri
+    - cg.subject.pabra
+    - cg.river.basin
+    - cg.coverage.subregion (done)
+    - dcterms.audience (done)
+    - cg.subject.wle
+  - We can do some of these without even asking anyone, for example `cg.coverage.subregion`, `cg.river.basin`, and `dcterms.audience`
+- First, I will look at `cg.coverage.subregion`
+  - These should ideally come from ISO 3166-2 subdivisions
+  - I will sentence case them and then create a controlled vocabulary from those that are matching (and worry about cleaning the rest up later)
+
+```console
+localhost/dspace63= > UPDATE metadatavalue SET text_value=INITCAP(text_value) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=231;
+UPDATE 2903
+localhost/dspace63= > \COPY (SELECT DISTINCT text_value as "cg.coverage.subregion" FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 231) to /tmp/2021-09-24-subregions.txt;
+COPY 1200
+```
+
+- Then I process the list for matches with my `subdivision-lookup.py` script, and extract only the values that matched:
+
+```console
+$ ./ilri/subdivision-lookup.py -i /tmp/2021-09-24-subregions.txt -o /tmp/subregions.csv
+$ csvgrep -c matched -m 'true' /tmp/subregions.csv | csvcut -c 1 | sed 1d > /tmp/subregions-matched.txt
+$ wc -l /tmp/subregions-matched.txt 
+81 /tmp/subregions-matched.txt
+```
+
+- Then I updated the controlled vocabulary in the submission forms
+- I did the same for `dcterms.audience`, taking special care to a few all-caps values:
+
+```console
+localhost/dspace63= > UPDATE metadatavalue SET text_value=INITCAP(text_value) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=144 AND text_value != 'NGOS' AND text_value != 'CGIAR';
+localhost/dspace63= > UPDATE metadatavalue SET text_value='NGOs' WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=144 AND text_value = 'NGOS';
+```
+
 <!-- vim: set sw=2 ts=2: -->