From 7be53639dccf007e3e2b632311af5920799ebdad Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Fri, 16 Aug 2024 19:57:30 -0700 Subject: [PATCH] Add content/posts/2024-08.md --- content/posts/2024-08.md | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) create mode 100644 content/posts/2024-08.md diff --git a/content/posts/2024-08.md b/content/posts/2024-08.md new file mode 100644 index 000000000..0dc849558 --- /dev/null +++ b/content/posts/2024-08.md @@ -0,0 +1,39 @@ +--- +title: "August, 2024" +date: 2024-08-08T23:07:00-07:00 +author: "Alan Orth" +categories: ["Notes"] +--- + +## 2024-08-08 + +- While working on the CGIAR Climate Change Synthesis I learned some new tricks with OpenRefine + + + +- The first was to retrieve affiliations from OpenAlex and extract them from JSON with this GREL: + +``` +forEach( + value.parseJson()['authorships'], + a, + forEach( + a.parseJson()['institutions'], + i, + i['display_name'] + ).join("||") +).join("||") +``` + +- It is a nested `forEach` to extract all institutions for all authors +- Second was a better way to deduplicate lists in Jython while preserving list order: + +```python +# better dedupe preserves order +seen = set() +deduped_list = [x for x in value.split("||") if x not in seen and not seen.add(x)] + +return "||".join(deduped_list) +``` + +