mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Update notes for 2020-07-08
This commit is contained in:
@ -337,4 +337,23 @@ dc.contributor.author,correction
|
||||
|
||||

|
||||
|
||||
- I wrote a quick script to lookup organizations (affiliations) in the Research Organization Repository (ROR) JSON data release v5
|
||||
- I want to use this to evaluate ROR as a controlled vocabulary for CGSpace and MELSpace
|
||||
- I exported a list of affiliations from CGSpace:
|
||||
|
||||
```
|
||||
dspace=# \COPY (SELECT DISTINCT text_value as "cg.contributor.affiliation", count(*) FROM metadatavalue WHERE resource_type_id = 2 AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC) to /tmp/2020-07-08-affiliations.csv WITH CSV HEADER;
|
||||
```
|
||||
|
||||
- Then I stripped the header and quotes to make it a plain text file and ran `ror-lookup.py`:
|
||||
|
||||
```
|
||||
$ ./ror-lookup.py -i /tmp/2020-07-08-affiliations.txt -r ror.json -o 2020-07-08-affiliations-ror.csv -d
|
||||
$ ./ror-lookup.py -i /tmp/2020-07-08-affiliations.txt -r ror.json -o 2020-07-08-affiliations-ror.csv -d
|
||||
$ csvgrep -c 2 -m true 2020-07-08-affiliations-ror.csv | wc -l
|
||||
1378
|
||||
$ csvgrep -c 2 -m false 2020-07-08-affiliations-ror.csv | wc -l
|
||||
4490
|
||||
```
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user