mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2020-01-16
This commit is contained in:
@ -112,4 +112,22 @@ COPY 1325
|
||||
$ ./fix-metadata-values.py -i 2020-01-15-fix-8-ilri-subjects.csv -db dspace -u dspace -p 'fuuu' -f cg.subject.ilri -m 203 -t correct -d
|
||||
```
|
||||
|
||||
## 2020-01-16
|
||||
|
||||
- Extract a list of CIAT subjects from CGSpace for Elizabeth Arnaud from Bioversity:
|
||||
|
||||
```
|
||||
dspace=# \COPY (SELECT DISTINCT text_value as "cg.subject.ciat", count(*) FROM metadatavalue WHERE resource_type_id = 2 AND metadata_field_id = 122 GROUP BY text_value ORDER BY count DESC) to /tmp/2020-01-16-ciat-subjects.csv WITH CSV HEADER;
|
||||
COPY 35
|
||||
```
|
||||
|
||||
- Start examining the 175 IITA records that Bosede originally sent in October, 2019 (201907.xls)
|
||||
- We had delayed processing them because DSpace Test (linode19) was testing CG Core v2 implementation for the last few months
|
||||
- Sisay uploaded the records to DSpace Test as [IITA_201907_Jan13](https://dspacetest.cgiar.org/handle/10568/106567)
|
||||
- I started first with basic sanity checks using my csv-metadata-quality tool and found twenty-two items with extra whitespace, invalid multi-value separators, and duplicates, which means Sisay did not do any quality checking on the data
|
||||
- I corrected one invalid AGROVOC subject
|
||||
- Validate and normalize affiliations against our 2019-04 list using reconcile-csv and OpenRefine:
|
||||
- `$ lein run ~/src/git/DSpace/2019-04-08-affiliations.csv name id`
|
||||
- I always forget how to copy the reconciled values in OpenRefine, but you need to make a new colum and populate it using this GREL: `if(cell.recon.matched, cell.recon.match.name, value)`
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user