mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2019-10-01
This commit is contained in:
31
content/posts/2019-10.md
Normal file
31
content/posts/2019-10.md
Normal file
@ -0,0 +1,31 @@
|
||||
---
|
||||
title: "October, 2019"
|
||||
date: 2019-10-01T13:20:51+03:00
|
||||
author: "Alan Orth"
|
||||
tags: ["Notes"]
|
||||
---
|
||||
|
||||
## 2019-10-01
|
||||
|
||||
- Udana from IWMI asked me for a CSV export of their community on CGSpace
|
||||
- I exported it, but a quick run through the `csv-metadata-quality` tool shows that there are some low-hanging fruits we can fix before I send him the data
|
||||
- I will limit the scope to the titles, regions, subregions, and river basins for now to manually fix some non-breaking spaces (U+00A0) there that would otherwise be removed by the csv-metadata-quality script's "unneccesary Unicode" fix:
|
||||
|
||||
```
|
||||
$ csvcut -c 'id,dc.title[en_US],cg.coverage.region[en_US],cg.coverage.subregion[en_US],cg.river.basin[en_US]' ~/Downloads/10568-16814.csv > /tmp/iwmi-title-region-subregion-river.csv
|
||||
```
|
||||
|
||||
- Then I replace them in vim with `:% s/\%u00a0/ /g` because I can't figure out the correct sed syntax to do it directly from the pipe above
|
||||
- I uploaded those to CGSpace and then re-exported the metadata
|
||||
- Now that I think about it, I shouldn't be removing non-breaking spaces (U+00A0), I should be replacing them with normal spaces!
|
||||
- I modified the script so it replaces the non-breaking spaces instead of removing them
|
||||
- Then I ran the csv-metadata-quality script to do some general cleanups (though I temporarily commented out the whitespace fixes because it was too many thousands of rows):
|
||||
|
||||
```
|
||||
$ csv-metadata-quality -i ~/Downloads/10568-16814.csv -o /tmp/iwmi.csv -x 'dc.date.issued,dc.date.issued[],dc.date.issued[en_US]' -u
|
||||
```
|
||||
|
||||
- That fixed 153 items (unnecessary Unicode, duplicates, comma–space fixes, etc)
|
||||
- Release [version 0.3.1 of the csv-metadata-quality script](https://github.com/ilri/csv-metadata-quality/releases/tag/v0.3.1) with the non-breaking spaces change
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
Reference in New Issue
Block a user