From a83e2385417632f71c28c604c639c174e3b2e397 Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Mon, 10 Jun 2019 17:17:58 +0300 Subject: [PATCH] Update notes for 2019-06-10 --- content/posts/2019-06.md | 18 ++++++++++++++++++ docs/sitemap.xml | 10 +++++----- 2 files changed, 23 insertions(+), 5 deletions(-) diff --git a/content/posts/2019-06.md b/content/posts/2019-06.md index 5bc325c2d..ba07bc83b 100644 --- a/content/posts/2019-06.md +++ b/content/posts/2019-06.md @@ -56,5 +56,23 @@ statistics-2018: org.apache.solr.common.SolrException:org.apache.solr.common.Sol - Rename the AReS repository on GitHub to OpenRXV: https://github.com/ilri/OpenRXV - Create a new AReS repository: https://github.com/ilri/AReS +- Start looking at the 203 IITA records on DSpace Test from last month ([IITA_May_16](https://dspacetest.cgiar.org/handle/10568/102032) aka "20194th.xls") using OpenRefine + - Trim leading, trailing, and consecutive whitespace on all columns, but I didn't notice very many issues + - Validate affiliations against latest list of top 1500 terms using reconcile-csv, correcting and standardizing about twenty-seven + - Validate countries against latest list of countries using reconcile-csv, correcting three + - Convert all DOIs to "https://dx.doi.org" format + - Normalize all `cg.identifier.url` Google book fields to "books.google.com" + - Correct some inconsistencies in IITA subjects + - Correct two incorrect "Peer Review" in `dc.description.version` + - About fifteen items have incorrect ISBNs (looks like an Excel error because the values look like scientific numbers) + - I managed to get to subjects, so I'll continue from there when I start working next +- Generate a new list of countries from the database for use with reconcile-csv + - After dumping, use csvcut to add line numbers, then change the csv header to match those you use in reconcile-csv, for example `id` and `name`: + +``` +dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = 228 AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC) to /tmp/countries.csv WITH CSV HEADER +COPY 192 +$ csvcut -l -c 0 /tmp/countries.csv > 2019-06-10-countries.csv +``` diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 821994944..2afa2d444 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,30 +4,30 @@ https://alanorth.github.io/cgspace-notes/ - 2019-06-07T15:01:38+03:00 + 2019-06-10T12:58:33+03:00 0 https://alanorth.github.io/cgspace-notes/2019-05/ - 2019-06-07T15:01:38+03:00 + 2019-06-10T12:58:33+03:00 https://alanorth.github.io/cgspace-notes/tags/notes/ - 2019-06-07T15:01:38+03:00 + 2019-06-10T12:58:33+03:00 0 https://alanorth.github.io/cgspace-notes/posts/ - 2019-06-07T15:01:38+03:00 + 2019-06-10T12:58:33+03:00 0 https://alanorth.github.io/cgspace-notes/tags/ - 2019-06-07T15:01:38+03:00 + 2019-06-10T12:58:33+03:00 0