diff --git a/content/posts/2019-06.md b/content/posts/2019-06.md index 80b719acb..b18ba70de 100644 --- a/content/posts/2019-06.md +++ b/content/posts/2019-06.md @@ -132,4 +132,25 @@ UPDATE 2 - Upload 202 IITA records from earlier this month (20194th.xls) to CGSpace - Communicate with Bioversity contractor in charge of their migration from Typo3 to CGSpace - +## 2019-06-28 + +- Start looking at the fifty-seven AfricaRice records sent by Ibnou earlier this month + - First, I see there are several items with type "Book" and "Book Chapter" should go in an "AfricaRice books and book chapters" collection, but none exists in the AfricaRice community + - Trim and collapse consecutive whitespace on author, affiliation, authorship types, title, subjects, doi, issn, source, citation, country, sponsors + - Standardize and correct affiliations like "Africa Rice Cente" and "Africa Rice Centre", including syntax errors with multi-value separators + - Lots of variation in affiliations, for example: + - Université Abomey-Calavi + - Université d'Abomey + - Université d'Abomey Calavi + - Université d'Abomey-Calavi + - University of Abomey-Calavi + - Validate and normalize affiliations against our 2019-04 list using reconcile-csv and OpenRefine: + - `$ lein run ~/src/git/DSpace/2019-04-08-affiliations.csv name id` + - I always forget how to copy the reconciled values in OpenRefine, but you need to make a new colume and populate it using this GREL: `if(cell.recon.matched, cell.recon.match.name, value)` + - Replace smart quotes with standard ASCII ones + - Fix typos in authoriship types + - Validate and normalize subjects against our 2019-06 list using reconcile-csv and OpenRefine: + - `$ lein run ~/src/git/DSpace/2019-06-10-subjects-matched.csv name id` + - Also add about 30 new AGROVOC subjects to our list that I verified manually + - There is one duplicate, both have the same DOI: https://doi.org/10.1016/j.agwat.2018.06.018 + - Fix four ISBNs that were in the ISSN field diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 7109c07a3..12410230e 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,30 +4,30 @@ https://alanorth.github.io/cgspace-notes/ - 2019-06-25T20:10:57+03:00 + 2019-06-25T21:00:27+03:00 0 https://alanorth.github.io/cgspace-notes/2019-05/ - 2019-06-25T20:10:57+03:00 + 2019-06-25T21:00:27+03:00 https://alanorth.github.io/cgspace-notes/tags/notes/ - 2019-06-25T20:10:57+03:00 + 2019-06-25T21:00:27+03:00 0 https://alanorth.github.io/cgspace-notes/posts/ - 2019-06-25T20:10:57+03:00 + 2019-06-25T21:00:27+03:00 0 https://alanorth.github.io/cgspace-notes/tags/ - 2019-06-25T20:10:57+03:00 + 2019-06-25T21:00:27+03:00 0