From 982ed47d553dbe86ecef62baec5deae3c91ab5ec Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Tue, 12 Jun 2018 10:42:43 +0300 Subject: [PATCH] Update notes for 2018-06-12 --- content/posts/2018-06.md | 61 +++++++++++++++++++++++++++++++ docs/2018-06/index.html | 78 ++++++++++++++++++++++++++++++++++++++-- docs/sitemap.xml | 10 +++--- 3 files changed, 141 insertions(+), 8 deletions(-) diff --git a/content/posts/2018-06.md b/content/posts/2018-06.md index abbec36dc..1fcb1cbb5 100644 --- a/content/posts/2018-06.md +++ b/content/posts/2018-06.md @@ -127,3 +127,64 @@ Failed to startup the DSpace Service Manager: failure starting up spring service - I was curious to see if I could create a GREL for use with a custom text facet in Open Refine to find cells with two or more consecutive spaces - I always use the built-in trim and collapse transformations anyways, but this seems to work to find the offending cells: `isNotNull(value.match(/.*?\s{2,}.*?/))` - I wonder if I should start checking for "smart" quotes like ’ (hex 2019) + +## 2018-06-12 + +- Udana from IWMI asked about the OAI base URL for their community on CGSpace +- I think it should be this: https://cgspace.cgiar.org/oai/request?verb=ListRecords&metadataPrefix=oai_dc&set=com_10568_16814 +- The style sheet obfuscates the data, but if you look at the source it is all there, including information about pagination of results +- Regarding Udana's Book Chapters and Reports on DSpace Test last week, Abenet told him to fix some character encoding and CRP issues, then I told him I'd check them after that +- The latest batch of IITA's 200 records (based on Abenet's version `Mercy1805_AY.xls`) are now in the [IITA_Jan_9_II_Ab](https://dspacetest.cgiar.org/handle/10568/96071) collection +- So here are some corrections: + - use of Unicode smart quote (hex 2019) in countries and affiliations, for example "COTE D’IVOIRE" and "Institut d’Economic Rurale, Mali" + - inconsistencies in `cg.contributor.affiliation`: + - "Centro Internacional de Agricultura Tropical" and "Centro International de Agricultura Tropical" should use the English name of CIAT (International Center for Tropical Agriculture) + - "Institut International d'Agriculture Tropicale" should use the English name of IITA (International Institute of Tropical Agriculture) + - "East and Southern Africa Regional Center" and "Eastern and Southern Africa Regional Centre" + - "Institut de la Recherche Agronomique, Cameroon" and "Institut de Recherche Agronomique, Cameroon" + - "Institut des Recherches Agricoles du Bénin" and "Institut National des Recherche Agricoles du Benin" and "National Agricultural Research Institute, Benin" + - "Institute of Agronomic Research, Cameroon" and "Institute of Agronomy Research, Cameroon" + - "Rivers State University" and "Rivers State University of Science and Technology" + - "Universität Hannover" and "University of Hannover" + - inconsistencies in `cg.subject.iita`: + - "AMELIORATION DES PLANTES" and "AMÉLIORATION DES PLANTES" + - "PRODUCTION VEGETALE" and "PRODUCTION VÉGÉTALE" + - "CONTRÔLE DE MALADIES" and "CONTROLE DES MALADIES" + - "HANDLING, TRANSPORT, STORAGE AND PROTECTION OF AGRICULTURAL PRODUCT" and "HANDLING, TRANSPORT, STORAGE AND PROTECTION OF AGRICULTURAL PRODUCTS" + - "RAVAGEURS DE PLANTES" and "RAVAGEURS DES PLANTES" + - "SANTE DES PLANTES" and "SANTÉ DES PLANTES" + - "SOCIOECONOMIE" and "SOCIOECONOMY" + - inconsistencies in `dc.description.sponsorship`: + - "Belgian Corporation" and "Belgium Corporation" + - inconsistencies in `dc.subject`: + - "AFRICAN CASSAVA MOSAIC" and "AFRICAN CASSAVA MOSAIC DISEASE" + - "ASPERGILLU FLAVUS" and "ASPERGILLUS FLAVUS" + - "BIOTECHNOLOGIES" and "BIOTECHNOLOGY" + - "CASSAVA MOSAIC DISEASE" and "CASSAVA MOSAIC DISEASES" and "CASSAVA MOSAIC VIRUS" + - "CASSAVA PROCESSING" and "CASSAVA PROCESSING TECHNOLOGY" + - "CROPPING SYSTEM" and "CROPPING SYSTEMS" + - "DRY SEASON" and "DRY-SEASON" + - "FERTILIZER" and "FERTILIZERS" + - "LEGUME" and "LEGUMES" + - "LEGUMINOSAE" and "LEGUMINOUS" + - "LEGUMINOUS COVER CROP" and "LEGUMINOUS COVER CROPS" + - "MATÉRIEL DE PLANTATION" and "MATÉRIELS DE PLANTATION" + - I noticed that some records do have encoding errors in the `dc.description.abstract` field, but only four of them so probably not from Abenet's handling of the XLS file + - Based on manually eyeballing the text I used a custom text facet with this GREL to identify the records: + +``` +or( + value.contains('€'), + value.contains('6g'), + value.contains('6m'), + value.contains('6d'), + value.contains('6e') +) +``` + - So IITA should double check the abstracts for these: + - https://dspacetest.cgiar.org/10568/96184 + - https://dspacetest.cgiar.org/10568/96141 + - https://dspacetest.cgiar.org/10568/96118 + - https://dspacetest.cgiar.org/10568/96113 + +# vim: set sw=2 ts=2: diff --git a/docs/2018-06/index.html b/docs/2018-06/index.html index d01930120..16820668a 100644 --- a/docs/2018-06/index.html +++ b/docs/2018-06/index.html @@ -41,7 +41,7 @@ sys 2m7.289s - + @@ -93,9 +93,9 @@ sys 2m7.289s "@type": "BlogPosting", "headline": "June, 2018", "url": "https://alanorth.github.io/cgspace-notes/2018-06/", - "wordCount": "1025", + "wordCount": "1462", "datePublished": "2018-06-04T19:49:54-07:00", - "dateModified": "2018-06-10T19:32:12+03:00", + "dateModified": "2018-06-11T15:21:14+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -312,6 +312,78 @@ Failed to startup the DSpace Service Manager: failure starting up spring service
  • I wonder if I should start checking for “smart” quotes like ’ (hex 2019)
  • +

    2018-06-12

    + + + +
    or(
    +  value.contains('€'),
    +  value.contains('6g'),
    +  value.contains('6m'),
    +  value.contains('6d'),
    +  value.contains('6e')
    +)
    +
    + + + +

    vim: set sw=2 ts=2:

    + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 412e6fb16..16cf360b9 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,7 +4,7 @@ https://alanorth.github.io/cgspace-notes/2018-06/ - 2018-06-10T19:32:12+03:00 + 2018-06-11T15:21:14+03:00 @@ -169,7 +169,7 @@ https://alanorth.github.io/cgspace-notes/ - 2018-06-10T19:32:12+03:00 + 2018-06-11T15:21:14+03:00 0 @@ -180,7 +180,7 @@ https://alanorth.github.io/cgspace-notes/tags/notes/ - 2018-06-10T19:32:12+03:00 + 2018-06-11T15:21:14+03:00 0 @@ -192,13 +192,13 @@ https://alanorth.github.io/cgspace-notes/posts/ - 2018-06-10T19:32:12+03:00 + 2018-06-11T15:21:14+03:00 0 https://alanorth.github.io/cgspace-notes/tags/ - 2018-06-10T19:32:12+03:00 + 2018-06-11T15:21:14+03:00 0