Update notes for 2018-06-12

This commit is contained in:
Alan Orth 2018-06-12 10:42:43 +03:00
parent a8715c203c
commit 982ed47d55
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
3 changed files with 141 additions and 8 deletions

View File

@ -127,3 +127,64 @@ Failed to startup the DSpace Service Manager: failure starting up spring service
- I was curious to see if I could create a GREL for use with a custom text facet in Open Refine to find cells with two or more consecutive spaces
- I always use the built-in trim and collapse transformations anyways, but this seems to work to find the offending cells: `isNotNull(value.match(/.*?\s{2,}.*?/))`
- I wonder if I should start checking for "smart" quotes like (hex 2019)
## 2018-06-12
- Udana from IWMI asked about the OAI base URL for their community on CGSpace
- I think it should be this: https://cgspace.cgiar.org/oai/request?verb=ListRecords&metadataPrefix=oai_dc&set=com_10568_16814
- The style sheet obfuscates the data, but if you look at the source it is all there, including information about pagination of results
- Regarding Udana's Book Chapters and Reports on DSpace Test last week, Abenet told him to fix some character encoding and CRP issues, then I told him I'd check them after that
- The latest batch of IITA's 200 records (based on Abenet's version `Mercy1805_AY.xls`) are now in the [IITA_Jan_9_II_Ab](https://dspacetest.cgiar.org/handle/10568/96071) collection
- So here are some corrections:
- use of Unicode smart quote (hex 2019) in countries and affiliations, for example "COTE DIVOIRE" and "Institut dEconomic Rurale, Mali"
- inconsistencies in `cg.contributor.affiliation`:
- "Centro Internacional de Agricultura Tropical" and "Centro International de Agricultura Tropical" should use the English name of CIAT (International Center for Tropical Agriculture)
- "Institut International d'Agriculture Tropicale" should use the English name of IITA (International Institute of Tropical Agriculture)
- "East and Southern Africa Regional Center" and "Eastern and Southern Africa Regional Centre"
- "Institut de la Recherche Agronomique, Cameroon" and "Institut de Recherche Agronomique, Cameroon"
- "Institut des Recherches Agricoles du Bénin" and "Institut National des Recherche Agricoles du Benin" and "National Agricultural Research Institute, Benin"
- "Institute of Agronomic Research, Cameroon" and "Institute of Agronomy Research, Cameroon"
- "Rivers State University" and "Rivers State University of Science and Technology"
- "Universität Hannover" and "University of Hannover"
- inconsistencies in `cg.subject.iita`:
- "AMELIORATION DES PLANTES" and "AMÉLIORATION DES PLANTES"
- "PRODUCTION VEGETALE" and "PRODUCTION VÉGÉTALE"
- "CONTRÔLE DE MALADIES" and "CONTROLE DES MALADIES"
- "HANDLING, TRANSPORT, STORAGE AND PROTECTION OF AGRICULTURAL PRODUCT" and "HANDLING, TRANSPORT, STORAGE AND PROTECTION OF AGRICULTURAL PRODUCTS"
- "RAVAGEURS DE PLANTES" and "RAVAGEURS DES PLANTES"
- "SANTE DES PLANTES" and "SANTÉ DES PLANTES"
- "SOCIOECONOMIE" and "SOCIOECONOMY"
- inconsistencies in `dc.description.sponsorship`:
- "Belgian Corporation" and "Belgium Corporation"
- inconsistencies in `dc.subject`:
- "AFRICAN CASSAVA MOSAIC" and "AFRICAN CASSAVA MOSAIC DISEASE"
- "ASPERGILLU FLAVUS" and "ASPERGILLUS FLAVUS"
- "BIOTECHNOLOGIES" and "BIOTECHNOLOGY"
- "CASSAVA MOSAIC DISEASE" and "CASSAVA MOSAIC DISEASES" and "CASSAVA MOSAIC VIRUS"
- "CASSAVA PROCESSING" and "CASSAVA PROCESSING TECHNOLOGY"
- "CROPPING SYSTEM" and "CROPPING SYSTEMS"
- "DRY SEASON" and "DRY-SEASON"
- "FERTILIZER" and "FERTILIZERS"
- "LEGUME" and "LEGUMES"
- "LEGUMINOSAE" and "LEGUMINOUS"
- "LEGUMINOUS COVER CROP" and "LEGUMINOUS COVER CROPS"
- "MATÉRIEL DE PLANTATION" and "MATÉRIELS DE PLANTATION"
- I noticed that some records do have encoding errors in the `dc.description.abstract` field, but only four of them so probably not from Abenet's handling of the XLS file
- Based on manually eyeballing the text I used a custom text facet with this GREL to identify the records:
```
or(
value.contains('€'),
value.contains('6g'),
value.contains('6m'),
value.contains('6d'),
value.contains('6e')
)
```
- So IITA should double check the abstracts for these:
- https://dspacetest.cgiar.org/10568/96184
- https://dspacetest.cgiar.org/10568/96141
- https://dspacetest.cgiar.org/10568/96118
- https://dspacetest.cgiar.org/10568/96113
# vim: set sw=2 ts=2:

View File

@ -41,7 +41,7 @@ sys 2m7.289s
<meta property="article:published_time" content="2018-06-04T19:49:54-07:00"/>
<meta property="article:modified_time" content="2018-06-10T19:32:12&#43;03:00"/>
<meta property="article:modified_time" content="2018-06-11T15:21:14&#43;03:00"/>
@ -93,9 +93,9 @@ sys 2m7.289s
"@type": "BlogPosting",
"headline": "June, 2018",
"url": "https://alanorth.github.io/cgspace-notes/2018-06/",
"wordCount": "1025",
"wordCount": "1462",
"datePublished": "2018-06-04T19:49:54-07:00",
"dateModified": "2018-06-10T19:32:12&#43;03:00",
"dateModified": "2018-06-11T15:21:14&#43;03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -312,6 +312,78 @@ Failed to startup the DSpace Service Manager: failure starting up spring service
<li>I wonder if I should start checking for &ldquo;smart&rdquo; quotes like (hex 2019)</li>
</ul>
<h2 id="2018-06-12">2018-06-12</h2>
<ul>
<li>Udana from IWMI asked about the OAI base URL for their community on CGSpace</li>
<li>I think it should be this: <a href="https://cgspace.cgiar.org/oai/request?verb=ListRecords&amp;metadataPrefix=oai_dc&amp;set=com_10568_16814">https://cgspace.cgiar.org/oai/request?verb=ListRecords&amp;metadataPrefix=oai_dc&amp;set=com_10568_16814</a></li>
<li>The style sheet obfuscates the data, but if you look at the source it is all there, including information about pagination of results</li>
<li>Regarding Udana&rsquo;s Book Chapters and Reports on DSpace Test last week, Abenet told him to fix some character encoding and CRP issues, then I told him I&rsquo;d check them after that</li>
<li>The latest batch of IITA&rsquo;s 200 records (based on Abenet&rsquo;s version <code>Mercy1805_AY.xls</code>) are now in the <a href="https://dspacetest.cgiar.org/handle/10568/96071">IITA_Jan_9_II_Ab</a> collection</li>
<li>So here are some corrections:
<ul>
<li>use of Unicode smart quote (hex 2019) in countries and affiliations, for example &ldquo;COTE DIVOIRE&rdquo; and &ldquo;Institut dEconomic Rurale, Mali&rdquo;</li>
<li>inconsistencies in <code>cg.contributor.affiliation</code>:</li>
<li>&ldquo;Centro Internacional de Agricultura Tropical&rdquo; and &ldquo;Centro International de Agricultura Tropical&rdquo; should use the English name of CIAT (International Center for Tropical Agriculture)</li>
<li>&ldquo;Institut International d&rsquo;Agriculture Tropicale&rdquo; should use the English name of IITA (International Institute of Tropical Agriculture)</li>
<li>&ldquo;East and Southern Africa Regional Center&rdquo; and &ldquo;Eastern and Southern Africa Regional Centre&rdquo;</li>
<li>&ldquo;Institut de la Recherche Agronomique, Cameroon&rdquo; and &ldquo;Institut de Recherche Agronomique, Cameroon&rdquo;</li>
<li>&ldquo;Institut des Recherches Agricoles du Bénin&rdquo; and &ldquo;Institut National des Recherche Agricoles du Benin&rdquo; and &ldquo;National Agricultural Research Institute, Benin&rdquo;</li>
<li>&ldquo;Institute of Agronomic Research, Cameroon&rdquo; and &ldquo;Institute of Agronomy Research, Cameroon&rdquo;</li>
<li>&ldquo;Rivers State University&rdquo; and &ldquo;Rivers State University of Science and Technology&rdquo;</li>
<li>&ldquo;Universität Hannover&rdquo; and &ldquo;University of Hannover&rdquo;</li>
<li>inconsistencies in <code>cg.subject.iita</code>:</li>
<li>&ldquo;AMELIORATION DES PLANTES&rdquo; and &ldquo;AMÉLIORATION DES PLANTES&rdquo;</li>
<li>&ldquo;PRODUCTION VEGETALE&rdquo; and &ldquo;PRODUCTION VÉGÉTALE&rdquo;</li>
<li>&ldquo;CONTRÔLE DE MALADIES&rdquo; and &ldquo;CONTROLE DES MALADIES&rdquo;</li>
<li>&ldquo;HANDLING, TRANSPORT, STORAGE AND PROTECTION OF AGRICULTURAL PRODUCT&rdquo; and &ldquo;HANDLING, TRANSPORT, STORAGE AND PROTECTION OF AGRICULTURAL PRODUCTS&rdquo;</li>
<li>&ldquo;RAVAGEURS DE PLANTES&rdquo; and &ldquo;RAVAGEURS DES PLANTES&rdquo;</li>
<li>&ldquo;SANTE DES PLANTES&rdquo; and &ldquo;SANTÉ DES PLANTES&rdquo;</li>
<li>&ldquo;SOCIOECONOMIE&rdquo; and &ldquo;SOCIOECONOMY&rdquo;</li>
<li>inconsistencies in <code>dc.description.sponsorship</code>:</li>
<li>&ldquo;Belgian Corporation&rdquo; and &ldquo;Belgium Corporation&rdquo;</li>
<li>inconsistencies in <code>dc.subject</code>:</li>
<li>&ldquo;AFRICAN CASSAVA MOSAIC&rdquo; and &ldquo;AFRICAN CASSAVA MOSAIC DISEASE&rdquo;</li>
<li>&ldquo;ASPERGILLU FLAVUS&rdquo; and &ldquo;ASPERGILLUS FLAVUS&rdquo;</li>
<li>&ldquo;BIOTECHNOLOGIES&rdquo; and &ldquo;BIOTECHNOLOGY&rdquo;</li>
<li>&ldquo;CASSAVA MOSAIC DISEASE&rdquo; and &ldquo;CASSAVA MOSAIC DISEASES&rdquo; and &ldquo;CASSAVA MOSAIC VIRUS&rdquo;</li>
<li>&ldquo;CASSAVA PROCESSING&rdquo; and &ldquo;CASSAVA PROCESSING TECHNOLOGY&rdquo;</li>
<li>&ldquo;CROPPING SYSTEM&rdquo; and &ldquo;CROPPING SYSTEMS&rdquo;</li>
<li>&ldquo;DRY SEASON&rdquo; and &ldquo;DRY-SEASON&rdquo;</li>
<li>&ldquo;FERTILIZER&rdquo; and &ldquo;FERTILIZERS&rdquo;</li>
<li>&ldquo;LEGUME&rdquo; and &ldquo;LEGUMES&rdquo;</li>
<li>&ldquo;LEGUMINOSAE&rdquo; and &ldquo;LEGUMINOUS&rdquo;</li>
<li>&ldquo;LEGUMINOUS COVER CROP&rdquo; and &ldquo;LEGUMINOUS COVER CROPS&rdquo;</li>
<li>&ldquo;MATÉRIEL DE PLANTATION&rdquo; and &ldquo;MATÉRIELS DE PLANTATION&rdquo;</li>
<li>I noticed that some records do have encoding errors in the <code>dc.description.abstract</code> field, but only four of them so probably not from Abenet&rsquo;s handling of the XLS file</li>
<li>Based on manually eyeballing the text I used a custom text facet with this GREL to identify the records:</li>
</ul></li>
</ul>
<pre><code>or(
value.contains('€'),
value.contains('6g'),
value.contains('6m'),
value.contains('6d'),
value.contains('6e')
)
</code></pre>
<ul>
<li>So IITA should double check the abstracts for these:
<ul>
<li><a href="https://dspacetest.cgiar.org/10568/96184">https://dspacetest.cgiar.org/10568/96184</a></li>
<li><a href="https://dspacetest.cgiar.org/10568/96141">https://dspacetest.cgiar.org/10568/96141</a></li>
<li><a href="https://dspacetest.cgiar.org/10568/96118">https://dspacetest.cgiar.org/10568/96118</a></li>
<li><a href="https://dspacetest.cgiar.org/10568/96113">https://dspacetest.cgiar.org/10568/96113</a>
<br /></li>
</ul></li>
</ul>
<h1 id="vim-set-sw-2-ts-2">vim: set sw=2 ts=2:</h1>

View File

@ -4,7 +4,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/2018-06/</loc>
<lastmod>2018-06-10T19:32:12+03:00</lastmod>
<lastmod>2018-06-11T15:21:14+03:00</lastmod>
</url>
<url>
@ -169,7 +169,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2018-06-10T19:32:12+03:00</lastmod>
<lastmod>2018-06-11T15:21:14+03:00</lastmod>
<priority>0</priority>
</url>
@ -180,7 +180,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
<lastmod>2018-06-10T19:32:12+03:00</lastmod>
<lastmod>2018-06-11T15:21:14+03:00</lastmod>
<priority>0</priority>
</url>
@ -192,13 +192,13 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2018-06-10T19:32:12+03:00</lastmod>
<lastmod>2018-06-11T15:21:14+03:00</lastmod>
<priority>0</priority>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
<lastmod>2018-06-10T19:32:12+03:00</lastmod>
<lastmod>2018-06-11T15:21:14+03:00</lastmod>
<priority>0</priority>
</url>