mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-12-23 21:44:30 +01:00
Update notes for 2018-06-12
This commit is contained in:
parent
a8715c203c
commit
982ed47d55
@ -127,3 +127,64 @@ Failed to startup the DSpace Service Manager: failure starting up spring service
|
||||
- I was curious to see if I could create a GREL for use with a custom text facet in Open Refine to find cells with two or more consecutive spaces
|
||||
- I always use the built-in trim and collapse transformations anyways, but this seems to work to find the offending cells: `isNotNull(value.match(/.*?\s{2,}.*?/))`
|
||||
- I wonder if I should start checking for "smart" quotes like ’ (hex 2019)
|
||||
|
||||
## 2018-06-12
|
||||
|
||||
- Udana from IWMI asked about the OAI base URL for their community on CGSpace
|
||||
- I think it should be this: https://cgspace.cgiar.org/oai/request?verb=ListRecords&metadataPrefix=oai_dc&set=com_10568_16814
|
||||
- The style sheet obfuscates the data, but if you look at the source it is all there, including information about pagination of results
|
||||
- Regarding Udana's Book Chapters and Reports on DSpace Test last week, Abenet told him to fix some character encoding and CRP issues, then I told him I'd check them after that
|
||||
- The latest batch of IITA's 200 records (based on Abenet's version `Mercy1805_AY.xls`) are now in the [IITA_Jan_9_II_Ab](https://dspacetest.cgiar.org/handle/10568/96071) collection
|
||||
- So here are some corrections:
|
||||
- use of Unicode smart quote (hex 2019) in countries and affiliations, for example "COTE D’IVOIRE" and "Institut d’Economic Rurale, Mali"
|
||||
- inconsistencies in `cg.contributor.affiliation`:
|
||||
- "Centro Internacional de Agricultura Tropical" and "Centro International de Agricultura Tropical" should use the English name of CIAT (International Center for Tropical Agriculture)
|
||||
- "Institut International d'Agriculture Tropicale" should use the English name of IITA (International Institute of Tropical Agriculture)
|
||||
- "East and Southern Africa Regional Center" and "Eastern and Southern Africa Regional Centre"
|
||||
- "Institut de la Recherche Agronomique, Cameroon" and "Institut de Recherche Agronomique, Cameroon"
|
||||
- "Institut des Recherches Agricoles du Bénin" and "Institut National des Recherche Agricoles du Benin" and "National Agricultural Research Institute, Benin"
|
||||
- "Institute of Agronomic Research, Cameroon" and "Institute of Agronomy Research, Cameroon"
|
||||
- "Rivers State University" and "Rivers State University of Science and Technology"
|
||||
- "Universität Hannover" and "University of Hannover"
|
||||
- inconsistencies in `cg.subject.iita`:
|
||||
- "AMELIORATION DES PLANTES" and "AMÉLIORATION DES PLANTES"
|
||||
- "PRODUCTION VEGETALE" and "PRODUCTION VÉGÉTALE"
|
||||
- "CONTRÔLE DE MALADIES" and "CONTROLE DES MALADIES"
|
||||
- "HANDLING, TRANSPORT, STORAGE AND PROTECTION OF AGRICULTURAL PRODUCT" and "HANDLING, TRANSPORT, STORAGE AND PROTECTION OF AGRICULTURAL PRODUCTS"
|
||||
- "RAVAGEURS DE PLANTES" and "RAVAGEURS DES PLANTES"
|
||||
- "SANTE DES PLANTES" and "SANTÉ DES PLANTES"
|
||||
- "SOCIOECONOMIE" and "SOCIOECONOMY"
|
||||
- inconsistencies in `dc.description.sponsorship`:
|
||||
- "Belgian Corporation" and "Belgium Corporation"
|
||||
- inconsistencies in `dc.subject`:
|
||||
- "AFRICAN CASSAVA MOSAIC" and "AFRICAN CASSAVA MOSAIC DISEASE"
|
||||
- "ASPERGILLU FLAVUS" and "ASPERGILLUS FLAVUS"
|
||||
- "BIOTECHNOLOGIES" and "BIOTECHNOLOGY"
|
||||
- "CASSAVA MOSAIC DISEASE" and "CASSAVA MOSAIC DISEASES" and "CASSAVA MOSAIC VIRUS"
|
||||
- "CASSAVA PROCESSING" and "CASSAVA PROCESSING TECHNOLOGY"
|
||||
- "CROPPING SYSTEM" and "CROPPING SYSTEMS"
|
||||
- "DRY SEASON" and "DRY-SEASON"
|
||||
- "FERTILIZER" and "FERTILIZERS"
|
||||
- "LEGUME" and "LEGUMES"
|
||||
- "LEGUMINOSAE" and "LEGUMINOUS"
|
||||
- "LEGUMINOUS COVER CROP" and "LEGUMINOUS COVER CROPS"
|
||||
- "MATÉRIEL DE PLANTATION" and "MATÉRIELS DE PLANTATION"
|
||||
- I noticed that some records do have encoding errors in the `dc.description.abstract` field, but only four of them so probably not from Abenet's handling of the XLS file
|
||||
- Based on manually eyeballing the text I used a custom text facet with this GREL to identify the records:
|
||||
|
||||
```
|
||||
or(
|
||||
value.contains('€'),
|
||||
value.contains('6g'),
|
||||
value.contains('6m'),
|
||||
value.contains('6d'),
|
||||
value.contains('6e')
|
||||
)
|
||||
```
|
||||
- So IITA should double check the abstracts for these:
|
||||
- https://dspacetest.cgiar.org/10568/96184
|
||||
- https://dspacetest.cgiar.org/10568/96141
|
||||
- https://dspacetest.cgiar.org/10568/96118
|
||||
- https://dspacetest.cgiar.org/10568/96113
|
||||
|
||||
# vim: set sw=2 ts=2:
|
||||
|
@ -41,7 +41,7 @@ sys 2m7.289s
|
||||
|
||||
<meta property="article:published_time" content="2018-06-04T19:49:54-07:00"/>
|
||||
|
||||
<meta property="article:modified_time" content="2018-06-10T19:32:12+03:00"/>
|
||||
<meta property="article:modified_time" content="2018-06-11T15:21:14+03:00"/>
|
||||
|
||||
|
||||
|
||||
@ -93,9 +93,9 @@ sys 2m7.289s
|
||||
"@type": "BlogPosting",
|
||||
"headline": "June, 2018",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2018-06/",
|
||||
"wordCount": "1025",
|
||||
"wordCount": "1462",
|
||||
"datePublished": "2018-06-04T19:49:54-07:00",
|
||||
"dateModified": "2018-06-10T19:32:12+03:00",
|
||||
"dateModified": "2018-06-11T15:21:14+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -312,6 +312,78 @@ Failed to startup the DSpace Service Manager: failure starting up spring service
|
||||
<li>I wonder if I should start checking for “smart” quotes like ’ (hex 2019)</li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-06-12">2018-06-12</h2>
|
||||
|
||||
<ul>
|
||||
<li>Udana from IWMI asked about the OAI base URL for their community on CGSpace</li>
|
||||
<li>I think it should be this: <a href="https://cgspace.cgiar.org/oai/request?verb=ListRecords&metadataPrefix=oai_dc&set=com_10568_16814">https://cgspace.cgiar.org/oai/request?verb=ListRecords&metadataPrefix=oai_dc&set=com_10568_16814</a></li>
|
||||
<li>The style sheet obfuscates the data, but if you look at the source it is all there, including information about pagination of results</li>
|
||||
<li>Regarding Udana’s Book Chapters and Reports on DSpace Test last week, Abenet told him to fix some character encoding and CRP issues, then I told him I’d check them after that</li>
|
||||
<li>The latest batch of IITA’s 200 records (based on Abenet’s version <code>Mercy1805_AY.xls</code>) are now in the <a href="https://dspacetest.cgiar.org/handle/10568/96071">IITA_Jan_9_II_Ab</a> collection</li>
|
||||
<li>So here are some corrections:
|
||||
|
||||
<ul>
|
||||
<li>use of Unicode smart quote (hex 2019) in countries and affiliations, for example “COTE D’IVOIRE” and “Institut d’Economic Rurale, Mali”</li>
|
||||
<li>inconsistencies in <code>cg.contributor.affiliation</code>:</li>
|
||||
<li>“Centro Internacional de Agricultura Tropical” and “Centro International de Agricultura Tropical” should use the English name of CIAT (International Center for Tropical Agriculture)</li>
|
||||
<li>“Institut International d’Agriculture Tropicale” should use the English name of IITA (International Institute of Tropical Agriculture)</li>
|
||||
<li>“East and Southern Africa Regional Center” and “Eastern and Southern Africa Regional Centre”</li>
|
||||
<li>“Institut de la Recherche Agronomique, Cameroon” and “Institut de Recherche Agronomique, Cameroon”</li>
|
||||
<li>“Institut des Recherches Agricoles du Bénin” and “Institut National des Recherche Agricoles du Benin” and “National Agricultural Research Institute, Benin”</li>
|
||||
<li>“Institute of Agronomic Research, Cameroon” and “Institute of Agronomy Research, Cameroon”</li>
|
||||
<li>“Rivers State University” and “Rivers State University of Science and Technology”</li>
|
||||
<li>“Universität Hannover” and “University of Hannover”</li>
|
||||
<li>inconsistencies in <code>cg.subject.iita</code>:</li>
|
||||
<li>“AMELIORATION DES PLANTES” and “AMÉLIORATION DES PLANTES”</li>
|
||||
<li>“PRODUCTION VEGETALE” and “PRODUCTION VÉGÉTALE”</li>
|
||||
<li>“CONTRÔLE DE MALADIES” and “CONTROLE DES MALADIES”</li>
|
||||
<li>“HANDLING, TRANSPORT, STORAGE AND PROTECTION OF AGRICULTURAL PRODUCT” and “HANDLING, TRANSPORT, STORAGE AND PROTECTION OF AGRICULTURAL PRODUCTS”</li>
|
||||
<li>“RAVAGEURS DE PLANTES” and “RAVAGEURS DES PLANTES”</li>
|
||||
<li>“SANTE DES PLANTES” and “SANTÉ DES PLANTES”</li>
|
||||
<li>“SOCIOECONOMIE” and “SOCIOECONOMY”</li>
|
||||
<li>inconsistencies in <code>dc.description.sponsorship</code>:</li>
|
||||
<li>“Belgian Corporation” and “Belgium Corporation”</li>
|
||||
<li>inconsistencies in <code>dc.subject</code>:</li>
|
||||
<li>“AFRICAN CASSAVA MOSAIC” and “AFRICAN CASSAVA MOSAIC DISEASE”</li>
|
||||
<li>“ASPERGILLU FLAVUS” and “ASPERGILLUS FLAVUS”</li>
|
||||
<li>“BIOTECHNOLOGIES” and “BIOTECHNOLOGY”</li>
|
||||
<li>“CASSAVA MOSAIC DISEASE” and “CASSAVA MOSAIC DISEASES” and “CASSAVA MOSAIC VIRUS”</li>
|
||||
<li>“CASSAVA PROCESSING” and “CASSAVA PROCESSING TECHNOLOGY”</li>
|
||||
<li>“CROPPING SYSTEM” and “CROPPING SYSTEMS”</li>
|
||||
<li>“DRY SEASON” and “DRY-SEASON”</li>
|
||||
<li>“FERTILIZER” and “FERTILIZERS”</li>
|
||||
<li>“LEGUME” and “LEGUMES”</li>
|
||||
<li>“LEGUMINOSAE” and “LEGUMINOUS”</li>
|
||||
<li>“LEGUMINOUS COVER CROP” and “LEGUMINOUS COVER CROPS”</li>
|
||||
<li>“MATÉRIEL DE PLANTATION” and “MATÉRIELS DE PLANTATION”</li>
|
||||
<li>I noticed that some records do have encoding errors in the <code>dc.description.abstract</code> field, but only four of them so probably not from Abenet’s handling of the XLS file</li>
|
||||
<li>Based on manually eyeballing the text I used a custom text facet with this GREL to identify the records:</li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
|
||||
<pre><code>or(
|
||||
value.contains('€'),
|
||||
value.contains('6g'),
|
||||
value.contains('6m'),
|
||||
value.contains('6d'),
|
||||
value.contains('6e')
|
||||
)
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>So IITA should double check the abstracts for these:
|
||||
|
||||
<ul>
|
||||
<li><a href="https://dspacetest.cgiar.org/10568/96184">https://dspacetest.cgiar.org/10568/96184</a></li>
|
||||
<li><a href="https://dspacetest.cgiar.org/10568/96141">https://dspacetest.cgiar.org/10568/96141</a></li>
|
||||
<li><a href="https://dspacetest.cgiar.org/10568/96118">https://dspacetest.cgiar.org/10568/96118</a></li>
|
||||
<li><a href="https://dspacetest.cgiar.org/10568/96113">https://dspacetest.cgiar.org/10568/96113</a>
|
||||
<br /></li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
|
||||
<h1 id="vim-set-sw-2-ts-2">vim: set sw=2 ts=2:</h1>
|
||||
|
||||
|
||||
|
||||
|
||||
|
@ -4,7 +4,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2018-06/</loc>
|
||||
<lastmod>2018-06-10T19:32:12+03:00</lastmod>
|
||||
<lastmod>2018-06-11T15:21:14+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
@ -169,7 +169,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||
<lastmod>2018-06-10T19:32:12+03:00</lastmod>
|
||||
<lastmod>2018-06-11T15:21:14+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
@ -180,7 +180,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||
<lastmod>2018-06-10T19:32:12+03:00</lastmod>
|
||||
<lastmod>2018-06-11T15:21:14+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
@ -192,13 +192,13 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||
<lastmod>2018-06-10T19:32:12+03:00</lastmod>
|
||||
<lastmod>2018-06-11T15:21:14+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||
<lastmod>2018-06-10T19:32:12+03:00</lastmod>
|
||||
<lastmod>2018-06-11T15:21:14+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user