mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Update notes for 2018-06-12
This commit is contained in:
@ -127,3 +127,64 @@ Failed to startup the DSpace Service Manager: failure starting up spring service
|
||||
- I was curious to see if I could create a GREL for use with a custom text facet in Open Refine to find cells with two or more consecutive spaces
|
||||
- I always use the built-in trim and collapse transformations anyways, but this seems to work to find the offending cells: `isNotNull(value.match(/.*?\s{2,}.*?/))`
|
||||
- I wonder if I should start checking for "smart" quotes like ’ (hex 2019)
|
||||
|
||||
## 2018-06-12
|
||||
|
||||
- Udana from IWMI asked about the OAI base URL for their community on CGSpace
|
||||
- I think it should be this: https://cgspace.cgiar.org/oai/request?verb=ListRecords&metadataPrefix=oai_dc&set=com_10568_16814
|
||||
- The style sheet obfuscates the data, but if you look at the source it is all there, including information about pagination of results
|
||||
- Regarding Udana's Book Chapters and Reports on DSpace Test last week, Abenet told him to fix some character encoding and CRP issues, then I told him I'd check them after that
|
||||
- The latest batch of IITA's 200 records (based on Abenet's version `Mercy1805_AY.xls`) are now in the [IITA_Jan_9_II_Ab](https://dspacetest.cgiar.org/handle/10568/96071) collection
|
||||
- So here are some corrections:
|
||||
- use of Unicode smart quote (hex 2019) in countries and affiliations, for example "COTE D’IVOIRE" and "Institut d’Economic Rurale, Mali"
|
||||
- inconsistencies in `cg.contributor.affiliation`:
|
||||
- "Centro Internacional de Agricultura Tropical" and "Centro International de Agricultura Tropical" should use the English name of CIAT (International Center for Tropical Agriculture)
|
||||
- "Institut International d'Agriculture Tropicale" should use the English name of IITA (International Institute of Tropical Agriculture)
|
||||
- "East and Southern Africa Regional Center" and "Eastern and Southern Africa Regional Centre"
|
||||
- "Institut de la Recherche Agronomique, Cameroon" and "Institut de Recherche Agronomique, Cameroon"
|
||||
- "Institut des Recherches Agricoles du Bénin" and "Institut National des Recherche Agricoles du Benin" and "National Agricultural Research Institute, Benin"
|
||||
- "Institute of Agronomic Research, Cameroon" and "Institute of Agronomy Research, Cameroon"
|
||||
- "Rivers State University" and "Rivers State University of Science and Technology"
|
||||
- "Universität Hannover" and "University of Hannover"
|
||||
- inconsistencies in `cg.subject.iita`:
|
||||
- "AMELIORATION DES PLANTES" and "AMÉLIORATION DES PLANTES"
|
||||
- "PRODUCTION VEGETALE" and "PRODUCTION VÉGÉTALE"
|
||||
- "CONTRÔLE DE MALADIES" and "CONTROLE DES MALADIES"
|
||||
- "HANDLING, TRANSPORT, STORAGE AND PROTECTION OF AGRICULTURAL PRODUCT" and "HANDLING, TRANSPORT, STORAGE AND PROTECTION OF AGRICULTURAL PRODUCTS"
|
||||
- "RAVAGEURS DE PLANTES" and "RAVAGEURS DES PLANTES"
|
||||
- "SANTE DES PLANTES" and "SANTÉ DES PLANTES"
|
||||
- "SOCIOECONOMIE" and "SOCIOECONOMY"
|
||||
- inconsistencies in `dc.description.sponsorship`:
|
||||
- "Belgian Corporation" and "Belgium Corporation"
|
||||
- inconsistencies in `dc.subject`:
|
||||
- "AFRICAN CASSAVA MOSAIC" and "AFRICAN CASSAVA MOSAIC DISEASE"
|
||||
- "ASPERGILLU FLAVUS" and "ASPERGILLUS FLAVUS"
|
||||
- "BIOTECHNOLOGIES" and "BIOTECHNOLOGY"
|
||||
- "CASSAVA MOSAIC DISEASE" and "CASSAVA MOSAIC DISEASES" and "CASSAVA MOSAIC VIRUS"
|
||||
- "CASSAVA PROCESSING" and "CASSAVA PROCESSING TECHNOLOGY"
|
||||
- "CROPPING SYSTEM" and "CROPPING SYSTEMS"
|
||||
- "DRY SEASON" and "DRY-SEASON"
|
||||
- "FERTILIZER" and "FERTILIZERS"
|
||||
- "LEGUME" and "LEGUMES"
|
||||
- "LEGUMINOSAE" and "LEGUMINOUS"
|
||||
- "LEGUMINOUS COVER CROP" and "LEGUMINOUS COVER CROPS"
|
||||
- "MATÉRIEL DE PLANTATION" and "MATÉRIELS DE PLANTATION"
|
||||
- I noticed that some records do have encoding errors in the `dc.description.abstract` field, but only four of them so probably not from Abenet's handling of the XLS file
|
||||
- Based on manually eyeballing the text I used a custom text facet with this GREL to identify the records:
|
||||
|
||||
```
|
||||
or(
|
||||
value.contains('€'),
|
||||
value.contains('6g'),
|
||||
value.contains('6m'),
|
||||
value.contains('6d'),
|
||||
value.contains('6e')
|
||||
)
|
||||
```
|
||||
- So IITA should double check the abstracts for these:
|
||||
- https://dspacetest.cgiar.org/10568/96184
|
||||
- https://dspacetest.cgiar.org/10568/96141
|
||||
- https://dspacetest.cgiar.org/10568/96118
|
||||
- https://dspacetest.cgiar.org/10568/96113
|
||||
|
||||
# vim: set sw=2 ts=2:
|
||||
|
Reference in New Issue
Block a user