mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Update notes for 2018-06-11
This commit is contained in:
@ -105,3 +105,25 @@ Failed to startup the DSpace Service Manager: failure starting up spring service
|
||||
- dÕpassÕ
|
||||
- Also the abstracts have missing accents, ie "recherche sur le d veloppement"
|
||||
- I will have to tell IITA people to redo these entirely I think...
|
||||
|
||||
## 2018-06-11
|
||||
|
||||
- Sisay sent a new version of the last IITA records that he created from the original CSV from IITA
|
||||
- The 200 records are in the [IITA_Junel_11 (10568/95870)](https://dspacetest.cgiar.org/handle/10568/95870) collection
|
||||
- Many errors:
|
||||
- Authorship types: "CGIAR ans advanced research institute", "CGAIR and advanced research institute", "CGIAR and advanced research institutes", "CGAIR single center"
|
||||
- Lots of inconsistencies and mispellings in author affiliations:
|
||||
- "Institut des Recherches Agricoles du Bénin" and "Institut National des Recherche Agricoles du Benin" and "National Agricultural Research Institute, Benin"
|
||||
- International Insitute of Tropical Agriculture
|
||||
- Centro Internacional de Agricultura Tropical
|
||||
- "Rivers State University of Science and Technology" and "Rivers State University"
|
||||
- "Institut de la Recherche Agronomique, Cameroon" and "Institut de Recherche Agronomique, Cameroon"
|
||||
- Inconsistency in countries: "COTE D’IVOIRE" and "COTE D'IVOIRE"
|
||||
- A few DOIs with spaces or invalid characters
|
||||
- Inconsistency in IITA subjects, for example "PRODUCTION VEGETALE" and "PRODUCTION VÉGÉTALE" and several others
|
||||
- I ran `value.unescape('javascript')` on the abstract and citation fields because it looks like this data came from a SQL database and some stuff was escaped
|
||||
- It turns out that Abenet actually did a lot of small corrections on this data so when Sisay uses Bosede's original file it doesn't have all those corrections
|
||||
- So I told Sisay to re-create the collection using Abenet's XLS from last week (`Mercy1805_AY.xls`)
|
||||
- I was curious to see if I could create a GREL for use with a custom text facet in Open Refine to find cells with two or more consecutive spaces
|
||||
- I always use the built-in trim and collapse transformations anyways, but this seems to work to find the offending cells: `isNotNull(value.match(/.*?\s{2,}.*?/))`
|
||||
- I wonder if I should start checking for "smart" quotes like ’ (hex 2019)
|
||||
|
Reference in New Issue
Block a user