mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-22 06:35:03 +01:00
Update notes for 2018-06-11
This commit is contained in:
parent
da85011fae
commit
a8715c203c
@ -105,3 +105,25 @@ Failed to startup the DSpace Service Manager: failure starting up spring service
|
||||
- dÕpassÕ
|
||||
- Also the abstracts have missing accents, ie "recherche sur le d veloppement"
|
||||
- I will have to tell IITA people to redo these entirely I think...
|
||||
|
||||
## 2018-06-11
|
||||
|
||||
- Sisay sent a new version of the last IITA records that he created from the original CSV from IITA
|
||||
- The 200 records are in the [IITA_Junel_11 (10568/95870)](https://dspacetest.cgiar.org/handle/10568/95870) collection
|
||||
- Many errors:
|
||||
- Authorship types: "CGIAR ans advanced research institute", "CGAIR and advanced research institute", "CGIAR and advanced research institutes", "CGAIR single center"
|
||||
- Lots of inconsistencies and mispellings in author affiliations:
|
||||
- "Institut des Recherches Agricoles du Bénin" and "Institut National des Recherche Agricoles du Benin" and "National Agricultural Research Institute, Benin"
|
||||
- International Insitute of Tropical Agriculture
|
||||
- Centro Internacional de Agricultura Tropical
|
||||
- "Rivers State University of Science and Technology" and "Rivers State University"
|
||||
- "Institut de la Recherche Agronomique, Cameroon" and "Institut de Recherche Agronomique, Cameroon"
|
||||
- Inconsistency in countries: "COTE D’IVOIRE" and "COTE D'IVOIRE"
|
||||
- A few DOIs with spaces or invalid characters
|
||||
- Inconsistency in IITA subjects, for example "PRODUCTION VEGETALE" and "PRODUCTION VÉGÉTALE" and several others
|
||||
- I ran `value.unescape('javascript')` on the abstract and citation fields because it looks like this data came from a SQL database and some stuff was escaped
|
||||
- It turns out that Abenet actually did a lot of small corrections on this data so when Sisay uses Bosede's original file it doesn't have all those corrections
|
||||
- So I told Sisay to re-create the collection using Abenet's XLS from last week (`Mercy1805_AY.xls`)
|
||||
- I was curious to see if I could create a GREL for use with a custom text facet in Open Refine to find cells with two or more consecutive spaces
|
||||
- I always use the built-in trim and collapse transformations anyways, but this seems to work to find the offending cells: `isNotNull(value.match(/.*?\s{2,}.*?/))`
|
||||
- I wonder if I should start checking for "smart" quotes like ’ (hex 2019)
|
||||
|
@ -41,7 +41,7 @@ sys 2m7.289s
|
||||
|
||||
<meta property="article:published_time" content="2018-06-04T19:49:54-07:00"/>
|
||||
|
||||
<meta property="article:modified_time" content="2018-06-10T14:12:07+03:00"/>
|
||||
<meta property="article:modified_time" content="2018-06-10T19:32:12+03:00"/>
|
||||
|
||||
|
||||
|
||||
@ -93,9 +93,9 @@ sys 2m7.289s
|
||||
"@type": "BlogPosting",
|
||||
"headline": "June, 2018",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2018-06/",
|
||||
"wordCount": "750",
|
||||
"wordCount": "1025",
|
||||
"datePublished": "2018-06-04T19:49:54-07:00",
|
||||
"dateModified": "2018-06-10T14:12:07+03:00",
|
||||
"dateModified": "2018-06-10T19:32:12+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -285,6 +285,33 @@ Failed to startup the DSpace Service Manager: failure starting up spring service
|
||||
<li>I will have to tell IITA people to redo these entirely I think…</li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-06-11">2018-06-11</h2>
|
||||
|
||||
<ul>
|
||||
<li>Sisay sent a new version of the last IITA records that he created from the original CSV from IITA</li>
|
||||
<li>The 200 records are in the <a href="https://dspacetest.cgiar.org/handle/10568/95870">IITA_Junel_11 (<sup>10568</sup>⁄<sub>95870</sub>)</a> collection</li>
|
||||
<li>Many errors:
|
||||
|
||||
<ul>
|
||||
<li>Authorship types: “CGIAR ans advanced research institute”, “CGAIR and advanced research institute”, “CGIAR and advanced research institutes”, “CGAIR single center”</li>
|
||||
<li>Lots of inconsistencies and mispellings in author affiliations:</li>
|
||||
<li>“Institut des Recherches Agricoles du Bénin” and “Institut National des Recherche Agricoles du Benin” and “National Agricultural Research Institute, Benin”</li>
|
||||
<li>International Insitute of Tropical Agriculture</li>
|
||||
<li>Centro Internacional de Agricultura Tropical</li>
|
||||
<li>“Rivers State University of Science and Technology” and “Rivers State University”</li>
|
||||
<li>“Institut de la Recherche Agronomique, Cameroon” and “Institut de Recherche Agronomique, Cameroon”</li>
|
||||
<li>Inconsistency in countries: “COTE D’IVOIRE” and “COTE D’IVOIRE”</li>
|
||||
<li>A few DOIs with spaces or invalid characters</li>
|
||||
<li>Inconsistency in IITA subjects, for example “PRODUCTION VEGETALE” and “PRODUCTION VÉGÉTALE” and several others</li>
|
||||
<li>I ran <code>value.unescape('javascript')</code> on the abstract and citation fields because it looks like this data came from a SQL database and some stuff was escaped</li>
|
||||
</ul></li>
|
||||
<li>It turns out that Abenet actually did a lot of small corrections on this data so when Sisay uses Bosede’s original file it doesn’t have all those corrections</li>
|
||||
<li>So I told Sisay to re-create the collection using Abenet’s XLS from last week (<code>Mercy1805_AY.xls</code>)</li>
|
||||
<li>I was curious to see if I could create a GREL for use with a custom text facet in Open Refine to find cells with two or more consecutive spaces</li>
|
||||
<li>I always use the built-in trim and collapse transformations anyways, but this seems to work to find the offending cells: <code>isNotNull(value.match(/.*?\s{2,}.*?/))</code></li>
|
||||
<li>I wonder if I should start checking for “smart” quotes like ’ (hex 2019)</li>
|
||||
</ul>
|
||||
|
||||
|
||||
|
||||
|
||||
|
@ -4,7 +4,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2018-06/</loc>
|
||||
<lastmod>2018-06-10T14:12:07+03:00</lastmod>
|
||||
<lastmod>2018-06-10T19:32:12+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
@ -169,7 +169,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||
<lastmod>2018-06-10T14:12:07+03:00</lastmod>
|
||||
<lastmod>2018-06-10T19:32:12+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
@ -180,7 +180,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||
<lastmod>2018-06-10T14:12:07+03:00</lastmod>
|
||||
<lastmod>2018-06-10T19:32:12+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
@ -192,13 +192,13 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||
<lastmod>2018-06-10T14:12:07+03:00</lastmod>
|
||||
<lastmod>2018-06-10T19:32:12+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||
<lastmod>2018-06-10T14:12:07+03:00</lastmod>
|
||||
<lastmod>2018-06-10T19:32:12+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user