Update notes for 2019-10-12

This commit is contained in:
Alan Orth 2019-10-12 23:28:50 +03:00
parent 05433a338d
commit e6d361c7fe
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
3 changed files with 18 additions and 10 deletions

View File

@ -125,7 +125,11 @@ International Maize and Wheat Improvement Centre,International Maize and Wheat I
$ ./fix-metadata-values.py -i /tmp/affiliations.csv -db dspace -u dspace -p 'fuuu' -f from -m 211 -t to
```
- I did some manual curation of ~227 authors in preparation for telling Peter and Abenet that the migration is almost ready
- I did some manual curation of about 300 authors in OpenRefine in preparation for telling Peter and Abenet that the migration is almost ready
- I would still like to perhaps (re)move institutional authors from `dc.contributor.author` to `cg.contributor.affiliation`, but I will have to run that by Francesca, Carol, and Abenet
- I could use a custom text facet like this in OpenRefine to find authors that likely match the "Last, F." pattern: `isNotNull(value.match(/^.*, \p{Lu}\.?.*$/))`
- The `\p{Lu}` is a cool [regex character class](https://www.regular-expressions.info/unicode.html) to make sure this works for letters with accents
- As cool as that is, it's actually more effective to just search for authors that have "." in them!
- I've decided to add a `cg.contributor.affiliation` column to 1,025 items based on the logic above where the author name is not an actual person
<!-- vim: set sw=2 ts=2: -->

View File

@ -11,7 +11,7 @@
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-10/" />
<meta property="article:published_time" content="2019-10-01T13:20:51+03:00" />
<meta property="article:modified_time" content="2019-10-12T14:28:43+03:00" />
<meta property="article:modified_time" content="2019-10-12T19:21:30+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="October, 2019"/>
@ -27,9 +27,9 @@
"@type": "BlogPosting",
"headline": "October, 2019",
"url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-10\/",
"wordCount": "965",
"wordCount": "1051",
"datePublished": "2019-10-01T13:20:51+03:00",
"dateModified": "2019-10-12T14:28:43+03:00",
"dateModified": "2019-10-12T19:21:30+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -267,10 +267,14 @@ International Maize and Wheat Improvement Centre,International Maize and Wheat I
<pre><code>$ ./fix-metadata-values.py -i /tmp/affiliations.csv -db dspace -u dspace -p 'fuuu' -f from -m 211 -t to
</code></pre></li>
<li><p>I did some manual curation of ~227 authors in preparation for telling Peter and Abenet that the migration is almost ready</p>
<li><p>I did some manual curation of about 300 authors in OpenRefine in preparation for telling Peter and Abenet that the migration is almost ready</p>
<ul>
<li>I would still like to perhaps (re)move institutional authors from <code>dc.contributor.author</code> to <code>cg.contributor.affiliation</code>, but I will have to run that by Francesca, Carol, and Abenet</li>
<li>I could use a custom text facet like this in OpenRefine to find authors that likely match the &ldquo;Last, F.&rdquo; pattern: <code>isNotNull(value.match(/^.*, \p{Lu}\.?.*$/))</code></li>
<li>The <code>\p{Lu}</code> is a cool <a href="https://www.regular-expressions.info/unicode.html">regex character class</a> to make sure this works for letters with accents</li>
<li>As cool as that is, it&rsquo;s actually more effective to just search for authors that have &ldquo;.&rdquo; in them!</li>
<li>I&rsquo;ve decided to add a <code>cg.contributor.affiliation</code> column to 1,025 items based on the logic above where the author name is not an actual person</li>
</ul></li>
</ul>

View File

@ -4,27 +4,27 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2019-10-12T14:28:43+03:00</lastmod>
<lastmod>2019-10-12T19:21:30+03:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
<lastmod>2019-10-12T14:28:43+03:00</lastmod>
<lastmod>2019-10-12T19:21:30+03:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/2019-10/</loc>
<lastmod>2019-10-12T14:28:43+03:00</lastmod>
<lastmod>2019-10-12T19:21:30+03:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2019-10-12T14:28:43+03:00</lastmod>
<lastmod>2019-10-12T19:21:30+03:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
<lastmod>2019-10-12T14:28:43+03:00</lastmod>
<lastmod>2019-10-12T19:21:30+03:00</lastmod>
</url>
<url>