Update notes for 2019-10-12

This commit is contained in:
Alan Orth 2019-10-12 19:21:30 +03:00
parent 527c320f6b
commit 05433a338d
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
3 changed files with 61 additions and 8 deletions

View File

@ -103,5 +103,29 @@ UPDATE 1
- More work on identifying duplicates in the Bioversity migration data on DSpace Test
- I mapped twenty-five more items on CGSpace and deleted them from the migration test collection on DSpace Test
- After a few hours I think I finished all the duplicates that were identified by Atmire's Duplicate Checker module
- According to my spreadsheet there were fifty-two in total
- I was preparing to check the affiliations on the Bioversity records when I noticed that the last list of top affiliations I generated has some anomalies
- I made some corrections in a CSV:
```
from,to
CIAT,International Center for Tropical Agriculture
International Centre for Tropical Agriculture,International Center for Tropical Agriculture
International Maize and Wheat Improvement Center (CIMMYT),International Maize and Wheat Improvement Center
International Centre for Agricultural Research in the Dry Areas,International Center for Agricultural Research in the Dry Areas
International Maize and Wheat Improvement Centre,International Maize and Wheat Improvement Center
"Agricultural Information Resource Centre, Kenya.","Agricultural Information Resource Centre, Kenya"
"Centre for Livestock and Agricultural Development, Cambodia","Centre for Livestock and Agriculture Development, Cambodia"
```
- Then I applied it with my `fix-metadata-values.py` script on CGSpace:
```
$ ./fix-metadata-values.py -i /tmp/affiliations.csv -db dspace -u dspace -p 'fuuu' -f from -m 211 -t to
```
- I did some manual curation of ~227 authors in preparation for telling Peter and Abenet that the migration is almost ready
- I would still like to perhaps (re)move institutional authors from `dc.contributor.author` to `cg.contributor.affiliation`, but I will have to run that by Francesca, Carol, and Abenet
<!-- vim: set sw=2 ts=2: -->

View File

@ -11,7 +11,7 @@
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-10/" />
<meta property="article:published_time" content="2019-10-01T13:20:51+03:00" />
<meta property="article:modified_time" content="2019-10-11T12:06:40+03:00" />
<meta property="article:modified_time" content="2019-10-12T14:28:43+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="October, 2019"/>
@ -27,9 +27,9 @@
"@type": "BlogPosting",
"headline": "October, 2019",
"url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-10\/",
"wordCount": "755",
"wordCount": "965",
"datePublished": "2019-10-01T13:20:51+03:00",
"dateModified": "2019-10-11T12:06:40+03:00",
"dateModified": "2019-10-12T14:28:43+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -242,6 +242,35 @@ UPDATE 1
<ul>
<li>I mapped twenty-five more items on CGSpace and deleted them from the migration test collection on DSpace Test</li>
<li>After a few hours I think I finished all the duplicates that were identified by Atmire&rsquo;s Duplicate Checker module</li>
<li>According to my spreadsheet there were fifty-two in total</li>
</ul></li>
<li><p>I was preparing to check the affiliations on the Bioversity records when I noticed that the last list of top affiliations I generated has some anomalies</p>
<ul>
<li><p>I made some corrections in a CSV:</p>
<pre><code>from,to
CIAT,International Center for Tropical Agriculture
International Centre for Tropical Agriculture,International Center for Tropical Agriculture
International Maize and Wheat Improvement Center (CIMMYT),International Maize and Wheat Improvement Center
International Centre for Agricultural Research in the Dry Areas,International Center for Agricultural Research in the Dry Areas
International Maize and Wheat Improvement Centre,International Maize and Wheat Improvement Center
&quot;Agricultural Information Resource Centre, Kenya.&quot;,&quot;Agricultural Information Resource Centre, Kenya&quot;
&quot;Centre for Livestock and Agricultural Development, Cambodia&quot;,&quot;Centre for Livestock and Agriculture Development, Cambodia&quot;
</code></pre></li>
</ul></li>
<li><p>Then I applied it with my <code>fix-metadata-values.py</code> script on CGSpace:</p>
<pre><code>$ ./fix-metadata-values.py -i /tmp/affiliations.csv -db dspace -u dspace -p 'fuuu' -f from -m 211 -t to
</code></pre></li>
<li><p>I did some manual curation of ~227 authors in preparation for telling Peter and Abenet that the migration is almost ready</p>
<ul>
<li>I would still like to perhaps (re)move institutional authors from <code>dc.contributor.author</code> to <code>cg.contributor.affiliation</code>, but I will have to run that by Francesca, Carol, and Abenet</li>
</ul></li>
</ul>

View File

@ -4,27 +4,27 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2019-10-11T12:06:40+03:00</lastmod>
<lastmod>2019-10-12T14:28:43+03:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
<lastmod>2019-10-11T12:06:40+03:00</lastmod>
<lastmod>2019-10-12T14:28:43+03:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/2019-10/</loc>
<lastmod>2019-10-11T12:06:40+03:00</lastmod>
<lastmod>2019-10-12T14:28:43+03:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2019-10-11T12:06:40+03:00</lastmod>
<lastmod>2019-10-12T14:28:43+03:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
<lastmod>2019-10-11T12:06:40+03:00</lastmod>
<lastmod>2019-10-12T14:28:43+03:00</lastmod>
</url>
<url>