diff --git a/content/posts/2019-10.md b/content/posts/2019-10.md index ebcf9cce3..82c0b0179 100644 --- a/content/posts/2019-10.md +++ b/content/posts/2019-10.md @@ -103,5 +103,29 @@ UPDATE 1 - More work on identifying duplicates in the Bioversity migration data on DSpace Test - I mapped twenty-five more items on CGSpace and deleted them from the migration test collection on DSpace Test + - After a few hours I think I finished all the duplicates that were identified by Atmire's Duplicate Checker module + - According to my spreadsheet there were fifty-two in total +- I was preparing to check the affiliations on the Bioversity records when I noticed that the last list of top affiliations I generated has some anomalies + - I made some corrections in a CSV: + +``` +from,to +CIAT,International Center for Tropical Agriculture +International Centre for Tropical Agriculture,International Center for Tropical Agriculture +International Maize and Wheat Improvement Center (CIMMYT),International Maize and Wheat Improvement Center +International Centre for Agricultural Research in the Dry Areas,International Center for Agricultural Research in the Dry Areas +International Maize and Wheat Improvement Centre,International Maize and Wheat Improvement Center +"Agricultural Information Resource Centre, Kenya.","Agricultural Information Resource Centre, Kenya" +"Centre for Livestock and Agricultural Development, Cambodia","Centre for Livestock and Agriculture Development, Cambodia" +``` + +- Then I applied it with my `fix-metadata-values.py` script on CGSpace: + +``` +$ ./fix-metadata-values.py -i /tmp/affiliations.csv -db dspace -u dspace -p 'fuuu' -f from -m 211 -t to +``` + +- I did some manual curation of ~227 authors in preparation for telling Peter and Abenet that the migration is almost ready + - I would still like to perhaps (re)move institutional authors from `dc.contributor.author` to `cg.contributor.affiliation`, but I will have to run that by Francesca, Carol, and Abenet diff --git a/docs/2019-10/index.html b/docs/2019-10/index.html index f2ed7196f..0cc244de7 100644 --- a/docs/2019-10/index.html +++ b/docs/2019-10/index.html @@ -11,7 +11,7 @@ - + @@ -27,9 +27,9 @@ "@type": "BlogPosting", "headline": "October, 2019", "url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-10\/", - "wordCount": "755", + "wordCount": "965", "datePublished": "2019-10-01T13:20:51+03:00", - "dateModified": "2019-10-11T12:06:40+03:00", + "dateModified": "2019-10-12T14:28:43+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -242,6 +242,35 @@ UPDATE 1
I was preparing to check the affiliations on the Bioversity records when I noticed that the last list of top affiliations I generated has some anomalies
+ +I made some corrections in a CSV:
+ +from,to
+CIAT,International Center for Tropical Agriculture
+International Centre for Tropical Agriculture,International Center for Tropical Agriculture
+International Maize and Wheat Improvement Center (CIMMYT),International Maize and Wheat Improvement Center
+International Centre for Agricultural Research in the Dry Areas,International Center for Agricultural Research in the Dry Areas
+International Maize and Wheat Improvement Centre,International Maize and Wheat Improvement Center
+"Agricultural Information Resource Centre, Kenya.","Agricultural Information Resource Centre, Kenya"
+"Centre for Livestock and Agricultural Development, Cambodia","Centre for Livestock and Agriculture Development, Cambodia"
+
Then I applied it with my fix-metadata-values.py
script on CGSpace:
$ ./fix-metadata-values.py -i /tmp/affiliations.csv -db dspace -u dspace -p 'fuuu' -f from -m 211 -t to
+
I did some manual curation of ~227 authors in preparation for telling Peter and Abenet that the migration is almost ready
+ +dc.contributor.author
to cg.contributor.affiliation
, but I will have to run that by Francesca, Carol, and Abenet