diff --git a/content/posts/2019-10.md b/content/posts/2019-10.md index ebcf9cce3..82c0b0179 100644 --- a/content/posts/2019-10.md +++ b/content/posts/2019-10.md @@ -103,5 +103,29 @@ UPDATE 1 - More work on identifying duplicates in the Bioversity migration data on DSpace Test - I mapped twenty-five more items on CGSpace and deleted them from the migration test collection on DSpace Test + - After a few hours I think I finished all the duplicates that were identified by Atmire's Duplicate Checker module + - According to my spreadsheet there were fifty-two in total +- I was preparing to check the affiliations on the Bioversity records when I noticed that the last list of top affiliations I generated has some anomalies + - I made some corrections in a CSV: + +``` +from,to +CIAT,International Center for Tropical Agriculture +International Centre for Tropical Agriculture,International Center for Tropical Agriculture +International Maize and Wheat Improvement Center (CIMMYT),International Maize and Wheat Improvement Center +International Centre for Agricultural Research in the Dry Areas,International Center for Agricultural Research in the Dry Areas +International Maize and Wheat Improvement Centre,International Maize and Wheat Improvement Center +"Agricultural Information Resource Centre, Kenya.","Agricultural Information Resource Centre, Kenya" +"Centre for Livestock and Agricultural Development, Cambodia","Centre for Livestock and Agriculture Development, Cambodia" +``` + +- Then I applied it with my `fix-metadata-values.py` script on CGSpace: + +``` +$ ./fix-metadata-values.py -i /tmp/affiliations.csv -db dspace -u dspace -p 'fuuu' -f from -m 211 -t to +``` + +- I did some manual curation of ~227 authors in preparation for telling Peter and Abenet that the migration is almost ready + - I would still like to perhaps (re)move institutional authors from `dc.contributor.author` to `cg.contributor.affiliation`, but I will have to run that by Francesca, Carol, and Abenet diff --git a/docs/2019-10/index.html b/docs/2019-10/index.html index f2ed7196f..0cc244de7 100644 --- a/docs/2019-10/index.html +++ b/docs/2019-10/index.html @@ -11,7 +11,7 @@ - + @@ -27,9 +27,9 @@ "@type": "BlogPosting", "headline": "October, 2019", "url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-10\/", - "wordCount": "755", + "wordCount": "965", "datePublished": "2019-10-01T13:20:51+03:00", - "dateModified": "2019-10-11T12:06:40+03:00", + "dateModified": "2019-10-12T14:28:43+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -242,6 +242,35 @@ UPDATE 1 + +
  • I was preparing to check the affiliations on the Bioversity records when I noticed that the last list of top affiliations I generated has some anomalies

    + +
  • + +
  • Then I applied it with my fix-metadata-values.py script on CGSpace:

    + +
    $ ./fix-metadata-values.py -i /tmp/affiliations.csv -db dspace -u dspace -p 'fuuu' -f from -m 211 -t to
    +
  • + +
  • I did some manual curation of ~227 authors in preparation for telling Peter and Abenet that the migration is almost ready

    + +
  • diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 3f62c79de..4376f1f24 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,27 +4,27 @@ https://alanorth.github.io/cgspace-notes/ - 2019-10-11T12:06:40+03:00 + 2019-10-12T14:28:43+03:00 https://alanorth.github.io/cgspace-notes/tags/notes/ - 2019-10-11T12:06:40+03:00 + 2019-10-12T14:28:43+03:00 https://alanorth.github.io/cgspace-notes/2019-10/ - 2019-10-11T12:06:40+03:00 + 2019-10-12T14:28:43+03:00 https://alanorth.github.io/cgspace-notes/posts/ - 2019-10-11T12:06:40+03:00 + 2019-10-12T14:28:43+03:00 https://alanorth.github.io/cgspace-notes/tags/ - 2019-10-11T12:06:40+03:00 + 2019-10-12T14:28:43+03:00