diff --git a/content/posts/2019-08.md b/content/posts/2019-08.md index 5ad98a482..d557391f4 100644 --- a/content/posts/2019-08.md +++ b/content/posts/2019-08.md @@ -283,5 +283,33 @@ sys 2m24.715s ## 2019-08-28 - Skype with Jane about AReS Phase III priorities +- I did a test to automatically fix some authors in the database using my csv-metadata-quality script + - First I dumped a list of all unique authors: + +``` +dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/2019-08-28-all-authors.csv with csv header; +COPY 65597 +``` + +- Then I created a new CSV with two author columns (edit title of second column after): + +``` +$ csvcut -c dc.contributor.author,dc.contributor.author /tmp/2019-08-28-all-authors.csv > /tmp/all-authors.csv +``` + +- Then I ran my script on the new CSV, skipping one of the author columns: + +``` +$ csv-metadata-quality -u -i /tmp/all-authors.csv -o /tmp/authors.csv -x dc.contributor.author +``` + +- This fixed a bunch of issues with spaces, commas, unneccesary Unicode characters, etc +- Then I ran the corrections on my test server and there were 185 of them! + +``` +$ ./fix-metadata-values.py -i /tmp/authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -m 3 -t correctauthor +``` + +- I very well might run these on CGSpace soon... diff --git a/docs/2019-08/index.html b/docs/2019-08/index.html index cb4744b87..fe7fb88c3 100644 --- a/docs/2019-08/index.html +++ b/docs/2019-08/index.html @@ -27,7 +27,7 @@ Run system updates on DSpace Test (linode19) and reboot it - + @@ -59,9 +59,9 @@ Run system updates on DSpace Test (linode19) and reboot it "@type": "BlogPosting", "headline": "August, 2019", "url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-08\/", - "wordCount": "2065", + "wordCount": "2233", "datePublished": "2019-08-03T12:39:51\x2b03:00", - "dateModified": "2019-08-28T00:12:00\x2b03:00", + "dateModified": "2019-08-28T11:19:52\x2b03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -507,6 +507,35 @@ sys 2m24.715s diff --git a/docs/sitemap.xml b/docs/sitemap.xml index eaa4c856e..63cd6b011 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,27 +4,27 @@ https://alanorth.github.io/cgspace-notes/2019-08/ - 2019-08-28T00:12:00+03:00 + 2019-08-28T11:19:52+03:00 https://alanorth.github.io/cgspace-notes/ - 2019-08-28T00:12:00+03:00 + 2019-08-28T11:19:52+03:00 https://alanorth.github.io/cgspace-notes/tags/notes/ - 2019-08-28T00:12:00+03:00 + 2019-08-28T11:19:52+03:00 https://alanorth.github.io/cgspace-notes/posts/ - 2019-08-28T00:12:00+03:00 + 2019-08-28T11:19:52+03:00 https://alanorth.github.io/cgspace-notes/tags/ - 2019-08-28T00:12:00+03:00 + 2019-08-28T11:19:52+03:00