+++ date = "2016-10-03T15:53:00+03:00" author = "Alan Orth" title = "October, 2016" tags = ["Notes"] +++ ## 2016-10-03 - Testing adding [ORCIDs to a CSV](https://wiki.duraspace.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing) file for a single item to see if the author orders get messed up - Need to test the following scenarios to see how author order is affected: - ORCIDs only - ORCIDs plus normal authors - I exported a random item's metadata as CSV, deleted *all columns* except id and collection, and made a new coloum called `ORCID:dc.contributor.author` with the following random ORCIDs from the ORCID registry: ``` 0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X ``` - Hmm, with the `dc.contributor.author` column removed, DSpace doesn't detect any changes - With a blank `dc.contributor.author` column, DSpace wants to remove all non-ORCID authors and add the new ORCID authors - I added the [disclaimer text](https://github.com/ilri/DSpace/issues/234) to the About page, then added a footer link to the disclaimer's ID, but there is a Bootstrap issue that causes the page content to disappear when using in-page anchors: https://github.com/twbs/bootstrap/issues/1768 ![Bootstrap issue with in-page anchors](2016/10/bootstrap-issue.png) - Looks like we'll just have to add the text to the About page (without a link) or add a separate page ## 2016-10-04 - Start testing cleanups of authors that Peter sent last week - Out of 40,000+ rows, Peter had indicated corrections for ~3,200 of them—too many to look through carefully, so I did some basic quality checking: - Trim leading/trailing whitespace - Find invalid characters - Cluster values to merge obvious authors - That left us with 3,180 valid corrections and 3 deletions: ``` $ ./fix-metadata-values.py -i authors-fix-3180.csv -f dc.contributor.author -t correct -m 3 -d dspacetest -u dspacetest -p fuuu $ ./delete-metadata-values.py -i authors-delete-3.csv -f dc.contributor.author -m 3 -d dspacetest -u dspacetest -p fuuu ``` - Remove old about page ([#284](https://github.com/ilri/DSpace/pull/284)) - CGSpace crashed a few times today - Generate list of unique authors in CCAFS collections: ``` dspacetest=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/32729', '10568/5472', '10568/5473', '10568/10288', '10568/70974', '10568/3547', '10568/3549', '10568/3531','10568/16890','10568/5470','10568/3546', '10568/36024', '10568/66581', '10568/21789', '10568/5469', '10568/5468', '10568/3548', '10568/71053', '10568/25167'))) group by text_value order by count desc) to /tmp/ccafs-authors.csv with csv; ``` ## 2016-10-05 - Work on more infrastructure cleanups for Ansible DSpace role