diff --git a/content/posts/2018-10.md b/content/posts/2018-10.md index cf8f8dcde..a044f6f4f 100644 --- a/content/posts/2018-10.md +++ b/content/posts/2018-10.md @@ -346,4 +346,51 @@ $ time http --print h 'https://dspacetest.cgiar.org/rest/items?limit=100&offset= - I sent a mail to dspace-tech to ask how to profile this... +## 2018-10-17 + +- I decided to update most of the existing metadata values that we have in `dc.rights` on CGSpace to be machine readable in SPDX format (with Creative Commons version if it was included) +- Most of the are from Bioversity, and I asked Maria for permission before updating them +- I manually went through and looked at the existing values and updated them in several batches: + +``` +UPDATE metadatavalue SET text_value='CC-BY-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%CC BY %'; +UPDATE metadatavalue SET text_value='CC-BY-NC-ND-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%4.0%' AND text_value LIKE '%BY-NC-ND%' AND text_value LIKE '%by-nc-nd%'; +UPDATE metadatavalue SET text_value='CC-BY-NC-SA-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%4.0%' AND text_value LIKE '%BY-NC-SA%' AND text_value LIKE '%by-nc-sa%'; +UPDATE metadatavalue SET text_value='CC-BY-3.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%3.0%' AND text_value LIKE '%/by/%'; +UPDATE metadatavalue SET text_value='CC-BY-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%4.0%' AND text_value LIKE '%/by/%' AND text_value NOT LIKE '%zero%'; +UPDATE metadatavalue SET text_value='CC-BY-NC-2.5' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE +'%/by-nc%' AND text_value LIKE '%2.5%'; +UPDATE metadatavalue SET text_value='CC-BY-NC-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%/by-nc%' AND text_value LIKE '%4.0%'; +UPDATE metadatavalue SET text_value='CC-BY-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%4.0%' AND text_value LIKE '%Attribution %' AND text_value NOT LIKE '%zero%'; +UPDATE metadatavalue SET text_value='CC-BY-NC-SA-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value NOT LIKE '%zero%' AND text_value LIKE '%4.0%' AND text_value LIKE '%Attribution-NonCommercial-ShareAlike%'; +UPDATE metadatavalue SET text_value='CC-BY-NC-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%4.0%' AND text_value NOT LIKE '%zero%' AND text_value LIKE '%Attribution-NonCommercial %'; +UPDATE metadatavalue SET text_value='CC-BY-NC-3.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%3.0%' AND text_value NOT LIKE '%zero%' AND text_value LIKE '%Attribution-NonCommercial %'; +UPDATE metadatavalue SET text_value='CC-BY-3.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%3.0%' AND text_value NOT LIKE '%zero%' AND text_value LIKE '%Attribution %'; +UPDATE metadatavalue SET text_value='CC-BY-ND-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND resource_id=78184; +UPDATE metadatavalue SET text_value='CC-BY' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value NOT LIKE '%zero%' AND text_value NOT LIKE '%CC0%' AND text_value LIKE '%Attribution %' AND text_value NOT LIKE '%CC-%'; +UPDATE metadatavalue SET text_value='CC-BY-NC-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND resource_id=78564; +``` + +- I will update the values on CGSpace soon +- We also need to re-think the `dc.rights` field in the submission form: we should probably use a popup controlled vocabulary and list the Creative Commons values with version numbers and allow the user to enter their own (like the ORCID identifier field) +- Ask Jane if we can use some of the BDP money to host AReS explorer on a more powerful server +- IWMI sent me a list of new ORCID identifiers for their staff so I combined them with our list, updated the names with my [resolve-orcids.py](https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b) script, and regenerated the controlled vocabulary: + +``` +$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml MEL\ ORCID.json MEL\ ORCID_V2.json 2018-10-17-IWMI-ORCIDs.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq > +2018-10-17-orcids.txt +$ ./resolve-orcids.py -i 2018-10-17-orcids.txt -o 2018-10-17-names.txt -d +$ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-id.xml +``` + +- I also decided to add the ORCID identifiers that MEL had sent us a few months ago... +- One problem I had with the `resolve-orcids.py` script is that one user seems to have disabled their profile data since we last updated: + +``` +Looking up the names associated with ORCID iD: 0000-0001-7930-5752 +Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752 +``` + +- So I need to handle that situation in the script for sure, but I'm not sure what to do organizationally or ethically, since that user disabled their name! Do we remove him from the list? + diff --git a/docs/2018-10/index.html b/docs/2018-10/index.html index ffeefec06..ce8395f09 100644 --- a/docs/2018-10/index.html +++ b/docs/2018-10/index.html @@ -9,7 +9,7 @@ - + @@ -24,9 +24,9 @@ "@type": "BlogPosting", "headline": "October, 2018", "url": "https://alanorth.github.io/cgspace-notes/2018-10/", - "wordCount": "2451", + "wordCount": "3021", "datePublished": "2018-10-01T22:31:54+03:00", - "dateModified": "2018-10-16T17:26:18+03:00", + "dateModified": "2018-10-17T00:33:01+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -488,6 +488,58 @@ $ time http --print h 'https://dspacetest.cgiar.org/rest/items?expand=metadata,b
  • I sent a mail to dspace-tech to ask how to profile this…
  • +

    2018-10-17

    + + + +
    UPDATE metadatavalue SET text_value='CC-BY-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%CC BY %';
    +UPDATE metadatavalue SET text_value='CC-BY-NC-ND-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%4.0%' AND text_value LIKE '%BY-NC-ND%' AND text_value LIKE '%by-nc-nd%';
    +UPDATE metadatavalue SET text_value='CC-BY-NC-SA-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%4.0%' AND text_value LIKE '%BY-NC-SA%' AND text_value LIKE '%by-nc-sa%';
    +UPDATE metadatavalue SET text_value='CC-BY-3.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%3.0%' AND text_value LIKE '%/by/%';
    +UPDATE metadatavalue SET text_value='CC-BY-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%4.0%' AND text_value LIKE '%/by/%' AND text_value NOT LIKE '%zero%';
    +UPDATE metadatavalue SET text_value='CC-BY-NC-2.5' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE
    +'%/by-nc%' AND text_value LIKE '%2.5%';
    +UPDATE metadatavalue SET text_value='CC-BY-NC-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%/by-nc%' AND text_value LIKE '%4.0%';
    +UPDATE metadatavalue SET text_value='CC-BY-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%4.0%' AND text_value LIKE '%Attribution %' AND text_value NOT LIKE '%zero%';
    +UPDATE metadatavalue SET text_value='CC-BY-NC-SA-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value NOT LIKE '%zero%' AND text_value LIKE '%4.0%' AND text_value LIKE '%Attribution-NonCommercial-ShareAlike%';
    +UPDATE metadatavalue SET text_value='CC-BY-NC-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%4.0%' AND text_value NOT LIKE '%zero%' AND text_value LIKE '%Attribution-NonCommercial %';
    +UPDATE metadatavalue SET text_value='CC-BY-NC-3.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%3.0%' AND text_value NOT LIKE '%zero%' AND text_value LIKE '%Attribution-NonCommercial %';
    +UPDATE metadatavalue SET text_value='CC-BY-3.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%3.0%' AND text_value NOT LIKE '%zero%' AND text_value LIKE '%Attribution %';
    +UPDATE metadatavalue SET text_value='CC-BY-ND-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND resource_id=78184;
    +UPDATE metadatavalue SET text_value='CC-BY' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value NOT LIKE '%zero%' AND text_value NOT LIKE '%CC0%' AND text_value LIKE '%Attribution %' AND text_value NOT LIKE '%CC-%';
    +UPDATE metadatavalue SET text_value='CC-BY-NC-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND resource_id=78564;
    +
    + + + +
    $ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml MEL\ ORCID.json MEL\ ORCID_V2.json 2018-10-17-IWMI-ORCIDs.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq >
    +2018-10-17-orcids.txt
    +$ ./resolve-orcids.py -i 2018-10-17-orcids.txt -o 2018-10-17-names.txt -d
    +$ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-id.xml
    +
    + + + +
    Looking up the names associated with ORCID iD: 0000-0001-7930-5752
    +Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752
    +
    + + + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 1922e2a6a..76f3a9f8e 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,7 +4,7 @@ https://alanorth.github.io/cgspace-notes/2018-10/ - 2018-10-16T17:26:18+03:00 + 2018-10-17T00:33:01+03:00 @@ -189,7 +189,7 @@ https://alanorth.github.io/cgspace-notes/ - 2018-10-16T17:26:18+03:00 + 2018-10-17T00:33:01+03:00 0 @@ -200,7 +200,7 @@ https://alanorth.github.io/cgspace-notes/tags/notes/ - 2018-10-16T17:26:18+03:00 + 2018-10-17T00:33:01+03:00 0 @@ -212,13 +212,13 @@ https://alanorth.github.io/cgspace-notes/posts/ - 2018-10-16T17:26:18+03:00 + 2018-10-17T00:33:01+03:00 0 https://alanorth.github.io/cgspace-notes/tags/ - 2018-10-16T17:26:18+03:00 + 2018-10-17T00:33:01+03:00 0