+++ date = "2016-06-01T10:53:00+03:00" author = "Alan Orth" title = "June, 2016" tags = ["notes"] image = "../images/bg.jpg" +++ ## 2016-06-01 - Experimenting with IFPRI OAI (we want to harvest their publications) - After reading the [ContentDM documentation](https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html) I found IFPRI's OAI endpoint: http://ebrary.ifpri.org/oai/oai.php - After reading the [OAI documentation](https://www.openarchives.org/OAI/openarchivesprotocol.html) and testing with an [OAI validator](http://validator.oaipmh.com/) I found out how to get their publications - This is their publications set: http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&from=2016-01-01&set=p15738coll2&metadataPrefix=oai_dc - You can see the others by using the OAI `ListSets` verb: http://ebrary.ifpri.org/oai/oai.php?verb=ListSets - Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in `dc.identifier.fund` to `cg.identifier.cpwfproject` and then the rest to `dc.description.sponsorship` ``` dspacetest=# update metadatavalue set metadata_field_id=130 where metadata_field_id=75 and (text_value like 'PN%' or text_value like 'PHASE%' or text_value = 'CBA' or text_value = 'IA'); UPDATE 497 dspacetest=# update metadatavalue set metadata_field_id=29 where metadata_field_id=75; UPDATE 14 ``` - Fix a few minor miscellaneous issues in `dspace.cfg` ([#227](https://github.com/ilri/DSpace/pull/227)) ## 2016-06-02 - Testing the configuration and theme changes for the upcoming metadata migration and I found some issues with `cg.coverage.admin-unit` - Seems that the Browse configuration in `dspace.cfg` can't handle the '-' in the field name: ``` webui.browse.index.12 = subregion:metadata:cg.coverage.admin-unit:text ``` - But actually, I think since DSpace 4 or 5 (we are 5.1) the Browse indexes come from Discovery (defined in discovery.xml) so this is really just a parsing error - I've sent a message to the DSpace mailing list to ask about the Browse index definition - A user was having problems with submission and from the stacktrace it looks like a Sherpa/Romeo issue - I found a thread on the mailing list talking about it and there is bug report and a patch: https://jira.duraspace.org/browse/DS-2740 - The patch applies successfully on DSpace 5.1 so I will try it later ## 2016-06-03 - Investigating the CCAFS authority issue, I exported the metadata for the Videos collection - The top two authors are: ``` CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::500 CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::600 ``` - So the only difference is the "confidence" - Ok, well THAT is interesting: ``` dspacetest=# select text_value, authority, confidence from metadatavalue where metadata_field_id=3 and text_value like '%Orth, %'; text_value | authority | confidence ------------+--------------------------------------+------------ Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1 Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1 Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1 Orth, Alan | | -1 Orth, Alan | | -1 Orth, Alan | | -1 Orth, Alan | | -1 Orth, A. | 05c2c622-d252-4efb-b9ed-95a07d3adf11 | -1 Orth, A. | 05c2c622-d252-4efb-b9ed-95a07d3adf11 | -1 Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1 Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1 Orth, Alan | ad281dbf-ef81-4007-96c3-a7f5d2eaa6d9 | 600 Orth, Alan | ad281dbf-ef81-4007-96c3-a7f5d2eaa6d9 | 600 (13 rows) ``` - And now an actually relevent example: ``` dspacetest=# select count(*) from metadatavalue where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security' and confidence = 500; count ------- 707 (1 row) dspacetest=# select count(*) from metadatavalue where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security' and confidence != 500; count ------- 253 (1 row) ``` - Trying something experimental: ``` dspacetest=# update metadatavalue set confidence=500 where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security'; UPDATE 960 ``` - And then re-indexing authority and Discovery...? - After Discovery reindex the CCAFS authors are all together in the Authors sidebar facet - The docs for the ORCiD and Authority stuff for DSpace 5 mention changing the browse indexes to use the Authority as well: ``` webui.browse.index.2 = author:metadataAuthority:dc.contributor.author:authority ``` - That would only be for the "Browse by" function... so we'll have to see what effect that has later ## 2016-06-04 - Re-sync DSpace Test with CGSpace and perform test of metadata migration again - Run phase two of metadata migrations on CGSpace (see the [migration notes](https://gist.github.com/alanorth/1a730bec5ac9457a8fb0e3e72c98d09c)) - Run all system updates and reboot CGSpace server ## 2016-06-07 - Figured out how to export a list of the unique values from a metadata field ordered by count: ``` dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=29 group by text_value order by count desc) to /tmp/sponsorship.csv with csv; ``` - Identified the next round of fields to migrate: - dc.title.jtitle → dc.source - dc.crsubject.crpsubject → cg.contributor.crp - dc.contributor.affiliation → cg.contributor.affiliation - dc.Species → cg.species - dc.contributor.corporate → dc.contributor - dc.identifier.url → cg.identifier.url - dc.identifier.doi → cg.identifier.doi - dc.identifier.googleurl → cg.identifier.googleurl - dc.identifier.dataurl → cg.identifier.dataurl