cgspace-notes/content/2016-06.md

133 lines
6.1 KiB
Markdown

+++
date = "2016-06-01T10:53:00+03:00"
author = "Alan Orth"
title = "June, 2016"
tags = ["notes"]
image = "../images/bg.jpg"
+++
## 2016-06-01
- Experimenting with IFPRI OAI (we want to harvest their publications)
- After reading the [ContentDM documentation](https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html) I found IFPRI's OAI endpoint: http://ebrary.ifpri.org/oai/oai.php
- After reading the [OAI documentation](https://www.openarchives.org/OAI/openarchivesprotocol.html) and testing with an [OAI validator](http://validator.oaipmh.com/) I found out how to get their publications
- This is their publications set: http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&from=2016-01-01&set=p15738coll2&metadataPrefix=oai_dc
- You can see the others by using the OAI `ListSets` verb: http://ebrary.ifpri.org/oai/oai.php?verb=ListSets
- Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in `dc.identifier.fund` to `cg.identifier.cpwfproject` and then the rest to `dc.description.sponsorship`
```
dspacetest=# update metadatavalue set metadata_field_id=130 where metadata_field_id=75 and (text_value like 'PN%' or text_value like 'PHASE%' or text_value = 'CBA' or text_value = 'IA');
UPDATE 497
dspacetest=# update metadatavalue set metadata_field_id=29 where metadata_field_id=75;
UPDATE 14
```
- Fix a few minor miscellaneous issues in `dspace.cfg` ([#227](https://github.com/ilri/DSpace/pull/227))
## 2016-06-02
- Testing the configuration and theme changes for the upcoming metadata migration and I found some issues with `cg.coverage.admin-unit`
- Seems that the Browse configuration in `dspace.cfg` can't handle the '-' in the field name:
```
webui.browse.index.12 = subregion:metadata:cg.coverage.admin-unit:text
```
- But actually, I think since DSpace 4 or 5 (we are 5.1) the Browse indexes come from Discovery (defined in discovery.xml) so this is really just a parsing error
- I've sent a message to the DSpace mailing list to ask about the Browse index definition
- A user was having problems with submission and from the stacktrace it looks like a Sherpa/Romeo issue
- I found a thread on the mailing list talking about it and there is bug report and a patch: https://jira.duraspace.org/browse/DS-2740
- The patch applies successfully on DSpace 5.1 so I will try it later
## 2016-06-03
- Investigating the CCAFS authority issue, I exported the metadata for the Videos collection
- The top two authors are:
```
CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::500
CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::600
```
- So the only difference is the "confidence"
- Ok, well THAT is interesting:
```
dspacetest=# select text_value, authority, confidence from metadatavalue where metadata_field_id=3 and text_value like '%Orth, %';
text_value | authority | confidence
------------+--------------------------------------+------------
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
Orth, Alan | | -1
Orth, Alan | | -1
Orth, Alan | | -1
Orth, Alan | | -1
Orth, A. | 05c2c622-d252-4efb-b9ed-95a07d3adf11 | -1
Orth, A. | 05c2c622-d252-4efb-b9ed-95a07d3adf11 | -1
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
Orth, Alan | ad281dbf-ef81-4007-96c3-a7f5d2eaa6d9 | 600
Orth, Alan | ad281dbf-ef81-4007-96c3-a7f5d2eaa6d9 | 600
(13 rows)
```
- And now an actually relevent example:
```
dspacetest=# select count(*) from metadatavalue where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security' and confidence = 500;
count
-------
707
(1 row)
dspacetest=# select count(*) from metadatavalue where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security' and confidence != 500;
count
-------
253
(1 row)
```
- Trying something experimental:
```
dspacetest=# update metadatavalue set confidence=500 where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security';
UPDATE 960
```
- And then re-indexing authority and Discovery...?
- After Discovery reindex the CCAFS authors are all together in the Authors sidebar facet
- The docs for the ORCiD and Authority stuff for DSpace 5 mention changing the browse indexes to use the Authority as well:
```
webui.browse.index.2 = author:metadataAuthority:dc.contributor.author:authority
```
- That would only be for the "Browse by" function... so we'll have to see what effect that has later
## 2016-06-04
- Re-sync DSpace Test with CGSpace and perform test of metadata migration again
- Run phase two of metadata migrations on CGSpace (see the [migration notes](https://gist.github.com/alanorth/1a730bec5ac9457a8fb0e3e72c98d09c))
- Run all system updates and reboot CGSpace server
## 2016-06-07
- Figured out how to export a list of the unique values from a metadata field ordered by count:
```
dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=29 group by text_value order by count desc) to /tmp/sponsorship.csv with csv;
```
- Identified the next round of fields to migrate:
- dc.title.jtitle → dc.source
- dc.crsubject.crpsubject → cg.contributor.crp
- dc.contributor.affiliation → cg.contributor.affiliation
- dc.Species → cg.species
- dc.contributor.corporate → dc.contributor
- dc.identifier.url → cg.identifier.url
- dc.identifier.doi → cg.identifier.doi
- dc.identifier.googleurl → cg.identifier.googleurl
- dc.identifier.dataurl → cg.identifier.dataurl