diff --git a/content/2016-06.md b/content/2016-06.md index 637b6a5b3..54ea854ad 100644 --- a/content/2016-06.md +++ b/content/2016-06.md @@ -1,5 +1,5 @@ +++ -date = "2016-05-01T10:53:00+03:00" +date = "2016-06-01T10:53:00+03:00" author = "Alan Orth" title = "June, 2016" tags = ["notes"] @@ -110,3 +110,23 @@ webui.browse.index.2 = author:metadataAuthority:dc.contributor.author:authority - Re-sync DSpace Test with CGSpace and perform test of metadata migration again - Run phase two of metadata migrations on CGSpace (see the [migration notes](https://gist.github.com/alanorth/1a730bec5ac9457a8fb0e3e72c98d09c)) - Run all system updates and reboot CGSpace server + +## 2016-06-07 + +- Figured out how to export a list of the unique values from a metadata field ordered by count: + +``` +dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=29 group by text_value order by count desc) to /tmp/sponsorship.csv with csv; +``` + +- Identified the next round of fields to migrate: + - dc.title.jtitle → dc.source + - dc.crsubject.crpsubject → cg.contributor.crp + - dc.contributor.affiliation → cg.contributor.affiliation + - dc.Species → cg.species + - dc.contributor.corporate → dc.contributor + - dc.identifier.url → cg.identifier.url + - dc.identifier.doi → cg.identifier.doi + - dc.identifier.googleurl → cg.identifier.googleurl + - dc.identifier.dataurl → cg.identifier.dataurl + diff --git a/public/2016-04/index.html b/public/2016-04/index.html index 3b6f2d03d..21cc9ff38 100644 --- a/public/2016-04/index.html +++ b/public/2016-04/index.html @@ -550,7 +550,7 @@ dspace.log.2016-04-27:7271 - + diff --git a/public/2016-05/index.html b/public/2016-05/index.html index f353b57fd..029f6b7d2 100644 --- a/public/2016-05/index.html +++ b/public/2016-05/index.html @@ -393,10 +393,10 @@ sys 0m20.540s diff --git a/public/2016-06/index.html b/public/2016-06/index.html index 4053f3d52..a3212be31 100644 --- a/public/2016-06/index.html +++ b/public/2016-06/index.html @@ -11,7 +11,7 @@ - + @@ -65,8 +65,8 @@
Posted on -
@@ -197,6 +197,31 @@ UPDATE 960
  • Re-sync DSpace Test with CGSpace and perform test of metadata migration again
  • Run phase two of metadata migrations on CGSpace (see the migration notes)
  • Run all system updates and reboot CGSpace server
  • + + +

    2016-06-07

    + + + +
    dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=29 group by text_value order by count desc) to /tmp/sponsorship.csv with csv;
    +
    + + @@ -216,10 +241,10 @@ UPDATE 960 diff --git a/public/index.html b/public/index.html index 5eeb8a276..96e7a7180 100644 --- a/public/index.html +++ b/public/index.html @@ -58,6 +58,34 @@
    +
    +
    +

    June, 2016

    + +
    +
    + 2016-06-01 Experimenting with IFPRI OAI (we want to harvest their publications) After reading the ContentDM documentation I found IFPRI’s OAI endpoint: http://ebrary.ifpri.org/oai/oai.php After reading the OAI documentation and testing with an OAI validator I found out how to get their publications This is their publications set: http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&from=2016-01-01&set=p15738coll2&metadataPrefix=oai_dc You can see the others by using the OAI ListSets verb: http://ebrary.ifpri.org/oai/oai.php?verb=ListSets Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in dc.identifier.fund to cg.identifier.cpwfproject and then the rest to dc.description.sponsorship dspacetest=# update metadatavalue set metadata_field_id=130 where metadata_field_id=75 and (text_value like 'PN%' or text_value like 'PHASE%' or text_value = 'CBA' or text_value = 'IA'); UPDATE 497 dspacetest=# update metadatavalue set metadata_field_id=29 where metadata_field_id=75; UPDATE 14 Fix a few minor miscellaneous issues in dspace.cfg (#227) 2016-06-02 Testing the configuration and theme changes for the upcoming metadata migration and I found some issues with cg.coverage.admin-unit Seems that the Browse configuration in dspace.cfg can’t handle the ‘-’ in the field name: webui.browse.index.12 = subregion:metadata:cg.coverage.admin-unit:text But actually, I think since DSpace 4 or 5 (we are 5.1) the Browse indexes come from Discovery (defined in discovery.xml) so this is really just a parsing error I’ve sent a message to the DSpace mailing list to ask about the Browse index definition A user was having problems with submission and from the stacktrace it looks like a Sherpa/Romeo issue I found a thread on the mailing list talking about it and there is bug report and a patch: https://jira.duraspace.org/browse/DS-2740 The patch applies successfully on DSpace 5.1 so I will try it later 2016-06-03 Investigating the CCAFS authority issue, I exported the metadata for the Videos collection The top two authors are: CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::500 CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::600 So the only difference is the “confidence” Ok, well THAT is interesting: dspacetest=# select text_value, authority, confidence from metadatavalue where metadata_field_id=3 and text_value like '%Orth, %'; text_value | authority | confidence ------------+--------------------------------------+------------ Orth, A. +
    + + + +
    + + + +
    +

    May, 2016

    @@ -84,34 +112,6 @@ -
    - -
    -
    -

    June, 2016

    - -
    -
    - 2016-06-01 Experimenting with IFPRI OAI (we want to harvest their publications) After reading the ContentDM documentation I found IFPRI’s OAI endpoint: http://ebrary.ifpri.org/oai/oai.php After reading the OAI documentation and testing with an OAI validator I found out how to get their publications This is their publications set: http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&from=2016-01-01&set=p15738coll2&metadataPrefix=oai_dc You can see the others by using the OAI ListSets verb: http://ebrary.ifpri.org/oai/oai.php?verb=ListSets Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in dc.identifier.fund to cg.identifier.cpwfproject and then the rest to dc.description.sponsorship dspacetest=# update metadatavalue set metadata_field_id=130 where metadata_field_id=75 and (text_value like 'PN%' or text_value like 'PHASE%' or text_value = 'CBA' or text_value = 'IA'); UPDATE 497 dspacetest=# update metadatavalue set metadata_field_id=29 where metadata_field_id=75; UPDATE 14 Fix a few minor miscellaneous issues in dspace.cfg (#227) 2016-06-02 Testing the configuration and theme changes for the upcoming metadata migration and I found some issues with cg.coverage.admin-unit Seems that the Browse configuration in dspace.cfg can’t handle the ‘-’ in the field name: webui.browse.index.12 = subregion:metadata:cg.coverage.admin-unit:text But actually, I think since DSpace 4 or 5 (we are 5.1) the Browse indexes come from Discovery (defined in discovery.xml) so this is really just a parsing error I’ve sent a message to the DSpace mailing list to ask about the Browse index definition A user was having problems with submission and from the stacktrace it looks like a Sherpa/Romeo issue I found a thread on the mailing list talking about it and there is bug report and a patch: https://jira.duraspace.org/browse/DS-2740 The patch applies successfully on DSpace 5.1 so I will try it later 2016-06-03 Investigating the CCAFS authority issue, I exported the metadata for the Videos collection The top two authors are: CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::500 CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::600 So the only difference is the “confidence” Ok, well THAT is interesting: dspacetest=# select text_value, authority, confidence from metadatavalue where metadata_field_id=3 and text_value like '%Orth, %'; text_value | authority | confidence ------------+--------------------------------------+------------ Orth, A. -
    - - - -
    - - -
    diff --git a/public/index.xml b/public/index.xml index afa8a8f93..03625f0a3 100644 --- a/public/index.xml +++ b/public/index.xml @@ -6,9 +6,164 @@ Recent content on CGSpace Notes Hugo -- gohugo.io en-us - Sun, 01 May 2016 23:06:00 +0300 + Wed, 01 Jun 2016 10:53:00 +0300 + + June, 2016 + /cgspace-notes/2016-06/ + Wed, 01 Jun 2016 10:53:00 +0300 + + /cgspace-notes/2016-06/ + + +<h2 id="2016-06-01:6783872e82b68b1517e00f494e6b6504">2016-06-01</h2> + +<ul> +<li>Experimenting with IFPRI OAI (we want to harvest their publications)</li> +<li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI&rsquo;s OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li> +<li>After reading the <a href="https://www.openarchives.org/OAI/openarchivesprotocol.html">OAI documentation</a> and testing with an <a href="http://validator.oaipmh.com/">OAI validator</a> I found out how to get their publications</li> +<li>This is their publications set: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc">http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc</a></li> +<li>You can see the others by using the OAI <code>ListSets</code> verb: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListSets">http://ebrary.ifpri.org/oai/oai.php?verb=ListSets</a></li> +<li>Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in <code>dc.identifier.fund</code> to <code>cg.identifier.cpwfproject</code> and then the rest to <code>dc.description.sponsorship</code></li> +</ul> + +<pre><code>dspacetest=# update metadatavalue set metadata_field_id=130 where metadata_field_id=75 and (text_value like 'PN%' or text_value like 'PHASE%' or text_value = 'CBA' or text_value = 'IA'); +UPDATE 497 +dspacetest=# update metadatavalue set metadata_field_id=29 where metadata_field_id=75; +UPDATE 14 +</code></pre> + +<ul> +<li>Fix a few minor miscellaneous issues in <code>dspace.cfg</code> (<a href="https://github.com/ilri/DSpace/pull/227">#227</a>)</li> +</ul> + +<h2 id="2016-06-02:6783872e82b68b1517e00f494e6b6504">2016-06-02</h2> + +<ul> +<li>Testing the configuration and theme changes for the upcoming metadata migration and I found some issues with <code>cg.coverage.admin-unit</code></li> +<li>Seems that the Browse configuration in <code>dspace.cfg</code> can&rsquo;t handle the &lsquo;-&rsquo; in the field name:</li> +</ul> + +<pre><code>webui.browse.index.12 = subregion:metadata:cg.coverage.admin-unit:text +</code></pre> + +<ul> +<li>But actually, I think since DSpace 4 or 5 (we are 5.1) the Browse indexes come from Discovery (defined in discovery.xml) so this is really just a parsing error</li> +<li>I&rsquo;ve sent a message to the DSpace mailing list to ask about the Browse index definition</li> +<li>A user was having problems with submission and from the stacktrace it looks like a Sherpa/Romeo issue</li> +<li>I found a thread on the mailing list talking about it and there is bug report and a patch: <a href="https://jira.duraspace.org/browse/DS-2740">https://jira.duraspace.org/browse/DS-2740</a></li> +<li>The patch applies successfully on DSpace 5.1 so I will try it later</li> +</ul> + +<h2 id="2016-06-03:6783872e82b68b1517e00f494e6b6504">2016-06-03</h2> + +<ul> +<li>Investigating the CCAFS authority issue, I exported the metadata for the Videos collection</li> +<li>The top two authors are:</li> +</ul> + +<pre><code>CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::500 +CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::600 +</code></pre> + +<ul> +<li>So the only difference is the &ldquo;confidence&rdquo;</li> +<li>Ok, well THAT is interesting:</li> +</ul> + +<pre><code>dspacetest=# select text_value, authority, confidence from metadatavalue where metadata_field_id=3 and text_value like '%Orth, %'; + text_value | authority | confidence +------------+--------------------------------------+------------ + Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1 + Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1 + Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1 + Orth, Alan | | -1 + Orth, Alan | | -1 + Orth, Alan | | -1 + Orth, Alan | | -1 + Orth, A. | 05c2c622-d252-4efb-b9ed-95a07d3adf11 | -1 + Orth, A. | 05c2c622-d252-4efb-b9ed-95a07d3adf11 | -1 + Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1 + Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1 + Orth, Alan | ad281dbf-ef81-4007-96c3-a7f5d2eaa6d9 | 600 + Orth, Alan | ad281dbf-ef81-4007-96c3-a7f5d2eaa6d9 | 600 +(13 rows) +</code></pre> + +<ul> +<li>And now an actually relevent example:</li> +</ul> + +<pre><code>dspacetest=# select count(*) from metadatavalue where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security' and confidence = 500; + count +------- + 707 +(1 row) + +dspacetest=# select count(*) from metadatavalue where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security' and confidence != 500; + count +------- + 253 +(1 row) +</code></pre> + +<ul> +<li>Trying something experimental:</li> +</ul> + +<pre><code>dspacetest=# update metadatavalue set confidence=500 where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security'; +UPDATE 960 +</code></pre> + +<ul> +<li>And then re-indexing authority and Discovery&hellip;?</li> +<li>After Discovery reindex the CCAFS authors are all together in the Authors sidebar facet</li> +<li>The docs for the ORCiD and Authority stuff for DSpace 5 mention changing the browse indexes to use the Authority as well:</li> +</ul> + +<pre><code>webui.browse.index.2 = author:metadataAuthority:dc.contributor.author:authority +</code></pre> + +<ul> +<li>That would only be for the &ldquo;Browse by&rdquo; function&hellip; so we&rsquo;ll have to see what effect that has later</li> +</ul> + +<h2 id="2016-06-04:6783872e82b68b1517e00f494e6b6504">2016-06-04</h2> + +<ul> +<li>Re-sync DSpace Test with CGSpace and perform test of metadata migration again</li> +<li>Run phase two of metadata migrations on CGSpace (see the <a href="https://gist.github.com/alanorth/1a730bec5ac9457a8fb0e3e72c98d09c">migration notes</a>)</li> +<li>Run all system updates and reboot CGSpace server</li> +</ul> + +<h2 id="2016-06-07:6783872e82b68b1517e00f494e6b6504">2016-06-07</h2> + +<ul> +<li>Figured out how to export a list of the unique values from a metadata field ordered by count:</li> +</ul> + +<pre><code>dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=29 group by text_value order by count desc) to /tmp/sponsorship.csv with csv; +</code></pre> + +<ul> +<li>Identified the next round of fields to migrate: + +<ul> +<li>dc.title.jtitle → dc.source</li> +<li>dc.crsubject.crpsubject → cg.contributor.crp</li> +<li>dc.contributor.affiliation → cg.contributor.affiliation</li> +<li>dc.Species → cg.species</li> +<li>dc.contributor.corporate → dc.contributor</li> +<li>dc.identifier.url → cg.identifier.url</li> +<li>dc.identifier.doi → cg.identifier.doi</li> +<li>dc.identifier.googleurl → cg.identifier.googleurl</li> +<li>dc.identifier.dataurl → cg.identifier.dataurl</li> +</ul></li> +</ul> + + + May, 2016 /cgspace-notes/2016-05/ @@ -316,136 +471,6 @@ sys 0m20.540s - - June, 2016 - /cgspace-notes/2016-06/ - Sun, 01 May 2016 10:53:00 +0300 - - /cgspace-notes/2016-06/ - - -<h2 id="2016-06-01:6783872e82b68b1517e00f494e6b6504">2016-06-01</h2> - -<ul> -<li>Experimenting with IFPRI OAI (we want to harvest their publications)</li> -<li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI&rsquo;s OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li> -<li>After reading the <a href="https://www.openarchives.org/OAI/openarchivesprotocol.html">OAI documentation</a> and testing with an <a href="http://validator.oaipmh.com/">OAI validator</a> I found out how to get their publications</li> -<li>This is their publications set: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc">http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc</a></li> -<li>You can see the others by using the OAI <code>ListSets</code> verb: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListSets">http://ebrary.ifpri.org/oai/oai.php?verb=ListSets</a></li> -<li>Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in <code>dc.identifier.fund</code> to <code>cg.identifier.cpwfproject</code> and then the rest to <code>dc.description.sponsorship</code></li> -</ul> - -<pre><code>dspacetest=# update metadatavalue set metadata_field_id=130 where metadata_field_id=75 and (text_value like 'PN%' or text_value like 'PHASE%' or text_value = 'CBA' or text_value = 'IA'); -UPDATE 497 -dspacetest=# update metadatavalue set metadata_field_id=29 where metadata_field_id=75; -UPDATE 14 -</code></pre> - -<ul> -<li>Fix a few minor miscellaneous issues in <code>dspace.cfg</code> (<a href="https://github.com/ilri/DSpace/pull/227">#227</a>)</li> -</ul> - -<h2 id="2016-06-02:6783872e82b68b1517e00f494e6b6504">2016-06-02</h2> - -<ul> -<li>Testing the configuration and theme changes for the upcoming metadata migration and I found some issues with <code>cg.coverage.admin-unit</code></li> -<li>Seems that the Browse configuration in <code>dspace.cfg</code> can&rsquo;t handle the &lsquo;-&rsquo; in the field name:</li> -</ul> - -<pre><code>webui.browse.index.12 = subregion:metadata:cg.coverage.admin-unit:text -</code></pre> - -<ul> -<li>But actually, I think since DSpace 4 or 5 (we are 5.1) the Browse indexes come from Discovery (defined in discovery.xml) so this is really just a parsing error</li> -<li>I&rsquo;ve sent a message to the DSpace mailing list to ask about the Browse index definition</li> -<li>A user was having problems with submission and from the stacktrace it looks like a Sherpa/Romeo issue</li> -<li>I found a thread on the mailing list talking about it and there is bug report and a patch: <a href="https://jira.duraspace.org/browse/DS-2740">https://jira.duraspace.org/browse/DS-2740</a></li> -<li>The patch applies successfully on DSpace 5.1 so I will try it later</li> -</ul> - -<h2 id="2016-06-03:6783872e82b68b1517e00f494e6b6504">2016-06-03</h2> - -<ul> -<li>Investigating the CCAFS authority issue, I exported the metadata for the Videos collection</li> -<li>The top two authors are:</li> -</ul> - -<pre><code>CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::500 -CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::600 -</code></pre> - -<ul> -<li>So the only difference is the &ldquo;confidence&rdquo;</li> -<li>Ok, well THAT is interesting:</li> -</ul> - -<pre><code>dspacetest=# select text_value, authority, confidence from metadatavalue where metadata_field_id=3 and text_value like '%Orth, %'; - text_value | authority | confidence -------------+--------------------------------------+------------ - Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1 - Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1 - Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1 - Orth, Alan | | -1 - Orth, Alan | | -1 - Orth, Alan | | -1 - Orth, Alan | | -1 - Orth, A. | 05c2c622-d252-4efb-b9ed-95a07d3adf11 | -1 - Orth, A. | 05c2c622-d252-4efb-b9ed-95a07d3adf11 | -1 - Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1 - Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1 - Orth, Alan | ad281dbf-ef81-4007-96c3-a7f5d2eaa6d9 | 600 - Orth, Alan | ad281dbf-ef81-4007-96c3-a7f5d2eaa6d9 | 600 -(13 rows) -</code></pre> - -<ul> -<li>And now an actually relevent example:</li> -</ul> - -<pre><code>dspacetest=# select count(*) from metadatavalue where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security' and confidence = 500; - count -------- - 707 -(1 row) - -dspacetest=# select count(*) from metadatavalue where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security' and confidence != 500; - count -------- - 253 -(1 row) -</code></pre> - -<ul> -<li>Trying something experimental:</li> -</ul> - -<pre><code>dspacetest=# update metadatavalue set confidence=500 where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security'; -UPDATE 960 -</code></pre> - -<ul> -<li>And then re-indexing authority and Discovery&hellip;?</li> -<li>After Discovery reindex the CCAFS authors are all together in the Authors sidebar facet</li> -<li>The docs for the ORCiD and Authority stuff for DSpace 5 mention changing the browse indexes to use the Authority as well:</li> -</ul> - -<pre><code>webui.browse.index.2 = author:metadataAuthority:dc.contributor.author:authority -</code></pre> - -<ul> -<li>That would only be for the &ldquo;Browse by&rdquo; function&hellip; so we&rsquo;ll have to see what effect that has later</li> -</ul> - -<h2 id="2016-06-04:6783872e82b68b1517e00f494e6b6504">2016-06-04</h2> - -<ul> -<li>Re-sync DSpace Test with CGSpace and perform test of metadata migration again</li> -<li>Run phase two of metadata migrations on CGSpace (see the <a href="https://gist.github.com/alanorth/1a730bec5ac9457a8fb0e3e72c98d09c">migration notes</a>)</li> -<li>Run all system updates and reboot CGSpace server</li> -</ul> - - - April, 2016 /cgspace-notes/2016-04/ diff --git a/public/sitemap.xml b/public/sitemap.xml index aad337b86..c66b34b7f 100644 --- a/public/sitemap.xml +++ b/public/sitemap.xml @@ -3,20 +3,20 @@ /cgspace-notes/ - 2016-05-01T23:06:00+03:00 + 2016-06-01T10:53:00+03:00 0 + + /cgspace-notes/2016-06/ + 2016-06-01T10:53:00+03:00 + + /cgspace-notes/2016-05/ 2016-05-01T23:06:00+03:00 - - /cgspace-notes/2016-06/ - 2016-05-01T10:53:00+03:00 - - /cgspace-notes/2016-04/ 2016-04-04T11:06:00+03:00 diff --git a/public/tags/notes/index.html b/public/tags/notes/index.html index c991a23a9..794ef42ff 100644 --- a/public/tags/notes/index.html +++ b/public/tags/notes/index.html @@ -61,6 +61,32 @@

    Notes

    +
    +
    +
    +

    June, 2016

    + +
    +
    + 2016-06-01 Experimenting with IFPRI OAI (we want to harvest their publications) After reading the ContentDM documentation I found IFPRI’s OAI endpoint: http://ebrary.ifpri.org/oai/oai.php After reading the OAI documentation and testing with an OAI validator I found out how to get their publications This is their publications set: http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&from=2016-01-01&set=p15738coll2&metadataPrefix=oai_dc You can see the others by using the OAI ListSets verb: http://ebrary.ifpri.org/oai/oai.php?verb=ListSets Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in dc.identifier.fund to cg.identifier.cpwfproject and then the rest to dc.description.sponsorship dspacetest=# update metadatavalue set metadata_field_id=130 where metadata_field_id=75 and (text_value like 'PN%' or text_value like 'PHASE%' or text_value = 'CBA' or text_value = 'IA'); UPDATE 497 dspacetest=# update metadatavalue set metadata_field_id=29 where metadata_field_id=75; UPDATE 14 Fix a few minor miscellaneous issues in dspace.cfg (#227) 2016-06-02 Testing the configuration and theme changes for the upcoming metadata migration and I found some issues with cg.coverage.admin-unit Seems that the Browse configuration in dspace.cfg can’t handle the ‘-’ in the field name: webui.browse.index.12 = subregion:metadata:cg.coverage.admin-unit:text But actually, I think since DSpace 4 or 5 (we are 5.1) the Browse indexes come from Discovery (defined in discovery.xml) so this is really just a parsing error I’ve sent a message to the DSpace mailing list to ask about the Browse index definition A user was having problems with submission and from the stacktrace it looks like a Sherpa/Romeo issue I found a thread on the mailing list talking about it and there is bug report and a patch: https://jira.duraspace.org/browse/DS-2740 The patch applies successfully on DSpace 5.1 so I will try it later 2016-06-03 Investigating the CCAFS authority issue, I exported the metadata for the Videos collection The top two authors are: CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::500 CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::600 So the only difference is the “confidence” Ok, well THAT is interesting: dspacetest=# select text_value, authority, confidence from metadatavalue where metadata_field_id=3 and text_value like '%Orth, %'; text_value | authority | confidence ------------+--------------------------------------+------------ Orth, A. +
    + + + +
    + +
    @@ -87,32 +113,6 @@
    -
    -
    -
    -

    June, 2016

    - -
    -
    - 2016-06-01 Experimenting with IFPRI OAI (we want to harvest their publications) After reading the ContentDM documentation I found IFPRI’s OAI endpoint: http://ebrary.ifpri.org/oai/oai.php After reading the OAI documentation and testing with an OAI validator I found out how to get their publications This is their publications set: http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&from=2016-01-01&set=p15738coll2&metadataPrefix=oai_dc You can see the others by using the OAI ListSets verb: http://ebrary.ifpri.org/oai/oai.php?verb=ListSets Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in dc.identifier.fund to cg.identifier.cpwfproject and then the rest to dc.description.sponsorship dspacetest=# update metadatavalue set metadata_field_id=130 where metadata_field_id=75 and (text_value like 'PN%' or text_value like 'PHASE%' or text_value = 'CBA' or text_value = 'IA'); UPDATE 497 dspacetest=# update metadatavalue set metadata_field_id=29 where metadata_field_id=75; UPDATE 14 Fix a few minor miscellaneous issues in dspace.cfg (#227) 2016-06-02 Testing the configuration and theme changes for the upcoming metadata migration and I found some issues with cg.coverage.admin-unit Seems that the Browse configuration in dspace.cfg can’t handle the ‘-’ in the field name: webui.browse.index.12 = subregion:metadata:cg.coverage.admin-unit:text But actually, I think since DSpace 4 or 5 (we are 5.1) the Browse indexes come from Discovery (defined in discovery.xml) so this is really just a parsing error I’ve sent a message to the DSpace mailing list to ask about the Browse index definition A user was having problems with submission and from the stacktrace it looks like a Sherpa/Romeo issue I found a thread on the mailing list talking about it and there is bug report and a patch: https://jira.duraspace.org/browse/DS-2740 The patch applies successfully on DSpace 5.1 so I will try it later 2016-06-03 Investigating the CCAFS authority issue, I exported the metadata for the Videos collection The top two authors are: CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::500 CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::600 So the only difference is the “confidence” Ok, well THAT is interesting: dspacetest=# select text_value, authority, confidence from metadatavalue where metadata_field_id=3 and text_value like '%Orth, %'; text_value | authority | confidence ------------+--------------------------------------+------------ Orth, A. -
    - - - -
    - -
    diff --git a/public/tags/notes/index.xml b/public/tags/notes/index.xml index 38ec2fd40..4c5de4be6 100644 --- a/public/tags/notes/index.xml +++ b/public/tags/notes/index.xml @@ -6,9 +6,164 @@ Recent content in Notes on CGSpace Notes Hugo -- gohugo.io en-us - Sun, 01 May 2016 23:06:00 +0300 + Wed, 01 Jun 2016 10:53:00 +0300 + + June, 2016 + /cgspace-notes/2016-06/ + Wed, 01 Jun 2016 10:53:00 +0300 + + /cgspace-notes/2016-06/ + + +<h2 id="2016-06-01:6783872e82b68b1517e00f494e6b6504">2016-06-01</h2> + +<ul> +<li>Experimenting with IFPRI OAI (we want to harvest their publications)</li> +<li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI&rsquo;s OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li> +<li>After reading the <a href="https://www.openarchives.org/OAI/openarchivesprotocol.html">OAI documentation</a> and testing with an <a href="http://validator.oaipmh.com/">OAI validator</a> I found out how to get their publications</li> +<li>This is their publications set: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc">http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc</a></li> +<li>You can see the others by using the OAI <code>ListSets</code> verb: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListSets">http://ebrary.ifpri.org/oai/oai.php?verb=ListSets</a></li> +<li>Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in <code>dc.identifier.fund</code> to <code>cg.identifier.cpwfproject</code> and then the rest to <code>dc.description.sponsorship</code></li> +</ul> + +<pre><code>dspacetest=# update metadatavalue set metadata_field_id=130 where metadata_field_id=75 and (text_value like 'PN%' or text_value like 'PHASE%' or text_value = 'CBA' or text_value = 'IA'); +UPDATE 497 +dspacetest=# update metadatavalue set metadata_field_id=29 where metadata_field_id=75; +UPDATE 14 +</code></pre> + +<ul> +<li>Fix a few minor miscellaneous issues in <code>dspace.cfg</code> (<a href="https://github.com/ilri/DSpace/pull/227">#227</a>)</li> +</ul> + +<h2 id="2016-06-02:6783872e82b68b1517e00f494e6b6504">2016-06-02</h2> + +<ul> +<li>Testing the configuration and theme changes for the upcoming metadata migration and I found some issues with <code>cg.coverage.admin-unit</code></li> +<li>Seems that the Browse configuration in <code>dspace.cfg</code> can&rsquo;t handle the &lsquo;-&rsquo; in the field name:</li> +</ul> + +<pre><code>webui.browse.index.12 = subregion:metadata:cg.coverage.admin-unit:text +</code></pre> + +<ul> +<li>But actually, I think since DSpace 4 or 5 (we are 5.1) the Browse indexes come from Discovery (defined in discovery.xml) so this is really just a parsing error</li> +<li>I&rsquo;ve sent a message to the DSpace mailing list to ask about the Browse index definition</li> +<li>A user was having problems with submission and from the stacktrace it looks like a Sherpa/Romeo issue</li> +<li>I found a thread on the mailing list talking about it and there is bug report and a patch: <a href="https://jira.duraspace.org/browse/DS-2740">https://jira.duraspace.org/browse/DS-2740</a></li> +<li>The patch applies successfully on DSpace 5.1 so I will try it later</li> +</ul> + +<h2 id="2016-06-03:6783872e82b68b1517e00f494e6b6504">2016-06-03</h2> + +<ul> +<li>Investigating the CCAFS authority issue, I exported the metadata for the Videos collection</li> +<li>The top two authors are:</li> +</ul> + +<pre><code>CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::500 +CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::600 +</code></pre> + +<ul> +<li>So the only difference is the &ldquo;confidence&rdquo;</li> +<li>Ok, well THAT is interesting:</li> +</ul> + +<pre><code>dspacetest=# select text_value, authority, confidence from metadatavalue where metadata_field_id=3 and text_value like '%Orth, %'; + text_value | authority | confidence +------------+--------------------------------------+------------ + Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1 + Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1 + Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1 + Orth, Alan | | -1 + Orth, Alan | | -1 + Orth, Alan | | -1 + Orth, Alan | | -1 + Orth, A. | 05c2c622-d252-4efb-b9ed-95a07d3adf11 | -1 + Orth, A. | 05c2c622-d252-4efb-b9ed-95a07d3adf11 | -1 + Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1 + Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1 + Orth, Alan | ad281dbf-ef81-4007-96c3-a7f5d2eaa6d9 | 600 + Orth, Alan | ad281dbf-ef81-4007-96c3-a7f5d2eaa6d9 | 600 +(13 rows) +</code></pre> + +<ul> +<li>And now an actually relevent example:</li> +</ul> + +<pre><code>dspacetest=# select count(*) from metadatavalue where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security' and confidence = 500; + count +------- + 707 +(1 row) + +dspacetest=# select count(*) from metadatavalue where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security' and confidence != 500; + count +------- + 253 +(1 row) +</code></pre> + +<ul> +<li>Trying something experimental:</li> +</ul> + +<pre><code>dspacetest=# update metadatavalue set confidence=500 where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security'; +UPDATE 960 +</code></pre> + +<ul> +<li>And then re-indexing authority and Discovery&hellip;?</li> +<li>After Discovery reindex the CCAFS authors are all together in the Authors sidebar facet</li> +<li>The docs for the ORCiD and Authority stuff for DSpace 5 mention changing the browse indexes to use the Authority as well:</li> +</ul> + +<pre><code>webui.browse.index.2 = author:metadataAuthority:dc.contributor.author:authority +</code></pre> + +<ul> +<li>That would only be for the &ldquo;Browse by&rdquo; function&hellip; so we&rsquo;ll have to see what effect that has later</li> +</ul> + +<h2 id="2016-06-04:6783872e82b68b1517e00f494e6b6504">2016-06-04</h2> + +<ul> +<li>Re-sync DSpace Test with CGSpace and perform test of metadata migration again</li> +<li>Run phase two of metadata migrations on CGSpace (see the <a href="https://gist.github.com/alanorth/1a730bec5ac9457a8fb0e3e72c98d09c">migration notes</a>)</li> +<li>Run all system updates and reboot CGSpace server</li> +</ul> + +<h2 id="2016-06-07:6783872e82b68b1517e00f494e6b6504">2016-06-07</h2> + +<ul> +<li>Figured out how to export a list of the unique values from a metadata field ordered by count:</li> +</ul> + +<pre><code>dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=29 group by text_value order by count desc) to /tmp/sponsorship.csv with csv; +</code></pre> + +<ul> +<li>Identified the next round of fields to migrate: + +<ul> +<li>dc.title.jtitle → dc.source</li> +<li>dc.crsubject.crpsubject → cg.contributor.crp</li> +<li>dc.contributor.affiliation → cg.contributor.affiliation</li> +<li>dc.Species → cg.species</li> +<li>dc.contributor.corporate → dc.contributor</li> +<li>dc.identifier.url → cg.identifier.url</li> +<li>dc.identifier.doi → cg.identifier.doi</li> +<li>dc.identifier.googleurl → cg.identifier.googleurl</li> +<li>dc.identifier.dataurl → cg.identifier.dataurl</li> +</ul></li> +</ul> + + + May, 2016 /cgspace-notes/2016-05/ @@ -316,136 +471,6 @@ sys 0m20.540s - - June, 2016 - /cgspace-notes/2016-06/ - Sun, 01 May 2016 10:53:00 +0300 - - /cgspace-notes/2016-06/ - - -<h2 id="2016-06-01:6783872e82b68b1517e00f494e6b6504">2016-06-01</h2> - -<ul> -<li>Experimenting with IFPRI OAI (we want to harvest their publications)</li> -<li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI&rsquo;s OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li> -<li>After reading the <a href="https://www.openarchives.org/OAI/openarchivesprotocol.html">OAI documentation</a> and testing with an <a href="http://validator.oaipmh.com/">OAI validator</a> I found out how to get their publications</li> -<li>This is their publications set: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc">http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc</a></li> -<li>You can see the others by using the OAI <code>ListSets</code> verb: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListSets">http://ebrary.ifpri.org/oai/oai.php?verb=ListSets</a></li> -<li>Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in <code>dc.identifier.fund</code> to <code>cg.identifier.cpwfproject</code> and then the rest to <code>dc.description.sponsorship</code></li> -</ul> - -<pre><code>dspacetest=# update metadatavalue set metadata_field_id=130 where metadata_field_id=75 and (text_value like 'PN%' or text_value like 'PHASE%' or text_value = 'CBA' or text_value = 'IA'); -UPDATE 497 -dspacetest=# update metadatavalue set metadata_field_id=29 where metadata_field_id=75; -UPDATE 14 -</code></pre> - -<ul> -<li>Fix a few minor miscellaneous issues in <code>dspace.cfg</code> (<a href="https://github.com/ilri/DSpace/pull/227">#227</a>)</li> -</ul> - -<h2 id="2016-06-02:6783872e82b68b1517e00f494e6b6504">2016-06-02</h2> - -<ul> -<li>Testing the configuration and theme changes for the upcoming metadata migration and I found some issues with <code>cg.coverage.admin-unit</code></li> -<li>Seems that the Browse configuration in <code>dspace.cfg</code> can&rsquo;t handle the &lsquo;-&rsquo; in the field name:</li> -</ul> - -<pre><code>webui.browse.index.12 = subregion:metadata:cg.coverage.admin-unit:text -</code></pre> - -<ul> -<li>But actually, I think since DSpace 4 or 5 (we are 5.1) the Browse indexes come from Discovery (defined in discovery.xml) so this is really just a parsing error</li> -<li>I&rsquo;ve sent a message to the DSpace mailing list to ask about the Browse index definition</li> -<li>A user was having problems with submission and from the stacktrace it looks like a Sherpa/Romeo issue</li> -<li>I found a thread on the mailing list talking about it and there is bug report and a patch: <a href="https://jira.duraspace.org/browse/DS-2740">https://jira.duraspace.org/browse/DS-2740</a></li> -<li>The patch applies successfully on DSpace 5.1 so I will try it later</li> -</ul> - -<h2 id="2016-06-03:6783872e82b68b1517e00f494e6b6504">2016-06-03</h2> - -<ul> -<li>Investigating the CCAFS authority issue, I exported the metadata for the Videos collection</li> -<li>The top two authors are:</li> -</ul> - -<pre><code>CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::500 -CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::600 -</code></pre> - -<ul> -<li>So the only difference is the &ldquo;confidence&rdquo;</li> -<li>Ok, well THAT is interesting:</li> -</ul> - -<pre><code>dspacetest=# select text_value, authority, confidence from metadatavalue where metadata_field_id=3 and text_value like '%Orth, %'; - text_value | authority | confidence -------------+--------------------------------------+------------ - Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1 - Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1 - Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1 - Orth, Alan | | -1 - Orth, Alan | | -1 - Orth, Alan | | -1 - Orth, Alan | | -1 - Orth, A. | 05c2c622-d252-4efb-b9ed-95a07d3adf11 | -1 - Orth, A. | 05c2c622-d252-4efb-b9ed-95a07d3adf11 | -1 - Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1 - Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1 - Orth, Alan | ad281dbf-ef81-4007-96c3-a7f5d2eaa6d9 | 600 - Orth, Alan | ad281dbf-ef81-4007-96c3-a7f5d2eaa6d9 | 600 -(13 rows) -</code></pre> - -<ul> -<li>And now an actually relevent example:</li> -</ul> - -<pre><code>dspacetest=# select count(*) from metadatavalue where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security' and confidence = 500; - count -------- - 707 -(1 row) - -dspacetest=# select count(*) from metadatavalue where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security' and confidence != 500; - count -------- - 253 -(1 row) -</code></pre> - -<ul> -<li>Trying something experimental:</li> -</ul> - -<pre><code>dspacetest=# update metadatavalue set confidence=500 where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security'; -UPDATE 960 -</code></pre> - -<ul> -<li>And then re-indexing authority and Discovery&hellip;?</li> -<li>After Discovery reindex the CCAFS authors are all together in the Authors sidebar facet</li> -<li>The docs for the ORCiD and Authority stuff for DSpace 5 mention changing the browse indexes to use the Authority as well:</li> -</ul> - -<pre><code>webui.browse.index.2 = author:metadataAuthority:dc.contributor.author:authority -</code></pre> - -<ul> -<li>That would only be for the &ldquo;Browse by&rdquo; function&hellip; so we&rsquo;ll have to see what effect that has later</li> -</ul> - -<h2 id="2016-06-04:6783872e82b68b1517e00f494e6b6504">2016-06-04</h2> - -<ul> -<li>Re-sync DSpace Test with CGSpace and perform test of metadata migration again</li> -<li>Run phase two of metadata migrations on CGSpace (see the <a href="https://gist.github.com/alanorth/1a730bec5ac9457a8fb0e3e72c98d09c">migration notes</a>)</li> -<li>Run all system updates and reboot CGSpace server</li> -</ul> - - - April, 2016 /cgspace-notes/2016-04/