cgspace-notes/docs/2016-06/index.html

464 lines
22 KiB
HTML
Raw Normal View History

2018-02-11 17:28:23 +01:00
<!DOCTYPE html>
<html lang="en" >
2018-02-11 17:28:23 +01:00
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
2020-12-06 15:53:29 +01:00
2018-02-11 17:28:23 +01:00
<meta property="og:title" content="June, 2016" />
<meta property="og:description" content="2016-06-01
Experimenting with IFPRI OAI (we want to harvest their publications)
2020-01-27 15:20:44 +01:00
After reading the ContentDM documentation I found IFPRI&rsquo;s OAI endpoint: http://ebrary.ifpri.org/oai/oai.php
2018-02-11 17:28:23 +01:00
After reading the OAI documentation and testing with an OAI validator I found out how to get their publications
This is their publications set: http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc
You can see the others by using the OAI ListSets verb: http://ebrary.ifpri.org/oai/oai.php?verb=ListSets
Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in dc.identifier.fund to cg.identifier.cpwfproject and then the rest to dc.description.sponsorship
" />
<meta property="og:type" content="article" />
2019-02-02 13:12:57 +01:00
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2016-06/" />
2019-08-08 17:10:44 +02:00
<meta property="article:published_time" content="2016-06-01T10:53:00+03:00" />
2020-11-30 19:12:55 +01:00
<meta property="article:modified_time" content="2020-11-30T12:10:20+02:00" />
2018-09-30 07:23:48 +02:00
2020-12-06 15:53:29 +01:00
2018-02-11 17:28:23 +01:00
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="June, 2016"/>
<meta name="twitter:description" content="2016-06-01
Experimenting with IFPRI OAI (we want to harvest their publications)
2020-01-27 15:20:44 +01:00
After reading the ContentDM documentation I found IFPRI&rsquo;s OAI endpoint: http://ebrary.ifpri.org/oai/oai.php
2018-02-11 17:28:23 +01:00
After reading the OAI documentation and testing with an OAI validator I found out how to get their publications
This is their publications set: http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc
You can see the others by using the OAI ListSets verb: http://ebrary.ifpri.org/oai/oai.php?verb=ListSets
Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in dc.identifier.fund to cg.identifier.cpwfproject and then the rest to dc.description.sponsorship
"/>
2022-09-05 15:59:11 +02:00
<meta name="generator" content="Hugo 0.102.3" />
2018-02-11 17:28:23 +01:00
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "June, 2016",
2020-04-02 09:55:42 +02:00
"url": "https://alanorth.github.io/cgspace-notes/2016-06/",
2020-11-30 19:12:55 +01:00
"wordCount": "1551",
"datePublished": "2016-06-01T10:53:00+03:00",
2020-11-30 19:12:55 +01:00
"dateModified": "2020-11-30T12:10:20+02:00",
2018-02-11 17:28:23 +01:00
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"keywords": "Notes"
}
</script>
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2016-06/">
<title>June, 2016 | CGSpace Notes</title>
2018-02-11 17:28:23 +01:00
<!-- combined, minified CSS -->
2020-01-23 19:19:38 +01:00
2022-09-07 16:58:52 +02:00
<link href="https://alanorth.github.io/cgspace-notes/css/style.c6ba80bc50669557645abe05f86b73cc5af84408ed20f1551a267bc19ece8228.css" rel="stylesheet" integrity="sha256-xrqAvFBmlVdkWr4F&#43;GtzzFr4RAjtIPFVGiZ7wZ7Ogig=" crossorigin="anonymous">
2018-02-11 17:28:23 +01:00
2020-01-28 11:01:42 +01:00
<!-- minified Font Awesome for SVG icons -->
2021-09-28 09:32:32 +02:00
<script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f5072c55a0721857184db93a50561d7dc13975b4de2e19db7f81eb5f3fa57270.js" integrity="sha256-9QcsVaByGFcYTbk6UFYdfcE5dbTeLhnbf4HrXz&#43;lcnA=" crossorigin="anonymous"></script>
2020-01-28 11:01:42 +01:00
2019-04-14 15:59:47 +02:00
<!-- RSS 2.0 feed -->
2018-02-11 17:28:23 +01:00
</head>
<body>
<div class="blog-masthead">
<div class="container">
<nav class="nav blog-nav">
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
</nav>
</div>
</div>
2018-12-19 12:20:39 +01:00
2018-02-11 17:28:23 +01:00
<header class="blog-header">
<div class="container">
<h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
<p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
2018-02-11 17:28:23 +01:00
</div>
</header>
2018-12-19 12:20:39 +01:00
2018-02-11 17:28:23 +01:00
<div class="container">
<div class="row">
<div class="col-sm-8 blog-main">
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2016-06/">June, 2016</a></h2>
2020-11-16 09:54:00 +01:00
<p class="blog-post-meta">
<time datetime="2016-06-01T10:53:00+03:00">Wed Jun 01, 2016</time>
in
2018-02-11 17:28:23 +01:00
2022-06-23 07:40:53 +02:00
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/tags/notes/" rel="tag">Notes</a>
2018-02-11 17:28:23 +01:00
</p>
</header>
2019-12-17 13:49:24 +01:00
<h2 id="2016-06-01">2016-06-01</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>Experimenting with IFPRI OAI (we want to harvest their publications)</li>
2020-01-27 15:20:44 +01:00
<li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI&rsquo;s OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li>
2018-02-11 17:28:23 +01:00
<li>After reading the <a href="https://www.openarchives.org/OAI/openarchivesprotocol.html">OAI documentation</a> and testing with an <a href="http://validator.oaipmh.com/">OAI validator</a> I found out how to get their publications</li>
<li>This is their publications set: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc">http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc</a></li>
<li>You can see the others by using the OAI <code>ListSets</code> verb: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListSets">http://ebrary.ifpri.org/oai/oai.php?verb=ListSets</a></li>
<li>Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in <code>dc.identifier.fund</code> to <code>cg.identifier.cpwfproject</code> and then the rest to <code>dc.description.sponsorship</code></li>
</ul>
2022-03-04 13:30:06 +01:00
<pre tabindex="0"><code>dspacetest=# update metadatavalue set metadata_field_id=130 where metadata_field_id=75 and (text_value like &#39;PN%&#39; or text_value like &#39;PHASE%&#39; or text_value = &#39;CBA&#39; or text_value = &#39;IA&#39;);
2018-02-11 17:28:23 +01:00
UPDATE 497
dspacetest=# update metadatavalue set metadata_field_id=29 where metadata_field_id=75;
UPDATE 14
2019-11-28 16:30:45 +01:00
</code></pre><ul>
2018-02-11 17:28:23 +01:00
<li>Fix a few minor miscellaneous issues in <code>dspace.cfg</code> (<a href="https://github.com/ilri/DSpace/pull/227">#227</a>)</li>
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-06-02">2016-06-02</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>Testing the configuration and theme changes for the upcoming metadata migration and I found some issues with <code>cg.coverage.admin-unit</code></li>
2020-01-27 15:20:44 +01:00
<li>Seems that the Browse configuration in <code>dspace.cfg</code> can&rsquo;t handle the &lsquo;-&rsquo; in the field name:</li>
2019-11-28 16:30:45 +01:00
</ul>
2021-09-13 15:21:16 +02:00
<pre tabindex="0"><code>webui.browse.index.12 = subregion:metadata:cg.coverage.admin-unit:text
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>But actually, I think since DSpace 4 or 5 (we are 5.1) the Browse indexes come from Discovery (defined in discovery.xml) so this is really just a parsing error</li>
2020-01-27 15:20:44 +01:00
<li>I&rsquo;ve sent a message to the DSpace mailing list to ask about the Browse index definition</li>
2019-11-28 16:30:45 +01:00
<li>A user was having problems with submission and from the stacktrace it looks like a Sherpa/Romeo issue</li>
<li>I found a thread on the mailing list talking about it and there is bug report and a patch: <a href="https://jira.duraspace.org/browse/DS-2740">https://jira.duraspace.org/browse/DS-2740</a></li>
<li>The patch applies successfully on DSpace 5.1 so I will try it later</li>
2018-02-11 17:28:23 +01:00
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-06-03">2016-06-03</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>Investigating the CCAFS authority issue, I exported the metadata for the Videos collection</li>
2019-11-28 16:30:45 +01:00
<li>The top two authors are:</li>
</ul>
2021-09-13 15:21:16 +02:00
<pre tabindex="0"><code>CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::500
2018-02-11 17:28:23 +01:00
CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::600
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>So the only difference is the &ldquo;confidence&rdquo;</li>
<li>Ok, well THAT is interesting:</li>
</ul>
2022-03-04 13:30:06 +01:00
<pre tabindex="0"><code>dspacetest=# select text_value, authority, confidence from metadatavalue where metadata_field_id=3 and text_value like &#39;%Orth, %&#39;;
2019-11-28 16:30:45 +01:00
text_value | authority | confidence
2018-02-11 17:28:23 +01:00
------------+--------------------------------------+------------
2019-11-28 16:30:45 +01:00
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
Orth, Alan | | -1
Orth, Alan | | -1
Orth, Alan | | -1
Orth, Alan | | -1
Orth, A. | 05c2c622-d252-4efb-b9ed-95a07d3adf11 | -1
Orth, A. | 05c2c622-d252-4efb-b9ed-95a07d3adf11 | -1
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
Orth, Alan | ad281dbf-ef81-4007-96c3-a7f5d2eaa6d9 | 600
Orth, Alan | ad281dbf-ef81-4007-96c3-a7f5d2eaa6d9 | 600
2018-02-11 17:28:23 +01:00
(13 rows)
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>And now an actually relevent example:</li>
</ul>
2022-03-04 13:30:06 +01:00
<pre tabindex="0"><code>dspacetest=# select count(*) from metadatavalue where metadata_field_id=3 and text_value like &#39;CGIAR Research Program on Climate Change, Agriculture and Food Security&#39; and confidence = 500;
2019-11-28 16:30:45 +01:00
count
2018-02-11 17:28:23 +01:00
-------
2019-11-28 16:30:45 +01:00
707
2018-02-11 17:28:23 +01:00
(1 row)
2022-03-04 13:30:06 +01:00
dspacetest=# select count(*) from metadatavalue where metadata_field_id=3 and text_value like &#39;CGIAR Research Program on Climate Change, Agriculture and Food Security&#39; and confidence != 500;
2019-11-28 16:30:45 +01:00
count
2018-02-11 17:28:23 +01:00
-------
2019-11-28 16:30:45 +01:00
253
2018-02-11 17:28:23 +01:00
(1 row)
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>Trying something experimental:</li>
</ul>
2022-03-04 13:30:06 +01:00
<pre tabindex="0"><code>dspacetest=# update metadatavalue set confidence=500 where metadata_field_id=3 and text_value like &#39;CGIAR Research Program on Climate Change, Agriculture and Food Security&#39;;
2018-02-11 17:28:23 +01:00
UPDATE 960
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>And then re-indexing authority and Discovery&hellip;?</li>
<li>After Discovery reindex the CCAFS authors are all together in the Authors sidebar facet</li>
<li>The docs for the ORCiD and Authority stuff for DSpace 5 mention changing the browse indexes to use the Authority as well:</li>
</ul>
2021-09-13 15:21:16 +02:00
<pre tabindex="0"><code>webui.browse.index.2 = author:metadataAuthority:dc.contributor.author:authority
2019-11-28 16:30:45 +01:00
</code></pre><ul>
2020-01-27 15:20:44 +01:00
<li>That would only be for the &ldquo;Browse by&rdquo; function&hellip; so we&rsquo;ll have to see what effect that has later</li>
2018-02-11 17:28:23 +01:00
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-06-04">2016-06-04</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>Re-sync DSpace Test with CGSpace and perform test of metadata migration again</li>
<li>Run phase two of metadata migrations on CGSpace (see the <a href="https://gist.github.com/alanorth/1a730bec5ac9457a8fb0e3e72c98d09c">migration notes</a>)</li>
<li>Run all system updates and reboot CGSpace server</li>
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-06-07">2016-06-07</h2>
2018-02-11 17:28:23 +01:00
<ul>
2019-11-28 16:30:45 +01:00
<li>Figured out how to export a list of the unique values from a metadata field ordered by count:</li>
</ul>
2021-09-13 15:21:16 +02:00
<pre tabindex="0"><code>dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=29 group by text_value order by count desc) to /tmp/sponsorship.csv with csv;
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>
<p>Identified the next round of fields to migrate:</p>
2018-02-11 17:28:23 +01:00
<ul>
2019-11-28 16:30:45 +01:00
<li>dc.title.jtitle → dc.source</li>
<li>dc.crsubject.crpsubject → cg.contributor.crp</li>
<li>dc.contributor.affiliation → cg.contributor.affiliation</li>
<li>dc.Species → cg.species</li>
<li>dc.contributor.corporate → dc.contributor</li>
<li>dc.identifier.url → cg.identifier.url</li>
<li>dc.identifier.doi → cg.identifier.doi</li>
<li>dc.identifier.googleurl → cg.identifier.googleurl</li>
<li>dc.identifier.dataurl → cg.identifier.dataurl</li>
2018-02-11 17:28:23 +01:00
</ul>
2019-11-28 16:30:45 +01:00
</li>
<li>
2020-01-27 15:20:44 +01:00
<p>Discuss pulling data from IFPRI&rsquo;s ContentDM with Ryan Miller</p>
2019-11-28 16:30:45 +01:00
</li>
<li>
2020-01-27 15:20:44 +01:00
<p>Looks like OAI is kinda obtuse for this, and if we use ContentDM&rsquo;s API we&rsquo;ll be able to access their internal field names (rather than trying to figure out how they stuffed them into various, repeated Dublin Core fields)</p>
2019-11-28 16:30:45 +01:00
</li>
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-06-08">2016-06-08</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>Discuss controlled vocabularies for ~28 fields</li>
2020-04-13 16:24:05 +02:00
<li>Looks like this is all we need: <a href="https://wiki.lyrasis.org/display/DSDOC5x/Submission+User+Interface#SubmissionUserInterface-ConfiguringControlledVocabularies">https://wiki.lyrasis.org/display/DSDOC5x/Submission+User+Interface#SubmissionUserInterface-ConfiguringControlledVocabularies</a></li>
2020-11-30 19:12:55 +01:00
<li>I wrote an XPath expression to extract the ILRI subjects from <code>input-forms.xml</code> (from the xmlstarlet package):</li>
2019-11-28 16:30:45 +01:00
</ul>
2022-03-04 13:30:06 +01:00
<pre tabindex="0"><code>$ xml sel -t -m &#39;//value-pairs[@value-pairs-name=&#34;ilrisubject&#34;]/pair/displayed-value/text()&#39; -c &#39;.&#39; -n dspace/config/input-forms.xml
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>Write to Atmire about the use of <code>atmire.orcid.id</code> to see if we can change it</li>
<li>Seems to be a virtual field that is queried from the authority cache&hellip; hmm</li>
2020-01-27 15:20:44 +01:00
<li>In other news, I found out that the About page that we haven&rsquo;t been using lives in <code>dspace/config/about.xml</code>, so now we can update the text</li>
2019-11-28 16:30:45 +01:00
<li>File bug about <code>closed=&quot;true&quot;</code> attribute of controlled vocabularies not working: <a href="https://jira.duraspace.org/browse/DS-3238">https://jira.duraspace.org/browse/DS-3238</a></li>
2018-02-11 17:28:23 +01:00
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-06-09">2016-06-09</h2>
2018-02-11 17:28:23 +01:00
<ul>
2020-01-27 15:20:44 +01:00
<li>Atmire explained that the <code>atmire.orcid.id</code> field doesn&rsquo;t exist in the schema, as it actually comes from the authority cache during XMLUI run time</li>
<li>This means we don&rsquo;t see it when harvesting via OAI or REST, for example</li>
2018-02-11 17:28:23 +01:00
<li>They opened a feature ticket on the DSpace tracker to ask for support of this: <a href="https://jira.duraspace.org/browse/DS-3239">https://jira.duraspace.org/browse/DS-3239</a></li>
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-06-10">2016-06-10</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>Investigating authority confidences</li>
<li>It looks like the values are documented in <code>Choices.java</code></li>
2019-11-28 16:30:45 +01:00
<li>Experiment with setting all 960 CCAFS author values to be 500:</li>
</ul>
2022-03-04 13:30:06 +01:00
<pre tabindex="0"><code>dspacetest=# SELECT authority, confidence FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=3 AND text_value = &#39;CGIAR Research Program on Climate Change, Agriculture and Food Security&#39;;
2018-02-11 17:28:23 +01:00
2022-03-04 13:30:06 +01:00
dspacetest=# UPDATE metadatavalue set confidence = 500 where resource_type_id=2 AND metadata_field_id=3 AND text_value = &#39;CGIAR Research Program on Climate Change, Agriculture and Food Security&#39;;
2018-02-11 17:28:23 +01:00
UPDATE 960
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>After the database edit, I did a full Discovery re-index</li>
<li>And now there are exactly 960 items in the authors facet for &lsquo;CGIAR Research Program on Climate Change, Agriculture and Food Security&rsquo;</li>
<li>Now I ran the same on CGSpace</li>
<li>Merge controlled vocabulary functionality for animal breeds to <code>5_x-prod</code> (<a href="https://github.com/ilri/DSpace/pull/236">#236</a>)</li>
<li>Write python script to update metadata values in batch via PostgreSQL: <a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897">fix-metadata-values.py</a></li>
<li>We need to use this to correct some pretty ugly values in fields like <code>dc.description.sponsorship</code></li>
<li>Merge item display tweaks from earlier this week (<a href="https://github.com/ilri/DSpace/pull/231">#231</a>)</li>
<li>Merge controlled vocabulary functionality for subregions (<a href="https://github.com/ilri/DSpace/pull/238">#238</a>)</li>
2018-02-11 17:28:23 +01:00
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-06-11">2016-06-11</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>Merge controlled vocabulary for sponsorship field (<a href="https://github.com/ilri/DSpace/pull/239">#239</a>)</li>
<li>Fix character encoding issues for animal breed lookup that I merged yesterday</li>
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-06-17">2016-06-17</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>Linode has free RAM upgrades for their 13th birthday so I migrated DSpace Test (4→8GB of RAM)</li>
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-06-18">2016-06-18</h2>
2018-02-11 17:28:23 +01:00
<ul>
2019-11-28 16:30:45 +01:00
<li>
<p>Clean up titles and hints in <code>input-forms.xml</code> to use title/sentence case and a few more consistency things (<a href="https://github.com/ilri/DSpace/pull/241">#241</a>)</p>
</li>
<li>
<p>The final list of fields to migrate in the third phase of metadata migrations is:</p>
2018-02-11 17:28:23 +01:00
<ul>
2019-11-28 16:30:45 +01:00
<li>dc.title.jtitle → dc.source</li>
<li>dc.crsubject.crpsubject → cg.contributor.crp</li>
<li>dc.contributor.affiliation → cg.contributor.affiliation</li>
<li>dc.srplace.subregion → cg.coverage.subregion</li>
<li>dc.Species → cg.species</li>
<li>dc.contributor.corporate → dc.contributor</li>
<li>dc.identifier.url → cg.identifier.url</li>
<li>dc.identifier.doi → cg.identifier.doi</li>
<li>dc.identifier.googleurl → cg.identifier.googleurl</li>
<li>dc.identifier.dataurl → cg.identifier.dataurl</li>
2018-02-11 17:28:23 +01:00
</ul>
2019-11-28 16:30:45 +01:00
</li>
<li>
<p>Interesting &ldquo;Sunburst&rdquo; visualization on a Digital Commons page: <a href="http://www.repository.law.indiana.edu/sunburst.html">http://www.repository.law.indiana.edu/sunburst.html</a></p>
</li>
<li>
<p>Final testing on metadata fix/delete for <code>dc.description.sponsorship</code> cleanup</p>
</li>
<li>
<p>Need to run <code>fix-metadata-values.py</code> and then <code>fix-metadata-values.py</code></p>
</li>
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-06-20">2016-06-20</h2>
2018-02-11 17:28:23 +01:00
<ul>
2020-01-27 15:20:44 +01:00
<li>CGSpace&rsquo;s HTTPS certificate expired last night and I didn&rsquo;t notice, had to renew:</li>
2019-11-28 16:30:45 +01:00
</ul>
2022-03-04 13:30:06 +01:00
<pre tabindex="0"><code># /opt/letsencrypt/letsencrypt-auto renew --standalone --pre-hook &#34;/usr/bin/service nginx stop&#34; --post-hook &#34;/usr/bin/service nginx start&#34;
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>I really need to fix that cron job&hellip;</li>
2018-02-11 17:28:23 +01:00
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-06-24">2016-06-24</h2>
2018-02-11 17:28:23 +01:00
<ul>
2019-11-28 16:30:45 +01:00
<li>Run the replacements/deletes for <code>dc.description.sponsorship</code> (investors) on CGSpace:</li>
</ul>
2022-03-04 13:30:06 +01:00
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i investors-not-blank-not-delete-85.csv -f dc.description.sponsorship -t &#39;correct investor&#39; -m 29 -d cgspace -p &#39;fuuu&#39; -u cgspace
$ ./delete-metadata-values.py -i investors-delete-82.csv -f dc.description.sponsorship -m 29 -d cgspace -p &#39;fuuu&#39; -u cgspace
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>The scripts for this are here:
2018-02-11 17:28:23 +01:00
<ul>
<li><a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897">fix-metadata-values.py</a></li>
<li><a href="https://gist.github.com/alanorth/bd7d58c947f686401a2b1fadc78736be">delete-metadata-values.py</a></li>
</ul>
2019-11-28 16:30:45 +01:00
</li>
<li>Add new sponsors to controlled vocabulary (<a href="https://github.com/ilri/DSpace/pull/244">#244</a>)</li>
<li>Refine submission form labels and hints</li>
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-06-28">2016-06-28</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>Testing the cleanup of <code>dc.contributor.corporate</code> with 13 deletions and 121 replacements</li>
2020-01-27 15:20:44 +01:00
<li>There are still ~97 fields that weren&rsquo;t indicated to do anything</li>
2019-11-28 16:30:45 +01:00
<li>After the above deletions and replacements I regenerated a CSV and sent it to Peter <em>et al</em> to have a look</li>
</ul>
2021-09-13 15:21:16 +02:00
<pre tabindex="0"><code>dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=126 group by text_value order by count desc) to /tmp/contributors-june28.csv with csv;
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>Re-evaluate <code>dc.contributor.corporate</code> and it seems we will move it to <code>dc.contributor.author</code> as this is more in line with how editors are actually using it</li>
2018-02-11 17:28:23 +01:00
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-06-29">2016-06-29</h2>
2018-02-11 17:28:23 +01:00
<ul>
2019-11-28 16:30:45 +01:00
<li>Test run of <code>migrate-fields.sh</code> with the following re-mappings:</li>
</ul>
2021-09-13 15:21:16 +02:00
<pre tabindex="0"><code>72 55 #dc.source
2018-02-11 17:28:23 +01:00
86 230 #cg.contributor.crp
91 211 #cg.contributor.affiliation
94 212 #cg.species
107 231 #cg.coverage.subregion
126 3 #dc.contributor.author
73 219 #cg.identifier.url
74 220 #cg.identifier.doi
79 222 #cg.identifier.googleurl
89 223 #cg.identifier.dataurl
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>Run all cleanups and deletions of <code>dc.contributor.corporate</code> on CGSpace:</li>
</ul>
2022-03-04 13:30:06 +01:00
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i Corporate-Authors-Fix-121.csv -f dc.contributor.corporate -t &#39;Correct style&#39; -m 126 -d cgspace -u cgspace -p &#39;fuuu&#39;
$ ./fix-metadata-values.py -i Corporate-Authors-Fix-PB.csv -f dc.contributor.corporate -t &#39;should be&#39; -m 126 -d cgspace -u cgspace -p &#39;fuuu&#39;
$ ./delete-metadata-values.py -f dc.contributor.corporate -i Corporate-Authors-Delete-13.csv -m 126 -u cgspace -d cgspace -p &#39;fuuu&#39;
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>Re-deploy CGSpace and DSpace Test with latest June changes</li>
<li>Now the sharing and Altmetric bits are more prominent:</li>
2018-02-11 17:28:23 +01:00
</ul>
2019-11-28 16:30:45 +01:00
<p><img src="/cgspace-notes/2016/06/xmlui-altmetric-sharing.png" alt="DSpace 5.1 XMLUI With Altmetric Badge"></p>
2018-02-11 17:28:23 +01:00
<ul>
<li>Run all system updates on the servers and reboot</li>
<li>Start working on config changes for phase three of the metadata migrations</li>
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-06-30">2016-06-30</h2>
2018-02-11 17:28:23 +01:00
<ul>
2019-11-28 16:30:45 +01:00
<li>Wow, there are 95 authors in the database who have &lsquo;,&rsquo; at the end of their name:</li>
</ul>
2022-03-04 13:30:06 +01:00
<pre tabindex="0"><code># select text_value from metadatavalue where metadata_field_id=3 and text_value like &#39;%,&#39;;
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>We need to use something like this to fix them, need to write a proper regex later:</li>
2019-05-05 15:45:12 +02:00
</ul>
2022-03-04 13:30:06 +01:00
<pre tabindex="0"><code># update metadatavalue set text_value = regexp_replace(text_value, &#39;(Poole, J),&#39;, &#39;\1&#39;) where metadata_field_id=3 and text_value = &#39;Poole, J,&#39;;
2019-11-28 16:30:45 +01:00
</code></pre>
2018-02-11 17:28:23 +01:00
</article>
</div> <!-- /.blog-main -->
<aside class="col-sm-3 ml-auto blog-sidebar">
<section class="sidebar-module">
<h4>Recent Posts</h4>
<ol class="list-unstyled">
2022-08-01 15:36:13 +02:00
<li><a href="/cgspace-notes/2022-08/">August, 2022</a></li>
2022-07-04 08:25:14 +02:00
<li><a href="/cgspace-notes/2022-07/">July, 2022</a></li>
2022-06-06 08:45:43 +02:00
<li><a href="/cgspace-notes/2022-06/">June, 2022</a></li>
2022-05-04 10:09:45 +02:00
<li><a href="/cgspace-notes/2022-05/">May, 2022</a></li>
2022-04-27 08:58:45 +02:00
<li><a href="/cgspace-notes/2022-04/">April, 2022</a></li>
2022-03-01 15:48:40 +01:00
2018-02-11 17:28:23 +01:00
</ol>
</section>
<section class="sidebar-module">
<h4>Links</h4>
<ol class="list-unstyled">
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
</ol>
</section>
</aside>
</div> <!-- /.row -->
</div> <!-- /.container -->
<footer class="blog-footer">
<p dir="auto">
2018-02-11 17:28:23 +01:00
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
</p>
<p>
<a href="#">Back to top</a>
</p>
</footer>
</body>
</html>