cgspace-notes/docs/2016-07/index.html

377 lines
14 KiB
HTML
Raw Normal View History

2018-02-11 17:28:23 +01:00
<!DOCTYPE html>
<html lang="en" >
2018-02-11 17:28:23 +01:00
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:title" content="July, 2016" />
<meta property="og:description" content="2016-07-01
Add dc.description.sponsorship to Discovery sidebar facets and make investors clickable in item view (#232)
2019-05-05 15:45:12 +02:00
I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:
2018-02-11 17:28:23 +01:00
dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, &#39;(^.&#43;?),$&#39;, &#39;\1&#39;) where metadata_field_id=3 and resource_type_id=2 and text_value ~ &#39;^.&#43;?,$&#39;;
UPDATE 95
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ &#39;^.&#43;?,$&#39;;
2019-11-28 16:30:45 +01:00
text_value
2018-02-11 17:28:23 +01:00
------------
(0 rows)
In this case the select query was showing 95 results before the update
" />
<meta property="og:type" content="article" />
2019-02-02 13:12:57 +01:00
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2016-07/" />
2019-08-08 17:10:44 +02:00
<meta property="article:published_time" content="2016-07-01T10:53:00+03:00" />
<meta property="article:modified_time" content="2018-03-09T22:10:33+02:00" />
2018-09-30 07:23:48 +02:00
2018-02-11 17:28:23 +01:00
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="July, 2016"/>
<meta name="twitter:description" content="2016-07-01
Add dc.description.sponsorship to Discovery sidebar facets and make investors clickable in item view (#232)
2019-05-05 15:45:12 +02:00
I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:
2018-02-11 17:28:23 +01:00
dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, &#39;(^.&#43;?),$&#39;, &#39;\1&#39;) where metadata_field_id=3 and resource_type_id=2 and text_value ~ &#39;^.&#43;?,$&#39;;
UPDATE 95
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ &#39;^.&#43;?,$&#39;;
2019-11-28 16:30:45 +01:00
text_value
2018-02-11 17:28:23 +01:00
------------
(0 rows)
In this case the select query was showing 95 results before the update
"/>
2020-06-30 14:47:18 +02:00
<meta name="generator" content="Hugo 0.73.0" />
2018-02-11 17:28:23 +01:00
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "July, 2016",
2020-04-02 09:55:42 +02:00
"url": "https://alanorth.github.io/cgspace-notes/2016-07/",
2018-04-30 18:05:39 +02:00
"wordCount": "866",
"datePublished": "2016-07-01T10:53:00+03:00",
"dateModified": "2018-03-09T22:10:33+02:00",
2018-02-11 17:28:23 +01:00
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"keywords": "Notes"
}
</script>
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2016-07/">
<title>July, 2016 | CGSpace Notes</title>
2018-02-11 17:28:23 +01:00
<!-- combined, minified CSS -->
2020-01-23 19:19:38 +01:00
2020-01-28 11:01:42 +01:00
<link href="https://alanorth.github.io/cgspace-notes/css/style.6da5c906cc7a8fbb93f31cd2316c5dbe3f19ac4aa6bfb066f1243045b8f6061e.css" rel="stylesheet" integrity="sha256-baXJBsx6j7uT8xzSMWxdvj8ZrEqmv7Bm8SQwRbj2Bh4=" crossorigin="anonymous">
2018-02-11 17:28:23 +01:00
2020-01-28 11:01:42 +01:00
<!-- minified Font Awesome for SVG icons -->
2020-04-02 09:55:42 +02:00
<script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f3d2a1f5980bab30ddd0d8cadbd496475309fc48e2b1d052c5c09e6facffcb0f.js" integrity="sha256-89Kh9ZgLqzDd0NjK29SWR1MJ/EjisdBSxcCeb6z/yw8=" crossorigin="anonymous"></script>
2020-01-28 11:01:42 +01:00
2019-04-14 15:59:47 +02:00
<!-- RSS 2.0 feed -->
2018-02-11 17:28:23 +01:00
</head>
<body>
<div class="blog-masthead">
<div class="container">
<nav class="nav blog-nav">
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
</nav>
</div>
</div>
2018-12-19 12:20:39 +01:00
2018-02-11 17:28:23 +01:00
<header class="blog-header">
<div class="container">
<h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
<p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
2018-02-11 17:28:23 +01:00
</div>
</header>
2018-12-19 12:20:39 +01:00
2018-02-11 17:28:23 +01:00
<div class="container">
<div class="row">
<div class="col-sm-8 blog-main">
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2016-07/">July, 2016</a></h2>
2020-04-02 09:55:42 +02:00
<p class="blog-post-meta"><time datetime="2016-07-01T10:53:00+03:00">Fri Jul 01, 2016</time> by Alan Orth in
2018-02-11 17:28:23 +01:00
2020-01-28 11:01:42 +01:00
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
2018-02-11 17:28:23 +01:00
</p>
</header>
2019-12-17 13:49:24 +01:00
<h2 id="2016-07-01">2016-07-01</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li>
2019-11-28 16:30:45 +01:00
<li>I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:</li>
</ul>
2018-02-11 17:28:23 +01:00
<pre><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
UPDATE 95
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
2019-11-28 16:30:45 +01:00
text_value
2018-02-11 17:28:23 +01:00
------------
(0 rows)
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>In this case the select query was showing 95 results before the update</li>
2018-02-11 17:28:23 +01:00
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-07-02">2016-07-02</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>Comment on DSpace Jira ticket about author lookup search text (<a href="https://jira.duraspace.org/browse/DS-2329">DS-2329</a>)</li>
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-07-04">2016-07-04</h2>
2018-02-11 17:28:23 +01:00
<ul>
2020-01-27 15:20:44 +01:00
<li>Seems the database&rsquo;s author authority values mean nothing without the <code>authority</code> Solr core from the host where they were created!</li>
2018-02-11 17:28:23 +01:00
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-07-05">2016-07-05</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>Amend <code>backup-solr.sh</code> script so it backs up the entire Solr folder</li>
<li>We <em>really</em> only need <code>statistics</code> and <code>authority</code> but meh</li>
2019-11-28 16:30:45 +01:00
<li>Fix metadata for species on DSpace Test:</li>
</ul>
2018-02-11 17:28:23 +01:00
<pre><code>$ ./fix-metadata-values.py -i /tmp/Species-Peter-Fix.csv -f dc.Species -t CORRECT -m 94 -d dspacetest -u dspacetest -p 'fuuu'
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>Will run later on CGSpace</li>
<li>A user is still having problems with Sherpa/Romeo causing crashes during the submission process when the journal is &ldquo;ungraded&rdquo;</li>
<li>I tested the <a href="https://jira.duraspace.org/browse/DS-2740">patch for DS-2740</a> that I had found last month and it seems to work</li>
<li>I will merge it to <code>5_x-prod</code></li>
2018-02-11 17:28:23 +01:00
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-07-06">2016-07-06</h2>
2018-02-11 17:28:23 +01:00
<ul>
2019-11-28 16:30:45 +01:00
<li>Delete 23 blank metadata values from CGSpace:</li>
</ul>
2018-02-11 17:28:23 +01:00
<pre><code>cgspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
DELETE 23
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>Complete phase three of metadata migration, for the following fields:
2018-02-11 17:28:23 +01:00
<ul>
<li>dc.title.jtitle → dc.source</li>
<li>dc.crsubject.crpsubject → cg.contributor.crp</li>
<li>dc.contributor.affiliation → cg.contributor.affiliation</li>
<li>dc.Species → cg.species</li>
<li>dc.srplace.subregion → cg.coverage.subregion</li>
<li>dc.contributor.corporate → dc.contributor.author</li>
<li>dc.identifier.url → cg.identifier.url</li>
<li>dc.identifier.doi → cg.identifier.doi</li>
<li>dc.identifier.googleurl → cg.identifier.googleurl</li>
<li>dc.identifier.dataurl → cg.identifier.dataurl</li>
2019-11-28 16:30:45 +01:00
</ul>
</li>
<li>Also, run fixes and deletes for species and author affiliations (over 1000 corrections!)</li>
</ul>
2018-02-11 17:28:23 +01:00
<pre><code>$ ./fix-metadata-values.py -i Species-Peter-Fix.csv -f dc.Species -t CORRECT -m 212 -d dspace -u dspace -p 'fuuu'
$ ./fix-metadata-values.py -i Affiliations-Fix-1045-Peter-Abenet.csv -f dc.contributor.affiliation -t Correct -m 211 -d dspace -u dspace -p 'fuuu'
$ ./delete-metadata-values.py -f dc.contributor.affiliation -i Affiliations-Delete-Peter-Abenet.csv -m 211 -u dspace -d dspace -p 'fuuu'
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>I then ran all server updates and rebooted the server</li>
2018-02-11 17:28:23 +01:00
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-07-11">2016-07-11</h2>
2018-02-11 17:28:23 +01:00
<ul>
2019-11-28 16:30:45 +01:00
<li>Doing some author cleanups from Peter and Abenet:</li>
</ul>
2018-02-11 17:28:23 +01:00
<pre><code>$ ./fix-metadata-values.py -i /tmp/Authors-Fix-205-UTF8.csv -f dc.contributor.author -t correct -m 3 -d dspacetest -u dspacetest -p fuuu
$ ./delete-metadata-values.py -f dc.contributor.author -i /tmp/Authors-Delete-UTF8.csv -m 3 -u dspacetest -d dspacetest -p fuuu
2019-12-17 13:49:24 +01:00
</code></pre><h2 id="2016-07-13">2016-07-13</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>Run the author cleanups on CGSpace and start a full Discovery re-index</li>
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-07-14">2016-07-14</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>Test LDAP settings for new root LDAP</li>
<li>Seems to work when binding as a top-level user</li>
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-07-18">2016-07-18</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>Adjust identifiers in XMLUI item display to be more prominent</li>
<li>Add species and breed to the XMLUI item display</li>
2019-11-28 16:30:45 +01:00
<li>CGSpace crashed late at night and the DSpace logs were showing:</li>
</ul>
2018-02-11 17:28:23 +01:00
<pre><code>2016-07-18 20:26:30,941 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
...
2019-11-28 16:30:45 +01:00
</code></pre><ul>
2020-01-27 15:20:44 +01:00
<li>I suspect it&rsquo;s someone hitting REST too much:</li>
2019-11-28 16:30:45 +01:00
</ul>
2018-02-11 17:28:23 +01:00
<pre><code># awk '{print $1}' /var/log/nginx/rest.log | sort -n | uniq -c | sort -h | tail -n 3
2019-11-28 16:30:45 +01:00
710 66.249.78.38
1781 181.118.144.29
24904 70.32.99.142
</code></pre><ul>
<li>I just blocked access to <code>/rest</code> for that last IP for now:</li>
2018-02-11 17:28:23 +01:00
</ul>
2019-11-28 16:30:45 +01:00
<pre><code> # log rest requests
location /rest {
access_log /var/log/nginx/rest.log;
proxy_pass http://127.0.0.1:8443;
deny 70.32.99.142;
}
2019-12-17 13:49:24 +01:00
</code></pre><h2 id="2016-07-21">2016-07-21</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>Mitigate the <a href="https://httpoxy.org">HTTPoxy</a> vulnerability for Tomcat etc in nginx: <a href="https://github.com/ilri/rmg-ansible-public/pull/38">https://github.com/ilri/rmg-ansible-public/pull/38</a></li>
<li>Unblock 70.32.99.142 from <code>/rest</code> as it has been blocked for a few days</li>
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-07-22">2016-07-22</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>Help Paola from CCAFS with thumbnails for batch uploads</li>
<li>She has been struggling to get the dimensions right, and manually enlarging smaller thumbnails, renaming PNGs to JPG, etc</li>
<li>Altmetric reports having an issue with some of our authors being doubled&hellip;</li>
<li>This is related to authority and confidence!</li>
<li>We might need to use <code>index.authority.ignore-prefered=true</code> to tell the Discovery index to prefer the variation that exists in the metadatavalue rather than what it finds in the authority cache.</li>
2019-11-28 16:30:45 +01:00
<li>Trying these on DSpace Test after a discussion by Daniel Scharon on the dspace-tech mailing list:</li>
</ul>
2018-02-11 17:28:23 +01:00
<pre><code>index.authority.ignore-prefered.dc.contributor.author=true
index.authority.ignore-variants.dc.contributor.author=false
2019-11-28 16:30:45 +01:00
</code></pre><ul>
2020-01-27 15:20:44 +01:00
<li>After reindexing I don&rsquo;t see any change in Discovery&rsquo;s display of authors, and still have entries like:</li>
2019-11-28 16:30:45 +01:00
</ul>
2018-02-11 17:28:23 +01:00
<pre><code>Grace, D. (464)
Grace, D. (62)
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>I asked for clarification of the following options on the DSpace mailing list:</li>
</ul>
2018-02-11 17:28:23 +01:00
<pre><code>index.authority.ignore
index.authority.ignore-prefered
index.authority.ignore-variants
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>In the mean time, I will try these on DSpace Test (plus a reindex):</li>
</ul>
2018-02-11 17:28:23 +01:00
<pre><code>index.authority.ignore=true
index.authority.ignore-prefered=true
index.authority.ignore-variants=true
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>Enabled usage of <code>X-Forwarded-For</code> in DSpace admin control panel (<a href="https://github.com/ilri/DSpace/pull/255">#255</a></li>
<li>It was misconfigured and disabled, but already working for some reason <em>sigh</em></li>
<li>&hellip; no luck. Trying with just:</li>
</ul>
2018-02-11 17:28:23 +01:00
<pre><code>index.authority.ignore=true
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>After re-indexing and clearing the XMLUI cache nothing has changed</li>
2018-02-11 17:28:23 +01:00
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-07-25">2016-07-25</h2>
2018-02-11 17:28:23 +01:00
<ul>
2019-11-28 16:30:45 +01:00
<li>Trying a few more settings (plus reindex) for Discovery on DSpace Test:</li>
</ul>
2018-02-11 17:28:23 +01:00
<pre><code>index.authority.ignore-prefered.dc.contributor.author=true
index.authority.ignore-variants=true
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>Run all OS updates and reboot DSpace Test server</li>
<li>No changes to Discovery after reindexing&hellip; hmm.</li>
<li>Integrate and massively clean up About page (<a href="https://github.com/ilri/DSpace/pull/256">#256</a>)</li>
2018-02-11 17:28:23 +01:00
</ul>
2019-11-28 16:30:45 +01:00
<p><img src="/cgspace-notes/2016/07/cgspace-about-page.png" alt="About page"></p>
2018-02-11 17:28:23 +01:00
<ul>
2020-01-27 15:20:44 +01:00
<li>The DSpace source code mentions the configuration key <code>discovery.index.authority.ignore-prefered.*</code> (with prefix of discovery, despite the docs saying otherwise), so I&rsquo;m trying the following on DSpace Test:</li>
2019-11-28 16:30:45 +01:00
</ul>
2018-02-11 17:28:23 +01:00
<pre><code>discovery.index.authority.ignore-prefered.dc.contributor.author=true
discovery.index.authority.ignore-variants=true
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>Still no change!</li>
<li>Deploy species, breed, and identifier changes to CGSpace, as well as About page</li>
<li>Run Linode RAM upgrade (8→12GB)</li>
<li>Re-sync DSpace Test with CGSpace</li>
2020-01-27 15:20:44 +01:00
<li>I noticed that our backup scripts don&rsquo;t send Solr cores to S3 so I amended the script</li>
2018-02-11 17:28:23 +01:00
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-07-31">2016-07-31</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>Work on removing Dryland Systems and Humidtropics subjects from Discovery sidebar and Browse by</li>
<li>Also change &ldquo;Subjects&rdquo; to &ldquo;AGROVOC keywords&rdquo; in Discovery sidebar/search and Browse by (<a href="https://github.com/ilri/DSpace/issues/257">#257</a>)</li>
</ul>
</article>
</div> <!-- /.blog-main -->
<aside class="col-sm-3 ml-auto blog-sidebar">
<section class="sidebar-module">
<h4>Recent Posts</h4>
<ol class="list-unstyled">
2020-06-02 14:12:32 +02:00
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
2020-05-02 09:08:14 +02:00
2020-06-02 14:12:32 +02:00
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
2020-06-01 16:08:25 +02:00
2020-04-02 09:54:46 +02:00
<li><a href="/cgspace-notes/2020-04/">April, 2020</a></li>
2020-03-02 11:38:10 +01:00
<li><a href="/cgspace-notes/2020-03/">March, 2020</a></li>
2020-02-02 16:15:48 +01:00
<li><a href="/cgspace-notes/2020-02/">February, 2020</a></li>
2018-02-11 17:28:23 +01:00
</ol>
</section>
<section class="sidebar-module">
<h4>Links</h4>
<ol class="list-unstyled">
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
</ol>
</section>
</aside>
</div> <!-- /.row -->
</div> <!-- /.container -->
<footer class="blog-footer">
<p dir="auto">
2018-02-11 17:28:23 +01:00
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
</p>
<p>
<a href="#">Back to top</a>
</p>
</footer>
</body>
</html>