<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"> <meta property="og:title" content="July, 2016" /> <meta property="og:description" content="2016-07-01 Add dc.description.sponsorship to Discovery sidebar facets and make investors clickable in item view (#232) I think this query should find and replace all authors that have “,” at the end of their names: dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$'; UPDATE 95 dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$'; text_value ------------ (0 rows) In this case the select query was showing 95 results before the update " /> <meta property="og:type" content="article" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2016-07/" /> <meta property="article:published_time" content="2016-07-01T10:53:00+03:00"/> <meta property="article:modified_time" content="2017-01-09T16:18:07+02:00"/> <meta name="twitter:card" content="summary"/> <meta name="twitter:text:title" content="July, 2016"/> <meta name="twitter:title" content="July, 2016"/> <meta name="twitter:description" content="2016-07-01 Add dc.description.sponsorship to Discovery sidebar facets and make investors clickable in item view (#232) I think this query should find and replace all authors that have “,” at the end of their names: dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$'; UPDATE 95 dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$'; text_value ------------ (0 rows) In this case the select query was showing 95 results before the update "/> <meta name="generator" content="Hugo 0.20.2" /> <script type="application/ld+json"> { "@context": "http://schema.org", "@type": "BlogPosting", "headline": "July, 2016", "url": "https://alanorth.github.io/cgspace-notes/2016-07/", "wordCount": "866", "datePublished": "2016-07-01T10:53:00+03:00", "dateModified": "2017-01-09T16:18:07+02:00", "author": { "@type": "Person", "name": "Alan Orth" }, "keywords": "Notes" } </script> <link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2016-07/"> <title>July, 2016 | CGSpace Notes</title> <!-- combined, minified CSS --> <link href="https://alanorth.github.io/cgspace-notes/css/style.css" rel="stylesheet" integrity="sha384-CBHEXFKdMsTRFhEu0HSP9oETZoVpnz1mozAPqhfpxMQkda7lNJlqsQdYB30287Ka" crossorigin="anonymous"> </head> <body> <div class="blog-masthead"> <div class="container"> <nav class="nav blog-nav"> <a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a> </nav> </div> </div> <header class="blog-header"> <div class="container"> <h1 class="blog-title"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1> <p class="lead blog-description">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p> </div> </header> <div class="container"> <div class="row"> <div class="col-sm-8 blog-main"> <article class="blog-post"> <header> <h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2016-07/">July, 2016</a></h2> <p class="blog-post-meta"><time datetime="2016-07-01T10:53:00+03:00">Fri Jul 01, 2016</time> by Alan Orth in <i class="fa fa-tag" aria-hidden="true"></i> <a href="/cgspace-notes/tags/notes" rel="tag">Notes</a> </p> </header> <h2 id="2016-07-01">2016-07-01</h2> <ul> <li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li> <li>I think this query should find and replace all authors that have “,” at the end of their names:</li> </ul> <pre><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$'; UPDATE 95 dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$'; text_value ------------ (0 rows) </code></pre> <ul> <li>In this case the select query was showing 95 results before the update</li> </ul> <p></p> <h2 id="2016-07-02">2016-07-02</h2> <ul> <li>Comment on DSpace Jira ticket about author lookup search text (<a href="https://jira.duraspace.org/browse/DS-2329">DS-2329</a>)</li> </ul> <h2 id="2016-07-04">2016-07-04</h2> <ul> <li>Seems the database’s author authority values mean nothing without the <code>authority</code> Solr core from the host where they were created!</li> </ul> <h2 id="2016-07-05">2016-07-05</h2> <ul> <li>Amend <code>backup-solr.sh</code> script so it backs up the entire Solr folder</li> <li>We <em>really</em> only need <code>statistics</code> and <code>authority</code> but meh</li> <li>Fix metadata for species on DSpace Test:</li> </ul> <pre><code>$ ./fix-metadata-values.py -i /tmp/Species-Peter-Fix.csv -f dc.Species -t CORRECT -m 94 -d dspacetest -u dspacetest -p 'fuuu' </code></pre> <ul> <li>Will run later on CGSpace</li> <li>A user is still having problems with Sherpa/Romeo causing crashes during the submission process when the journal is “ungraded”</li> <li>I tested the <a href="https://jira.duraspace.org/browse/DS-2740">patch for DS-2740</a> that I had found last month and it seems to work</li> <li>I will merge it to <code>5_x-prod</code></li> </ul> <h2 id="2016-07-06">2016-07-06</h2> <ul> <li>Delete 23 blank metadata values from CGSpace:</li> </ul> <pre><code>cgspace=# delete from metadatavalue where resource_type_id=2 and text_value=''; DELETE 23 </code></pre> <ul> <li>Complete phase three of metadata migration, for the following fields: <ul> <li>dc.title.jtitle → dc.source</li> <li>dc.crsubject.crpsubject → cg.contributor.crp</li> <li>dc.contributor.affiliation → cg.contributor.affiliation</li> <li>dc.Species → cg.species</li> <li>dc.srplace.subregion → cg.coverage.subregion</li> <li>dc.contributor.corporate → dc.contributor.author</li> <li>dc.identifier.url → cg.identifier.url</li> <li>dc.identifier.doi → cg.identifier.doi</li> <li>dc.identifier.googleurl → cg.identifier.googleurl</li> <li>dc.identifier.dataurl → cg.identifier.dataurl</li> </ul></li> <li>Also, run fixes and deletes for species and author affiliations (over 1000 corrections!)</li> </ul> <pre><code>$ ./fix-metadata-values.py -i Species-Peter-Fix.csv -f dc.Species -t CORRECT -m 212 -d dspace -u dspace -p 'fuuu' $ ./fix-metadata-values.py -i Affiliations-Fix-1045-Peter-Abenet.csv -f dc.contributor.affiliation -t Correct -m 211 -d dspace -u dspace -p 'fuuu' $ ./delete-metadata-values.py -f dc.contributor.affiliation -i Affiliations-Delete-Peter-Abenet.csv -m 211 -u dspace -d dspace -p 'fuuu' </code></pre> <ul> <li>I then ran all server updates and rebooted the server</li> </ul> <h2 id="2016-07-11">2016-07-11</h2> <ul> <li>Doing some author cleanups from Peter and Abenet:</li> </ul> <pre><code>$ ./fix-metadata-values.py -i /tmp/Authors-Fix-205-UTF8.csv -f dc.contributor.author -t correct -m 3 -d dspacetest -u dspacetest -p fuuu $ ./delete-metadata-values.py -f dc.contributor.author -i /tmp/Authors-Delete-UTF8.csv -m 3 -u dspacetest -d dspacetest -p fuuu </code></pre> <h2 id="2016-07-13">2016-07-13</h2> <ul> <li>Run the author cleanups on CGSpace and start a full Discovery re-index</li> </ul> <h2 id="2016-07-14">2016-07-14</h2> <ul> <li>Test LDAP settings for new root LDAP</li> <li>Seems to work when binding as a top-level user</li> </ul> <h2 id="2016-07-18">2016-07-18</h2> <ul> <li>Adjust identifiers in XMLUI item display to be more prominent</li> <li>Add species and breed to the XMLUI item display</li> <li>CGSpace crashed late at night and the DSpace logs were showing:</li> </ul> <pre><code>2016-07-18 20:26:30,941 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error - org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object ... </code></pre> <ul> <li>I suspect it’s someone hitting REST too much:</li> </ul> <pre><code># awk '{print $1}' /var/log/nginx/rest.log | sort -n | uniq -c | sort -h | tail -n 3 710 66.249.78.38 1781 181.118.144.29 24904 70.32.99.142 </code></pre> <ul> <li>I just blocked access to <code>/rest</code> for that last IP for now:</li> </ul> <pre><code> # log rest requests location /rest { access_log /var/log/nginx/rest.log; proxy_pass http://127.0.0.1:8443; deny 70.32.99.142; } </code></pre> <h2 id="2016-07-21">2016-07-21</h2> <ul> <li>Mitigate the <a href="https://httpoxy.org">HTTPoxy</a> vulnerability for Tomcat etc in nginx: <a href="https://github.com/ilri/rmg-ansible-public/pull/38">https://github.com/ilri/rmg-ansible-public/pull/38</a></li> <li>Unblock 70.32.99.142 from <code>/rest</code> as it has been blocked for a few days</li> </ul> <h2 id="2016-07-22">2016-07-22</h2> <ul> <li>Help Paola from CCAFS with thumbnails for batch uploads</li> <li>She has been struggling to get the dimensions right, and manually enlarging smaller thumbnails, renaming PNGs to JPG, etc</li> <li>Altmetric reports having an issue with some of our authors being doubled…</li> <li>This is related to authority and confidence!</li> <li>We might need to use <code>index.authority.ignore-prefered=true</code> to tell the Discovery index to prefer the variation that exists in the metadatavalue rather than what it finds in the authority cache.</li> <li>Trying these on DSpace Test after a discussion by Daniel Scharon on the dspace-tech mailing list:</li> </ul> <pre><code>index.authority.ignore-prefered.dc.contributor.author=true index.authority.ignore-variants.dc.contributor.author=false </code></pre> <ul> <li>After reindexing I don’t see any change in Discovery’s display of authors, and still have entries like:</li> </ul> <pre><code>Grace, D. (464) Grace, D. (62) </code></pre> <ul> <li>I asked for clarification of the following options on the DSpace mailing list:</li> </ul> <pre><code>index.authority.ignore index.authority.ignore-prefered index.authority.ignore-variants </code></pre> <ul> <li>In the mean time, I will try these on DSpace Test (plus a reindex):</li> </ul> <pre><code>index.authority.ignore=true index.authority.ignore-prefered=true index.authority.ignore-variants=true </code></pre> <ul> <li>Enabled usage of <code>X-Forwarded-For</code> in DSpace admin control panel (<a href="https://github.com/ilri/DSpace/pull/255">#255</a></li> <li>It was misconfigured and disabled, but already working for some reason <em>sigh</em></li> <li>… no luck. Trying with just:</li> </ul> <pre><code>index.authority.ignore=true </code></pre> <ul> <li>After re-indexing and clearing the XMLUI cache nothing has changed</li> </ul> <h2 id="2016-07-25">2016-07-25</h2> <ul> <li>Trying a few more settings (plus reindex) for Discovery on DSpace Test:</li> </ul> <pre><code>index.authority.ignore-prefered.dc.contributor.author=true index.authority.ignore-variants=true </code></pre> <ul> <li>Run all OS updates and reboot DSpace Test server</li> <li>No changes to Discovery after reindexing… hmm.</li> <li>Integrate and massively clean up About page (<a href="https://github.com/ilri/DSpace/pull/256">#256</a>)</li> </ul> <p><img src="/cgspace-notes/2016/07/cgspace-about-page.png" alt="About page" /></p> <ul> <li>The DSpace source code mentions the configuration key <code>discovery.index.authority.ignore-prefered.*</code> (with prefix of discovery, despite the docs saying otherwise), so I’m trying the following on DSpace Test:</li> </ul> <pre><code>discovery.index.authority.ignore-prefered.dc.contributor.author=true discovery.index.authority.ignore-variants=true </code></pre> <ul> <li>Still no change!</li> <li>Deploy species, breed, and identifier changes to CGSpace, as well as About page</li> <li>Run Linode RAM upgrade (8→12GB)</li> <li>Re-sync DSpace Test with CGSpace</li> <li>I noticed that our backup scripts don’t send Solr cores to S3 so I amended the script</li> </ul> <h2 id="2016-07-31">2016-07-31</h2> <ul> <li>Work on removing Dryland Systems and Humidtropics subjects from Discovery sidebar and Browse by</li> <li>Also change “Subjects” to “AGROVOC keywords” in Discovery sidebar/search and Browse by (<a href="https://github.com/ilri/DSpace/issues/257">#257</a>)</li> </ul> </article> </div> <!-- /.blog-main --> <aside class="col-sm-3 offset-sm-1 blog-sidebar"> <section class="sidebar-module"> <h4>Recent Posts</h4> <ol class="list-unstyled"> <li><a href="/cgspace-notes/2017-04/">April, 2017</a></li> <li><a href="/cgspace-notes/2017-03/">March, 2017</a></li> <li><a href="/cgspace-notes/2017-02/">February, 2017</a></li> <li><a href="/cgspace-notes/2017-01/">January, 2017</a></li> <li><a href="/cgspace-notes/2016-12/">December, 2016</a></li> </ol> </section> <section class="sidebar-module"> <h4>Links</h4> <ol class="list-unstyled"> <li><a href="https://cgspace.cgiar.org">CGSpace</a></li> <li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li> <li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li> </ol> </section> </aside> </div> <!-- /.row --> </div> <!-- /.container --> <footer class="blog-footer"> <p> Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>. </p> <p> <a href="#">Back to top</a> </p> </footer> </body> </html>