mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-22 22:55:04 +01:00
281 lines
9.9 KiB
HTML
281 lines
9.9 KiB
HTML
<!DOCTYPE html>
|
|
<html lang="en">
|
|
|
|
<head>
|
|
|
|
|
|
|
|
|
|
<meta charset="utf-8">
|
|
<meta http-equiv="X-UA-Compatible" content="IE=edge">
|
|
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
|
|
<!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
|
|
|
|
<meta name="description" content="">
|
|
<meta name="author" content="Alan Orth">
|
|
|
|
<!-- OpenGraph Metadata: http://ogp.me/ -->
|
|
<meta property="og:title" content="October, 2016">
|
|
<meta property="og:description" content="">
|
|
|
|
|
|
<meta property="og:type" content="article">
|
|
<meta property="article:published_time" content="2016-10-03T15:53:00+03:00">
|
|
<meta property="article:author" content="Alan Orth">
|
|
|
|
|
|
|
|
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2016-10/">
|
|
|
|
<!-- Metadata for Twitter: https://dev.twitter.com/cards/markup -->
|
|
|
|
<meta property="twitter:card" content="summary">
|
|
|
|
|
|
<meta property="twitter:title" content="October, 2016">
|
|
<meta property="twitter:description" content="">
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<meta name="generator" content="Hugo 0.17" />
|
|
|
|
|
|
<base href="https://alanorth.github.io/cgspace-notes/">
|
|
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2016-10/">
|
|
|
|
<title>October, 2016 | CGSpace Notes</title>
|
|
|
|
<!-- combined, minified CSS -->
|
|
<link href="https://alanorth.github.io/cgspace-notes/css/style.css" rel="stylesheet">
|
|
|
|
<!-- RSS 2.0 feed -->
|
|
<link href="https://alanorth.github.io/cgspace-notes/index.xml" type="application/rss+xml" rel="alternate">
|
|
</head>
|
|
|
|
<body>
|
|
|
|
<div class="blog-masthead">
|
|
<div class="container">
|
|
<nav class="nav blog-nav">
|
|
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
|
|
|
|
</nav>
|
|
</div>
|
|
</div>
|
|
|
|
<header class="blog-header">
|
|
<div class="container">
|
|
<h1 class="blog-title"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
|
|
|
|
</div>
|
|
</header>
|
|
|
|
<div class="container">
|
|
<div class="row">
|
|
<div class="col-sm-8 blog-main">
|
|
|
|
|
|
<article class="blog-post">
|
|
<header>
|
|
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2016-10/">October, 2016</a></h2>
|
|
<p class="blog-post-meta"><time datetime="2016-10-03T15:53:00+03:00">Mon Oct 03, 2016</time> by Alan Orth in
|
|
|
|
<i class="fa fa-tag" aria-hidden="true"></i> <a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
|
|
|
|
</p>
|
|
</header>
|
|
|
|
|
|
<h2 id="2016-10-03">2016-10-03</h2>
|
|
|
|
<ul>
|
|
<li>Testing adding <a href="https://wiki.duraspace.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li>
|
|
<li>Need to test the following scenarios to see how author order is affected:
|
|
|
|
<ul>
|
|
<li>ORCIDs only</li>
|
|
<li>ORCIDs plus normal authors</li>
|
|
</ul></li>
|
|
<li>I exported a random item’s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li>
|
|
</ul>
|
|
|
|
<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Hmm, with the <code>dc.contributor.author</code> column removed, DSpace doesn’t detect any changes</li>
|
|
<li>With a blank <code>dc.contributor.author</code> column, DSpace wants to remove all non-ORCID authors and add the new ORCID authors</li>
|
|
<li>I added the <a href="https://github.com/ilri/DSpace/issues/234">disclaimer text</a> to the About page, then added a footer link to the disclaimer’s ID, but there is a Bootstrap issue that causes the page content to disappear when using in-page anchors: <a href="https://github.com/twbs/bootstrap/issues/1768">https://github.com/twbs/bootstrap/issues/1768</a></li>
|
|
</ul>
|
|
|
|
<p><img src="2016/10/bootstrap-issue.png" alt="Bootstrap issue with in-page anchors" /></p>
|
|
|
|
<ul>
|
|
<li>Looks like we’ll just have to add the text to the About page (without a link) or add a separate page</li>
|
|
</ul>
|
|
|
|
<h2 id="2016-10-04">2016-10-04</h2>
|
|
|
|
<ul>
|
|
<li>Start testing cleanups of authors that Peter sent last week</li>
|
|
<li>Out of 40,000+ rows, Peter had indicated corrections for ~3,200 of them—too many to look through carefully, so I did some basic quality checking:
|
|
|
|
<ul>
|
|
<li>Trim leading/trailing whitespace</li>
|
|
<li>Find invalid characters</li>
|
|
<li>Cluster values to merge obvious authors</li>
|
|
</ul></li>
|
|
<li>That left us with 3,180 valid corrections and 3 deletions:</li>
|
|
</ul>
|
|
|
|
<pre><code>$ ./fix-metadata-values.py -i authors-fix-3180.csv -f dc.contributor.author -t correct -m 3 -d dspacetest -u dspacetest -p fuuu
|
|
$ ./delete-metadata-values.py -i authors-delete-3.csv -f dc.contributor.author -m 3 -d dspacetest -u dspacetest -p fuuu
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Remove old about page (<a href="https://github.com/ilri/DSpace/pull/284">#284</a>)</li>
|
|
<li>CGSpace crashed a few times today</li>
|
|
<li>Generate list of unique authors in CCAFS collections:</li>
|
|
</ul>
|
|
|
|
<pre><code>dspacetest=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/32729', '10568/5472', '10568/5473', '10568/10288', '10568/70974', '10568/3547', '10568/3549', '10568/3531','10568/16890','10568/5470','10568/3546', '10568/36024', '10568/66581', '10568/21789', '10568/5469', '10568/5468', '10568/3548', '10568/71053', '10568/25167'))) group by text_value order by count desc) to /tmp/ccafs-authors.csv with csv;
|
|
</code></pre>
|
|
|
|
<h2 id="2016-10-05">2016-10-05</h2>
|
|
|
|
<ul>
|
|
<li>Work on more infrastructure cleanups for Ansible DSpace role</li>
|
|
<li>Clean up Let’s Encrypt plumbing and submit pull request for rmg-ansible-public (<a href="https://github.com/ilri/rmg-ansible-public/pull/60">#60</a>)</li>
|
|
</ul>
|
|
|
|
<h2 id="2016-10-06">2016-10-06</h2>
|
|
|
|
<ul>
|
|
<li>Nice! DSpace Test (linode02) is now having <code>java.lang.OutOfMemoryError: Java heap space</code> errors…</li>
|
|
<li>Heap space is 2048m, and we have 5GB of RAM being used for OS cache (Solr!) so let’s just bump the memory to 3072m</li>
|
|
<li>Magdalena from CCAFS asked why the colors in the thumbnails for these <a href="https://cgspace.cgiar.org/handle/10568/71249">two</a> <a href="https://cgspace.cgiar.org/handle/10568/71259">items</a> look different, even though they are the same in the PDF itself</li>
|
|
</ul>
|
|
|
|
<p><img src="2016/10/cmyk-vs-srgb.jpg" alt="CMYK vs sRGB colors" /></p>
|
|
|
|
<ul>
|
|
<li>Turns out the first PDF was exported from InDesign using CMYK and the second one was using sRGB</li>
|
|
<li>Run all system updates on DSpace Test and reboot it</li>
|
|
</ul>
|
|
|
|
<h2 id="2016-10-08">2016-10-08</h2>
|
|
|
|
<ul>
|
|
<li>Re-deploy CGSpace with latest changes from late September and early October</li>
|
|
<li>Run fixes for ILRI subjects and delete blank metadata values:</li>
|
|
</ul>
|
|
|
|
<pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
|
|
DELETE 11
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Run all system updates and reboot CGSpace</li>
|
|
<li>Delete ten gigs of old 2015 Tomcat logs that never got rotated (WTF?):</li>
|
|
</ul>
|
|
|
|
<pre><code>root@linode01:~# ls -lh /var/log/tomcat7/localhost_access_log.2015* | wc -l
|
|
47
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Delete 2GB <code>cron-filter-media.log</code> file, as it is just a log from a cron job and it doesn’t get rotated like normal log files (almost a year now maybe)</li>
|
|
</ul>
|
|
|
|
<h2 id="2016-10-14">2016-10-14</h2>
|
|
|
|
<ul>
|
|
<li>Run all system updates on DSpace Test and reboot server</li>
|
|
<li>Looking into some issues with Discovery filters in Atmire’s content and usage analysis module after adjusting the filter class</li>
|
|
<li>Looks like changing the filters from <code>configuration.DiscoverySearchFilterFacet</code> to <code>configuration.DiscoverySearchFilter</code> breaks them in Atmire CUA module</li>
|
|
</ul>
|
|
|
|
<h2 id="2016-10-17">2016-10-17</h2>
|
|
|
|
<ul>
|
|
<li>A bit more cleanup on the CCAFS authors, and run the corrections on DSpace Test:</li>
|
|
</ul>
|
|
|
|
<pre><code>$ ./fix-metadata-values.py -i ccafs-authors-oct-16.csv -f dc.contributor.author -t 'correct name' -m 3 -d dspace -u dspace -p fuuu
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>One observation is that there are still some old versions of names in the author lookup because authors appear in other communities (as we only corrected authors from CCAFS for this round)</li>
|
|
</ul>
|
|
|
|
|
|
|
|
|
|
</article>
|
|
|
|
|
|
</div> <!-- /.blog-main -->
|
|
|
|
|
|
<aside class="col-sm-3 offset-sm-1 blog-sidebar">
|
|
|
|
|
|
|
|
|
|
<section class="sidebar-module">
|
|
<h4>Recent Posts</h4>
|
|
<ol class="list-unstyled">
|
|
|
|
<li><a href="/cgspace-notes/2016-10/">October, 2016</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2016-09/">September, 2016</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2016-08/">August, 2016</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2016-07/">July, 2016</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2016-06/">June, 2016</a></li>
|
|
|
|
</ol>
|
|
</section>
|
|
|
|
|
|
<section class="sidebar-module">
|
|
<h4>Links</h4>
|
|
<ol class="list-unstyled">
|
|
|
|
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
|
|
|
|
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
|
|
|
|
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
|
|
|
|
</ol>
|
|
</section>
|
|
|
|
</aside>
|
|
|
|
|
|
|
|
</div> <!-- /.row -->
|
|
</div> <!-- /.container -->
|
|
|
|
<footer class="blog-footer">
|
|
<p>
|
|
|
|
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
|
|
|
|
</p>
|
|
<p>
|
|
<a href="#">Back to top</a>
|
|
</p>
|
|
</footer>
|
|
|
|
</body>
|
|
|
|
</html>
|