cgspace-notes/public/2017-10/index.html

364 lines
14 KiB
HTML
Raw Normal View History

2017-10-01 07:13:31 +02:00
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:title" content="October, 2017" />
<meta property="og:description" content="2017-10-01
Peter emailed to point out that many items in the ILRI archive collection have multiple handles:
http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
There appears to be a pattern but I&rsquo;ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine
2017-10-01 12:44:37 +02:00
Add Katherine Lutz to the groups for content sumission and edit steps of the CGIAR System collections
2017-10-01 07:13:31 +02:00
" />
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2017-10/" />
<meta property="article:published_time" content="2017-10-01T08:07:54&#43;03:00"/>
2017-10-11 12:58:31 +02:00
<meta property="article:modified_time" content="2017-10-11T02:21:51&#43;03:00"/>
2017-10-01 07:13:31 +02:00
<meta name="twitter:card" content="summary"/><meta name="twitter:title" content="October, 2017"/>
<meta name="twitter:description" content="2017-10-01
Peter emailed to point out that many items in the ILRI archive collection have multiple handles:
http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
There appears to be a pattern but I&rsquo;ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine
2017-10-01 12:44:37 +02:00
Add Katherine Lutz to the groups for content sumission and edit steps of the CGIAR System collections
2017-10-01 07:13:31 +02:00
"/>
<meta name="generator" content="Hugo 0.29" />
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "October, 2017",
"url": "https://alanorth.github.io/cgspace-notes/2017-10/",
2017-10-11 12:58:31 +02:00
"wordCount": "1031",
2017-10-01 07:13:31 +02:00
"datePublished": "2017-10-01T08:07:54&#43;03:00",
2017-10-11 12:58:31 +02:00
"dateModified": "2017-10-11T02:21:51&#43;03:00",
2017-10-01 07:13:31 +02:00
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"keywords": "Notes"
}
</script>
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2017-10/">
<title>October, 2017 | CGSpace Notes</title>
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.css" rel="stylesheet" integrity="sha384-zYRhIy0/Yl1e5lW9cimY1AugfdkHChXyCbs2NKFaLTgeQjVfj/CMPIUdjXm/JPWV" crossorigin="anonymous">
</head>
<body>
<div class="blog-masthead">
<div class="container">
<nav class="nav blog-nav">
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
</nav>
</div>
</div>
<header class="blog-header">
<div class="container">
<h1 class="blog-title"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
<p class="lead blog-description">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
</div>
</header>
<div class="container">
<div class="row">
<div class="col-sm-8 blog-main">
<article class="blog-post">
<header>
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2017-10/">October, 2017</a></h2>
<p class="blog-post-meta"><time datetime="2017-10-01T08:07:54&#43;03:00">Sun Oct 01, 2017</time> by Alan Orth in
<i class="fa fa-tag" aria-hidden="true"></i>&nbsp;<a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
</p>
</header>
<h2 id="2017-10-01">2017-10-01</h2>
<ul>
<li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li>
</ul>
<pre><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
</code></pre>
<ul>
<li>There appears to be a pattern but I&rsquo;ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</li>
2017-10-01 12:44:37 +02:00
<li>Add Katherine Lutz to the groups for content sumission and edit steps of the CGIAR System collections</li>
2017-10-01 07:13:31 +02:00
</ul>
<p></p>
2017-10-02 07:14:44 +02:00
<h2 id="2017-10-02">2017-10-02</h2>
<ul>
<li>Peter Ballantyne said he was having problems logging into CGSpace with &ldquo;both&rdquo; of his accounts (CGIAR LDAP and personal, apparently)</li>
<li>I looked in the logs and saw some LDAP lookup failures due to timeout but also strangely a &ldquo;no DN found&rdquo; error:</li>
</ul>
<pre><code>2017-10-01 20:24:57,928 WARN org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=CA0AA5FEAEA8805645489404CDCE9594:ip_addr=41.204.190.40:ldap_attribute_lookup:type=failed_search javax.naming.CommunicationException\colon; svcgroot2.cgiarad.org\colon;3269 [Root exception is java.net.ConnectException\colon; Connection timed out (Connection timed out)]
2017-10-01 20:22:37,982 INFO org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=CA0AA5FEAEA8805645489404CDCE9594:ip_addr=41.204.190.40:failed_login:no DN found for user pballantyne
</code></pre>
<ul>
2017-10-02 07:31:19 +02:00
<li>I thought maybe his account had expired (seeing as it&rsquo;s was the first of the month) but he says he was finally able to log in today</li>
<li>The logs for yesterday show fourteen errors related to LDAP auth failures:</li>
</ul>
<pre><code>$ grep -c &quot;ldap_authentication:type=failed_auth&quot; dspace.log.2017-10-01
14
</code></pre>
<ul>
<li>For what it&rsquo;s worth, there are no errors on any other recent days, so it must have been some network issue on Linode or CGNET&rsquo;s LDAP server</li>
2017-10-02 16:29:23 +02:00
<li>Linode emailed to say that linode578611 (DSpace Test) needs to migrate to a new host for a security update so I initiated the migration immediately rather than waiting for the scheduled time in two weeks</li>
2017-10-02 07:14:44 +02:00
</ul>
2017-10-04 10:29:41 +02:00
<h2 id="2017-10-04">2017-10-04</h2>
<ul>
<li>Twice in the last twenty-four hours Linode has alerted about high CPU usage on CGSpace (linode2533629)</li>
<li>Communicate with Sam from the CGIAR System Organization about some broken links coming from their CGIAR Library domain to CGSpace</li>
<li>The first is a link to a browse page that should be handled better in nginx:</li>
</ul>
<pre><code>http://library.cgiar.org/browse?value=Intellectual%20Assets%20Reports&amp;type=subject → https://cgspace.cgiar.org/browse?value=Intellectual%20Assets%20Reports&amp;type=subject
</code></pre>
<ul>
<li>We&rsquo;ll need to check for browse links and handle them properly, including swapping the <code>subject</code> parameter for <code>systemsubject</code> (which doesn&rsquo;t exist in Discovery yet, but we&rsquo;ll need to add it) as we have moved their poorly curated subjects from <code>dc.subject</code> to <code>cg.subject.system</code></li>
<li>The second link was a direct link to a bitstream which has broken due to the sequence being updated, so I told him he should link to the handle of the item instead</li>
2017-10-04 14:56:39 +02:00
<li>Help Sisay proof sixty-two IITA records on DSpace Test</li>
<li>Lots of inconsistencies and errors in subjects, dc.format.extent, regions, countries</li>
2017-10-04 16:06:10 +02:00
<li>Merge the Discovery search changes for ISI Journal (<a href="https://github.com/ilri/DSpace/pull/341">#341</a>)</li>
2017-10-04 10:29:41 +02:00
</ul>
2017-10-05 17:36:49 +02:00
<h2 id="2017-10-05">2017-10-05</h2>
<ul>
<li>Twice in the past twenty-four hours Linode has warned that CGSpace&rsquo;s outbound traffic rate was exceeding the notification threshold</li>
<li>I had a look at yesterday&rsquo;s OAI and REST logs in <code>/var/log/nginx</code> but didn&rsquo;t see anything unusual:</li>
</ul>
<pre><code># awk '{print $1}' /var/log/nginx/rest.log.1 | sort -n | uniq -c | sort -h | tail -n 10
141 157.55.39.240
145 40.77.167.85
162 66.249.66.92
181 66.249.66.95
211 66.249.66.91
312 66.249.66.94
384 66.249.66.90
1495 50.116.102.77
3904 70.32.83.92
9904 45.5.184.196
# awk '{print $1}' /var/log/nginx/oai.log.1 | sort -n | uniq -c | sort -h | tail -n 10
5 66.249.66.71
6 66.249.66.67
6 68.180.229.31
8 41.84.227.85
8 66.249.66.92
17 66.249.66.65
24 66.249.66.91
38 66.249.66.95
69 66.249.66.90
148 66.249.66.94
</code></pre>
<ul>
<li>Working on the nginx redirects for CGIAR Library</li>
<li>We should start using 301 redirects and also allow for <code>/sitemap</code> to work on the library.cgiar.org domain so the CGIAR System Organization people can update their Google Search Console and allow Google to find their content in a structured way</li>
<li>Remove eleven occurrences of <code>ACP</code> in IITA&rsquo;s <code>cg.coverage.region</code> using the Atmire batch edit module from Discovery</li>
<li>Need to investigate how we can verify the library.cgiar.org using the HTML or DNS methods</li>
2017-10-06 02:10:53 +02:00
<li>Run corrections on 143 ILRI Archive items that had two <code>dc.identifier.uri</code> values (Handle) that Peter had pointed out earlier this week</li>
<li>I used OpenRefine to isolate them and then fixed and re-imported them into CGSpace</li>
2017-10-06 02:11:56 +02:00
<li>I manually checked a dozen of them and it appeared that the correct handle was always the second one, so I just deleted the first one</li>
2017-10-05 17:36:49 +02:00
</ul>
2017-10-06 11:22:46 +02:00
<h2 id="2017-10-06">2017-10-06</h2>
<ul>
<li>I saw a nice tweak to thumbnail presentation on the Cardiff Metropolitan University DSpace: <a href="https://repository.cardiffmet.ac.uk/handle/10369/8780">https://repository.cardiffmet.ac.uk/handle/10369/8780</a></li>
<li>It adds a subtle border and box shadow, before and after:</li>
</ul>
<p><img src="/cgspace-notes/2017/10/dspace-thumbnail-original.png" alt="Original flat thumbnails" />
<img src="/cgspace-notes/2017/10/dspace-thumbnail-box-shadow.png" alt="Tweaked with border and box shadow" /></p>
<ul>
<li>I&rsquo;ll post it to the Yammer group to see what people think</li>
2017-10-06 18:27:58 +02:00
<li>I figured out at way to do the HTML verification for Google Search console for library.cgiar.org</li>
<li>We can drop the HTML file in their XMLUI theme folder and it will get copied to the webapps directory during build/install</li>
<li>Then we add an nginx alias for that URL in the library.cgiar.org vhost</li>
<li>This method is kinda a hack but at least we can put all the pieces into git to be reproducible</li>
<li>I will tell Tunji to send me the verification file</li>
2017-10-06 11:22:46 +02:00
</ul>
2017-10-10 13:22:54 +02:00
<h2 id="2017-10-10">2017-10-10</h2>
<ul>
<li>Deploy logic to allow verification of the library.cgiar.org domain in the Google Search Console (<a href="https://github.com/ilri/DSpace/pull/343">#343</a>)</li>
<li>After verifying both the HTTP and HTTPS domains and submitting a sitemap it will be interesting to see how the stats in the console as well as the search results change (currently 28,500 results):</li>
</ul>
<p><img src="/cgspace-notes/2017/10/google-search-console.png" alt="Google Search Console" />
<img src="/cgspace-notes/2017/10/google-search-console-2.png" alt="Google Search Console 2" />
<img src="/cgspace-notes/2017/10/google-search-results.png" alt="Google Search results" /></p>
<ul>
<li>I tried to submit a &ldquo;Change of Address&rdquo; request in the Google Search Console but I need to be an owner on CGSpace&rsquo;s console (currently I&rsquo;m just a user) in order to do that</li>
2017-10-11 01:21:51 +02:00
<li>Manually clean up some communities and collections that Peter had requested a few weeks ago</li>
<li>Delete Community <sup>10568</sup>&frasl;<sub>102</sub> (ILRI Research and Development Issues)</li>
<li>Move five collections to <sup>10568</sup>&frasl;<sub>27629</sub> (ILRI Projects) using <code>move-collections.sh</code> with the following configuration:</li>
</ul>
<pre><code>10568/1637 10568/174 10568/27629
10568/1642 10568/174 10568/27629
10568/1614 10568/174 10568/27629
10568/75561 10568/150 10568/27629
10568/183 10568/230 10568/27629
</code></pre>
<ul>
<li>Delete community <sup>10568</sup>&frasl;<sub>174</sub> (Sustainable livestock futures)</li>
<li>Delete collections in <sup>10568</sup>&frasl;<sub>27629</sub> that have zero items (33 of them!)</li>
2017-10-10 13:22:54 +02:00
</ul>
2017-10-11 12:58:31 +02:00
<h2 id="2017-10-11">2017-10-11</h2>
<ul>
<li>Peter added me as an owner on the CGSpace property on Google Search Console and I tried to submit a &ldquo;Change of Address&rdquo; request for the CGIAR Library but got an error:</li>
</ul>
<p><img src="/cgspace-notes/2017/10/search-console-change-address-error.png" alt="Change of Address error" /></p>
<ul>
<li>We are sending top-level CGIAR Library traffic to their specific community hierarchy in CGSpace so this type of change of address won&rsquo;t work—we&rsquo;ll just need to wait for Google to slowly index everything and take note of the HTTP 301 redirects</li>
<li>Also the Google Search Console doesn&rsquo;t work very well with Google Analytics being blocked, so I had to turn off my ad blocker to get the &ldquo;Change of Address&rdquo; tool to work!</li>
</ul>
2017-10-01 07:13:31 +02:00
</article>
</div> <!-- /.blog-main -->
<aside class="col-sm-3 ml-auto blog-sidebar">
<section class="sidebar-module">
<h4>Recent Posts</h4>
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2017-10/">October, 2017</a></li>
<li><a href="/cgspace-notes/cgiar-library-migration/">CGIAR Library Migration</a></li>
<li><a href="/cgspace-notes/2017-09/">September, 2017</a></li>
<li><a href="/cgspace-notes/2017-08/">August, 2017</a></li>
<li><a href="/cgspace-notes/2017-07/">July, 2017</a></li>
</ol>
</section>
<section class="sidebar-module">
<h4>Links</h4>
<ol class="list-unstyled">
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
</ol>
</section>
</aside>
</div> <!-- /.row -->
</div> <!-- /.container -->
<footer class="blog-footer">
<p>
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
</p>
<p>
<a href="#">Back to top</a>
</p>
</footer>
</body>
</html>