mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-27 00:48:19 +01:00
356 lines
19 KiB
HTML
356 lines
19 KiB
HTML
<!DOCTYPE html>
|
|
<html lang="en">
|
|
|
|
<head>
|
|
<meta charset="utf-8">
|
|
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
|
|
|
|
<meta property="og:title" content="January, 2017" />
|
|
<meta property="og:description" content="2017-01-02 I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error I tested on DSpace Test as well and it doesn’t work there either I asked on the dspace-tech mailing list because it seems to be broken, and actually now I’m not sure if we’ve ever had the sharding task run successfully over all these years 2017-01-04 I tried to shard my local dev instance and it fails the same way: $ JAVA_OPTS="-Xms768m -Xmx768m -Dfile." />
|
|
<meta property="og:type" content="article" />
|
|
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2017-01/" />
|
|
|
|
|
|
<meta property="og:updated_time" content="2017-01-02T10:43:00+03:00"/>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<meta itemprop="name" content="January, 2017">
|
|
<meta itemprop="description" content="2017-01-02 I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error I tested on DSpace Test as well and it doesn’t work there either I asked on the dspace-tech mailing list because it seems to be broken, and actually now I’m not sure if we’ve ever had the sharding task run successfully over all these years 2017-01-04 I tried to shard my local dev instance and it fails the same way: $ JAVA_OPTS="-Xms768m -Xmx768m -Dfile.">
|
|
|
|
|
|
<meta itemprop="dateModified" content="2017-01-02T10:43:00+03:00" />
|
|
<meta itemprop="wordCount" content="826">
|
|
|
|
|
|
|
|
<meta itemprop="keywords" content="notes," />
|
|
|
|
|
|
|
|
<meta name="twitter:card" content="summary"/>
|
|
|
|
|
|
|
|
<meta name="twitter:title" content="January, 2017"/>
|
|
<meta name="twitter:description" content="2017-01-02 I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error I tested on DSpace Test as well and it doesn’t work there either I asked on the dspace-tech mailing list because it seems to be broken, and actually now I’m not sure if we’ve ever had the sharding task run successfully over all these years 2017-01-04 I tried to shard my local dev instance and it fails the same way: $ JAVA_OPTS="-Xms768m -Xmx768m -Dfile."/>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<meta name="generator" content="Hugo 0.18.1" />
|
|
|
|
|
|
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2017-01/">
|
|
|
|
<title>January, 2017 | CGSpace Notes</title>
|
|
|
|
<!-- combined, minified CSS -->
|
|
<link href="https://alanorth.github.io/cgspace-notes/css/style.css" rel="stylesheet" integrity="sha384-qRVpIj9hSzsBhmO8Y7YEKF2UFra2sJQtl9V/uFKKDvy+Wjh9zgTku6VRgT8YdPoD" crossorigin="anonymous">
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
</head>
|
|
|
|
<body>
|
|
|
|
<div class="blog-masthead">
|
|
<div class="container">
|
|
<nav class="nav blog-nav">
|
|
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
|
|
|
|
|
|
|
|
</nav>
|
|
</div>
|
|
</div>
|
|
|
|
<header class="blog-header">
|
|
<div class="container">
|
|
<h1 class="blog-title"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
|
|
|
|
</div>
|
|
</header>
|
|
|
|
<div class="container">
|
|
<div class="row">
|
|
<div class="col-sm-8 blog-main">
|
|
|
|
|
|
|
|
|
|
<article class="blog-post">
|
|
<header>
|
|
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2017-01/">January, 2017</a></h2>
|
|
<p class="blog-post-meta"><time datetime="2017-01-02T10:43:00+03:00">Mon Jan 02, 2017</time> by Alan Orth in
|
|
|
|
<i class="fa fa-tag" aria-hidden="true"></i> <a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
|
|
|
|
</p>
|
|
</header>
|
|
|
|
|
|
<h2 id="2017-01-02">2017-01-02</h2>
|
|
|
|
<ul>
|
|
<li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li>
|
|
<li>I tested on DSpace Test as well and it doesn’t work there either</li>
|
|
<li>I asked on the dspace-tech mailing list because it seems to be broken, and actually now I’m not sure if we’ve ever had the sharding task run successfully over all these years</li>
|
|
</ul>
|
|
|
|
<h2 id="2017-01-04">2017-01-04</h2>
|
|
|
|
<ul>
|
|
<li>I tried to shard my local dev instance and it fails the same way:</li>
|
|
</ul>
|
|
|
|
<pre><code>$ JAVA_OPTS="-Xms768m -Xmx768m -Dfile.encoding=UTF-8" ~/dspace/bin/dspace stats-util -s
|
|
Moving: 9318 into core statistics-2016
|
|
Exception: IOException occured when talking to server at: http://localhost:8081/solr//statistics-2016
|
|
org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://localhost:8081/solr//statistics-2016
|
|
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566)
|
|
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
|
|
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
|
|
at org.dspace.statistics.SolrLogger.shardSolrIndex(SourceFile:2291)
|
|
at org.dspace.statistics.util.StatisticsClient.main(StatisticsClient.java:106)
|
|
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
|
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
|
|
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
|
at java.lang.reflect.Method.invoke(Method.java:498)
|
|
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
|
|
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
|
|
Caused by: org.apache.http.client.ClientProtocolException
|
|
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:867)
|
|
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
|
|
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
|
|
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
|
|
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:448)
|
|
... 10 more
|
|
Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry request with a non-repeatable request entity. The cause lists the reason the original request failed.
|
|
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:659)
|
|
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
|
|
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
|
|
... 14 more
|
|
Caused by: java.net.SocketException: Broken pipe (Write failed)
|
|
at java.net.SocketOutputStream.socketWrite0(Native Method)
|
|
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
|
|
at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
|
|
at org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:181)
|
|
at org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:124)
|
|
at org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:181)
|
|
at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:132)
|
|
at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:89)
|
|
at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
|
|
at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:117)
|
|
at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:265)
|
|
at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:203)
|
|
at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:236)
|
|
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:121)
|
|
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
|
|
... 16 more
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>And the DSpace log shows:</li>
|
|
</ul>
|
|
|
|
<pre><code>2017-01-04 22:39:05,412 INFO org.dspace.statistics.SolrLogger @ Created core with name: statistics-2016
|
|
2017-01-04 22:39:05,412 INFO org.dspace.statistics.SolrLogger @ Moving: 9318 records into core statistics-2016
|
|
2017-01-04 22:39:07,310 INFO org.apache.http.impl.client.SystemDefaultHttpClient @ I/O exception (java.net.SocketException) caught when processing request to {}->http://localhost:8081: Broken pipe (Write failed)
|
|
2017-01-04 22:39:07,310 INFO org.apache.http.impl.client.SystemDefaultHttpClient @ Retrying request to {}->http://localhost:8081
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Despite failing instantly, a <code>statistics-2016</code> directory was created, but it only has a data dir (no conf)</li>
|
|
<li>The Tomcat access logs show more:</li>
|
|
</ul>
|
|
|
|
<pre><code>127.0.0.1 - - [04/Jan/2017:22:39:05 +0200] "GET /solr/statistics/select?q=type%3A2+AND+id%3A1&wt=javabin&version=2 HTTP/1.1" 200 107
|
|
127.0.0.1 - - [04/Jan/2017:22:39:05 +0200] "GET /solr/statistics/select?q=*%3A*&rows=0&facet=true&facet.range=time&facet.range.start=NOW%2FYEAR-17YEARS&facet.range.end=NOW%2FYEAR%2B0YEARS&facet.range.gap=%2B1YEAR&facet.mincount=1&wt=javabin&version=2 HTTP/1.1" 200 423
|
|
127.0.0.1 - - [04/Jan/2017:22:39:05 +0200] "GET /solr/admin/cores?action=STATUS&core=statistics-2016&indexInfo=true&wt=javabin&version=2 HTTP/1.1" 200 77
|
|
127.0.0.1 - - [04/Jan/2017:22:39:05 +0200] "GET /solr/admin/cores?action=CREATE&name=statistics-2016&instanceDir=statistics&dataDir=%2FUsers%2Faorth%2Fdspace%2Fsolr%2Fstatistics-2016%2Fdata&wt=javabin&version=2 HTTP/1.1" 200 63
|
|
127.0.0.1 - - [04/Jan/2017:22:39:07 +0200] "GET /solr/statistics/select?csv.mv.separator=%7C&q=*%3A*&fq=time%3A%28%5B2016%5C-01%5C-01T00%5C%3A00%5C%3A00Z+TO+2017%5C-01%5C-01T00%5C%3A00%5C%3A00Z%5D+NOT+2017%5C-01%5C-01T00%5C%3A00%5C%3A00Z%29&rows=10000&wt=csv HTTP/1.1" 200 4359517
|
|
127.0.0.1 - - [04/Jan/2017:22:39:07 +0200] "GET /solr/statistics/admin/luke?show=schema&wt=javabin&version=2 HTTP/1.1" 200 16248
|
|
127.0.0.1 - - [04/Jan/2017:22:39:07 +0200] "POST /solr//statistics-2016/update/csv?commit=true&softCommit=false&waitSearcher=true&f.previousWorkflowStep.split=true&f.previousWorkflowStep.separator=%7C&f.previousWorkflowStep.encapsulator=%22&f.actingGroupId.split=true&f.actingGroupId.separator=%7C&f.actingGroupId.encapsulator=%22&f.containerCommunity.split=true&f.containerCommunity.separator=%7C&f.containerCommunity.encapsulator=%22&f.range.split=true&f.range.separator=%7C&f.range.encapsulator=%22&f.containerItem.split=true&f.containerItem.separator=%7C&f.containerItem.encapsulator=%22&f.p_communities_map.split=true&f.p_communities_map.separator=%7C&f.p_communities_map.encapsulator=%22&f.ngram_query_search.split=true&f.ngram_query_search.separator=%7C&f.ngram_query_search.encapsulator=%22&f.containerBitstream.split=true&f.containerBitstream.separator=%7C&f.containerBitstream.encapsulator=%22&f.owningItem.split=true&f.owningItem.separator=%7C&f.owningItem.encapsulator=%22&f.actingGroupParentId.split=true&f.actingGroupParentId.separator=%7C&f.actingGroupParentId.encapsulator=%22&f.text.split=true&f.text.separator=%7C&f.text.encapsulator=%22&f.simple_query_search.split=true&f.simple_query_search.separator=%7C&f.simple_query_search.encapsulator=%22&f.owningComm.split=true&f.owningComm.separator=%7C&f.owningComm.encapsulator=%22&f.owner.split=true&f.owner.separator=%7C&f.owner.encapsulator=%22&f.filterquery.split=true&f.filterquery.separator=%7C&f.filterquery.encapsulator=%22&f.p_group_map.split=true&f.p_group_map.separator=%7C&f.p_group_map.encapsulator=%22&f.actorMemberGroupId.split=true&f.actorMemberGroupId.separator=%7C&f.actorMemberGroupId.encapsulator=%22&f.bitstreamId.split=true&f.bitstreamId.separator=%7C&f.bitstreamId.encapsulator=%22&f.group_name.split=true&f.group_name.separator=%7C&f.group_name.encapsulator=%22&f.p_communities_name.split=true&f.p_communities_name.separator=%7C&f.p_communities_name.encapsulator=%22&f.query.split=true&f.query.separator=%7C&f.query.encapsulator=%22&f.workflowStep.split=true&f.workflowStep.separator=%7C&f.workflowStep.encapsulator=%22&f.containerCollection.split=true&f.containerCollection.separator=%7C&f.containerCollection.encapsulator=%22&f.complete_query_search.split=true&f.complete_query_search.separator=%7C&f.complete_query_search.encapsulator=%22&f.p_communities_id.split=true&f.p_communities_id.separator=%7C&f.p_communities_id.encapsulator=%22&f.rangeDescription.split=true&f.rangeDescription.separator=%7C&f.rangeDescription.encapsulator=%22&f.group_id.split=true&f.group_id.separator=%7C&f.group_id.encapsulator=%22&f.bundleName.split=true&f.bundleName.separator=%7C&f.bundleName.encapsulator=%22&f.ngram_simplequery_search.split=true&f.ngram_simplequery_search.separator=%7C&f.ngram_simplequery_search.encapsulator=%22&f.group_map.split=true&f.group_map.separator=%7C&f.group_map.encapsulator=%22&f.owningColl.split=true&f.owningColl.separator=%7C&f.owningColl.encapsulator=%22&f.p_group_id.split=true&f.p_group_id.separator=%7C&f.p_group_id.encapsulator=%22&f.p_group_name.split=true&f.p_group_name.separator=%7C&f.p_group_name.encapsulator=%22&wt=javabin&version=2 HTTP/1.1" 409 156
|
|
127.0.0.1 - - [04/Jan/2017:22:44:00 +0200] "POST /solr/datatables/update?wt=javabin&version=2 HTTP/1.1" 200 41
|
|
127.0.0.1 - - [04/Jan/2017:22:44:00 +0200] "POST /solr/datatables/update HTTP/1.1" 200 40
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Very interesting… it creates the core and then fails somehow</li>
|
|
</ul>
|
|
|
|
<h2 id="2017-01-08">2017-01-08</h2>
|
|
|
|
<ul>
|
|
<li>Put Sisay’s <code>item-view.xsl</code> code to show mapped collections on CGSpace (<a href="https://github.com/ilri/DSpace/pull/295">#295</a>)</li>
|
|
</ul>
|
|
|
|
<h2 id="2017-01-09">2017-01-09</h2>
|
|
|
|
<ul>
|
|
<li>A user wrote to tell me that the new display of an item’s mappings had a crazy bug for at least one item: <a href="https://cgspace.cgiar.org/handle/10568/78596">https://cgspace.cgiar.org/handle/10568/78596</a></li>
|
|
<li>She said she only mapped it once, but it appears to be mapped 184 times</li>
|
|
</ul>
|
|
|
|
<p><img src="/cgspace-notes/2017/01/mapping-crazy-duplicate.png" alt="Crazy item mapping" /></p>
|
|
|
|
<h2 id="2017-01-10">2017-01-10</h2>
|
|
|
|
<ul>
|
|
<li>I tried to clean up the duplicate mappings by exporting the item’s metadata to CSV, editing, and re-importing, but DSpace said “no changes were detected”</li>
|
|
<li>I’ve asked on the dspace-tech mailing list to see if anyone can help</li>
|
|
<li>I found an old post on the mailing list discussing a similar issue, and listing some SQL commands that might help</li>
|
|
<li>For example, this shows 186 mappings for the item, the first three of which are real:</li>
|
|
</ul>
|
|
|
|
<pre><code>dspace=# select * from collection2item where item_id = '80596';
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Then I deleted the others:</li>
|
|
</ul>
|
|
|
|
<pre><code>dspace=# delete from collection2item where item_id = '80596' and id not in (90792, 90806, 90807);
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>And in the item view it now shows the correct mappings</li>
|
|
<li>I will have to ask the DSpace people if this is a valid approach</li>
|
|
<li>Finish looking at the Journal Title corrections of the top 500 Journal Titles so we can make a controlled vocabulary from it</li>
|
|
</ul>
|
|
|
|
<h2 id="2017-01-11">2017-01-11</h2>
|
|
|
|
<ul>
|
|
<li>Maria found another item with duplicate mappings: <a href="https://cgspace.cgiar.org/handle/10568/78658">https://cgspace.cgiar.org/handle/10568/78658</a></li>
|
|
<li>Error in <code>fix-metadata-values.py</code> when it tries to print the value for Entwicklung & Ländlicher Raum:</li>
|
|
</ul>
|
|
|
|
<pre><code>Traceback (most recent call last):
|
|
File "./fix-metadata-values.py", line 80, in <module>
|
|
print("Fixing {} occurences of: {}".format(records_to_fix, record[0]))
|
|
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 15: ordinal not in range(128)
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Seems we need to encode as UTF-8 before printing to screen, ie:</li>
|
|
</ul>
|
|
|
|
<pre><code>print("Fixing {} occurences of: {}".format(records_to_fix, record[0].encode('utf-8')))
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>See: <a href="http://stackoverflow.com/a/36427358/487333">http://stackoverflow.com/a/36427358/487333</a></li>
|
|
<li>I’m actually not sure if we need to encode() the strings to UTF-8 before writing them to the database… I’ve never had this issue before</li>
|
|
<li>Now back to cleaning up some journal titles so we can make the controlled vocabulary:</li>
|
|
</ul>
|
|
|
|
<pre><code>$ ./fix-metadata-values.py -i /tmp/fix-27-journal-titles.csv -f dc.source -t correct -m 55 -d dspace-u dspace-p 'fuuu'
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Now get the top 500 journal titles:</li>
|
|
</ul>
|
|
|
|
<pre><code>dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by cou
|
|
nt desc limit 500) to /tmp/journal-titles.csv with csv;
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>The values are a bit dirty and outdated, since the file I had given to Abenet and Peter was from November</li>
|
|
<li>I will have to go through these and fix some more before making the controlled vocabulary</li>
|
|
<li>Added 30 more corrections or so, now there are 49 total and I’ll have to get the top 500 after applying them</li>
|
|
</ul>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
</article>
|
|
|
|
|
|
|
|
</div> <!-- /.blog-main -->
|
|
|
|
<aside class="col-sm-3 offset-sm-1 blog-sidebar">
|
|
|
|
|
|
|
|
|
|
|
|
<section class="sidebar-module">
|
|
<h4>Recent Posts</h4>
|
|
<ol class="list-unstyled">
|
|
|
|
<li><a href="/cgspace-notes/2017-01/">January, 2017</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2016-12/">December, 2016</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2016-11/">November, 2016</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2016-10/">October, 2016</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2016-09/">September, 2016</a></li>
|
|
|
|
</ol>
|
|
</section>
|
|
|
|
|
|
|
|
|
|
<section class="sidebar-module">
|
|
<h4>Links</h4>
|
|
<ol class="list-unstyled">
|
|
|
|
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
|
|
|
|
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
|
|
|
|
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
|
|
|
|
</ol>
|
|
</section>
|
|
|
|
</aside>
|
|
|
|
|
|
</div> <!-- /.row -->
|
|
</div> <!-- /.container -->
|
|
|
|
<footer class="blog-footer">
|
|
<p>
|
|
|
|
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
|
|
|
|
</p>
|
|
<p>
|
|
<a href="#">Back to top</a>
|
|
</p>
|
|
</footer>
|
|
|
|
</body>
|
|
|
|
</html>
|