cgspace-notes/public/2016-11/index.html

<!DOCTYPE html>
<html lang="en">

<head>
    

    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
    <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->

    <meta name="description" content="">
    <meta name="author" content="Alan Orth">

    <!-- OpenGraph Metadata: http://ogp.me/ -->
    <meta property="og:title" content="November, 2016">
    <meta property="og:description" content="">

    
    <meta property="og:type" content="article">
    <meta property="article:published_time" content="2016-11-01T09:21:00&#43;03:00">
    <meta property="article:author" content="Alan Orth">
      
    
    <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2016-11/">

    <!-- Metadata for Twitter: https://dev.twitter.com/cards/markup -->
    
    <meta property="twitter:card" content="summary">
    
    
    <meta property="twitter:title" content="November, 2016">
    <meta property="twitter:description" content="">

    
    <meta name="generator" content="Hugo 0.17" />


    <base href="https://alanorth.github.io/cgspace-notes/">
    <link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2016-11/">

    <title>November, 2016 | CGSpace Notes</title>

    <!-- combined, minified CSS -->
    <link href="https://alanorth.github.io/cgspace-notes/css/style.css" rel="stylesheet">

    <!-- RSS 2.0 feed -->
    <link href="https://alanorth.github.io/cgspace-notes/index.xml" type="application/rss+xml" rel="alternate">
  </head>

<body>

  <div class="blog-masthead">
    <div class="container">
      <nav class="nav blog-nav">
        <a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
        
      </nav>
    </div>
  </div>

  <header class="blog-header">
    <div class="container">
      <h1 class="blog-title"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
      
    </div>
  </header>

  <div class="container">
    <div class="row">
      <div class="col-sm-8 blog-main">

        
      <article class="blog-post">
        <header>
          <h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2016-11/">November, 2016</a></h2>
          <p class="blog-post-meta"><time datetime="2016-11-01T09:21:00&#43;03:00">Tue Nov 01, 2016</time> by Alan Orth in 

            <i class="fa fa-tag" aria-hidden="true"></i>&nbsp;<a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>

</p>
        </header>
        

<h2 id="2016-11-01">2016-11-01</h2>

<ul>
<li>Add <code>dc.type</code> to the output options for Atmire&rsquo;s Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li>
</ul>

<p><img src="2016/11/listings-and-reports.png" alt="Listings and Reports with output type" /></p>

<h2 id="2016-11-02">2016-11-02</h2>

<ul>
<li>Migrate DSpace Test to DSpace 5.5 (<a href="https://gist.github.com/alanorth/61013895c6efe7095d7f81000953d1cf">notes</a>)</li>
<li>Run all updates on DSpace Test and reboot the server</li>
<li>Looks like the OAI bug from DSpace 5.1 that caused validation at Base Search to fail is now fixed and DSpace Test passes validation! (<a href="https://github.com/ilri/DSpace/issues/63">#63</a>)</li>
<li>Indexing Discovery on DSpace Test took 332 minutes, which is like five times as long as it usually takes</li>
<li>At the end it appeared to finish correctly but there were lots of errors right after it finished:</li>
</ul>

<pre><code>2016-11-02 15:09:48,578 INFO  com.atmire.dspace.discovery.AtmireSolrService @ Wrote Collection: 10568/76454 to Index
2016-11-02 15:09:48,584 INFO  com.atmire.dspace.discovery.AtmireSolrService @ Wrote Community: 10568/3202 to Index
2016-11-02 15:09:48,589 INFO  com.atmire.dspace.discovery.AtmireSolrService @ Wrote Collection: 10568/76455 to Index
2016-11-02 15:09:48,590 INFO  com.atmire.dspace.discovery.AtmireSolrService @ Wrote Community: 10568/51693 to Index
2016-11-02 15:09:48,590 INFO  org.dspace.discovery.IndexClient @ Done with indexing
2016-11-02 15:09:48,600 INFO  com.atmire.dspace.discovery.AtmireSolrService @ Wrote Collection: 10568/76456 to Index
2016-11-02 15:09:48,613 INFO  org.dspace.discovery.SolrServiceImpl @ Wrote Item: 10568/55536 to Index
2016-11-02 15:09:48,616 INFO  com.atmire.dspace.discovery.AtmireSolrService @ Wrote Collection: 10568/76457 to Index
2016-11-02 15:09:48,634 ERROR com.atmire.dspace.discovery.AtmireSolrService @
java.lang.NullPointerException
        at org.dspace.discovery.SearchUtils.getDiscoveryConfiguration(SourceFile:57)
        at org.dspace.discovery.SolrServiceImpl.buildDocument(SolrServiceImpl.java:824)
        at com.atmire.dspace.discovery.AtmireSolrService.indexContent(AtmireSolrService.java:821)
        at com.atmire.dspace.discovery.AtmireSolrService.updateIndex(AtmireSolrService.java:898)
        at org.dspace.discovery.SolrServiceImpl.createIndex(SolrServiceImpl.java:370)
        at org.dspace.storage.rdbms.DatabaseUtils$ReindexerThread.run(DatabaseUtils.java:945)
</code></pre>

<ul>
<li>DSpace is still up, and a few minutes later I see the default DSpace indexer is still running</li>
<li>Sure enough, looking back before the first one finished, I see output from both indexers interleaved in the log:</li>
</ul>

<pre><code>2016-11-02 15:09:28,545 INFO  org.dspace.discovery.SolrServiceImpl @ Wrote Item: 10568/47242 to Index
2016-11-02 15:09:28,633 INFO  org.dspace.discovery.SolrServiceImpl @ Wrote Item: 10568/60785 to Index
2016-11-02 15:09:28,678 INFO  com.atmire.dspace.discovery.AtmireSolrService @ Processing (55695 of 55722): 43557
2016-11-02 15:09:28,688 INFO  com.atmire.dspace.discovery.AtmireSolrService @ Processing (55703 of 55722): 34476
</code></pre>

<ul>
<li>I will raise a ticket with Atmire to ask them</li>
</ul>

<h2 id="2016-11-06">2016-11-06</h2>

<ul>
<li>After re-deploying and re-indexing I didn&rsquo;t see the same issue, and the indexing completed in 85 minutes, which is about how long it is supposed to take</li>
</ul>

<h2 id="2016-11-07">2016-11-07</h2>

<ul>
<li>Horrible one liner to get Linode ID from certain Ansible host vars:</li>
</ul>

<pre><code>$ grep -A 3 contact_info * | grep -E &quot;(Orth|Sisay|Peter|Daniel|Tsega)&quot; | awk -F'-' '{print $1}' | grep linode | uniq | xargs grep linode_id
</code></pre>

<ul>
<li>I noticed some weird CRPs in the database, and they don&rsquo;t show up in Discovery for some reason, perhaps the <code>:</code></li>
<li>I&rsquo;ll export these and fix them in batch:</li>
</ul>

<pre><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id=230 group by text_value order by count desc) to /tmp/crp.csv with csv;
COPY 22
</code></pre>

<ul>
<li>Test running the replacements:</li>
</ul>

<pre><code>$ ./fix-metadata-values.py -i /tmp/CRPs.csv -f cg.contributor.crp -t correct -m 230 -d dspace -u dspace -p 'fuuu'
</code></pre>

<ul>
<li>Add <code>AMR</code> to ILRI subjects and remove one duplicate instance of IITA in author affiliations controlled vocabulary (<a href="https://github.com/ilri/DSpace/pull/288">#288</a>)</li>
</ul>

<h2 id="2016-11-08">2016-11-08</h2>

<ul>
<li>Atmire&rsquo;s Listings and Reports module seems to be broken on DSpace 5.5</li>
</ul>

<p><img src="2016/11/listings-and-reports-55.png" alt="Listings and Reports broken in DSpace 5.5" /></p>

<ul>
<li>I&rsquo;ve filed a ticket with Atmire</li>
<li>Thinking about batch updates for ORCIDs and authors</li>
<li>Playing with <a href="https://github.com/moonlitesolutions/SolrClient">SolrClient</a> in Python to query Solr</li>
<li>All records in the authority core are either <code>authority_type:orcid</code> or <code>authority_type:person</code></li>
<li>There is a <code>deleted</code> field and all items seem to be <code>false</code>, but might be important sanity check to remember</li>
<li>The way to go is probably to have a CSV of author names and authority IDs, then to batch update them in PostgreSQL</li>
<li>Dump of the top ~200 authors in CGSpace:</li>
</ul>

<pre><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id=3 group by text_value order by count desc limit 210) to /tmp/210-authors.csv with csv;
</code></pre>

<h2 id="2016-11-09">2016-11-09</h2>

<ul>
<li>CGSpace crashed so I quickly ran system updates, applied one or two of the waiting changes from the <code>5_x-prod</code> branch, and rebooted the server</li>
<li>The error was <code>Timeout waiting for idle object</code> but I haven&rsquo;t looked into the Tomcat logs to see what happened</li>
<li>Also, I ran the corrections for CRPs from earlier this week</li>
</ul>

<h2 id="2016-11-10">2016-11-10</h2>

<ul>
<li>Helping Megan Zandstra and CIAT with some questions about the REST API</li>
<li>Playing with <code>find-by-metadata-field</code>, this works:</li>
</ul>

<pre><code>$ curl -s -H &quot;accept: application/json&quot; -H &quot;Content-Type: application/json&quot; -X POST &quot;http://localhost:8080/rest/items/find-by-metadata-field&quot; -d '{&quot;key&quot;: &quot;cg.subject.ilri&quot;,&quot;value&quot;: &quot;SEEDS&quot;}'
</code></pre>

<ul>
<li>But the results are deceiving because metadata fields can have text languages and your query must match exactly!</li>
</ul>

<pre><code>dspace=# select distinct text_value, text_lang from metadatavalue where resource_type_id=2 and metadata_field_id=203 and text_value='SEEDS';
 text_value | text_lang
------------+-----------
 SEEDS      |
 SEEDS      |
 SEEDS      | en_US
(3 rows)
</code></pre>

<ul>
<li>So basically, the text language here could be null, blank, or en_US</li>
<li>To query metadata with these properties, you can do:</li>
</ul>

<pre><code>$ curl -s -H &quot;accept: application/json&quot; -H &quot;Content-Type: application/json&quot; -X POST &quot;http://localhost:8080/rest/items/find-by-metadata-field&quot; -d '{&quot;key&quot;: &quot;cg.subject.ilri&quot;,&quot;value&quot;: &quot;SEEDS&quot;}' | jq length
55
$ curl -s -H &quot;accept: application/json&quot; -H &quot;Content-Type: application/json&quot; -X POST &quot;http://localhost:8080/rest/items/find-by-metadata-field&quot; -d '{&quot;key&quot;: &quot;cg.subject.ilri&quot;,&quot;value&quot;: &quot;SEEDS&quot;, &quot;language&quot;:&quot;&quot;}' | jq length
34
$ curl -s -H &quot;accept: application/json&quot; -H &quot;Content-Type: application/json&quot; -X POST &quot;http://localhost:8080/rest/items/find-by-metadata-field&quot; -d '{&quot;key&quot;: &quot;cg.subject.ilri&quot;,&quot;value&quot;: &quot;SEEDS&quot;, &quot;language&quot;:&quot;en_US&quot;}' | jq length
</code></pre>

<ul>
<li>The results (55+34=89) don&rsquo;t seem to match those from the database:</li>
</ul>

<pre><code>dspace=# select count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id=203 and text_value='SEEDS' and text_lang is null;
 count
-------
    15
dspace=# select count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id=203 and text_value='SEEDS' and text_lang='';
 count
-------
     4
dspace=# select count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id=203 and text_value='SEEDS' and text_lang='en_US';
 count
-------
    66
</code></pre>

<ul>
<li>So, querying from the API I get 55 + 34 = 89 results, but the database actually only has 85&hellip;</li>
<li>And the <code>find-by-metadata-field</code> endpoint doesn&rsquo;t seem to have a way to get all items with the field, or a wildcard value</li>
<li>I&rsquo;ll ask a question on the dspace-tech mailing list</li>
<li>And speaking of <code>text_lang</code>, this is interesting:</li>
</ul>

<pre><code>dspacetest=# select distinct text_lang from metadatavalue where resource_type_id=2;
 text_lang
-----------

 ethnob
 en
 spa
 EN
 es
 frn
 en_
 en_US

 EN_US
 eng
 en_U
 fr
(14 rows)
</code></pre>

<ul>
<li>Generate a list of all these so I can fix them in batch:</li>
</ul>

<pre><code>dspace=# \copy (select distinct text_lang, count(*) from metadatavalue where resource_type_id=2 group by text_lang order by count desc) to /tmp/text-langs.csv with csv;
COPY 14
</code></pre>

<ul>
<li>Perhaps we need to fix them all in batch, or experiment with fixing only certain metadatavalues:</li>
</ul>

<pre><code>dspace=# update metadatavalue set text_lang='en_US' where resource_type_id=2 and metadata_field_id=203 and text_value='SEEDS';
UPDATE 85
</code></pre>


      </article> 


      </div> <!-- /.blog-main -->

      
        <aside class="col-sm-3 offset-sm-1 blog-sidebar">
  

  <section class="sidebar-module">
    <h4>Recent Posts</h4>
    <ol class="list-unstyled">
      
      <li><a href="/cgspace-notes/2016-11/">November, 2016</a></li>
      
      <li><a href="/cgspace-notes/2016-10/">October, 2016</a></li>
      
      <li><a href="/cgspace-notes/2016-09/">September, 2016</a></li>
      
      <li><a href="/cgspace-notes/2016-08/">August, 2016</a></li>
      
      <li><a href="/cgspace-notes/2016-07/">July, 2016</a></li>
      
    </ol>
  </section>

  
  <section class="sidebar-module">
    <h4>Links</h4>
    <ol class="list-unstyled">
      
      <li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
      
      <li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
      
      <li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
      
    </ol>
  </section>
  
</aside>

      
    </div> <!-- /.row -->
  </div> <!-- /.container -->

  <footer class="blog-footer">
    <p>
      
      Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
      
    </p>
    <p>
      <a href="#">Back to top</a>
    </p>
  </footer>

</body>

</html>
Regenerate public 2016-11-01 09:23:50 +02:00			`<!DOCTYPE html>`
			`<html lang="en">`

			`<head>`




			`<meta charset="utf-8">`
			`<meta http-equiv="X-UA-Compatible" content="IE=edge">`
			`<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">`
			`<!-- The above 3 meta tags must come first in the head; any other head content must come after these tags -->`

			`<meta name="description" content="">`
			`<meta name="author" content="Alan Orth">`

			`<!-- OpenGraph Metadata: http://ogp.me/ -->`
			`<meta property="og:title" content="November, 2016">`
			`<meta property="og:description" content="">`


			`<meta property="og:type" content="article">`
			`<meta property="article:published_time" content="2016-11-01T09:21:00+03:00">`
			`<meta property="article:author" content="Alan Orth">`



			`<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2016-11/">`

			`<!-- Metadata for Twitter: https://dev.twitter.com/cards/markup -->`

			`<meta property="twitter:card" content="summary">`


			`<meta property="twitter:title" content="November, 2016">`
			`<meta property="twitter:description" content="">`







			`<meta name="generator" content="Hugo 0.17" />`


			`<base href="https://alanorth.github.io/cgspace-notes/">`
			`<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2016-11/">`

			`<title>November, 2016 \| CGSpace Notes</title>`

			`<!-- combined, minified CSS -->`
			`<link href="https://alanorth.github.io/cgspace-notes/css/style.css" rel="stylesheet">`

			`<!-- RSS 2.0 feed -->`
			`<link href="https://alanorth.github.io/cgspace-notes/index.xml" type="application/rss+xml" rel="alternate">`
			`</head>`

			`<body>`

			`<div class="blog-masthead">`
			`<div class="container">`
			`<nav class="nav blog-nav">`
			`<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>`

			`</nav>`
			`</div>`
			`</div>`

			`<header class="blog-header">`
			`<div class="container">`
			`<h1 class="blog-title"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>`

			`</div>`
			`</header>`

			`<div class="container">`
			`<div class="row">`
			`<div class="col-sm-8 blog-main">`


			`<article class="blog-post">`
			`<header>`
			`<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2016-11/">November, 2016</a></h2>`
			`<p class="blog-post-meta"><time datetime="2016-11-01T09:21:00+03:00">Tue Nov 01, 2016</time> by Alan Orth in`

			`<i class="fa fa-tag" aria-hidden="true"></i> <a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>`

			`</p>`
			`</header>`


Add notes for 2016-11-02 2016-11-02 12:27:37 +02:00			`<h2 id="2016-11-01">2016-11-01</h2>`
Regenerate public 2016-11-01 09:23:50 +02:00
			`<ul>`
Update notes for 2016-11-01 2016-11-01 09:35:09 +02:00			`<li>Add <code>dc.type</code> to the output options for Atmire’s Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li>`
Regenerate public 2016-11-01 09:23:50 +02:00			`</ul>`

			`<p><img src="2016/11/listings-and-reports.png" alt="Listings and Reports with output type" /></p>`

Add notes for 2016-11-02 2016-11-02 12:27:37 +02:00			`<h2 id="2016-11-02">2016-11-02</h2>`

			`<ul>`
Add notes for 2016-11-06 2016-11-06 13:47:08 +02:00			`<li>Migrate DSpace Test to DSpace 5.5 (<a href="https://gist.github.com/alanorth/61013895c6efe7095d7f81000953d1cf">notes</a>)</li>`
Add notes for 2016-11-02 2016-11-02 12:27:37 +02:00			`<li>Run all updates on DSpace Test and reboot the server</li>`
Update notes for 2016-11-02 2016-11-02 13:51:42 +02:00			`<li>Looks like the OAI bug from DSpace 5.1 that caused validation at Base Search to fail is now fixed and DSpace Test passes validation! (<a href="https://github.com/ilri/DSpace/issues/63">#63</a>)</li>`
Update notes for 2016-11-02 2016-11-02 17:19:02 +02:00			`<li>Indexing Discovery on DSpace Test took 332 minutes, which is like five times as long as it usually takes</li>`
			`<li>At the end it appeared to finish correctly but there were lots of errors right after it finished:</li>`
			`</ul>`

			`<pre><code>2016-11-02 15:09:48,578 INFO com.atmire.dspace.discovery.AtmireSolrService @ Wrote Collection: 10568/76454 to Index`
			`2016-11-02 15:09:48,584 INFO com.atmire.dspace.discovery.AtmireSolrService @ Wrote Community: 10568/3202 to Index`
			`2016-11-02 15:09:48,589 INFO com.atmire.dspace.discovery.AtmireSolrService @ Wrote Collection: 10568/76455 to Index`
			`2016-11-02 15:09:48,590 INFO com.atmire.dspace.discovery.AtmireSolrService @ Wrote Community: 10568/51693 to Index`
			`2016-11-02 15:09:48,590 INFO org.dspace.discovery.IndexClient @ Done with indexing`
			`2016-11-02 15:09:48,600 INFO com.atmire.dspace.discovery.AtmireSolrService @ Wrote Collection: 10568/76456 to Index`
			`2016-11-02 15:09:48,613 INFO org.dspace.discovery.SolrServiceImpl @ Wrote Item: 10568/55536 to Index`
			`2016-11-02 15:09:48,616 INFO com.atmire.dspace.discovery.AtmireSolrService @ Wrote Collection: 10568/76457 to Index`
			`2016-11-02 15:09:48,634 ERROR com.atmire.dspace.discovery.AtmireSolrService @`
			`java.lang.NullPointerException`
			`at org.dspace.discovery.SearchUtils.getDiscoveryConfiguration(SourceFile:57)`
			`at org.dspace.discovery.SolrServiceImpl.buildDocument(SolrServiceImpl.java:824)`
			`at com.atmire.dspace.discovery.AtmireSolrService.indexContent(AtmireSolrService.java:821)`
			`at com.atmire.dspace.discovery.AtmireSolrService.updateIndex(AtmireSolrService.java:898)`
			`at org.dspace.discovery.SolrServiceImpl.createIndex(SolrServiceImpl.java:370)`
			`at org.dspace.storage.rdbms.DatabaseUtils$ReindexerThread.run(DatabaseUtils.java:945)`
			`</code></pre>`

			`<ul>`
			`<li>DSpace is still up, and a few minutes later I see the default DSpace indexer is still running</li>`
			`<li>Sure enough, looking back before the first one finished, I see output from both indexers interleaved in the log:</li>`
			`</ul>`

			`<pre><code>2016-11-02 15:09:28,545 INFO org.dspace.discovery.SolrServiceImpl @ Wrote Item: 10568/47242 to Index`
			`2016-11-02 15:09:28,633 INFO org.dspace.discovery.SolrServiceImpl @ Wrote Item: 10568/60785 to Index`
			`2016-11-02 15:09:28,678 INFO com.atmire.dspace.discovery.AtmireSolrService @ Processing (55695 of 55722): 43557`
			`2016-11-02 15:09:28,688 INFO com.atmire.dspace.discovery.AtmireSolrService @ Processing (55703 of 55722): 34476`
			`</code></pre>`

			`<ul>`
			`<li>I will raise a ticket with Atmire to ask them</li>`
Add notes for 2016-11-02 2016-11-02 12:27:37 +02:00			`</ul>`

Add notes for 2016-11-06 2016-11-06 13:47:08 +02:00			`<h2 id="2016-11-06">2016-11-06</h2>`

			`<ul>`
			`<li>After re-deploying and re-indexing I didn’t see the same issue, and the indexing completed in 85 minutes, which is about how long it is supposed to take</li>`
			`</ul>`

Update notes for 2016-11-07 2016-11-07 16:46:42 +02:00			`<h2 id="2016-11-07">2016-11-07</h2>`

			`<ul>`
			`<li>Horrible one liner to get Linode ID from certain Ansible host vars:</li>`
			`</ul>`

			`<pre><code>$ grep -A 3 contact_info * \| grep -E "(Orth\|Sisay\|Peter\|Daniel\|Tsega)" \| awk -F'-' '{print $1}' \| grep linode \| uniq \| xargs grep linode_id`
			`</code></pre>`

			`<ul>`
			`<li>I noticed some weird CRPs in the database, and they don’t show up in Discovery for some reason, perhaps the <code>:</code></li>`
			`<li>I’ll export these and fix them in batch:</li>`
			`</ul>`

			`<pre><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id=230 group by text_value order by count desc) to /tmp/crp.csv with csv;`
			`COPY 22`
			`</code></pre>`

			`<ul>`
			`<li>Test running the replacements:</li>`
			`</ul>`

			`<pre><code>$ ./fix-metadata-values.py -i /tmp/CRPs.csv -f cg.contributor.crp -t correct -m 230 -d dspace -u dspace -p 'fuuu'`
			`</code></pre>`

			`<ul>`
			`<li>Add <code>AMR</code> to ILRI subjects and remove one duplicate instance of IITA in author affiliations controlled vocabulary (<a href="https://github.com/ilri/DSpace/pull/288">#288</a>)</li>`
			`</ul>`

Add notes for 2016-11-08 2016-11-08 11:27:36 +02:00			`<h2 id="2016-11-08">2016-11-08</h2>`

			`<ul>`
			`<li>Atmire’s Listings and Reports module seems to be broken on DSpace 5.5</li>`
			`</ul>`

			`<p><img src="2016/11/listings-and-reports-55.png" alt="Listings and Reports broken in DSpace 5.5" /></p>`

			`<ul>`
			`<li>I’ve filed a ticket with Atmire</li>`
Update notes for 2016-11-08 2016-11-08 12:44:29 +02:00			`<li>Thinking about batch updates for ORCIDs and authors</li>`
			`<li>Playing with <a href="https://github.com/moonlitesolutions/SolrClient">SolrClient</a> in Python to query Solr</li>`
			`<li>All records in the authority core are either <code>authority_type:orcid</code> or <code>authority_type:person</code></li>`
			`<li>There is a <code>deleted</code> field and all items seem to be <code>false</code>, but might be important sanity check to remember</li>`
			`<li>The way to go is probably to have a CSV of author names and authority IDs, then to batch update them in PostgreSQL</li>`
			`<li>Dump of the top ~200 authors in CGSpace:</li>`
Add notes for 2016-11-08 2016-11-08 11:27:36 +02:00			`</ul>`

Update notes for 2016-11-08 2016-11-08 12:44:29 +02:00			`<pre><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id=3 group by text_value order by count desc limit 210) to /tmp/210-authors.csv with csv;`
			`</code></pre>`

Add notes for 2016-11-09 2016-11-09 13:11:16 +02:00			`<h2 id="2016-11-09">2016-11-09</h2>`

			`<ul>`
			`<li>CGSpace crashed so I quickly ran system updates, applied one or two of the waiting changes from the <code>5_x-prod</code> branch, and rebooted the server</li>`
			`<li>The error was <code>Timeout waiting for idle object</code> but I haven’t looked into the Tomcat logs to see what happened</li>`
			`<li>Also, I ran the corrections for CRPs from earlier this week</li>`
			`</ul>`

Update notes for 2016-11-10 2016-11-10 14:49:09 +02:00			`<h2 id="2016-11-10">2016-11-10</h2>`

			`<ul>`
			`<li>Helping Megan Zandstra and CIAT with some questions about the REST API</li>`
			`<li>Playing with <code>find-by-metadata-field</code>, this works:</li>`
			`</ul>`

			`<pre><code>$ curl -s -H "accept: application/json" -H "Content-Type: application/json" -X POST "http://localhost:8080/rest/items/find-by-metadata-field" -d '{"key": "cg.subject.ilri","value": "SEEDS"}'`
			`</code></pre>`

			`<ul>`
			`<li>But the results are deceiving because metadata fields can have text languages and your query must match exactly!</li>`
			`</ul>`

			`<pre><code>dspace=# select distinct text_value, text_lang from metadatavalue where resource_type_id=2 and metadata_field_id=203 and text_value='SEEDS';`
			`text_value \| text_lang`
			`------------+-----------`
			`SEEDS \|`
			`SEEDS \|`
			`SEEDS \| en_US`
			`(3 rows)`
			`</code></pre>`

			`<ul>`
			`<li>So basically, the text language here could be null, blank, or en_US</li>`
			`<li>To query metadata with these properties, you can do:</li>`
			`</ul>`

			`<pre><code>$ curl -s -H "accept: application/json" -H "Content-Type: application/json" -X POST "http://localhost:8080/rest/items/find-by-metadata-field" -d '{"key": "cg.subject.ilri","value": "SEEDS"}' \| jq length`
			`55`
			`$ curl -s -H "accept: application/json" -H "Content-Type: application/json" -X POST "http://localhost:8080/rest/items/find-by-metadata-field" -d '{"key": "cg.subject.ilri","value": "SEEDS", "language":""}' \| jq length`
			`34`
			`$ curl -s -H "accept: application/json" -H "Content-Type: application/json" -X POST "http://localhost:8080/rest/items/find-by-metadata-field" -d '{"key": "cg.subject.ilri","value": "SEEDS", "language":"en_US"}' \| jq length`
			`</code></pre>`

			`<ul>`
			`<li>The results (55+34=89) don’t seem to match those from the database:</li>`
			`</ul>`

			`<pre><code>dspace=# select count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id=203 and text_value='SEEDS' and text_lang is null;`
			`count`
			`-------`
			`15`
			`dspace=# select count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id=203 and text_value='SEEDS' and text_lang='';`
			`count`
			`-------`
			`4`
			`dspace=# select count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id=203 and text_value='SEEDS' and text_lang='en_US';`
			`count`
			`-------`
			`66`
			`</code></pre>`

			`<ul>`
			`<li>So, querying from the API I get 55 + 34 = 89 results, but the database actually only has 85…</li>`
			`<li>And the <code>find-by-metadata-field</code> endpoint doesn’t seem to have a way to get all items with the field, or a wildcard value</li>`
			`<li>I’ll ask a question on the dspace-tech mailing list</li>`
			`<li>And speaking of <code>text_lang</code>, this is interesting:</li>`
			`</ul>`

			`<pre><code>dspacetest=# select distinct text_lang from metadatavalue where resource_type_id=2;`
			`text_lang`
			`-----------`

			`ethnob`
			`en`
			`spa`
			`EN`
			`es`
			`frn`
			`en_`
			`en_US`

			`EN_US`
			`eng`
			`en_U`
			`fr`
			`(14 rows)`
			`</code></pre>`

			`<ul>`
			`<li>Generate a list of all these so I can fix them in batch:</li>`
			`</ul>`

			`<pre><code>dspace=# \copy (select distinct text_lang, count(*) from metadatavalue where resource_type_id=2 group by text_lang order by count desc) to /tmp/text-langs.csv with csv;`
			`COPY 14`
			`</code></pre>`

			`<ul>`
			`<li>Perhaps we need to fix them all in batch, or experiment with fixing only certain metadatavalues:</li>`
			`</ul>`

			`<pre><code>dspace=# update metadatavalue set text_lang='en_US' where resource_type_id=2 and metadata_field_id=203 and text_value='SEEDS';`
			`UPDATE 85`
			`</code></pre>`

Regenerate public 2016-11-01 09:23:50 +02:00


			`</article>`


			`</div> <!-- /.blog-main -->`


			`<aside class="col-sm-3 offset-sm-1 blog-sidebar">`




			`<section class="sidebar-module">`
			`<h4>Recent Posts</h4>`
			`<ol class="list-unstyled">`

			`<li><a href="/cgspace-notes/2016-11/">November, 2016</a></li>`

			`<li><a href="/cgspace-notes/2016-10/">October, 2016</a></li>`

			`<li><a href="/cgspace-notes/2016-09/">September, 2016</a></li>`

			`<li><a href="/cgspace-notes/2016-08/">August, 2016</a></li>`

			`<li><a href="/cgspace-notes/2016-07/">July, 2016</a></li>`

			`</ol>`
			`</section>`


			`<section class="sidebar-module">`
			`<h4>Links</h4>`
			`<ol class="list-unstyled">`

			`<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>`

			`<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>`

			`<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>`

			`</ol>`
			`</section>`

			`</aside>`



			`</div> <!-- /.row -->`
			`</div> <!-- /.container -->`

			`<footer class="blog-footer">`
			`<p>`

			`Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.`

			`</p>`
			`<p>`
			`<a href="#">Back to top</a>`
			`</p>`
			`</footer>`

			`</body>`

			`</html>`