cgspace-notes/public/2016-09/index.html

<!DOCTYPE html>
<html lang="en-us">
<head prefix="og: http://ogp.me/ns#">
  <meta charset="utf-8" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1" />
  <meta property="og:title" content=" September, 2016 &middot;  CGSpace Notes" />
  
  <meta property="og:site_name" content="CGSpace Notes" />
  <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2016-09/" />
  
  
  <meta property="og:type" content="article" />
  
  <meta property="og:article:published_time" content="2016-09-01T15:53:00&#43;03:00" />
  
  <meta property="og:article:tag" content="notes" />
  
  
  <title>
     September, 2016 &middot;  CGSpace Notes
  </title>

  <link rel="stylesheet" href="https://alanorth.github.io/cgspace-notes/css/bootstrap.min.css" />
  <link rel="stylesheet" href="https://alanorth.github.io/cgspace-notes/css/main.css" />
  <link rel="stylesheet" href="https://alanorth.github.io/cgspace-notes/css/font-awesome.min.css" />
  <link rel="stylesheet" href="https://alanorth.github.io/cgspace-notes/css/github.css" />
  <link rel="stylesheet" href="//fonts.googleapis.com/css?family=Source+Sans+Pro:200,300,400" type="text/css">
  <link rel="shortcut icon" href="https://alanorth.github.io/cgspace-notes/images/favicon.ico" />
  <link rel="apple-touch-icon" href="https://alanorth.github.io/cgspace-notes/images/apple-touch-icon.png" />
  
</head>
<body>
    <header class="global-header"  style="background-image:url(../images/bg.jpg )">
    <section class="header-text">
      <h1><a href="https://alanorth.github.io/cgspace-notes/">CGSpace Notes</a></h1>
      
      <div class="sns-links hidden-print">
  
  
</div>

      
      <a href="https://alanorth.github.io/cgspace-notes/" class="btn-header btn-back hidden-xs">
        <i class="fa fa-angle-left" aria-hidden="true"></i>
        &nbsp;Home
      </a>
      
      
    </section>
  </header>
  <main class="container">


<article>
  <header>
    <h1 class="text-primary">September, 2016</h1>
    <div class="post-meta clearfix">
      <div class="post-date pull-left">
        Posted on
        <time datetime="2016-09-01T15:53:00&#43;03:00">
          Sep 1, 2016
        </time>
      </div>
      <div class="pull-right">
        
        <span class="post-tag small"><a href="https://alanorth.github.io/cgspace-notes//tags/notes">#notes</a></span>
        
      </div>
    </div>
  </header>
  <section>
    

<h2 id="2016-09-01">2016-09-01</h2>

<ul>
<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li>
<li>Discuss how the migration of CGIAR&rsquo;s Active Directory to a flat structure will break our LDAP groups in DSpace</li>
<li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li>
<li>It looks like we might be able to use OUs now, instead of DCs:</li>
</ul>

<pre><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;admigration1@cgiarad.org&quot; -W &quot;(sAMAccountName=admigration1)&quot;
</code></pre>

<ul>
<li>User who has been migrated to the root vs user still in the hierarchical structure:</li>
</ul>

<pre><code>distinguishedName: CN=Last\, First (ILRI),OU=ILRI Kenya Employees,OU=ILRI Kenya,OU=ILRIHUB,DC=CGIARAD,DC=ORG
distinguishedName: CN=Last\, First (ILRI),OU=ILRI Ethiopia Employees,OU=ILRI Ethiopia,DC=ILRI,DC=CGIARAD,DC=ORG
</code></pre>

<ul>
<li>Changing the DSpace LDAP config to use <code>OU=ILRIHUB</code> seems to work:</li>
</ul>

<p><img src="../images/2016/09/ilri-ldap-users.png" alt="DSpace groups based on LDAP DN" /></p>

<ul>
<li>Notes for local PostgreSQL database recreation from production snapshot:</li>
</ul>

<pre><code>$ dropdb dspacetest
$ createdb -O dspacetest --encoding=UNICODE dspacetest
$ psql dspacetest -c 'alter user dspacetest createuser;'
$ pg_restore -O -U dspacetest -d dspacetest ~/Downloads/cgspace_2016-09-01.backup
$ psql dspacetest -c 'alter user dspacetest nocreateuser;'
$ psql -U dspacetest -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest -h localhost
$ vacuumdb dspacetest
</code></pre>

<ul>
<li>Some names that I thought I fixed in July seem not to be:</li>
</ul>

<pre><code>dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Poole, %';
      text_value       |              authority               | confidence
-----------------------+--------------------------------------+------------
 Poole, Elizabeth Jane | b6efa27f-8829-4b92-80fe-bc63e03e3ccb |        600
 Poole, Elizabeth Jane | 41628f42-fc38-4b38-b473-93aec9196326 |        600
 Poole, Elizabeth Jane | 83b82da0-f652-4ebc-babc-591af1697919 |        600
 Poole, Elizabeth Jane | c3a22456-8d6a-41f9-bba0-de51ef564d45 |        600
 Poole, E.J.           | c3a22456-8d6a-41f9-bba0-de51ef564d45 |        600
 Poole, E.J.           | 0fbd91b9-1b71-4504-8828-e26885bf8b84 |        600
(6 rows)
</code></pre>

<ul>
<li>At least a few of these actually have the correct ORCID, but I will unify the authority to be c3a22456-8d6a-41f9-bba0-de51ef564d45</li>
</ul>

<pre><code>dspacetest=# update metadatavalue set authority='c3a22456-8d6a-41f9-bba0-de51ef564d45', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Poole, %';
UPDATE 69
</code></pre>

<ul>
<li>And for Peter Ballantyne:</li>
</ul>

<pre><code>dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Ballantyne, %';
    text_value     |              authority               | confidence
-------------------+--------------------------------------+------------
 Ballantyne, Peter | 2dcbcc7b-47b0-4fd7-bef9-39d554494081 |        600
 Ballantyne, Peter | 4f04ca06-9a76-4206-bd9c-917ca75d278e |        600
 Ballantyne, P.G.  | 4f04ca06-9a76-4206-bd9c-917ca75d278e |        600
 Ballantyne, Peter | ba5f205b-b78b-43e5-8e80-0c9a1e1ad2ca |        600
 Ballantyne, Peter | 20f21160-414c-4ecf-89ca-5f2cb64e75c1 |        600
(5 rows)
</code></pre>

<ul>
<li>Again, a few have the correct ORCID, but there should only be one authority&hellip;</li>
</ul>

<pre><code>dspacetest=# update metadatavalue set authority='4f04ca06-9a76-4206-bd9c-917ca75d278e', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Ballantyne, %';
UPDATE 58
</code></pre>

<ul>
<li>And for me:</li>
</ul>

<pre><code>dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Orth, A%';
 text_value |              authority               | confidence
------------+--------------------------------------+------------
 Orth, Alan | 4884def0-4d7e-4256-9dd4-018cd60a5871 |        600
 Orth, A.   | 4884def0-4d7e-4256-9dd4-018cd60a5871 |        600
 Orth, A.   | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
(3 rows)
dspacetest=# update metadatavalue set authority='1a1943a0-3f87-402f-9afe-e52fb46a513e', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Orth, %';
UPDATE 11
</code></pre>

<ul>
<li>And for CCAFS author Bruce Campbell that I had discussed with CCAFS earlier this week:</li>
</ul>

<pre><code>dspacetest=# update metadatavalue set authority='0e414b4c-4671-4a23-b570-6077aca647d8', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Campbell, B%';
UPDATE 166
dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Campbell, B%';
       text_value       |              authority               | confidence
------------------------+--------------------------------------+------------
 Campbell, Bruce        | 0e414b4c-4671-4a23-b570-6077aca647d8 |        600
 Campbell, Bruce Morgan | 0e414b4c-4671-4a23-b570-6077aca647d8 |        600
 Campbell, B.           | 0e414b4c-4671-4a23-b570-6077aca647d8 |        600
 Campbell, B.M.         | 0e414b4c-4671-4a23-b570-6077aca647d8 |        600
(4 rows)
</code></pre>

<ul>
<li>After updating the Authority indexes (<code>bin/dspace index-authority</code>) everything looks good</li>
<li>Run authority updates on CGSpace</li>
</ul>

<h2 id="2016-09-05">2016-09-05</h2>

<ul>
<li>After one week of logging TLS connections on CGSpace:</li>
</ul>

<pre><code># zgrep &quot;DES-CBC3&quot; /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | wc -l
217
# zcat -f -- /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | wc -l
1164376
# zgrep &quot;DES-CBC3&quot; /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | awk '{print $6}' | sort | uniq
TLSv1/DES-CBC3-SHA
TLSv1/EDH-RSA-DES-CBC3-SHA
</code></pre>

<ul>
<li>So this represents <code>0.02%</code> of 1.16M connections over a one-week period</li>
<li>Transforming some filenames in OpenRefine so they can have a useful description for SAFBuilder:</li>
</ul>

<pre><code>value + &quot;__description:&quot; + cells[&quot;dc.type&quot;].value
</code></pre>

<ul>
<li>This gives you, for example: <code>Mainstreaming gender in agricultural R&amp;D.pdf__description:Brief</code></li>
</ul>

<h2 id="2016-09-06">2016-09-06</h2>

<ul>
<li>Trying to import the records for CIAT from yesterday, but having filename encoding issues from their zip file</li>
<li>Create a zip on Mac OS X from a SAF bundle containing only one record with one PDF:

<ul>
<li>Filename: Complementing Farmers Genetic Knowledge Farmer Breeding Workshop in Turipaná, Colombia.pdf</li>
<li>Imports fine on DSpace running on Mac OS X</li>
<li>Fails to import on DSpace running on Linux with error <code>No such file or directory</code></li>
</ul></li>
<li>Change diacritic in file name from á to a and re-create SAF bundle and zip

<ul>
<li>Success on both Mac OS X and Linux&hellip;</li>
</ul></li>
<li>Looks like on the Mac OS X file system the file names represent á as: a (U+0061) +  ́ (U+0301)</li>
<li>See: <a href="http://www.fileformat.info/info/unicode/char/e1/index.htm">http://www.fileformat.info/info/unicode/char/e1/index.htm</a></li>
<li>See: <a href="http://demo.icu-project.org/icu-bin/nbrowser?t=%C3%A1&amp;s=&amp;uv=0">http://demo.icu-project.org/icu-bin/nbrowser?t=%C3%A1&amp;s=&amp;uv=0</a></li>
<li>If I unzip the original zip from CIAT on Windows, re-zip it with 7zip on Windows, and then unzip it on Linux directly, the file names seem to be proper UTF-8</li>
<li>We should definitely clean filenames so they don&rsquo;t use characters that are tricky to process in CSV and shell scripts, like: <code>,</code>, <code>'</code>, and <code>&quot;</code></li>
</ul>

<pre><code>value.replace(&quot;'&quot;,&quot;&quot;).replace(&quot;,&quot;,&quot;&quot;).replace('&quot;','')
</code></pre>

<ul>
<li>I need to write a Python script to match that for renaming files in the file system</li>
<li>When importing SAF bundles it seems you can specify the target collection on the command line using <code>-c 10568/4003</code> or in the <code>collections</code> file inside each item in the bundle</li>
<li>Seems that the latter method causes a null pointer exception, so I will just have to use the former method</li>
<li>In the end I was able to import the files after unzipping them ONLY on Linux

<ul>
<li>The CSV file was giving file names in UTF-8, and unzipping the zip on Mac OS X and transferring it was converting the file names to Unicode equivalence like I saw above</li>
</ul></li>
<li>Import CIAT Gender Network records to CGSpace, first creating the SAF bundles as my user, then importing as the <code>tomcat7</code> user, and deleting the bundle, for each collection&rsquo;s items:</li>
</ul>

<pre><code>$ ./safbuilder.sh -c /home/aorth/ciat-gender-2016-09-06/66601.csv
$ JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx512m&quot; /home/cgspace.cgiar.org/bin/dspace import -a -e aorth@mjanja.ch -c 10568/66601 -s /home/aorth/ciat-gender-2016-09-06/SimpleArchiveFormat -m 66601.map
$ rm -rf ~/ciat-gender-2016-09-06/SimpleArchiveFormat/
</code></pre>

<h2 id="2016-09-07">2016-09-07</h2>

<ul>
<li>Erase and rebuild DSpace Test based on latest Ubuntu 16.04, PostgreSQL 9.5, and Java 8 stuff</li>
<li>Reading about PostgreSQL maintenance and it seems manual vacuuming is only for certain workloads, such as heavy update/write loads</li>
<li>I suggest we disable our nightly manual vacuum task, as we&rsquo;re a mostly read workload, and I&rsquo;d rather stick as close to the documentation as possible since we haven&rsquo;t done any testing/observation of PostgreSQL</li>
<li>See: <a href="https://www.postgresql.org/docs/9.3/static/routine-vacuuming.html">https://www.postgresql.org/docs/9.3/static/routine-vacuuming.html</a></li>
</ul>

  </section>
  <footer>
    
    <section class="author-info row">
      <div class="author-avatar col-md-2">
        
      </div>
      <div class="author-meta col-md-6">
        
        <h1 class="author-name text-primary">Alan Orth</h1>
        
        
      </div>
      
    </section>
    <ul class="pager">
      
      <li class="previous"><a href="https://alanorth.github.io/cgspace-notes/2016-08/"><span aria-hidden="true">&larr;</span> Older</a></li>
      
      
      <li class="next disabled"><a href="#">Newer <span aria-hidden="true">&rarr;</span></a></li>
      
    </ul>
  </footer>
</article>

  </main>
  <footer class="container global-footer">
    <div class="copyright-note pull-left">
      
    </div>
    <div class="sns-links hidden-print">
  
  
</div>

  </footer>

  <script src="https://alanorth.github.io/cgspace-notes/js/highlight.pack.js"></script>
  <script>
    hljs.initHighlightingOnLoad();
  </script>
  
  
</body>
</html>
Add notes for 2016-09-01 2016-09-02 11:44:44 +03:00			`<!DOCTYPE html>`
			`<html lang="en-us">`
			`<head prefix="og: http://ogp.me/ns#">`
			`<meta charset="utf-8" />`
			`<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1" />`
			`<meta property="og:title" content=" September, 2016 · CGSpace Notes" />`

			`<meta property="og:site_name" content="CGSpace Notes" />`
			`<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2016-09/" />`


			`<meta property="og:type" content="article" />`

			`<meta property="og:article:published_time" content="2016-09-01T15:53:00+03:00" />`

			`<meta property="og:article:tag" content="notes" />`



			`<title>`
			`September, 2016 · CGSpace Notes`
			`</title>`

			`<link rel="stylesheet" href="https://alanorth.github.io/cgspace-notes/css/bootstrap.min.css" />`
			`<link rel="stylesheet" href="https://alanorth.github.io/cgspace-notes/css/main.css" />`
			`<link rel="stylesheet" href="https://alanorth.github.io/cgspace-notes/css/font-awesome.min.css" />`
			`<link rel="stylesheet" href="https://alanorth.github.io/cgspace-notes/css/github.css" />`
			`<link rel="stylesheet" href="//fonts.googleapis.com/css?family=Source+Sans+Pro:200,300,400" type="text/css">`
			`<link rel="shortcut icon" href="https://alanorth.github.io/cgspace-notes/images/favicon.ico" />`
			`<link rel="apple-touch-icon" href="https://alanorth.github.io/cgspace-notes/images/apple-touch-icon.png" />`

			`</head>`
			`<body>`
			`<header class="global-header" style="background-image:url(../images/bg.jpg )">`
			`<section class="header-text">`
			`<h1><a href="https://alanorth.github.io/cgspace-notes/">CGSpace Notes</a></h1>`

			`<div class="sns-links hidden-print">`









			`</div>`


			`<a href="https://alanorth.github.io/cgspace-notes/" class="btn-header btn-back hidden-xs">`
			`<i class="fa fa-angle-left" aria-hidden="true"></i>`
			` Home`
			`</a>`


			`</section>`
			`</header>`
			`<main class="container">`


			`<article>`
			`<header>`
			`<h1 class="text-primary">September, 2016</h1>`
			`<div class="post-meta clearfix">`
			`<div class="post-date pull-left">`
			`Posted on`
			`<time datetime="2016-09-01T15:53:00+03:00">`
			`Sep 1, 2016`
			`</time>`
			`</div>`
			`<div class="pull-right">`

			`<span class="post-tag small"><a href="https://alanorth.github.io/cgspace-notes//tags/notes">#notes</a></span>`

			`</div>`
			`</div>`
			`</header>`
			`<section>`


			`<h2 id="2016-09-01">2016-09-01</h2>`

			`<ul>`
			`<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li>`
			`<li>Discuss how the migration of CGIAR’s Active Directory to a flat structure will break our LDAP groups in DSpace</li>`
			`<li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li>`
			`<li>It looks like we might be able to use OUs now, instead of DCs:</li>`
			`</ul>`

			`<pre><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)"`
			`</code></pre>`

			`<ul>`
			`<li>User who has been migrated to the root vs user still in the hierarchical structure:</li>`
			`</ul>`

			`<pre><code>distinguishedName: CN=Last\, First (ILRI),OU=ILRI Kenya Employees,OU=ILRI Kenya,OU=ILRIHUB,DC=CGIARAD,DC=ORG`
			`distinguishedName: CN=Last\, First (ILRI),OU=ILRI Ethiopia Employees,OU=ILRI Ethiopia,DC=ILRI,DC=CGIARAD,DC=ORG`
			`</code></pre>`

Add notes for 2016-09-02 2016-09-02 17:22:11 +03:00			`<ul>`
			`<li>Changing the DSpace LDAP config to use <code>OU=ILRIHUB</code> seems to work:</li>`
			`</ul>`

			`<p><img src="../images/2016/09/ilri-ldap-users.png" alt="DSpace groups based on LDAP DN" /></p>`

Add notes for 2016-09-01 2016-09-02 11:44:44 +03:00			`<ul>`
			`<li>Notes for local PostgreSQL database recreation from production snapshot:</li>`
			`</ul>`

			`<pre><code>$ dropdb dspacetest`
			`$ createdb -O dspacetest --encoding=UNICODE dspacetest`
			`$ psql dspacetest -c 'alter user dspacetest createuser;'`
			`$ pg_restore -O -U dspacetest -d dspacetest ~/Downloads/cgspace_2016-09-01.backup`
			`$ psql dspacetest -c 'alter user dspacetest nocreateuser;'`
			`$ psql -U dspacetest -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest -h localhost`
			`$ vacuumdb dspacetest`
			`</code></pre>`

Add notes for 2016-09-02 2016-09-02 17:22:11 +03:00			`<ul>`
			`<li>Some names that I thought I fixed in July seem not to be:</li>`
			`</ul>`

			`<pre><code>dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Poole, %';`
			`text_value \| authority \| confidence`
			`-----------------------+--------------------------------------+------------`
			`Poole, Elizabeth Jane \| b6efa27f-8829-4b92-80fe-bc63e03e3ccb \| 600`
			`Poole, Elizabeth Jane \| 41628f42-fc38-4b38-b473-93aec9196326 \| 600`
			`Poole, Elizabeth Jane \| 83b82da0-f652-4ebc-babc-591af1697919 \| 600`
			`Poole, Elizabeth Jane \| c3a22456-8d6a-41f9-bba0-de51ef564d45 \| 600`
			`Poole, E.J. \| c3a22456-8d6a-41f9-bba0-de51ef564d45 \| 600`
			`Poole, E.J. \| 0fbd91b9-1b71-4504-8828-e26885bf8b84 \| 600`
			`(6 rows)`
			`</code></pre>`

			`<ul>`
			`<li>At least a few of these actually have the correct ORCID, but I will unify the authority to be c3a22456-8d6a-41f9-bba0-de51ef564d45</li>`
			`</ul>`

			`<pre><code>dspacetest=# update metadatavalue set authority='c3a22456-8d6a-41f9-bba0-de51ef564d45', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Poole, %';`
			`UPDATE 69`
			`</code></pre>`

			`<ul>`
			`<li>And for Peter Ballantyne:</li>`
			`</ul>`

			`<pre><code>dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Ballantyne, %';`
			`text_value \| authority \| confidence`
			`-------------------+--------------------------------------+------------`
			`Ballantyne, Peter \| 2dcbcc7b-47b0-4fd7-bef9-39d554494081 \| 600`
			`Ballantyne, Peter \| 4f04ca06-9a76-4206-bd9c-917ca75d278e \| 600`
			`Ballantyne, P.G. \| 4f04ca06-9a76-4206-bd9c-917ca75d278e \| 600`
			`Ballantyne, Peter \| ba5f205b-b78b-43e5-8e80-0c9a1e1ad2ca \| 600`
			`Ballantyne, Peter \| 20f21160-414c-4ecf-89ca-5f2cb64e75c1 \| 600`
			`(5 rows)`
			`</code></pre>`

			`<ul>`
			`<li>Again, a few have the correct ORCID, but there should only be one authority…</li>`
			`</ul>`

			`<pre><code>dspacetest=# update metadatavalue set authority='4f04ca06-9a76-4206-bd9c-917ca75d278e', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Ballantyne, %';`
			`UPDATE 58`
			`</code></pre>`

			`<ul>`
			`<li>And for me:</li>`
			`</ul>`

			`<pre><code>dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Orth, A%';`
			`text_value \| authority \| confidence`
			`------------+--------------------------------------+------------`
			`Orth, Alan \| 4884def0-4d7e-4256-9dd4-018cd60a5871 \| 600`
			`Orth, A. \| 4884def0-4d7e-4256-9dd4-018cd60a5871 \| 600`
			`Orth, A. \| 1a1943a0-3f87-402f-9afe-e52fb46a513e \| 600`
			`(3 rows)`
			`dspacetest=# update metadatavalue set authority='1a1943a0-3f87-402f-9afe-e52fb46a513e', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Orth, %';`
			`UPDATE 11`
			`</code></pre>`

			`<ul>`
			`<li>And for CCAFS author Bruce Campbell that I had discussed with CCAFS earlier this week:</li>`
			`</ul>`

			`<pre><code>dspacetest=# update metadatavalue set authority='0e414b4c-4671-4a23-b570-6077aca647d8', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Campbell, B%';`
			`UPDATE 166`
			`dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Campbell, B%';`
			`text_value \| authority \| confidence`
			`------------------------+--------------------------------------+------------`
			`Campbell, Bruce \| 0e414b4c-4671-4a23-b570-6077aca647d8 \| 600`
			`Campbell, Bruce Morgan \| 0e414b4c-4671-4a23-b570-6077aca647d8 \| 600`
			`Campbell, B. \| 0e414b4c-4671-4a23-b570-6077aca647d8 \| 600`
			`Campbell, B.M. \| 0e414b4c-4671-4a23-b570-6077aca647d8 \| 600`
			`(4 rows)`
			`</code></pre>`

			`<ul>`
			`<li>After updating the Authority indexes (<code>bin/dspace index-authority</code>) everything looks good</li>`
Update notes for 2016-09-02 2016-09-02 23:36:26 +03:00			`<li>Run authority updates on CGSpace</li>`
Add notes for 2016-09-06 2016-09-06 15:17:40 +03:00			`</ul>`

			`<h2 id="2016-09-05">2016-09-05</h2>`

			`<ul>`
			`<li>After one week of logging TLS connections on CGSpace:</li>`
			`</ul>`

			`<pre><code># zgrep "DES-CBC3" /var/log/nginx/cgspace.cgiar.org-access-ssl.log* \| wc -l`
			`217`
			`# zcat -f -- /var/log/nginx/cgspace.cgiar.org-access-ssl.log* \| wc -l`
			`1164376`
			`# zgrep "DES-CBC3" /var/log/nginx/cgspace.cgiar.org-access-ssl.log* \| awk '{print $6}' \| sort \| uniq`
			`TLSv1/DES-CBC3-SHA`
			`TLSv1/EDH-RSA-DES-CBC3-SHA`
			`</code></pre>`

			`<ul>`
			`<li>So this represents <code>0.02%</code> of 1.16M connections over a one-week period</li>`
			`<li>Transforming some filenames in OpenRefine so they can have a useful description for SAFBuilder:</li>`
			`</ul>`

			`<pre><code>value + "__description:" + cells["dc.type"].value`
			`</code></pre>`

			`<ul>`
			`<li>This gives you, for example: <code>Mainstreaming gender in agricultural R&D.pdf__description:Brief</code></li>`
			`</ul>`

			`<h2 id="2016-09-06">2016-09-06</h2>`

			`<ul>`
			`<li>Trying to import the records for CIAT from yesterday, but having filename encoding issues from their zip file</li>`
			`<li>Create a zip on Mac OS X from a SAF bundle containing only one record with one PDF:`

			`<ul>`
			`<li>Filename: Complementing Farmers Genetic Knowledge Farmer Breeding Workshop in Turipaná, Colombia.pdf</li>`
			`<li>Imports fine on DSpace running on Mac OS X</li>`
			`<li>Fails to import on DSpace running on Linux with error <code>No such file or directory</code></li>`
			`</ul></li>`
			`<li>Change diacritic in file name from á to a and re-create SAF bundle and zip`

			`<ul>`
			`<li>Success on both Mac OS X and Linux…</li>`
			`</ul></li>`
			`<li>Looks like on the Mac OS X file system the file names represent á as: a (U+0061) + ́ (U+0301)</li>`
			`<li>See: <a href="http://www.fileformat.info/info/unicode/char/e1/index.htm">http://www.fileformat.info/info/unicode/char/e1/index.htm</a></li>`
			`<li>See: <a href="http://demo.icu-project.org/icu-bin/nbrowser?t=%C3%A1&s=&uv=0">http://demo.icu-project.org/icu-bin/nbrowser?t=%C3%A1&s=&uv=0</a></li>`
			`<li>If I unzip the original zip from CIAT on Windows, re-zip it with 7zip on Windows, and then unzip it on Linux directly, the file names seem to be proper UTF-8</li>`
			`<li>We should definitely clean filenames so they don’t use characters that are tricky to process in CSV and shell scripts, like: <code>,</code>, <code>'</code>, and <code>"</code></li>`
			`</ul>`

			`<pre><code>value.replace("'","").replace(",","").replace('"','')`
			`</code></pre>`

			`<ul>`
			`<li>I need to write a Python script to match that for renaming files in the file system</li>`
			`<li>When importing SAF bundles it seems you can specify the target collection on the command line using <code>-c 10568/4003</code> or in the <code>collections</code> file inside each item in the bundle</li>`
			`<li>Seems that the latter method causes a null pointer exception, so I will just have to use the former method</li>`
			`<li>In the end I was able to import the files after unzipping them ONLY on Linux`

			`<ul>`
			`<li>The CSV file was giving file names in UTF-8, and unzipping the zip on Mac OS X and transferring it was converting the file names to Unicode equivalence like I saw above</li>`
			`</ul></li>`
Update notes for 2016-09-06 2016-09-06 17:15:25 +03:00			`<li>Import CIAT Gender Network records to CGSpace, first creating the SAF bundles as my user, then importing as the <code>tomcat7</code> user, and deleting the bundle, for each collection’s items:</li>`
Add notes for 2016-09-02 2016-09-02 17:22:11 +03:00			`</ul>`

Update notes for 2016-09-06 2016-09-06 17:15:25 +03:00			`<pre><code>$ ./safbuilder.sh -c /home/aorth/ciat-gender-2016-09-06/66601.csv`
			`$ JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx512m" /home/cgspace.cgiar.org/bin/dspace import -a -e aorth@mjanja.ch -c 10568/66601 -s /home/aorth/ciat-gender-2016-09-06/SimpleArchiveFormat -m 66601.map`
			`$ rm -rf ~/ciat-gender-2016-09-06/SimpleArchiveFormat/`
			`</code></pre>`

Add notes for 2016-09-07 2016-09-07 14:26:16 +03:00			`<h2 id="2016-09-07">2016-09-07</h2>`

			`<ul>`
			`<li>Erase and rebuild DSpace Test based on latest Ubuntu 16.04, PostgreSQL 9.5, and Java 8 stuff</li>`
			`<li>Reading about PostgreSQL maintenance and it seems manual vacuuming is only for certain workloads, such as heavy update/write loads</li>`
			`<li>I suggest we disable our nightly manual vacuum task, as we’re a mostly read workload, and I’d rather stick as close to the documentation as possible since we haven’t done any testing/observation of PostgreSQL</li>`
			`<li>See: <a href="https://www.postgresql.org/docs/9.3/static/routine-vacuuming.html">https://www.postgresql.org/docs/9.3/static/routine-vacuuming.html</a></li>`
			`</ul>`

Add notes for 2016-09-01 2016-09-02 11:44:44 +03:00			`</section>`
			`<footer>`

			`<section class="author-info row">`
			`<div class="author-avatar col-md-2">`

			`</div>`
			`<div class="author-meta col-md-6">`

			`<h1 class="author-name text-primary">Alan Orth</h1>`


			`</div>`

			`</section>`
			`<ul class="pager">`

			`<li class="previous"><a href="https://alanorth.github.io/cgspace-notes/2016-08/"><span aria-hidden="true">←</span> Older</a></li>`


			`<li class="next disabled"><a href="#">Newer <span aria-hidden="true">→</span></a></li>`

			`</ul>`
			`</footer>`
			`</article>`

			`</main>`
			`<footer class="container global-footer">`
			`<div class="copyright-note pull-left">`

			`</div>`
			`<div class="sns-links hidden-print">`









			`</div>`

			`</footer>`

			`<script src="https://alanorth.github.io/cgspace-notes/js/highlight.pack.js"></script>`
			`<script>`
			`hljs.initHighlightingOnLoad();`
			`</script>`


			`</body>`
			`</html>`