Add notes for 2018-05-15

This commit is contained in:
2018-05-15 13:25:03 +03:00
parent 700f15e01b
commit 837d07d3a7
3 changed files with 73 additions and 10 deletions

View File

@ -27,7 +27,7 @@ Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked
<meta property="article:published_time" content="2018-05-01T16:43:54&#43;03:00"/>
<meta property="article:modified_time" content="2018-05-10T14:41:37&#43;03:00"/>
<meta property="article:modified_time" content="2018-05-13T18:30:25&#43;03:00"/>
@ -65,9 +65,9 @@ Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked
"@type": "BlogPosting",
"headline": "May, 2018",
"url": "https://alanorth.github.io/cgspace-notes/2018-05/",
"wordCount": "1263",
"wordCount": "1441",
"datePublished": "2018-05-01T16:43:54&#43;03:00",
"dateModified": "2018-05-10T14:41:37&#43;03:00",
"dateModified": "2018-05-13T18:30:25&#43;03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -322,7 +322,7 @@ Livestock and Fish
<ul>
<li>It turns out there was a space in my &ldquo;country&rdquo; header that was causing reconcile-csv to crash</li>
<li>After removing that it works fine!</li>
<li>Looking at Sisay&rsquo;s 2,000 CIFOR records on DSpace Test (<a href="https://dspacetest.cgiar.org/handle/10568/92904"><sup>10568</sup>&frasl;<sub>92904</sub></a>)
<li>Looking at Sisay&rsquo;s 2,640 CIFOR records on DSpace Test (<a href="https://dspacetest.cgiar.org/handle/10568/92904"><sup>10568</sup>&frasl;<sub>92904</sub></a>)
<ul>
<li>Trimmed all leading / trailing white space and condensed multiple spaces into one</li>
@ -336,6 +336,40 @@ Livestock and Fish
</ul></li>
</ul>
<h2 id="2018-05-14">2018-05-14</h2>
<ul>
<li>Send a message to the OpenRefine mailing list about the bug with reconciling multi-value cells</li>
</ul>
<h2 id="2018-05-15">2018-05-15</h2>
<ul>
<li>Turns out I was doing the OpenRefine reconciliation wrong: I needed to copy the matched values to a new column!</li>
<li>Also, I learned how to do something cool with Jython expressions in OpenRefine</li>
<li>This will fetch a URL and return its HTTP response code:</li>
</ul>
<pre><code>import urllib2
import re
pattern = re.compile('.*10.1016.*')
if pattern.match(value):
get = urllib2.urlopen(value)
return get.getcode()
return &quot;blank&quot;
</code></pre>
<ul>
<li>I used a regex to limit it to just some of the DOIs in this case because there were thousands of URLs</li>
<li>Here the response code would be 200, 404, etc, or &ldquo;blank&rdquo; if there is no URL for that item</li>
<li>You could use this in a facet or in a new column</li>
<li>More information and good examples here: <a href="https://programminghistorian.org/lessons/fetch-and-parse-data-with-openrefine">https://programminghistorian.org/lessons/fetch-and-parse-data-with-openrefine</a></li>
<li>Finish looking at the 2,640 CIFOR records on DSpace Test (<a href="https://dspacetest.cgiar.org/handle/10568/92904"><sup>10568</sup>&frasl;<sub>92904</sub></a>), cleaning up authors and adding collection mappings</li>
<li>They can now be moved to CGSpace as far as I&rsquo;m concerned, but I don&rsquo;t know if Sisay will do it or me</li>
</ul>