mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-22 22:55:04 +01:00
Add notes for 2018-05-15
This commit is contained in:
parent
700f15e01b
commit
837d07d3a7
@ -162,7 +162,7 @@ $ lein run /tmp/crps.csv id
|
|||||||
|
|
||||||
- It turns out there was a space in my "country" header that was causing reconcile-csv to crash
|
- It turns out there was a space in my "country" header that was causing reconcile-csv to crash
|
||||||
- After removing that it works fine!
|
- After removing that it works fine!
|
||||||
- Looking at Sisay's 2,000 CIFOR records on DSpace Test ([10568/92904](https://dspacetest.cgiar.org/handle/10568/92904))
|
- Looking at Sisay's 2,640 CIFOR records on DSpace Test ([10568/92904](https://dspacetest.cgiar.org/handle/10568/92904))
|
||||||
- Trimmed all leading / trailing white space and condensed multiple spaces into one
|
- Trimmed all leading / trailing white space and condensed multiple spaces into one
|
||||||
- Corrected DOIs to use HTTPS and "doi.org" instead of "dx.doi.org"
|
- Corrected DOIs to use HTTPS and "doi.org" instead of "dx.doi.org"
|
||||||
- There are eight items in `cg.identifier.doi` that are not DOIs)
|
- There are eight items in `cg.identifier.doi` that are not DOIs)
|
||||||
@ -171,3 +171,32 @@ $ lein run /tmp/crps.csv id
|
|||||||
- Corrected affiliations to not use acronyms
|
- Corrected affiliations to not use acronyms
|
||||||
- Reconcile countries against our countries list (removing terms like LATIN AMERICA, CENTRAL AFRICA, etc that are not countries)
|
- Reconcile countries against our countries list (removing terms like LATIN AMERICA, CENTRAL AFRICA, etc that are not countries)
|
||||||
- Reconcile regions against our list of regions
|
- Reconcile regions against our list of regions
|
||||||
|
|
||||||
|
## 2018-05-14
|
||||||
|
|
||||||
|
- Send a message to the OpenRefine mailing list about the bug with reconciling multi-value cells
|
||||||
|
|
||||||
|
## 2018-05-15
|
||||||
|
|
||||||
|
- Turns out I was doing the OpenRefine reconciliation wrong: I needed to copy the matched values to a new column!
|
||||||
|
- Also, I learned how to do something cool with Jython expressions in OpenRefine
|
||||||
|
- This will fetch a URL and return its HTTP response code:
|
||||||
|
|
||||||
|
```
|
||||||
|
import urllib2
|
||||||
|
import re
|
||||||
|
|
||||||
|
pattern = re.compile('.*10.1016.*')
|
||||||
|
if pattern.match(value):
|
||||||
|
get = urllib2.urlopen(value)
|
||||||
|
return get.getcode()
|
||||||
|
|
||||||
|
return "blank"
|
||||||
|
```
|
||||||
|
|
||||||
|
- I used a regex to limit it to just some of the DOIs in this case because there were thousands of URLs
|
||||||
|
- Here the response code would be 200, 404, etc, or "blank" if there is no URL for that item
|
||||||
|
- You could use this in a facet or in a new column
|
||||||
|
- More information and good examples here: https://programminghistorian.org/lessons/fetch-and-parse-data-with-openrefine
|
||||||
|
- Finish looking at the 2,640 CIFOR records on DSpace Test ([10568/92904](https://dspacetest.cgiar.org/handle/10568/92904)), cleaning up authors and adding collection mappings
|
||||||
|
- They can now be moved to CGSpace as far as I'm concerned, but I don't know if Sisay will do it or me
|
||||||
|
@ -27,7 +27,7 @@ Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked
|
|||||||
|
|
||||||
<meta property="article:published_time" content="2018-05-01T16:43:54+03:00"/>
|
<meta property="article:published_time" content="2018-05-01T16:43:54+03:00"/>
|
||||||
|
|
||||||
<meta property="article:modified_time" content="2018-05-10T14:41:37+03:00"/>
|
<meta property="article:modified_time" content="2018-05-13T18:30:25+03:00"/>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@ -65,9 +65,9 @@ Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked
|
|||||||
"@type": "BlogPosting",
|
"@type": "BlogPosting",
|
||||||
"headline": "May, 2018",
|
"headline": "May, 2018",
|
||||||
"url": "https://alanorth.github.io/cgspace-notes/2018-05/",
|
"url": "https://alanorth.github.io/cgspace-notes/2018-05/",
|
||||||
"wordCount": "1263",
|
"wordCount": "1441",
|
||||||
"datePublished": "2018-05-01T16:43:54+03:00",
|
"datePublished": "2018-05-01T16:43:54+03:00",
|
||||||
"dateModified": "2018-05-10T14:41:37+03:00",
|
"dateModified": "2018-05-13T18:30:25+03:00",
|
||||||
"author": {
|
"author": {
|
||||||
"@type": "Person",
|
"@type": "Person",
|
||||||
"name": "Alan Orth"
|
"name": "Alan Orth"
|
||||||
@ -322,7 +322,7 @@ Livestock and Fish
|
|||||||
<ul>
|
<ul>
|
||||||
<li>It turns out there was a space in my “country” header that was causing reconcile-csv to crash</li>
|
<li>It turns out there was a space in my “country” header that was causing reconcile-csv to crash</li>
|
||||||
<li>After removing that it works fine!</li>
|
<li>After removing that it works fine!</li>
|
||||||
<li>Looking at Sisay’s 2,000 CIFOR records on DSpace Test (<a href="https://dspacetest.cgiar.org/handle/10568/92904"><sup>10568</sup>⁄<sub>92904</sub></a>)
|
<li>Looking at Sisay’s 2,640 CIFOR records on DSpace Test (<a href="https://dspacetest.cgiar.org/handle/10568/92904"><sup>10568</sup>⁄<sub>92904</sub></a>)
|
||||||
|
|
||||||
<ul>
|
<ul>
|
||||||
<li>Trimmed all leading / trailing white space and condensed multiple spaces into one</li>
|
<li>Trimmed all leading / trailing white space and condensed multiple spaces into one</li>
|
||||||
@ -336,6 +336,40 @@ Livestock and Fish
|
|||||||
</ul></li>
|
</ul></li>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
|
<h2 id="2018-05-14">2018-05-14</h2>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>Send a message to the OpenRefine mailing list about the bug with reconciling multi-value cells</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<h2 id="2018-05-15">2018-05-15</h2>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>Turns out I was doing the OpenRefine reconciliation wrong: I needed to copy the matched values to a new column!</li>
|
||||||
|
<li>Also, I learned how to do something cool with Jython expressions in OpenRefine</li>
|
||||||
|
<li>This will fetch a URL and return its HTTP response code:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>import urllib2
|
||||||
|
import re
|
||||||
|
|
||||||
|
pattern = re.compile('.*10.1016.*')
|
||||||
|
if pattern.match(value):
|
||||||
|
get = urllib2.urlopen(value)
|
||||||
|
return get.getcode()
|
||||||
|
|
||||||
|
return "blank"
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>I used a regex to limit it to just some of the DOIs in this case because there were thousands of URLs</li>
|
||||||
|
<li>Here the response code would be 200, 404, etc, or “blank” if there is no URL for that item</li>
|
||||||
|
<li>You could use this in a facet or in a new column</li>
|
||||||
|
<li>More information and good examples here: <a href="https://programminghistorian.org/lessons/fetch-and-parse-data-with-openrefine">https://programminghistorian.org/lessons/fetch-and-parse-data-with-openrefine</a></li>
|
||||||
|
<li>Finish looking at the 2,640 CIFOR records on DSpace Test (<a href="https://dspacetest.cgiar.org/handle/10568/92904"><sup>10568</sup>⁄<sub>92904</sub></a>), cleaning up authors and adding collection mappings</li>
|
||||||
|
<li>They can now be moved to CGSpace as far as I’m concerned, but I don’t know if Sisay will do it or me</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@ -4,7 +4,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/2018-05/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/2018-05/</loc>
|
||||||
<lastmod>2018-05-10T14:41:37+03:00</lastmod>
|
<lastmod>2018-05-13T18:30:25+03:00</lastmod>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
@ -164,7 +164,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||||
<lastmod>2018-05-10T14:41:37+03:00</lastmod>
|
<lastmod>2018-05-13T18:30:25+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
@ -175,7 +175,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||||
<lastmod>2018-05-10T14:41:37+03:00</lastmod>
|
<lastmod>2018-05-13T18:30:25+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
@ -187,13 +187,13 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||||
<lastmod>2018-05-10T14:41:37+03:00</lastmod>
|
<lastmod>2018-05-13T18:30:25+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||||
<lastmod>2018-05-10T14:41:37+03:00</lastmod>
|
<lastmod>2018-05-13T18:30:25+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user