mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2018-05-15
This commit is contained in:
@ -162,7 +162,7 @@ $ lein run /tmp/crps.csv id
|
||||
|
||||
- It turns out there was a space in my "country" header that was causing reconcile-csv to crash
|
||||
- After removing that it works fine!
|
||||
- Looking at Sisay's 2,000 CIFOR records on DSpace Test ([10568/92904](https://dspacetest.cgiar.org/handle/10568/92904))
|
||||
- Looking at Sisay's 2,640 CIFOR records on DSpace Test ([10568/92904](https://dspacetest.cgiar.org/handle/10568/92904))
|
||||
- Trimmed all leading / trailing white space and condensed multiple spaces into one
|
||||
- Corrected DOIs to use HTTPS and "doi.org" instead of "dx.doi.org"
|
||||
- There are eight items in `cg.identifier.doi` that are not DOIs)
|
||||
@ -171,3 +171,32 @@ $ lein run /tmp/crps.csv id
|
||||
- Corrected affiliations to not use acronyms
|
||||
- Reconcile countries against our countries list (removing terms like LATIN AMERICA, CENTRAL AFRICA, etc that are not countries)
|
||||
- Reconcile regions against our list of regions
|
||||
|
||||
## 2018-05-14
|
||||
|
||||
- Send a message to the OpenRefine mailing list about the bug with reconciling multi-value cells
|
||||
|
||||
## 2018-05-15
|
||||
|
||||
- Turns out I was doing the OpenRefine reconciliation wrong: I needed to copy the matched values to a new column!
|
||||
- Also, I learned how to do something cool with Jython expressions in OpenRefine
|
||||
- This will fetch a URL and return its HTTP response code:
|
||||
|
||||
```
|
||||
import urllib2
|
||||
import re
|
||||
|
||||
pattern = re.compile('.*10.1016.*')
|
||||
if pattern.match(value):
|
||||
get = urllib2.urlopen(value)
|
||||
return get.getcode()
|
||||
|
||||
return "blank"
|
||||
```
|
||||
|
||||
- I used a regex to limit it to just some of the DOIs in this case because there were thousands of URLs
|
||||
- Here the response code would be 200, 404, etc, or "blank" if there is no URL for that item
|
||||
- You could use this in a facet or in a new column
|
||||
- More information and good examples here: https://programminghistorian.org/lessons/fetch-and-parse-data-with-openrefine
|
||||
- Finish looking at the 2,640 CIFOR records on DSpace Test ([10568/92904](https://dspacetest.cgiar.org/handle/10568/92904)), cleaning up authors and adding collection mappings
|
||||
- They can now be moved to CGSpace as far as I'm concerned, but I don't know if Sisay will do it or me
|
||||
|
Reference in New Issue
Block a user