diff --git a/content/posts/2019-08.md b/content/posts/2019-08.md index 6522c3f4c..8dfa573e7 100644 --- a/content/posts/2019-08.md +++ b/content/posts/2019-08.md @@ -358,12 +358,5 @@ sys 2m27.496s - After reading the code I see that XSLT is reading the community titles from the DIM representation (stored in the `$dim` variable) created from METS - I modified the patterns in my sed script so that those lines are not replaced and then the community list works again - This is actually not a problem at all because this metadata is only used in the HTML meta tags in XMLUI community lists and has nothing to do with item metadata -- Get a list of institutions from CCAFS's Clarisa API and try to parse it with `jq` and pass it through `csvcut` to add line numbers: - -``` -$ cat ~/Downloads/institutions.json| jq '.[] | {name: .name}' | grep name | awk -F: '{print $2}' | sed 's/"//g' | csvcut -l > /tmp/investors.csv -``` - -- I could potentially use this with reconcile-csv and OpenRefine as a source to validate our institutional authors against... diff --git a/content/posts/2019-09.md b/content/posts/2019-09.md index f85ccf356..05c5cbe6e 100644 --- a/content/posts/2019-09.md +++ b/content/posts/2019-09.md @@ -319,5 +319,37 @@ $ dspace import -a me@cgiar.org -m 2019-09-20-bioversity2.map -s /home/aorth/Bio - Give more feedback to Bosede about the [IITA Sept 6 (20196th.xls) records on DSpace Test](https://dspacetest.cgiar.org/handle/10568/105116) - I told her to delete one item that appears to be a duplicate, or to fix its citation to be correct if she thinks it is not a duplicate - I deleted another item that I had previously identified as a duplicate that she had fixed by incorrectly deleting the original (ugh) +- Get a list of institutions from CCAFS's Clarisa API and try to parse it with `jq`, do some small cleanups and add a header in `sed`, and then pass it through `csvcut` to add line numbers: + +``` +$ cat ~/Downloads/institutions.json| jq '.[] | {name: .name}' | grep name | awk -F: '{print $2}' | sed -e 's/"//g' -e 's/^ //' -e '1iname' | csvcut -l | sed '1s/line_number/id/' > /tmp/clarisa-institutions.csv +$ csv-metadata-quality -i /tmp/clarisa-institutions.csv -o /tmp/clarisa-institutions-cleaned.csv -u +``` + +- The csv-metadata-quality tool caught a few records with excessive spacing and unnecessary Unicode +- I could potentially use this with reconcile-csv and OpenRefine as a source to validate our institutional authors against... + +## 2019-09-27 + +- Skype with Peter and Abenet about CGSpace actions + - Peter will respond to ICARDA's request to deposit items in to CGSpace, with a caveat that we agree on some vocabulary standards for institutions, countries, regions, etc + - We discussed using ISO 3166 for countries, though Peter doesn't like the formal names like "Moldova, Republic of" and "Tanzania, United Republic of" + - The Debian `iso-codes` package has ISO 3166-1 with "common name", "name", and "official name" representations, for example: + - common_name: Tanzania + - name: Tanzania, United Republic of + - official_name: United Republic of Tanzania + - There are still some unfortunate ones there, though: + - name: Korea, Democratic People's Republic of + - official_name: Democratic People's Republic of Korea + - And this, which isn't even in English... + - name: Côte d'Ivoire + - official_name: Republic of Côte d'Ivoire + - The other alternative is to just keep using the names we have, which are mostly compliant with AGROVOC + - Peter said that a new server for DSpace Test is fine, so I can proceed with the normal process of getting approval from Michael Victor and ICT when I have time (recommend moving from $40 to $80/month Linode, with 16GB RAM) + - I need to ask Atmire for a quote to upgrade CGSpace to DSpace 6 with all current modules so we can see how many more credits we need +- A little bit more work on the Sept 6 IITA batch records + - Bosede deleted the one item that I told her was a duplicate + - I checked the AGROVOC subjects and fixed one incorrect one + - Then I told her that I think the items are ready to go to CGSpace and asked Abenet for a final comment diff --git a/docs/2019-08/index.html b/docs/2019-08/index.html index c1bc10d59..65cf32e65 100644 --- a/docs/2019-08/index.html +++ b/docs/2019-08/index.html @@ -27,7 +27,7 @@ Run system updates on DSpace Test (linode19) and reboot it - + @@ -59,9 +59,9 @@ Run system updates on DSpace Test (linode19) and reboot it "@type": "BlogPosting", "headline": "August, 2019", "url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-08\/", - "wordCount": "2770", + "wordCount": "2703", "datePublished": "2019-08-03T12:39:51\x2b03:00", - "dateModified": "2019-09-01T01:54:55\x2b03:00", + "dateModified": "2019-09-27T01:20:09\x2b03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -603,13 +603,6 @@ sys 2m27.496s
Get a list of institutions from CCAFS’s Clarisa API and try to parse it with jq
and pass it through csvcut
to add line numbers:
$ cat ~/Downloads/institutions.json| jq '.[] | {name: .name}' | grep name | awk -F: '{print $2}' | sed 's/"//g' | csvcut -l > /tmp/investors.csv
-
I could potentially use this with reconcile-csv and OpenRefine as a source to validate our institutional authors against…
Get a list of institutions from CCAFS’s Clarisa API and try to parse it with jq
, do some small cleanups and add a header in sed
, and then pass it through csvcut
to add line numbers:
$ cat ~/Downloads/institutions.json| jq '.[] | {name: .name}' | grep name | awk -F: '{print $2}' | sed -e 's/"//g' -e 's/^ //' -e '1iname' | csvcut -l | sed '1s/line_number/id/' > /tmp/clarisa-institutions.csv
+$ csv-metadata-quality -i /tmp/clarisa-institutions.csv -o /tmp/clarisa-institutions-cleaned.csv -u
+
The csv-metadata-quality tool caught a few records with excessive spacing and unnecessary Unicode
I could potentially use this with reconcile-csv and OpenRefine as a source to validate our institutional authors against…
iso-codes
package has ISO 3166-1 with “common name”, “name”, and “official name” representations, for example:
+
+