diff --git a/content/posts/2019-08.md b/content/posts/2019-08.md index 6522c3f4c..8dfa573e7 100644 --- a/content/posts/2019-08.md +++ b/content/posts/2019-08.md @@ -358,12 +358,5 @@ sys 2m27.496s - After reading the code I see that XSLT is reading the community titles from the DIM representation (stored in the `$dim` variable) created from METS - I modified the patterns in my sed script so that those lines are not replaced and then the community list works again - This is actually not a problem at all because this metadata is only used in the HTML meta tags in XMLUI community lists and has nothing to do with item metadata -- Get a list of institutions from CCAFS's Clarisa API and try to parse it with `jq` and pass it through `csvcut` to add line numbers: - -``` -$ cat ~/Downloads/institutions.json| jq '.[] | {name: .name}' | grep name | awk -F: '{print $2}' | sed 's/"//g' | csvcut -l > /tmp/investors.csv -``` - -- I could potentially use this with reconcile-csv and OpenRefine as a source to validate our institutional authors against... diff --git a/content/posts/2019-09.md b/content/posts/2019-09.md index f85ccf356..05c5cbe6e 100644 --- a/content/posts/2019-09.md +++ b/content/posts/2019-09.md @@ -319,5 +319,37 @@ $ dspace import -a me@cgiar.org -m 2019-09-20-bioversity2.map -s /home/aorth/Bio - Give more feedback to Bosede about the [IITA Sept 6 (20196th.xls) records on DSpace Test](https://dspacetest.cgiar.org/handle/10568/105116) - I told her to delete one item that appears to be a duplicate, or to fix its citation to be correct if she thinks it is not a duplicate - I deleted another item that I had previously identified as a duplicate that she had fixed by incorrectly deleting the original (ugh) +- Get a list of institutions from CCAFS's Clarisa API and try to parse it with `jq`, do some small cleanups and add a header in `sed`, and then pass it through `csvcut` to add line numbers: + +``` +$ cat ~/Downloads/institutions.json| jq '.[] | {name: .name}' | grep name | awk -F: '{print $2}' | sed -e 's/"//g' -e 's/^ //' -e '1iname' | csvcut -l | sed '1s/line_number/id/' > /tmp/clarisa-institutions.csv +$ csv-metadata-quality -i /tmp/clarisa-institutions.csv -o /tmp/clarisa-institutions-cleaned.csv -u +``` + +- The csv-metadata-quality tool caught a few records with excessive spacing and unnecessary Unicode +- I could potentially use this with reconcile-csv and OpenRefine as a source to validate our institutional authors against... + +## 2019-09-27 + +- Skype with Peter and Abenet about CGSpace actions + - Peter will respond to ICARDA's request to deposit items in to CGSpace, with a caveat that we agree on some vocabulary standards for institutions, countries, regions, etc + - We discussed using ISO 3166 for countries, though Peter doesn't like the formal names like "Moldova, Republic of" and "Tanzania, United Republic of" + - The Debian `iso-codes` package has ISO 3166-1 with "common name", "name", and "official name" representations, for example: + - common_name: Tanzania + - name: Tanzania, United Republic of + - official_name: United Republic of Tanzania + - There are still some unfortunate ones there, though: + - name: Korea, Democratic People's Republic of + - official_name: Democratic People's Republic of Korea + - And this, which isn't even in English... + - name: Côte d'Ivoire + - official_name: Republic of Côte d'Ivoire + - The other alternative is to just keep using the names we have, which are mostly compliant with AGROVOC + - Peter said that a new server for DSpace Test is fine, so I can proceed with the normal process of getting approval from Michael Victor and ICT when I have time (recommend moving from $40 to $80/month Linode, with 16GB RAM) + - I need to ask Atmire for a quote to upgrade CGSpace to DSpace 6 with all current modules so we can see how many more credits we need +- A little bit more work on the Sept 6 IITA batch records + - Bosede deleted the one item that I told her was a duplicate + - I checked the AGROVOC subjects and fixed one incorrect one + - Then I told her that I think the items are ready to go to CGSpace and asked Abenet for a final comment diff --git a/docs/2019-08/index.html b/docs/2019-08/index.html index c1bc10d59..65cf32e65 100644 --- a/docs/2019-08/index.html +++ b/docs/2019-08/index.html @@ -27,7 +27,7 @@ Run system updates on DSpace Test (linode19) and reboot it - + @@ -59,9 +59,9 @@ Run system updates on DSpace Test (linode19) and reboot it "@type": "BlogPosting", "headline": "August, 2019", "url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-08\/", - "wordCount": "2770", + "wordCount": "2703", "datePublished": "2019-08-03T12:39:51\x2b03:00", - "dateModified": "2019-09-01T01:54:55\x2b03:00", + "dateModified": "2019-09-27T01:20:09\x2b03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -603,13 +603,6 @@ sys 2m27.496s
  • I modified the patterns in my sed script so that those lines are not replaced and then the community list works again
  • This is actually not a problem at all because this metadata is only used in the HTML meta tags in XMLUI community lists and has nothing to do with item metadata
  • - -
  • Get a list of institutions from CCAFS’s Clarisa API and try to parse it with jq and pass it through csvcut to add line numbers:

    - -
    $ cat ~/Downloads/institutions.json| jq '.[] | {name: .name}' | grep name | awk -F: '{print $2}' | sed 's/"//g' | csvcut -l > /tmp/investors.csv
    -
  • - -
  • I could potentially use this with reconcile-csv and OpenRefine as a source to validate our institutional authors against…

  • diff --git a/docs/2019-09/index.html b/docs/2019-09/index.html index 1b443a6dc..d5e1dad64 100644 --- a/docs/2019-09/index.html +++ b/docs/2019-09/index.html @@ -40,7 +40,7 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning: - + @@ -85,9 +85,9 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning: "@type": "BlogPosting", "headline": "September, 2019", "url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-09\/", - "wordCount": "2497", + "wordCount": "2870", "datePublished": "2019-09-01T10:17:51\x2b03:00", - "dateModified": "2019-09-26T14:21:41\x2b03:00", + "dateModified": "2019-09-27T01:20:09\x2b03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -561,6 +561,56 @@ $ dspace import -a me@cgiar.org -m 2019-09-20-bioversity2.map -s /home/aorth/Bio
  • I told her to delete one item that appears to be a duplicate, or to fix its citation to be correct if she thinks it is not a duplicate
  • I deleted another item that I had previously identified as a duplicate that she had fixed by incorrectly deleting the original (ugh)
  • + +
  • Get a list of institutions from CCAFS’s Clarisa API and try to parse it with jq, do some small cleanups and add a header in sed, and then pass it through csvcut to add line numbers:

    + +
    $ cat ~/Downloads/institutions.json| jq '.[] | {name: .name}' | grep name | awk -F: '{print $2}' | sed -e 's/"//g' -e 's/^ //' -e '1iname' | csvcut -l | sed '1s/line_number/id/' > /tmp/clarisa-institutions.csv
    +$ csv-metadata-quality -i /tmp/clarisa-institutions.csv -o /tmp/clarisa-institutions-cleaned.csv -u
    +
  • + +
  • The csv-metadata-quality tool caught a few records with excessive spacing and unnecessary Unicode

  • + +
  • I could potentially use this with reconcile-csv and OpenRefine as a source to validate our institutional authors against…

  • + + +

    2019-09-27

    + + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index cb676705f..6a984516a 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,32 +4,32 @@ https://alanorth.github.io/cgspace-notes/ - 2019-09-26T14:21:41+03:00 + 2019-09-27T01:20:09+03:00 https://alanorth.github.io/cgspace-notes/tags/notes/ - 2019-09-26T14:21:41+03:00 + 2019-09-27T01:20:09+03:00 https://alanorth.github.io/cgspace-notes/posts/ - 2019-09-26T14:21:41+03:00 + 2019-09-27T01:20:09+03:00 https://alanorth.github.io/cgspace-notes/2019-09/ - 2019-09-26T14:21:41+03:00 + 2019-09-27T01:20:09+03:00 https://alanorth.github.io/cgspace-notes/tags/ - 2019-09-26T14:21:41+03:00 + 2019-09-27T01:20:09+03:00 https://alanorth.github.io/cgspace-notes/2019-08/ - 2019-09-01T01:54:55+03:00 + 2019-09-27T01:20:09+03:00