diff --git a/content/posts/2018-05.md b/content/posts/2018-05.md index 6bf6142c7..bd88cf360 100644 --- a/content/posts/2018-05.md +++ b/content/posts/2018-05.md @@ -111,3 +111,49 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i - Udana asked about the Book Chapters we had been proofing on DSpace Test in 2018-04 - I told him that there were still some TODO items for him on that data, for example to update the `dc.language.iso` field for the Spanish items +- I was trying to remember how I parsed the `input-forms.xml` using `xmllint` to extract subjects neatly +- I could use it with [reconcile-csv](https://github.com/okfn/reconcile-csv) or to populate a Solr instance for reconciliation +- This XPath expression gets close, but outputs all items on one line: + +``` +$ xmllint --xpath '//value-pairs[@value-pairs-name="crpsubject"]/pair/stored-value/node()' dspace/config/input-forms.xml +Agriculture for Nutrition and HealthBig DataClimate Change, Agriculture and Food SecurityExcellence in BreedingFishForests, Trees and AgroforestryGenebanksGrain Legumes and Dryland CerealsLivestockMaizePolicies, Institutions and MarketsRiceRoots, Tubers and BananasWater, Land and EcosystemsWheatAquatic Agricultural SystemsDryland CerealsDryland SystemsGrain LegumesIntegrated Systems for the Humid TropicsLivestock and Fish +``` + +- Maybe `xmlstarlet` is better: + +``` +$ xmlstarlet sel -t -v '//value-pairs[@value-pairs-name="crpsubject"]/pair/stored-value/text()' dspace/config/input-forms.xml +Agriculture for Nutrition and Health +Big Data +Climate Change, Agriculture and Food Security +Excellence in Breeding +Fish +Forests, Trees and Agroforestry +Genebanks +Grain Legumes and Dryland Cereals +Livestock +Maize +Policies, Institutions and Markets +Rice +Roots, Tubers and Bananas +Water, Land and Ecosystems +Wheat +Aquatic Agricultural Systems +Dryland Cereals +Dryland Systems +Grain Legumes +Integrated Systems for the Humid Tropics +Livestock and Fish +``` + +- Discuss Colombian BNARS harvesting the CIAT data from CGSpace +- They are using a system called Primo and the only options for data harvesting in that system are via FTP and OAI +- I told them to get all [CIAT records via OAI](https://cgspace.cgiar.org/oai/request?verb=ListRecords&metadataPrefix=oai_dc&set=com_10568_35697) +- Just a note to myself, I figured out how to get reconcile-csv to run from source rather than running the old pre-compiled JAR file: + +``` +$ lein run /tmp/crps.csv id +``` + +- I tried to reconcile against a CSV of our countries but reconcile-csv crashes diff --git a/docs/2018-05/index.html b/docs/2018-05/index.html index 884832b21..859e84c17 100644 --- a/docs/2018-05/index.html +++ b/docs/2018-05/index.html @@ -27,7 +27,7 @@ Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked - + @@ -65,9 +65,9 @@ Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked "@type": "BlogPosting", "headline": "May, 2018", "url": "https://alanorth.github.io/cgspace-notes/2018-05/", - "wordCount": "907", + "wordCount": "1150", "datePublished": "2018-05-01T16:43:54+03:00", - "dateModified": "2018-05-07T17:50:32+03:00", + "dateModified": "2018-05-09T18:32:14+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -266,6 +266,55 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i + +
$ xmllint --xpath '//value-pairs[@value-pairs-name="crpsubject"]/pair/stored-value/node()' dspace/config/input-forms.xml        
+Agriculture for Nutrition and HealthBig DataClimate Change, Agriculture and Food SecurityExcellence in BreedingFishForests, Trees and AgroforestryGenebanksGrain Legumes and Dryland CerealsLivestockMaizePolicies, Institutions and MarketsRiceRoots, Tubers and BananasWater, Land and EcosystemsWheatAquatic Agricultural SystemsDryland CerealsDryland SystemsGrain LegumesIntegrated Systems for the Humid TropicsLivestock and Fish
+
+ + + +
$ xmlstarlet sel -t -v '//value-pairs[@value-pairs-name="crpsubject"]/pair/stored-value/text()' dspace/config/input-forms.xml
+Agriculture for Nutrition and Health
+Big Data
+Climate Change, Agriculture and Food Security
+Excellence in Breeding
+Fish
+Forests, Trees and Agroforestry
+Genebanks
+Grain Legumes and Dryland Cereals
+Livestock
+Maize
+Policies, Institutions and Markets
+Rice
+Roots, Tubers and Bananas
+Water, Land and Ecosystems
+Wheat
+Aquatic Agricultural Systems
+Dryland Cereals
+Dryland Systems
+Grain Legumes
+Integrated Systems for the Humid Tropics
+Livestock and Fish
+
+ + + +
$ lein run /tmp/crps.csv id
+
+ + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index bc7830792..155ad6678 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,7 +4,7 @@ https://alanorth.github.io/cgspace-notes/2018-05/ - 2018-05-07T17:50:32+03:00 + 2018-05-09T18:32:14+03:00 @@ -164,7 +164,7 @@ https://alanorth.github.io/cgspace-notes/ - 2018-05-07T17:50:32+03:00 + 2018-05-09T18:32:14+03:00 0 @@ -175,7 +175,7 @@ https://alanorth.github.io/cgspace-notes/tags/notes/ - 2018-05-07T17:50:32+03:00 + 2018-05-09T18:32:14+03:00 0 @@ -187,13 +187,13 @@ https://alanorth.github.io/cgspace-notes/posts/ - 2018-05-07T17:50:32+03:00 + 2018-05-09T18:32:14+03:00 0 https://alanorth.github.io/cgspace-notes/tags/ - 2018-05-07T17:50:32+03:00 + 2018-05-09T18:32:14+03:00 0