From 1fb9e61d7d66f1f4350c80eac0136e1813d94542 Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Wed, 17 Feb 2016 23:16:29 +0200 Subject: [PATCH] Update notes for 2016-02 --- content/2016-02.md | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/content/2016-02.md b/content/2016-02.md index efd17e85f..b37fdbab8 100644 --- a/content/2016-02.md +++ b/content/2016-02.md @@ -206,3 +206,25 @@ CIAT_COLOMBIA_000169_Técnicas_para_el_aislamiento_y_cultivo_de_protoplastos_de_ - Merge pull requests for submission form theming ([#178](https://github.com/ilri/DSpace/pull/178)) and missing center subjects in XMLUI item views ([#176](https://github.com/ilri/DSpace/pull/176)) - They will be deployed on CGSpace the next time I re-deploy + +## 2016-02-16 + +- Turns out OpenRefine has an unescape function! + +``` +value.unescape("url") +``` + +- This turns the URLs into human-readable versions that we can use as proper filenames +- Run web server and system updates on DSpace Test and reboot +- To merge `dc.identifier.url` and `dc.identifier.url[]`, rename the second column so it doesn't have the brackets, like `dc.identifier.url2` +- Then you create a facet for blank values on each column, show the rows that have values for one and not the other, then transform each independently to have the contents of the other, with "||" in between +- Work on Python script for parsing and downloading PDF records from `dc.identifier.url` +- To turn `dc.identifier.url` into filenames, create a new column based o +- To get filenames from `dc.identifier.url`, create a new column based on this transform: `forEach(value.split('||'), v, v.split('/')[-1]).join('||')` +- This also works for records that have multiple URLs (separated by "||") + +## 2016-02-17 + +- Re-deploy CGSpace, run all system updates, and reboot +- More work on CIAT data, cleaning and doing a last metadata-only import into DSpace Test