From ad891890c7a9d43e61f32f82baaaf243a5e3c283 Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Mon, 15 May 2017 07:52:09 +0300 Subject: [PATCH] Add notes for 2017-05-15 --- content/post/2017-05.md | 5 +++++ public/2017-05/index.html | 13 ++++++++++--- public/sitemap.xml | 10 +++++----- 3 files changed, 20 insertions(+), 8 deletions(-) diff --git a/content/post/2017-05.md b/content/post/2017-05.md index 8de99f28f..8bee6fde7 100644 --- a/content/post/2017-05.md +++ b/content/post/2017-05.md @@ -132,3 +132,8 @@ dspace=# delete from metadatavalue where resource_type_id=2 and text_value=''; - After quite a bit of troubleshooting with importing cleaned up data as CSV, it seems that there are actually [NUL](https://en.wikipedia.org/wiki/Null_character) characters in the `dc.description.abstract` field (at least) on the lines where CSV importing was failing - I tried to find a way to remove the characters in vim or Open Refine, but decided it was quicker to just remove the column temporarily and import it - The import was successful and detected 2022 changes, which should likely be the rest that were failing to import before + +## 2017-05-15 + +- To delete the blank lines that cause isses during import we need to use a regex in vim `g/^$/d` +- After that I started looking in the `dc.subject` field to try to pull countries and regions out, but there are too many values in there diff --git a/public/2017-05/index.html b/public/2017-05/index.html index 461dbf830..23e362413 100644 --- a/public/2017-05/index.html +++ b/public/2017-05/index.html @@ -13,7 +13,7 @@ - + @@ -45,9 +45,9 @@ "@type": "BlogPosting", "headline": "May, 2017", "url": "https://alanorth.github.io/cgspace-notes/2017-05/", - "wordCount": "1122", + "wordCount": "1167", "datePublished": "2017-05-01T16:21:52+02:00", - "dateModified": "2017-05-10T23:44:44+03:00", + "dateModified": "2017-05-13T13:48:40+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -271,6 +271,13 @@ $ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager
  • The import was successful and detected 2022 changes, which should likely be the rest that were failing to import before
  • +

    2017-05-15

    + + + diff --git a/public/sitemap.xml b/public/sitemap.xml index 3f5b8089f..9824d523b 100644 --- a/public/sitemap.xml +++ b/public/sitemap.xml @@ -4,7 +4,7 @@ https://alanorth.github.io/cgspace-notes/2017-05/ - 2017-05-10T23:44:44+03:00 + 2017-05-13T13:48:40+03:00 @@ -99,7 +99,7 @@ https://alanorth.github.io/cgspace-notes/ - 2017-05-10T23:44:44+03:00 + 2017-05-13T13:48:40+03:00 0 @@ -110,19 +110,19 @@ https://alanorth.github.io/cgspace-notes/tags/notes/ - 2017-05-10T23:44:44+03:00 + 2017-05-13T13:48:40+03:00 0 https://alanorth.github.io/cgspace-notes/post/ - 2017-05-10T23:44:44+03:00 + 2017-05-13T13:48:40+03:00 0 https://alanorth.github.io/cgspace-notes/tags/ - 2017-05-10T23:44:44+03:00 + 2017-05-13T13:48:40+03:00 0