diff --git a/content/post/2017-07.md b/content/post/2017-07.md index c8d88c114..f92af03e4 100644 --- a/content/post/2017-07.md +++ b/content/post/2017-07.md @@ -90,3 +90,6 @@ org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserve - September 10/11: go live? - Talk to Tsega and Danny about exporting/injesting the blog posts from Drupal into DSpace? - Followup meeting on August 8/9? +- Sent Abenet the 2415 records from CGIAR Library's Historical Archive (10947/1) after cleaning up the author authorities and HTML entities in `dc.contributor.author` and `dc.description.abstract` using OpenRefine: + - Authors: `value.replace(/::\w{8}-\w{4}-\w{4}-\w{4}-\w{12}::600/,"")` + - Abstracts: `replace(value,/<\/?\w+((\s+\w+(\s*=\s*(?:".*?"|'.*?'|[^'">\s]+))?)+\s*|\s*)\/?>/,'')` diff --git a/public/2017-07/index.html b/public/2017-07/index.html index c634d08f2..493dc4333 100644 --- a/public/2017-07/index.html +++ b/public/2017-07/index.html @@ -27,7 +27,7 @@ We can use PostgreSQL’s extended output format (-x) plus sed to format the - + @@ -73,9 +73,9 @@ We can use PostgreSQL’s extended output format (-x) plus sed to format the "@type": "BlogPosting", "headline": "July, 2017", "url": "https://alanorth.github.io/cgspace-notes/2017-07/", - "wordCount": "694", + "wordCount": "724", "datePublished": "2017-07-01T18:03:52+03:00", - "dateModified": "2017-07-17T23:50:33+04:00", + "dateModified": "2017-07-20T12:53:02+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -247,6 +247,12 @@ org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserve
dc.contributor.author
and dc.description.abstract
using OpenRefine:
+
+value.replace(/::\w{8}-\w{4}-\w{4}-\w{4}-\w{12}::600/,"")
replace(value,/<\/?\w+((\s+\w+(\s*=\s*(?:".*?"|'.*?'|[^'">\s]+))?)+\s*|\s*)\/?>/,'')