diff --git a/content/post/2017-04.md b/content/post/2017-04.md index 32f7c6230..2cacc968e 100644 --- a/content/post/2017-04.md +++ b/content/post/2017-04.md @@ -251,3 +251,27 @@ $ bundle binstubs puma --path ./sbin - So it seems he did it in the crosswalk! - Keep working on Ansible stuff for deploying the CKM REST API - We can use systemd's `Environment` stuff to pass the database parameters to Rails +- Abenet noticed that the "Workflow Statistics" option is missing now, but we have screenshots from a presentation in 2016 when it was there +- I filed a ticket with Atmire +- Looking at 933 CIAT records from Sisay, he's having problems creating a SAF bundle to import to DSpace Test +- I started by looking at his CSV in OpenRefine, and I see there a _bunch_ of fields with whitespace issues that I cleaned up: + +``` +value.replace(" ||","||").replace("|| ","||").replace(" || ","||") +``` + +- Also, all the filenames have spaces and URL encoded characters in them, so I decoded them from URL encoding: + +``` +unescape(value,"url") +``` + +- Then create the filename column using the following transform from URL: + +``` +value.split('/')[-1].replace(/#.*$/,"") +``` + +- The `replace` part is because some URLs have an anchor like `#page=14` which we obviously don't want on the filename +- Also, we need to only use the PDF on the item corresponding with page 1, so we don't end up with literally hundreds of duplicate PDFs +- Alternatively, I could export each page to a standalone PDF... diff --git a/public/2017-04/index.html b/public/2017-04/index.html index 52c39b8c3..23ae52366 100644 --- a/public/2017-04/index.html +++ b/public/2017-04/index.html @@ -30,7 +30,7 @@ $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Th - + @@ -79,9 +79,9 @@ $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Th "@type": "BlogPosting", "headline": "April, 2017", "url": "https://alanorth.github.io/cgspace-notes/2017-04/", - "wordCount": "1712", + "wordCount": "1877", "datePublished": "2017-04-02T17:08:52+02:00", - "dateModified": "2017-04-18T16:58:55+03:00", + "dateModified": "2017-04-19T15:39:19+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -448,6 +448,33 @@ $ rails -s
Environment
stuff to pass the database parameters to Railsvalue.replace(" ||","||").replace("|| ","||").replace(" || ","||")
+
+
+unescape(value,"url")
+
+
+value.split('/')[-1].replace(/#.*$/,"")
+
+
+replace
part is because some URLs have an anchor like #page=14
which we obviously don’t want on the filename