diff --git a/content/post/2017-04.md b/content/post/2017-04.md index 32f7c6230..2cacc968e 100644 --- a/content/post/2017-04.md +++ b/content/post/2017-04.md @@ -251,3 +251,27 @@ $ bundle binstubs puma --path ./sbin - So it seems he did it in the crosswalk! - Keep working on Ansible stuff for deploying the CKM REST API - We can use systemd's `Environment` stuff to pass the database parameters to Rails +- Abenet noticed that the "Workflow Statistics" option is missing now, but we have screenshots from a presentation in 2016 when it was there +- I filed a ticket with Atmire +- Looking at 933 CIAT records from Sisay, he's having problems creating a SAF bundle to import to DSpace Test +- I started by looking at his CSV in OpenRefine, and I see there a _bunch_ of fields with whitespace issues that I cleaned up: + +``` +value.replace(" ||","||").replace("|| ","||").replace(" || ","||") +``` + +- Also, all the filenames have spaces and URL encoded characters in them, so I decoded them from URL encoding: + +``` +unescape(value,"url") +``` + +- Then create the filename column using the following transform from URL: + +``` +value.split('/')[-1].replace(/#.*$/,"") +``` + +- The `replace` part is because some URLs have an anchor like `#page=14` which we obviously don't want on the filename +- Also, we need to only use the PDF on the item corresponding with page 1, so we don't end up with literally hundreds of duplicate PDFs +- Alternatively, I could export each page to a standalone PDF... diff --git a/public/2017-04/index.html b/public/2017-04/index.html index 52c39b8c3..23ae52366 100644 --- a/public/2017-04/index.html +++ b/public/2017-04/index.html @@ -30,7 +30,7 @@ $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Th - + @@ -79,9 +79,9 @@ $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Th "@type": "BlogPosting", "headline": "April, 2017", "url": "https://alanorth.github.io/cgspace-notes/2017-04/", - "wordCount": "1712", + "wordCount": "1877", "datePublished": "2017-04-02T17:08:52+02:00", - "dateModified": "2017-04-18T16:58:55+03:00", + "dateModified": "2017-04-19T15:39:19+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -448,6 +448,33 @@ $ rails -s
  • So it seems he did it in the crosswalk!
  • Keep working on Ansible stuff for deploying the CKM REST API
  • We can use systemd’s Environment stuff to pass the database parameters to Rails
  • +
  • Abenet noticed that the “Workflow Statistics” option is missing now, but we have screenshots from a presentation in 2016 when it was there
  • +
  • I filed a ticket with Atmire
  • +
  • Looking at 933 CIAT records from Sisay, he’s having problems creating a SAF bundle to import to DSpace Test
  • +
  • I started by looking at his CSV in OpenRefine, and I see there a bunch of fields with whitespace issues that I cleaned up:
  • + + +
    value.replace(" ||","||").replace("|| ","||").replace(" || ","||")
    +
    + + + +
    unescape(value,"url")
    +
    + + + +
    value.split('/')[-1].replace(/#.*$/,"")
    +
    + + diff --git a/public/sitemap.xml b/public/sitemap.xml index 0ac6176c4..b60b60b01 100644 --- a/public/sitemap.xml +++ b/public/sitemap.xml @@ -3,7 +3,7 @@ https://alanorth.github.io/cgspace-notes/2017-04/ - 2017-04-18T16:58:55+03:00 + 2017-04-19T15:39:19+03:00 @@ -93,7 +93,7 @@ https://alanorth.github.io/cgspace-notes/ - 2017-04-18T16:58:55+03:00 + 2017-04-19T15:39:19+03:00 0 @@ -104,19 +104,19 @@ https://alanorth.github.io/cgspace-notes/tags/notes/ - 2017-04-18T16:58:55+03:00 + 2017-04-19T15:39:19+03:00 0 https://alanorth.github.io/cgspace-notes/post/ - 2017-04-18T16:58:55+03:00 + 2017-04-19T15:39:19+03:00 0 https://alanorth.github.io/cgspace-notes/tags/ - 2017-04-18T16:58:55+03:00 + 2017-04-19T15:39:19+03:00 0