mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Update notes for 2017-04-19
This commit is contained in:
@ -30,7 +30,7 @@ $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Th
|
||||
|
||||
|
||||
<meta property="article:published_time" content="2017-04-02T17:08:52+02:00"/>
|
||||
<meta property="article:modified_time" content="2017-04-18T16:58:55+03:00"/>
|
||||
<meta property="article:modified_time" content="2017-04-19T15:39:19+03:00"/>
|
||||
|
||||
|
||||
|
||||
@ -79,9 +79,9 @@ $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Th
|
||||
"@type": "BlogPosting",
|
||||
"headline": "April, 2017",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2017-04/",
|
||||
"wordCount": "1712",
|
||||
"wordCount": "1877",
|
||||
"datePublished": "2017-04-02T17:08:52+02:00",
|
||||
"dateModified": "2017-04-18T16:58:55+03:00",
|
||||
"dateModified": "2017-04-19T15:39:19+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -448,6 +448,33 @@ $ rails -s
|
||||
<li>So it seems he did it in the crosswalk!</li>
|
||||
<li>Keep working on Ansible stuff for deploying the CKM REST API</li>
|
||||
<li>We can use systemd’s <code>Environment</code> stuff to pass the database parameters to Rails</li>
|
||||
<li>Abenet noticed that the “Workflow Statistics” option is missing now, but we have screenshots from a presentation in 2016 when it was there</li>
|
||||
<li>I filed a ticket with Atmire</li>
|
||||
<li>Looking at 933 CIAT records from Sisay, he’s having problems creating a SAF bundle to import to DSpace Test</li>
|
||||
<li>I started by looking at his CSV in OpenRefine, and I see there a <em>bunch</em> of fields with whitespace issues that I cleaned up:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>value.replace(" ||","||").replace("|| ","||").replace(" || ","||")
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>Also, all the filenames have spaces and URL encoded characters in them, so I decoded them from URL encoding:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>unescape(value,"url")
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>Then create the filename column using the following transform from URL:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>value.split('/')[-1].replace(/#.*$/,"")
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>The <code>replace</code> part is because some URLs have an anchor like <code>#page=14</code> which we obviously don’t want on the filename</li>
|
||||
<li>Also, we need to only use the PDF on the item corresponding with page 1, so we don’t end up with literally hundreds of duplicate PDFs</li>
|
||||
<li>Alternatively, I could export each page to a standalone PDF…</li>
|
||||
</ul>
|
||||
|
||||
|
||||
|
Reference in New Issue
Block a user