Update notes for 2017-04-19

This commit is contained in:
Alan Orth 2017-04-19 18:37:04 +03:00
parent beac46e9db
commit 96298cc5bf
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
3 changed files with 59 additions and 8 deletions

View File

@ -251,3 +251,27 @@ $ bundle binstubs puma --path ./sbin
- So it seems he did it in the crosswalk! - So it seems he did it in the crosswalk!
- Keep working on Ansible stuff for deploying the CKM REST API - Keep working on Ansible stuff for deploying the CKM REST API
- We can use systemd's `Environment` stuff to pass the database parameters to Rails - We can use systemd's `Environment` stuff to pass the database parameters to Rails
- Abenet noticed that the "Workflow Statistics" option is missing now, but we have screenshots from a presentation in 2016 when it was there
- I filed a ticket with Atmire
- Looking at 933 CIAT records from Sisay, he's having problems creating a SAF bundle to import to DSpace Test
- I started by looking at his CSV in OpenRefine, and I see there a _bunch_ of fields with whitespace issues that I cleaned up:
```
value.replace(" ||","||").replace("|| ","||").replace(" || ","||")
```
- Also, all the filenames have spaces and URL encoded characters in them, so I decoded them from URL encoding:
```
unescape(value,"url")
```
- Then create the filename column using the following transform from URL:
```
value.split('/')[-1].replace(/#.*$/,"")
```
- The `replace` part is because some URLs have an anchor like `#page=14` which we obviously don't want on the filename
- Also, we need to only use the PDF on the item corresponding with page 1, so we don't end up with literally hundreds of duplicate PDFs
- Alternatively, I could export each page to a standalone PDF...

View File

@ -30,7 +30,7 @@ $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Th
<meta property="article:published_time" content="2017-04-02T17:08:52&#43;02:00"/> <meta property="article:published_time" content="2017-04-02T17:08:52&#43;02:00"/>
<meta property="article:modified_time" content="2017-04-18T16:58:55&#43;03:00"/> <meta property="article:modified_time" content="2017-04-19T15:39:19&#43;03:00"/>
@ -79,9 +79,9 @@ $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Th
"@type": "BlogPosting", "@type": "BlogPosting",
"headline": "April, 2017", "headline": "April, 2017",
"url": "https://alanorth.github.io/cgspace-notes/2017-04/", "url": "https://alanorth.github.io/cgspace-notes/2017-04/",
"wordCount": "1712", "wordCount": "1877",
"datePublished": "2017-04-02T17:08:52&#43;02:00", "datePublished": "2017-04-02T17:08:52&#43;02:00",
"dateModified": "2017-04-18T16:58:55&#43;03:00", "dateModified": "2017-04-19T15:39:19&#43;03:00",
"author": { "author": {
"@type": "Person", "@type": "Person",
"name": "Alan Orth" "name": "Alan Orth"
@ -448,6 +448,33 @@ $ rails -s
<li>So it seems he did it in the crosswalk!</li> <li>So it seems he did it in the crosswalk!</li>
<li>Keep working on Ansible stuff for deploying the CKM REST API</li> <li>Keep working on Ansible stuff for deploying the CKM REST API</li>
<li>We can use systemd&rsquo;s <code>Environment</code> stuff to pass the database parameters to Rails</li> <li>We can use systemd&rsquo;s <code>Environment</code> stuff to pass the database parameters to Rails</li>
<li>Abenet noticed that the &ldquo;Workflow Statistics&rdquo; option is missing now, but we have screenshots from a presentation in 2016 when it was there</li>
<li>I filed a ticket with Atmire</li>
<li>Looking at 933 CIAT records from Sisay, he&rsquo;s having problems creating a SAF bundle to import to DSpace Test</li>
<li>I started by looking at his CSV in OpenRefine, and I see there a <em>bunch</em> of fields with whitespace issues that I cleaned up:</li>
</ul>
<pre><code>value.replace(&quot; ||&quot;,&quot;||&quot;).replace(&quot;|| &quot;,&quot;||&quot;).replace(&quot; || &quot;,&quot;||&quot;)
</code></pre>
<ul>
<li>Also, all the filenames have spaces and URL encoded characters in them, so I decoded them from URL encoding:</li>
</ul>
<pre><code>unescape(value,&quot;url&quot;)
</code></pre>
<ul>
<li>Then create the filename column using the following transform from URL:</li>
</ul>
<pre><code>value.split('/')[-1].replace(/#.*$/,&quot;&quot;)
</code></pre>
<ul>
<li>The <code>replace</code> part is because some URLs have an anchor like <code>#page=14</code> which we obviously don&rsquo;t want on the filename</li>
<li>Also, we need to only use the PDF on the item corresponding with page 1, so we don&rsquo;t end up with literally hundreds of duplicate PDFs</li>
<li>Alternatively, I could export each page to a standalone PDF&hellip;</li>
</ul> </ul>

View File

@ -3,7 +3,7 @@
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/2017-04/</loc> <loc>https://alanorth.github.io/cgspace-notes/2017-04/</loc>
<lastmod>2017-04-18T16:58:55+03:00</lastmod> <lastmod>2017-04-19T15:39:19+03:00</lastmod>
</url> </url>
<url> <url>
@ -93,7 +93,7 @@
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/</loc> <loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2017-04-18T16:58:55+03:00</lastmod> <lastmod>2017-04-19T15:39:19+03:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>
@ -104,19 +104,19 @@
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc> <loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
<lastmod>2017-04-18T16:58:55+03:00</lastmod> <lastmod>2017-04-19T15:39:19+03:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/post/</loc> <loc>https://alanorth.github.io/cgspace-notes/post/</loc>
<lastmod>2017-04-18T16:58:55+03:00</lastmod> <lastmod>2017-04-19T15:39:19+03:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc> <loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
<lastmod>2017-04-18T16:58:55+03:00</lastmod> <lastmod>2017-04-19T15:39:19+03:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>