mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-22 14:45:03 +01:00
Update notes for 2017-04-19
This commit is contained in:
parent
beac46e9db
commit
96298cc5bf
@ -251,3 +251,27 @@ $ bundle binstubs puma --path ./sbin
|
||||
- So it seems he did it in the crosswalk!
|
||||
- Keep working on Ansible stuff for deploying the CKM REST API
|
||||
- We can use systemd's `Environment` stuff to pass the database parameters to Rails
|
||||
- Abenet noticed that the "Workflow Statistics" option is missing now, but we have screenshots from a presentation in 2016 when it was there
|
||||
- I filed a ticket with Atmire
|
||||
- Looking at 933 CIAT records from Sisay, he's having problems creating a SAF bundle to import to DSpace Test
|
||||
- I started by looking at his CSV in OpenRefine, and I see there a _bunch_ of fields with whitespace issues that I cleaned up:
|
||||
|
||||
```
|
||||
value.replace(" ||","||").replace("|| ","||").replace(" || ","||")
|
||||
```
|
||||
|
||||
- Also, all the filenames have spaces and URL encoded characters in them, so I decoded them from URL encoding:
|
||||
|
||||
```
|
||||
unescape(value,"url")
|
||||
```
|
||||
|
||||
- Then create the filename column using the following transform from URL:
|
||||
|
||||
```
|
||||
value.split('/')[-1].replace(/#.*$/,"")
|
||||
```
|
||||
|
||||
- The `replace` part is because some URLs have an anchor like `#page=14` which we obviously don't want on the filename
|
||||
- Also, we need to only use the PDF on the item corresponding with page 1, so we don't end up with literally hundreds of duplicate PDFs
|
||||
- Alternatively, I could export each page to a standalone PDF...
|
||||
|
@ -30,7 +30,7 @@ $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Th
|
||||
|
||||
|
||||
<meta property="article:published_time" content="2017-04-02T17:08:52+02:00"/>
|
||||
<meta property="article:modified_time" content="2017-04-18T16:58:55+03:00"/>
|
||||
<meta property="article:modified_time" content="2017-04-19T15:39:19+03:00"/>
|
||||
|
||||
|
||||
|
||||
@ -79,9 +79,9 @@ $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Th
|
||||
"@type": "BlogPosting",
|
||||
"headline": "April, 2017",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2017-04/",
|
||||
"wordCount": "1712",
|
||||
"wordCount": "1877",
|
||||
"datePublished": "2017-04-02T17:08:52+02:00",
|
||||
"dateModified": "2017-04-18T16:58:55+03:00",
|
||||
"dateModified": "2017-04-19T15:39:19+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -448,6 +448,33 @@ $ rails -s
|
||||
<li>So it seems he did it in the crosswalk!</li>
|
||||
<li>Keep working on Ansible stuff for deploying the CKM REST API</li>
|
||||
<li>We can use systemd’s <code>Environment</code> stuff to pass the database parameters to Rails</li>
|
||||
<li>Abenet noticed that the “Workflow Statistics” option is missing now, but we have screenshots from a presentation in 2016 when it was there</li>
|
||||
<li>I filed a ticket with Atmire</li>
|
||||
<li>Looking at 933 CIAT records from Sisay, he’s having problems creating a SAF bundle to import to DSpace Test</li>
|
||||
<li>I started by looking at his CSV in OpenRefine, and I see there a <em>bunch</em> of fields with whitespace issues that I cleaned up:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>value.replace(" ||","||").replace("|| ","||").replace(" || ","||")
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>Also, all the filenames have spaces and URL encoded characters in them, so I decoded them from URL encoding:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>unescape(value,"url")
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>Then create the filename column using the following transform from URL:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>value.split('/')[-1].replace(/#.*$/,"")
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>The <code>replace</code> part is because some URLs have an anchor like <code>#page=14</code> which we obviously don’t want on the filename</li>
|
||||
<li>Also, we need to only use the PDF on the item corresponding with page 1, so we don’t end up with literally hundreds of duplicate PDFs</li>
|
||||
<li>Alternatively, I could export each page to a standalone PDF…</li>
|
||||
</ul>
|
||||
|
||||
|
||||
|
@ -3,7 +3,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2017-04/</loc>
|
||||
<lastmod>2017-04-18T16:58:55+03:00</lastmod>
|
||||
<lastmod>2017-04-19T15:39:19+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
@ -93,7 +93,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||
<lastmod>2017-04-18T16:58:55+03:00</lastmod>
|
||||
<lastmod>2017-04-19T15:39:19+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
@ -104,19 +104,19 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||
<lastmod>2017-04-18T16:58:55+03:00</lastmod>
|
||||
<lastmod>2017-04-19T15:39:19+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/post/</loc>
|
||||
<lastmod>2017-04-18T16:58:55+03:00</lastmod>
|
||||
<lastmod>2017-04-19T15:39:19+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||
<lastmod>2017-04-18T16:58:55+03:00</lastmod>
|
||||
<lastmod>2017-04-19T15:39:19+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user