From 49329b6c7fe70fe89ce36e31db720fb280062941 Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Mon, 22 Feb 2016 19:14:39 +0200 Subject: [PATCH] Add notes for 2016-02-22 Signed-off-by: Alan Orth --- content/2016-02.md | 21 +++++++++++++++++++++ public/2016-02/index.html | 25 +++++++++++++++++++++++++ public/index.xml | 25 +++++++++++++++++++++++++ public/tags/notes/index.xml | 25 +++++++++++++++++++++++++ 4 files changed, 96 insertions(+) diff --git a/content/2016-02.md b/content/2016-02.md index 3bc32ce01..9931a8635 100644 --- a/content/2016-02.md +++ b/content/2016-02.md @@ -242,3 +242,24 @@ java.io.FileNotFoundException: /usr/share/tomcat7/SimpleArchiveFormat/item_1021/ - Need to rename files to have no accents or umlauts, etc... - Useful custom text facet for URLs ending with ".pdf": `value.endsWith(".pdf")` + +## 2016-02-22 + +- To change Spanish accents to ASCII in OpenRefine: + +``` +value.replace('ó','o').replace('í','i').replace('á','a').replace('é','e').replace('ñ','n') +``` + +- But actually, the accents might not be an issue, as I can successfully import files containing Spanish accents on my Mac +- On closer inspection, I can import files with the following names on Linux (DSpace Test): + +``` +Bitstream: tést.pdf +Bitstream: tést señora.pdf +Bitstream: tést señora alimentación.pdf +``` + +- Seems it could be something with the HFS+ filesystem actually, as it's not UTF-8 ([it's something like UCS-2](http://www.cio.com/article/2868393/linus-torvalds-apples-hfs-is-probably-the-worst-file-system-ever.html)) +- HFS+ stores filenames as a string, and filenames with accents get stored as [character+accent](https://blog.vrypan.net/2012/11/13/hfsplus-unicode-and-accented-chars/) whereas Linux's ext4 stores them as an array of bytes +- Running the SAFBuilder on Mac OS X works if you're going to import the resulting bundle on Mac OS X, but if your DSpace is running on Linux you need to run the SAFBuilder there where the filesystem's encoding matches diff --git a/public/2016-02/index.html b/public/2016-02/index.html index 8052018e4..913b2859f 100644 --- a/public/2016-02/index.html +++ b/public/2016-02/index.html @@ -352,6 +352,31 @@ CIAT_COLOMBIA_000169_Técnicas_para_el_aislamiento_y_cultivo_de_protoplastos_de_ + +

2016-02-22

+ + + +
value.replace('ó','o').replace('í','i').replace('á','a').replace('é','e').replace('ñ','n')
+
+ + + +
Bitstream: tést.pdf
+Bitstream: tést señora.pdf
+Bitstream: tést señora alimentación.pdf
+
+ + diff --git a/public/index.xml b/public/index.xml index 0b631ba1e..80fa65dc8 100644 --- a/public/index.xml +++ b/public/index.xml @@ -291,6 +291,31 @@ CIAT_COLOMBIA_000169_Técnicas_para_el_aislamiento_y_cultivo_de_protoplastos_de_ <li>Need to rename files to have no accents or umlauts, etc&hellip;</li> <li>Useful custom text facet for URLs ending with &ldquo;.pdf&rdquo;: <code>value.endsWith(&quot;.pdf&quot;)</code></li> </ul> + +<h2 id="2016-02-22:124a59adbaa8ef13e1518d003fc03981">2016-02-22</h2> + +<ul> +<li>To change Spanish accents to ASCII in OpenRefine:</li> +</ul> + +<pre><code>value.replace('ó','o').replace('í','i').replace('á','a').replace('é','e').replace('ñ','n') +</code></pre> + +<ul> +<li>But actually, the accents might not be an issue, as I can successfully import files containing Spanish accents on my Mac</li> +<li>On closer inspection, I can import files with the following names on Linux (DSpace Test):</li> +</ul> + +<pre><code>Bitstream: tést.pdf +Bitstream: tést señora.pdf +Bitstream: tést señora alimentación.pdf +</code></pre> + +<ul> +<li>Seems it could be something with the HFS+ filesystem actually, as it&rsquo;s not UTF-8 (<a href="http://www.cio.com/article/2868393/linus-torvalds-apples-hfs-is-probably-the-worst-file-system-ever.html">it&rsquo;s something like UCS-2</a>)</li> +<li>HFS+ stores filenames as a string, and filenames with accents get stored as <a href="https://blog.vrypan.net/2012/11/13/hfsplus-unicode-and-accented-chars/">character+accent</a> whereas Linux&rsquo;s ext4 stores them as an array of bytes</li> +<li>Running the SAFBuilder on Mac OS X works if you&rsquo;re going to import the resulting bundle on Mac OS X, but if your DSpace is running on Linux you need to run the SAFBuilder there where the filesystem&rsquo;s encoding matches</li> +</ul> diff --git a/public/tags/notes/index.xml b/public/tags/notes/index.xml index 52422b38c..011944cc7 100644 --- a/public/tags/notes/index.xml +++ b/public/tags/notes/index.xml @@ -291,6 +291,31 @@ CIAT_COLOMBIA_000169_Técnicas_para_el_aislamiento_y_cultivo_de_protoplastos_de_ <li>Need to rename files to have no accents or umlauts, etc&hellip;</li> <li>Useful custom text facet for URLs ending with &ldquo;.pdf&rdquo;: <code>value.endsWith(&quot;.pdf&quot;)</code></li> </ul> + +<h2 id="2016-02-22:124a59adbaa8ef13e1518d003fc03981">2016-02-22</h2> + +<ul> +<li>To change Spanish accents to ASCII in OpenRefine:</li> +</ul> + +<pre><code>value.replace('ó','o').replace('í','i').replace('á','a').replace('é','e').replace('ñ','n') +</code></pre> + +<ul> +<li>But actually, the accents might not be an issue, as I can successfully import files containing Spanish accents on my Mac</li> +<li>On closer inspection, I can import files with the following names on Linux (DSpace Test):</li> +</ul> + +<pre><code>Bitstream: tést.pdf +Bitstream: tést señora.pdf +Bitstream: tést señora alimentación.pdf +</code></pre> + +<ul> +<li>Seems it could be something with the HFS+ filesystem actually, as it&rsquo;s not UTF-8 (<a href="http://www.cio.com/article/2868393/linus-torvalds-apples-hfs-is-probably-the-worst-file-system-ever.html">it&rsquo;s something like UCS-2</a>)</li> +<li>HFS+ stores filenames as a string, and filenames with accents get stored as <a href="https://blog.vrypan.net/2012/11/13/hfsplus-unicode-and-accented-chars/">character+accent</a> whereas Linux&rsquo;s ext4 stores them as an array of bytes</li> +<li>Running the SAFBuilder on Mac OS X works if you&rsquo;re going to import the resulting bundle on Mac OS X, but if your DSpace is running on Linux you need to run the SAFBuilder there where the filesystem&rsquo;s encoding matches</li> +</ul>