Add notes for 2016-02-22

Signed-off-by: Alan Orth <alan.orth@gmail.com>
This commit is contained in:
Alan Orth 2016-02-22 19:14:39 +02:00
parent e364344126
commit 49329b6c7f
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
4 changed files with 96 additions and 0 deletions

View File

@ -242,3 +242,24 @@ java.io.FileNotFoundException: /usr/share/tomcat7/SimpleArchiveFormat/item_1021/
- Need to rename files to have no accents or umlauts, etc... - Need to rename files to have no accents or umlauts, etc...
- Useful custom text facet for URLs ending with ".pdf": `value.endsWith(".pdf")` - Useful custom text facet for URLs ending with ".pdf": `value.endsWith(".pdf")`
## 2016-02-22
- To change Spanish accents to ASCII in OpenRefine:
```
value.replace('ó','o').replace('í','i').replace('á','a').replace('é','e').replace('ñ','n')
```
- But actually, the accents might not be an issue, as I can successfully import files containing Spanish accents on my Mac
- On closer inspection, I can import files with the following names on Linux (DSpace Test):
```
Bitstream: tést.pdf
Bitstream: tést señora.pdf
Bitstream: tést señora alimentación.pdf
```
- Seems it could be something with the HFS+ filesystem actually, as it's not UTF-8 ([it's something like UCS-2](http://www.cio.com/article/2868393/linus-torvalds-apples-hfs-is-probably-the-worst-file-system-ever.html))
- HFS+ stores filenames as a string, and filenames with accents get stored as [character+accent](https://blog.vrypan.net/2012/11/13/hfsplus-unicode-and-accented-chars/) whereas Linux's ext4 stores them as an array of bytes
- Running the SAFBuilder on Mac OS X works if you're going to import the resulting bundle on Mac OS X, but if your DSpace is running on Linux you need to run the SAFBuilder there where the filesystem's encoding matches

View File

@ -352,6 +352,31 @@ CIAT_COLOMBIA_000169_Técnicas_para_el_aislamiento_y_cultivo_de_protoplastos_de_
<ul> <ul>
<li>Need to rename files to have no accents or umlauts, etc&hellip;</li> <li>Need to rename files to have no accents or umlauts, etc&hellip;</li>
<li>Useful custom text facet for URLs ending with &ldquo;.pdf&rdquo;: <code>value.endsWith(&quot;.pdf&quot;)</code></li> <li>Useful custom text facet for URLs ending with &ldquo;.pdf&rdquo;: <code>value.endsWith(&quot;.pdf&quot;)</code></li>
</ul>
<h2 id="2016-02-22:124a59adbaa8ef13e1518d003fc03981">2016-02-22</h2>
<ul>
<li>To change Spanish accents to ASCII in OpenRefine:</li>
</ul>
<pre><code>value.replace('ó','o').replace('í','i').replace('á','a').replace('é','e').replace('ñ','n')
</code></pre>
<ul>
<li>But actually, the accents might not be an issue, as I can successfully import files containing Spanish accents on my Mac</li>
<li>On closer inspection, I can import files with the following names on Linux (DSpace Test):</li>
</ul>
<pre><code>Bitstream: tést.pdf
Bitstream: tést señora.pdf
Bitstream: tést señora alimentación.pdf
</code></pre>
<ul>
<li>Seems it could be something with the HFS+ filesystem actually, as it&rsquo;s not UTF-8 (<a href="http://www.cio.com/article/2868393/linus-torvalds-apples-hfs-is-probably-the-worst-file-system-ever.html">it&rsquo;s something like UCS-2</a>)</li>
<li>HFS+ stores filenames as a string, and filenames with accents get stored as <a href="https://blog.vrypan.net/2012/11/13/hfsplus-unicode-and-accented-chars/">character+accent</a> whereas Linux&rsquo;s ext4 stores them as an array of bytes</li>
<li>Running the SAFBuilder on Mac OS X works if you&rsquo;re going to import the resulting bundle on Mac OS X, but if your DSpace is running on Linux you need to run the SAFBuilder there where the filesystem&rsquo;s encoding matches</li>
</ul> </ul>
</section> </section>

View File

@ -291,6 +291,31 @@ CIAT_COLOMBIA_000169_Técnicas_para_el_aislamiento_y_cultivo_de_protoplastos_de_
&lt;li&gt;Need to rename files to have no accents or umlauts, etc&amp;hellip;&lt;/li&gt; &lt;li&gt;Need to rename files to have no accents or umlauts, etc&amp;hellip;&lt;/li&gt;
&lt;li&gt;Useful custom text facet for URLs ending with &amp;ldquo;.pdf&amp;rdquo;: &lt;code&gt;value.endsWith(&amp;quot;.pdf&amp;quot;)&lt;/code&gt;&lt;/li&gt; &lt;li&gt;Useful custom text facet for URLs ending with &amp;ldquo;.pdf&amp;rdquo;: &lt;code&gt;value.endsWith(&amp;quot;.pdf&amp;quot;)&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt; &lt;/ul&gt;
&lt;h2 id=&#34;2016-02-22:124a59adbaa8ef13e1518d003fc03981&#34;&gt;2016-02-22&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;To change Spanish accents to ASCII in OpenRefine:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;value.replace(&#39;ó&#39;,&#39;o&#39;).replace(&#39;í&#39;,&#39;i&#39;).replace(&#39;á&#39;,&#39;a&#39;).replace(&#39;é&#39;,&#39;e&#39;).replace(&#39;ñ&#39;,&#39;n&#39;)
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;But actually, the accents might not be an issue, as I can successfully import files containing Spanish accents on my Mac&lt;/li&gt;
&lt;li&gt;On closer inspection, I can import files with the following names on Linux (DSpace Test):&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;Bitstream: tést.pdf
Bitstream: tést señora.pdf
Bitstream: tést señora alimentación.pdf
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Seems it could be something with the HFS+ filesystem actually, as it&amp;rsquo;s not UTF-8 (&lt;a href=&#34;http://www.cio.com/article/2868393/linus-torvalds-apples-hfs-is-probably-the-worst-file-system-ever.html&#34;&gt;it&amp;rsquo;s something like UCS-2&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;HFS+ stores filenames as a string, and filenames with accents get stored as &lt;a href=&#34;https://blog.vrypan.net/2012/11/13/hfsplus-unicode-and-accented-chars/&#34;&gt;character+accent&lt;/a&gt; whereas Linux&amp;rsquo;s ext4 stores them as an array of bytes&lt;/li&gt;
&lt;li&gt;Running the SAFBuilder on Mac OS X works if you&amp;rsquo;re going to import the resulting bundle on Mac OS X, but if your DSpace is running on Linux you need to run the SAFBuilder there where the filesystem&amp;rsquo;s encoding matches&lt;/li&gt;
&lt;/ul&gt;
</description> </description>
</item> </item>

View File

@ -291,6 +291,31 @@ CIAT_COLOMBIA_000169_Técnicas_para_el_aislamiento_y_cultivo_de_protoplastos_de_
&lt;li&gt;Need to rename files to have no accents or umlauts, etc&amp;hellip;&lt;/li&gt; &lt;li&gt;Need to rename files to have no accents or umlauts, etc&amp;hellip;&lt;/li&gt;
&lt;li&gt;Useful custom text facet for URLs ending with &amp;ldquo;.pdf&amp;rdquo;: &lt;code&gt;value.endsWith(&amp;quot;.pdf&amp;quot;)&lt;/code&gt;&lt;/li&gt; &lt;li&gt;Useful custom text facet for URLs ending with &amp;ldquo;.pdf&amp;rdquo;: &lt;code&gt;value.endsWith(&amp;quot;.pdf&amp;quot;)&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt; &lt;/ul&gt;
&lt;h2 id=&#34;2016-02-22:124a59adbaa8ef13e1518d003fc03981&#34;&gt;2016-02-22&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;To change Spanish accents to ASCII in OpenRefine:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;value.replace(&#39;ó&#39;,&#39;o&#39;).replace(&#39;í&#39;,&#39;i&#39;).replace(&#39;á&#39;,&#39;a&#39;).replace(&#39;é&#39;,&#39;e&#39;).replace(&#39;ñ&#39;,&#39;n&#39;)
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;But actually, the accents might not be an issue, as I can successfully import files containing Spanish accents on my Mac&lt;/li&gt;
&lt;li&gt;On closer inspection, I can import files with the following names on Linux (DSpace Test):&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;Bitstream: tést.pdf
Bitstream: tést señora.pdf
Bitstream: tést señora alimentación.pdf
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Seems it could be something with the HFS+ filesystem actually, as it&amp;rsquo;s not UTF-8 (&lt;a href=&#34;http://www.cio.com/article/2868393/linus-torvalds-apples-hfs-is-probably-the-worst-file-system-ever.html&#34;&gt;it&amp;rsquo;s something like UCS-2&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;HFS+ stores filenames as a string, and filenames with accents get stored as &lt;a href=&#34;https://blog.vrypan.net/2012/11/13/hfsplus-unicode-and-accented-chars/&#34;&gt;character+accent&lt;/a&gt; whereas Linux&amp;rsquo;s ext4 stores them as an array of bytes&lt;/li&gt;
&lt;li&gt;Running the SAFBuilder on Mac OS X works if you&amp;rsquo;re going to import the resulting bundle on Mac OS X, but if your DSpace is running on Linux you need to run the SAFBuilder there where the filesystem&amp;rsquo;s encoding matches&lt;/li&gt;
&lt;/ul&gt;
</description> </description>
</item> </item>