mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-26 00:18:21 +01:00
Add notes for 2016-02-22
Signed-off-by: Alan Orth <alan.orth@gmail.com>
This commit is contained in:
parent
e364344126
commit
49329b6c7f
@ -242,3 +242,24 @@ java.io.FileNotFoundException: /usr/share/tomcat7/SimpleArchiveFormat/item_1021/
|
|||||||
|
|
||||||
- Need to rename files to have no accents or umlauts, etc...
|
- Need to rename files to have no accents or umlauts, etc...
|
||||||
- Useful custom text facet for URLs ending with ".pdf": `value.endsWith(".pdf")`
|
- Useful custom text facet for URLs ending with ".pdf": `value.endsWith(".pdf")`
|
||||||
|
|
||||||
|
## 2016-02-22
|
||||||
|
|
||||||
|
- To change Spanish accents to ASCII in OpenRefine:
|
||||||
|
|
||||||
|
```
|
||||||
|
value.replace('ó','o').replace('í','i').replace('á','a').replace('é','e').replace('ñ','n')
|
||||||
|
```
|
||||||
|
|
||||||
|
- But actually, the accents might not be an issue, as I can successfully import files containing Spanish accents on my Mac
|
||||||
|
- On closer inspection, I can import files with the following names on Linux (DSpace Test):
|
||||||
|
|
||||||
|
```
|
||||||
|
Bitstream: tést.pdf
|
||||||
|
Bitstream: tést señora.pdf
|
||||||
|
Bitstream: tést señora alimentación.pdf
|
||||||
|
```
|
||||||
|
|
||||||
|
- Seems it could be something with the HFS+ filesystem actually, as it's not UTF-8 ([it's something like UCS-2](http://www.cio.com/article/2868393/linus-torvalds-apples-hfs-is-probably-the-worst-file-system-ever.html))
|
||||||
|
- HFS+ stores filenames as a string, and filenames with accents get stored as [character+accent](https://blog.vrypan.net/2012/11/13/hfsplus-unicode-and-accented-chars/) whereas Linux's ext4 stores them as an array of bytes
|
||||||
|
- Running the SAFBuilder on Mac OS X works if you're going to import the resulting bundle on Mac OS X, but if your DSpace is running on Linux you need to run the SAFBuilder there where the filesystem's encoding matches
|
||||||
|
@ -352,6 +352,31 @@ CIAT_COLOMBIA_000169_Técnicas_para_el_aislamiento_y_cultivo_de_protoplastos_de_
|
|||||||
<ul>
|
<ul>
|
||||||
<li>Need to rename files to have no accents or umlauts, etc…</li>
|
<li>Need to rename files to have no accents or umlauts, etc…</li>
|
||||||
<li>Useful custom text facet for URLs ending with “.pdf”: <code>value.endsWith(".pdf")</code></li>
|
<li>Useful custom text facet for URLs ending with “.pdf”: <code>value.endsWith(".pdf")</code></li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<h2 id="2016-02-22:124a59adbaa8ef13e1518d003fc03981">2016-02-22</h2>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>To change Spanish accents to ASCII in OpenRefine:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>value.replace('ó','o').replace('í','i').replace('á','a').replace('é','e').replace('ñ','n')
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>But actually, the accents might not be an issue, as I can successfully import files containing Spanish accents on my Mac</li>
|
||||||
|
<li>On closer inspection, I can import files with the following names on Linux (DSpace Test):</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>Bitstream: tést.pdf
|
||||||
|
Bitstream: tést señora.pdf
|
||||||
|
Bitstream: tést señora alimentación.pdf
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>Seems it could be something with the HFS+ filesystem actually, as it’s not UTF-8 (<a href="http://www.cio.com/article/2868393/linus-torvalds-apples-hfs-is-probably-the-worst-file-system-ever.html">it’s something like UCS-2</a>)</li>
|
||||||
|
<li>HFS+ stores filenames as a string, and filenames with accents get stored as <a href="https://blog.vrypan.net/2012/11/13/hfsplus-unicode-and-accented-chars/">character+accent</a> whereas Linux’s ext4 stores them as an array of bytes</li>
|
||||||
|
<li>Running the SAFBuilder on Mac OS X works if you’re going to import the resulting bundle on Mac OS X, but if your DSpace is running on Linux you need to run the SAFBuilder there where the filesystem’s encoding matches</li>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
</section>
|
</section>
|
||||||
|
@ -291,6 +291,31 @@ CIAT_COLOMBIA_000169_Técnicas_para_el_aislamiento_y_cultivo_de_protoplastos_de_
|
|||||||
<li>Need to rename files to have no accents or umlauts, etc&hellip;</li>
|
<li>Need to rename files to have no accents or umlauts, etc&hellip;</li>
|
||||||
<li>Useful custom text facet for URLs ending with &ldquo;.pdf&rdquo;: <code>value.endsWith(&quot;.pdf&quot;)</code></li>
|
<li>Useful custom text facet for URLs ending with &ldquo;.pdf&rdquo;: <code>value.endsWith(&quot;.pdf&quot;)</code></li>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
|
<h2 id="2016-02-22:124a59adbaa8ef13e1518d003fc03981">2016-02-22</h2>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>To change Spanish accents to ASCII in OpenRefine:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>value.replace('ó','o').replace('í','i').replace('á','a').replace('é','e').replace('ñ','n')
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>But actually, the accents might not be an issue, as I can successfully import files containing Spanish accents on my Mac</li>
|
||||||
|
<li>On closer inspection, I can import files with the following names on Linux (DSpace Test):</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>Bitstream: tést.pdf
|
||||||
|
Bitstream: tést señora.pdf
|
||||||
|
Bitstream: tést señora alimentación.pdf
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>Seems it could be something with the HFS+ filesystem actually, as it&rsquo;s not UTF-8 (<a href="http://www.cio.com/article/2868393/linus-torvalds-apples-hfs-is-probably-the-worst-file-system-ever.html">it&rsquo;s something like UCS-2</a>)</li>
|
||||||
|
<li>HFS+ stores filenames as a string, and filenames with accents get stored as <a href="https://blog.vrypan.net/2012/11/13/hfsplus-unicode-and-accented-chars/">character+accent</a> whereas Linux&rsquo;s ext4 stores them as an array of bytes</li>
|
||||||
|
<li>Running the SAFBuilder on Mac OS X works if you&rsquo;re going to import the resulting bundle on Mac OS X, but if your DSpace is running on Linux you need to run the SAFBuilder there where the filesystem&rsquo;s encoding matches</li>
|
||||||
|
</ul>
|
||||||
</description>
|
</description>
|
||||||
</item>
|
</item>
|
||||||
|
|
||||||
|
@ -291,6 +291,31 @@ CIAT_COLOMBIA_000169_Técnicas_para_el_aislamiento_y_cultivo_de_protoplastos_de_
|
|||||||
<li>Need to rename files to have no accents or umlauts, etc&hellip;</li>
|
<li>Need to rename files to have no accents or umlauts, etc&hellip;</li>
|
||||||
<li>Useful custom text facet for URLs ending with &ldquo;.pdf&rdquo;: <code>value.endsWith(&quot;.pdf&quot;)</code></li>
|
<li>Useful custom text facet for URLs ending with &ldquo;.pdf&rdquo;: <code>value.endsWith(&quot;.pdf&quot;)</code></li>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
|
<h2 id="2016-02-22:124a59adbaa8ef13e1518d003fc03981">2016-02-22</h2>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>To change Spanish accents to ASCII in OpenRefine:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>value.replace('ó','o').replace('í','i').replace('á','a').replace('é','e').replace('ñ','n')
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>But actually, the accents might not be an issue, as I can successfully import files containing Spanish accents on my Mac</li>
|
||||||
|
<li>On closer inspection, I can import files with the following names on Linux (DSpace Test):</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>Bitstream: tést.pdf
|
||||||
|
Bitstream: tést señora.pdf
|
||||||
|
Bitstream: tést señora alimentación.pdf
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>Seems it could be something with the HFS+ filesystem actually, as it&rsquo;s not UTF-8 (<a href="http://www.cio.com/article/2868393/linus-torvalds-apples-hfs-is-probably-the-worst-file-system-ever.html">it&rsquo;s something like UCS-2</a>)</li>
|
||||||
|
<li>HFS+ stores filenames as a string, and filenames with accents get stored as <a href="https://blog.vrypan.net/2012/11/13/hfsplus-unicode-and-accented-chars/">character+accent</a> whereas Linux&rsquo;s ext4 stores them as an array of bytes</li>
|
||||||
|
<li>Running the SAFBuilder on Mac OS X works if you&rsquo;re going to import the resulting bundle on Mac OS X, but if your DSpace is running on Linux you need to run the SAFBuilder there where the filesystem&rsquo;s encoding matches</li>
|
||||||
|
</ul>
|
||||||
</description>
|
</description>
|
||||||
</item>
|
</item>
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user