mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-26 21:39:10 +01:00
Add notes for 2016-09-06
This commit is contained in:
parent
ca94154759
commit
31f36b37b8
@ -113,3 +113,50 @@ dspacetest=# select distinct text_value, authority, confidence from metadatavalu
|
||||
|
||||
- After updating the Authority indexes (`bin/dspace index-authority`) everything looks good
|
||||
- Run authority updates on CGSpace
|
||||
|
||||
## 2016-09-05
|
||||
|
||||
- After one week of logging TLS connections on CGSpace:
|
||||
|
||||
```
|
||||
# zgrep "DES-CBC3" /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | wc -l
|
||||
217
|
||||
# zcat -f -- /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | wc -l
|
||||
1164376
|
||||
# zgrep "DES-CBC3" /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | awk '{print $6}' | sort | uniq
|
||||
TLSv1/DES-CBC3-SHA
|
||||
TLSv1/EDH-RSA-DES-CBC3-SHA
|
||||
```
|
||||
- So this represents `0.02%` of 1.16M connections over a one-week period
|
||||
- Transforming some filenames in OpenRefine so they can have a useful description for SAFBuilder:
|
||||
|
||||
```
|
||||
value + "__description:" + cells["dc.type"].value
|
||||
```
|
||||
|
||||
- This gives you, for example: `Mainstreaming gender in agricultural R&D.pdf__description:Brief`
|
||||
|
||||
## 2016-09-06
|
||||
|
||||
- Trying to import the records for CIAT from yesterday, but having filename encoding issues from their zip file
|
||||
- Create a zip on Mac OS X from a SAF bundle containing only one record with one PDF:
|
||||
- Filename: Complementing Farmers Genetic Knowledge Farmer Breeding Workshop in Turipaná, Colombia.pdf
|
||||
- Imports fine on DSpace running on Mac OS X
|
||||
- Fails to import on DSpace running on Linux with error `No such file or directory`
|
||||
- Change diacritic in file name from á to a and re-create SAF bundle and zip
|
||||
- Success on both Mac OS X and Linux...
|
||||
- Looks like on the Mac OS X file system the file names represent á as: a (U+0061) + ́ (U+0301)
|
||||
- See: http://www.fileformat.info/info/unicode/char/e1/index.htm
|
||||
- See: http://demo.icu-project.org/icu-bin/nbrowser?t=%C3%A1&s=&uv=0
|
||||
- If I unzip the original zip from CIAT on Windows, re-zip it with 7zip on Windows, and then unzip it on Linux directly, the file names seem to be proper UTF-8
|
||||
- We should definitely clean filenames so they don't use characters that are tricky to process in CSV and shell scripts, like: `,`, `'`, and `"`
|
||||
|
||||
```
|
||||
value.replace("'","").replace(",","").replace('"','')
|
||||
```
|
||||
|
||||
- I need to write a Python script to match that for renaming files in the file system
|
||||
- When importing SAF bundles it seems you can specify the target collection on the command line using `-c 10568/4003` or in the `collections` file inside each item in the bundle
|
||||
- Seems that the latter method causes a null pointer exception, so I will just have to use the former method
|
||||
- In the end I was able to import the files after unzipping them ONLY on Linux
|
||||
- The CSV file was giving file names in UTF-8, and unzipping the zip on Mac OS X and transferring it was converting the file names to Unicode equivalence like I saw above
|
||||
|
@ -199,6 +199,70 @@ dspacetest=# select distinct text_value, authority, confidence from metadatavalu
|
||||
<ul>
|
||||
<li>After updating the Authority indexes (<code>bin/dspace index-authority</code>) everything looks good</li>
|
||||
<li>Run authority updates on CGSpace</li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-09-05">2016-09-05</h2>
|
||||
|
||||
<ul>
|
||||
<li>After one week of logging TLS connections on CGSpace:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code># zgrep "DES-CBC3" /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | wc -l
|
||||
217
|
||||
# zcat -f -- /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | wc -l
|
||||
1164376
|
||||
# zgrep "DES-CBC3" /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | awk '{print $6}' | sort | uniq
|
||||
TLSv1/DES-CBC3-SHA
|
||||
TLSv1/EDH-RSA-DES-CBC3-SHA
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>So this represents <code>0.02%</code> of 1.16M connections over a one-week period</li>
|
||||
<li>Transforming some filenames in OpenRefine so they can have a useful description for SAFBuilder:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>value + "__description:" + cells["dc.type"].value
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>This gives you, for example: <code>Mainstreaming gender in agricultural R&D.pdf__description:Brief</code></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-09-06">2016-09-06</h2>
|
||||
|
||||
<ul>
|
||||
<li>Trying to import the records for CIAT from yesterday, but having filename encoding issues from their zip file</li>
|
||||
<li>Create a zip on Mac OS X from a SAF bundle containing only one record with one PDF:
|
||||
|
||||
<ul>
|
||||
<li>Filename: Complementing Farmers Genetic Knowledge Farmer Breeding Workshop in Turipaná, Colombia.pdf</li>
|
||||
<li>Imports fine on DSpace running on Mac OS X</li>
|
||||
<li>Fails to import on DSpace running on Linux with error <code>No such file or directory</code></li>
|
||||
</ul></li>
|
||||
<li>Change diacritic in file name from á to a and re-create SAF bundle and zip
|
||||
|
||||
<ul>
|
||||
<li>Success on both Mac OS X and Linux…</li>
|
||||
</ul></li>
|
||||
<li>Looks like on the Mac OS X file system the file names represent á as: a (U+0061) + ́ (U+0301)</li>
|
||||
<li>See: <a href="http://www.fileformat.info/info/unicode/char/e1/index.htm">http://www.fileformat.info/info/unicode/char/e1/index.htm</a></li>
|
||||
<li>See: <a href="http://demo.icu-project.org/icu-bin/nbrowser?t=%C3%A1&s=&uv=0">http://demo.icu-project.org/icu-bin/nbrowser?t=%C3%A1&s=&uv=0</a></li>
|
||||
<li>If I unzip the original zip from CIAT on Windows, re-zip it with 7zip on Windows, and then unzip it on Linux directly, the file names seem to be proper UTF-8</li>
|
||||
<li>We should definitely clean filenames so they don’t use characters that are tricky to process in CSV and shell scripts, like: <code>,</code>, <code>'</code>, and <code>"</code></li>
|
||||
</ul>
|
||||
|
||||
<pre><code>value.replace("'","").replace(",","").replace('"','')
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>I need to write a Python script to match that for renaming files in the file system</li>
|
||||
<li>When importing SAF bundles it seems you can specify the target collection on the command line using <code>-c 10568/4003</code> or in the <code>collections</code> file inside each item in the bundle</li>
|
||||
<li>Seems that the latter method causes a null pointer exception, so I will just have to use the former method</li>
|
||||
<li>In the end I was able to import the files after unzipping them ONLY on Linux
|
||||
|
||||
<ul>
|
||||
<li>The CSV file was giving file names in UTF-8, and unzipping the zip on Mac OS X and transferring it was converting the file names to Unicode equivalence like I saw above</li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
|
||||
</section>
|
||||
|
@ -138,6 +138,70 @@ dspacetest=# select distinct text_value, authority, confidence from metadatavalu
|
||||
<li>After updating the Authority indexes (<code>bin/dspace index-authority</code>) everything looks good</li>
|
||||
<li>Run authority updates on CGSpace</li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-09-05">2016-09-05</h2>
|
||||
|
||||
<ul>
|
||||
<li>After one week of logging TLS connections on CGSpace:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code># zgrep &quot;DES-CBC3&quot; /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | wc -l
|
||||
217
|
||||
# zcat -f -- /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | wc -l
|
||||
1164376
|
||||
# zgrep &quot;DES-CBC3&quot; /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | awk '{print $6}' | sort | uniq
|
||||
TLSv1/DES-CBC3-SHA
|
||||
TLSv1/EDH-RSA-DES-CBC3-SHA
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>So this represents <code>0.02%</code> of 1.16M connections over a one-week period</li>
|
||||
<li>Transforming some filenames in OpenRefine so they can have a useful description for SAFBuilder:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>value + &quot;__description:&quot; + cells[&quot;dc.type&quot;].value
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>This gives you, for example: <code>Mainstreaming gender in agricultural R&amp;D.pdf__description:Brief</code></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-09-06">2016-09-06</h2>
|
||||
|
||||
<ul>
|
||||
<li>Trying to import the records for CIAT from yesterday, but having filename encoding issues from their zip file</li>
|
||||
<li>Create a zip on Mac OS X from a SAF bundle containing only one record with one PDF:
|
||||
|
||||
<ul>
|
||||
<li>Filename: Complementing Farmers Genetic Knowledge Farmer Breeding Workshop in Turipaná, Colombia.pdf</li>
|
||||
<li>Imports fine on DSpace running on Mac OS X</li>
|
||||
<li>Fails to import on DSpace running on Linux with error <code>No such file or directory</code></li>
|
||||
</ul></li>
|
||||
<li>Change diacritic in file name from á to a and re-create SAF bundle and zip
|
||||
|
||||
<ul>
|
||||
<li>Success on both Mac OS X and Linux&hellip;</li>
|
||||
</ul></li>
|
||||
<li>Looks like on the Mac OS X file system the file names represent á as: a (U+0061) + ́ (U+0301)</li>
|
||||
<li>See: <a href="http://www.fileformat.info/info/unicode/char/e1/index.htm">http://www.fileformat.info/info/unicode/char/e1/index.htm</a></li>
|
||||
<li>See: <a href="http://demo.icu-project.org/icu-bin/nbrowser?t=%C3%A1&amp;s=&amp;uv=0">http://demo.icu-project.org/icu-bin/nbrowser?t=%C3%A1&amp;s=&amp;uv=0</a></li>
|
||||
<li>If I unzip the original zip from CIAT on Windows, re-zip it with 7zip on Windows, and then unzip it on Linux directly, the file names seem to be proper UTF-8</li>
|
||||
<li>We should definitely clean filenames so they don&rsquo;t use characters that are tricky to process in CSV and shell scripts, like: <code>,</code>, <code>'</code>, and <code>&quot;</code></li>
|
||||
</ul>
|
||||
|
||||
<pre><code>value.replace(&quot;'&quot;,&quot;&quot;).replace(&quot;,&quot;,&quot;&quot;).replace('&quot;','')
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>I need to write a Python script to match that for renaming files in the file system</li>
|
||||
<li>When importing SAF bundles it seems you can specify the target collection on the command line using <code>-c 10568/4003</code> or in the <code>collections</code> file inside each item in the bundle</li>
|
||||
<li>Seems that the latter method causes a null pointer exception, so I will just have to use the former method</li>
|
||||
<li>In the end I was able to import the files after unzipping them ONLY on Linux
|
||||
|
||||
<ul>
|
||||
<li>The CSV file was giving file names in UTF-8, and unzipping the zip on Mac OS X and transferring it was converting the file names to Unicode equivalence like I saw above</li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
</description>
|
||||
</item>
|
||||
|
||||
|
@ -138,6 +138,70 @@ dspacetest=# select distinct text_value, authority, confidence from metadatavalu
|
||||
<li>After updating the Authority indexes (<code>bin/dspace index-authority</code>) everything looks good</li>
|
||||
<li>Run authority updates on CGSpace</li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-09-05">2016-09-05</h2>
|
||||
|
||||
<ul>
|
||||
<li>After one week of logging TLS connections on CGSpace:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code># zgrep &quot;DES-CBC3&quot; /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | wc -l
|
||||
217
|
||||
# zcat -f -- /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | wc -l
|
||||
1164376
|
||||
# zgrep &quot;DES-CBC3&quot; /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | awk '{print $6}' | sort | uniq
|
||||
TLSv1/DES-CBC3-SHA
|
||||
TLSv1/EDH-RSA-DES-CBC3-SHA
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>So this represents <code>0.02%</code> of 1.16M connections over a one-week period</li>
|
||||
<li>Transforming some filenames in OpenRefine so they can have a useful description for SAFBuilder:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>value + &quot;__description:&quot; + cells[&quot;dc.type&quot;].value
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>This gives you, for example: <code>Mainstreaming gender in agricultural R&amp;D.pdf__description:Brief</code></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-09-06">2016-09-06</h2>
|
||||
|
||||
<ul>
|
||||
<li>Trying to import the records for CIAT from yesterday, but having filename encoding issues from their zip file</li>
|
||||
<li>Create a zip on Mac OS X from a SAF bundle containing only one record with one PDF:
|
||||
|
||||
<ul>
|
||||
<li>Filename: Complementing Farmers Genetic Knowledge Farmer Breeding Workshop in Turipaná, Colombia.pdf</li>
|
||||
<li>Imports fine on DSpace running on Mac OS X</li>
|
||||
<li>Fails to import on DSpace running on Linux with error <code>No such file or directory</code></li>
|
||||
</ul></li>
|
||||
<li>Change diacritic in file name from á to a and re-create SAF bundle and zip
|
||||
|
||||
<ul>
|
||||
<li>Success on both Mac OS X and Linux&hellip;</li>
|
||||
</ul></li>
|
||||
<li>Looks like on the Mac OS X file system the file names represent á as: a (U+0061) + ́ (U+0301)</li>
|
||||
<li>See: <a href="http://www.fileformat.info/info/unicode/char/e1/index.htm">http://www.fileformat.info/info/unicode/char/e1/index.htm</a></li>
|
||||
<li>See: <a href="http://demo.icu-project.org/icu-bin/nbrowser?t=%C3%A1&amp;s=&amp;uv=0">http://demo.icu-project.org/icu-bin/nbrowser?t=%C3%A1&amp;s=&amp;uv=0</a></li>
|
||||
<li>If I unzip the original zip from CIAT on Windows, re-zip it with 7zip on Windows, and then unzip it on Linux directly, the file names seem to be proper UTF-8</li>
|
||||
<li>We should definitely clean filenames so they don&rsquo;t use characters that are tricky to process in CSV and shell scripts, like: <code>,</code>, <code>'</code>, and <code>&quot;</code></li>
|
||||
</ul>
|
||||
|
||||
<pre><code>value.replace(&quot;'&quot;,&quot;&quot;).replace(&quot;,&quot;,&quot;&quot;).replace('&quot;','')
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>I need to write a Python script to match that for renaming files in the file system</li>
|
||||
<li>When importing SAF bundles it seems you can specify the target collection on the command line using <code>-c 10568/4003</code> or in the <code>collections</code> file inside each item in the bundle</li>
|
||||
<li>Seems that the latter method causes a null pointer exception, so I will just have to use the former method</li>
|
||||
<li>In the end I was able to import the files after unzipping them ONLY on Linux
|
||||
|
||||
<ul>
|
||||
<li>The CSV file was giving file names in UTF-8, and unzipping the zip on Mac OS X and transferring it was converting the file names to Unicode equivalence like I saw above</li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
</description>
|
||||
</item>
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user