Add notes for 2016-09-06

This commit is contained in:
2016-09-06 15:17:40 +03:00
parent ca94154759
commit 31f36b37b8
4 changed files with 239 additions and 0 deletions

View File

@ -113,3 +113,50 @@ dspacetest=# select distinct text_value, authority, confidence from metadatavalu
- After updating the Authority indexes (`bin/dspace index-authority`) everything looks good
- Run authority updates on CGSpace
## 2016-09-05
- After one week of logging TLS connections on CGSpace:
```
# zgrep "DES-CBC3" /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | wc -l
217
# zcat -f -- /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | wc -l
1164376
# zgrep "DES-CBC3" /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | awk '{print $6}' | sort | uniq
TLSv1/DES-CBC3-SHA
TLSv1/EDH-RSA-DES-CBC3-SHA
```
- So this represents `0.02%` of 1.16M connections over a one-week period
- Transforming some filenames in OpenRefine so they can have a useful description for SAFBuilder:
```
value + "__description:" + cells["dc.type"].value
```
- This gives you, for example: `Mainstreaming gender in agricultural R&D.pdf__description:Brief`
## 2016-09-06
- Trying to import the records for CIAT from yesterday, but having filename encoding issues from their zip file
- Create a zip on Mac OS X from a SAF bundle containing only one record with one PDF:
- Filename: Complementing Farmers Genetic Knowledge Farmer Breeding Workshop in Turipaná, Colombia.pdf
- Imports fine on DSpace running on Mac OS X
- Fails to import on DSpace running on Linux with error `No such file or directory`
- Change diacritic in file name from á to a and re-create SAF bundle and zip
- Success on both Mac OS X and Linux...
- Looks like on the Mac OS X file system the file names represent á as: a (U+0061) + ́ (U+0301)
- See: http://www.fileformat.info/info/unicode/char/e1/index.htm
- See: http://demo.icu-project.org/icu-bin/nbrowser?t=%C3%A1&s=&uv=0
- If I unzip the original zip from CIAT on Windows, re-zip it with 7zip on Windows, and then unzip it on Linux directly, the file names seem to be proper UTF-8
- We should definitely clean filenames so they don't use characters that are tricky to process in CSV and shell scripts, like: `,`, `'`, and `"`
```
value.replace("'","").replace(",","").replace('"','')
```
- I need to write a Python script to match that for renaming files in the file system
- When importing SAF bundles it seems you can specify the target collection on the command line using `-c 10568/4003` or in the `collections` file inside each item in the bundle
- Seems that the latter method causes a null pointer exception, so I will just have to use the former method
- In the end I was able to import the files after unzipping them ONLY on Linux
- The CSV file was giving file names in UTF-8, and unzipping the zip on Mac OS X and transferring it was converting the file names to Unicode equivalence like I saw above