mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2016-09-06
This commit is contained in:
@ -113,3 +113,50 @@ dspacetest=# select distinct text_value, authority, confidence from metadatavalu
|
||||
|
||||
- After updating the Authority indexes (`bin/dspace index-authority`) everything looks good
|
||||
- Run authority updates on CGSpace
|
||||
|
||||
## 2016-09-05
|
||||
|
||||
- After one week of logging TLS connections on CGSpace:
|
||||
|
||||
```
|
||||
# zgrep "DES-CBC3" /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | wc -l
|
||||
217
|
||||
# zcat -f -- /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | wc -l
|
||||
1164376
|
||||
# zgrep "DES-CBC3" /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | awk '{print $6}' | sort | uniq
|
||||
TLSv1/DES-CBC3-SHA
|
||||
TLSv1/EDH-RSA-DES-CBC3-SHA
|
||||
```
|
||||
- So this represents `0.02%` of 1.16M connections over a one-week period
|
||||
- Transforming some filenames in OpenRefine so they can have a useful description for SAFBuilder:
|
||||
|
||||
```
|
||||
value + "__description:" + cells["dc.type"].value
|
||||
```
|
||||
|
||||
- This gives you, for example: `Mainstreaming gender in agricultural R&D.pdf__description:Brief`
|
||||
|
||||
## 2016-09-06
|
||||
|
||||
- Trying to import the records for CIAT from yesterday, but having filename encoding issues from their zip file
|
||||
- Create a zip on Mac OS X from a SAF bundle containing only one record with one PDF:
|
||||
- Filename: Complementing Farmers Genetic Knowledge Farmer Breeding Workshop in Turipaná, Colombia.pdf
|
||||
- Imports fine on DSpace running on Mac OS X
|
||||
- Fails to import on DSpace running on Linux with error `No such file or directory`
|
||||
- Change diacritic in file name from á to a and re-create SAF bundle and zip
|
||||
- Success on both Mac OS X and Linux...
|
||||
- Looks like on the Mac OS X file system the file names represent á as: a (U+0061) + ́ (U+0301)
|
||||
- See: http://www.fileformat.info/info/unicode/char/e1/index.htm
|
||||
- See: http://demo.icu-project.org/icu-bin/nbrowser?t=%C3%A1&s=&uv=0
|
||||
- If I unzip the original zip from CIAT on Windows, re-zip it with 7zip on Windows, and then unzip it on Linux directly, the file names seem to be proper UTF-8
|
||||
- We should definitely clean filenames so they don't use characters that are tricky to process in CSV and shell scripts, like: `,`, `'`, and `"`
|
||||
|
||||
```
|
||||
value.replace("'","").replace(",","").replace('"','')
|
||||
```
|
||||
|
||||
- I need to write a Python script to match that for renaming files in the file system
|
||||
- When importing SAF bundles it seems you can specify the target collection on the command line using `-c 10568/4003` or in the `collections` file inside each item in the bundle
|
||||
- Seems that the latter method causes a null pointer exception, so I will just have to use the former method
|
||||
- In the end I was able to import the files after unzipping them ONLY on Linux
|
||||
- The CSV file was giving file names in UTF-8, and unzipping the zip on Mac OS X and transferring it was converting the file names to Unicode equivalence like I saw above
|
||||
|
Reference in New Issue
Block a user