diff --git a/content/posts/2019-08.md b/content/posts/2019-08.md index 1723309e9..f5bbfe294 100644 --- a/content/posts/2019-08.md +++ b/content/posts/2019-08.md @@ -210,4 +210,19 @@ statistics-2015: org.apache.solr.common.SolrException:org.apache.solr.common.Sol - After reboot the statistics-2018 core failed to load so I restarted `tomcat7` again - After this last restart all Solr cores seem to be up and running +## 2019-08-20 + +- Francesco sent me a new CSV with the raw filenames and paths for the Bioversity migration + - All file paths are relative to the Typo3 upload path of `/fileadmin` on the Bioversity website + - I create a new column with the derived URL that I can use to download the PDFs with my `generate-thumbnails.py` script + - Unfortunately now the filename column has paths too, so I have to use a simple Python/Jython script in OpenRefine to get the basename of the files in the filename column: + +``` +import os + +return os.path.basename(value) +``` + +- Then I can try to download all the files again with the script + diff --git a/docs/2019-08/index.html b/docs/2019-08/index.html index 2385603cd..0654e9051 100644 --- a/docs/2019-08/index.html +++ b/docs/2019-08/index.html @@ -27,7 +27,7 @@ Run system updates on DSpace Test (linode19) and reboot it - + @@ -59,9 +59,9 @@ Run system updates on DSpace Test (linode19) and reboot it "@type": "BlogPosting", "headline": "August, 2019", "url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-08\/", - "wordCount": "1536", + "wordCount": "1637", "datePublished": "2019-08-03T12:39:51\x2b03:00", - "dateModified": "2019-08-16T17:10:11\x2b03:00", + "dateModified": "2019-08-18T23:07:48\x2b03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -400,6 +400,26 @@ $ dspace metadata-import -f /tmp/bioversity2.csv -e blah@blah.com
After this last restart all Solr cores seem to be up and running
Francesco sent me a new CSV with the raw filenames and paths for the Bioversity migration
+ +/fileadmin
on the Bioversity websitegenerate-thumbnails.py
scriptUnfortunately now the filename column has paths too, so I have to use a simple Python/Jython script in OpenRefine to get the basename of the files in the filename column:
+ +import os
+
+return os.path.basename(value)
+
Then I can try to download all the files again with the script