diff --git a/content/posts/2019-08.md b/content/posts/2019-08.md index 00d2f7b5f..d973ac1db 100644 --- a/content/posts/2019-08.md +++ b/content/posts/2019-08.md @@ -19,4 +19,46 @@ tags: ["Notes"] +## 2019-08-05 + +- Update Tomcat to 7.0.96 in the [Ansible infrastructure playbooks](https://github.com/ilri/rmg-ansible-public) +- Update PostgreSQL JDBC driver to 42.2.6 in the [Ansible infrastrucutre playbooks](https://github.com/ilri/rmg-ansible-public) +- Deploy both on DSpace Test (linode19) +- Looking at the 1429 records for Bioversity migration again + - The following items use the same exact PDF and seem to be duplicates: + - https://www.bioversityinternational.org/index.php?id=244&tx_news_pi1[news]=10191 + - https://www.bioversityinternational.org/index.php?id=244&tx_news_pi1[news]=342 + - The following items use the same exact PDF, but one seems to be incorrect: + - https://www.bioversityinternational.org/index.php?id=244&tx_news_pi1[news]=5347 + - https://www.bioversityinternational.org/index.php?id=244&tx_news_pi1[news]=5340 + - The following PDFs are used by several items incorrectly: + - `Report_of_a_Working_Group_on_Allium_7.pdf` + - `Report_of_a_Working_Group_on_Allium_Fourth_meeting_1696.pdf` + - The following items use the same PDF with a different name, but seem to be duplicates (pick one?): + - https://www.bioversityinternational.org/index.php?id=244&tx_news_pi1[news]=433 + - https://www.bioversityinternational.org/index.php?id=244&tx_news_pi1[news]=10189 + - The following items use the same PDF with a different name, but seem to be duplicates (pick one?): + - https://www.bioversityinternational.org/index.php?id=244&tx_news_pi1[news]=332 + - https://www.bioversityinternational.org/index.php?id=244&tx_news_pi1[news]=10187 + - There are about thirty PDFs that have French or Spanish filenames and there seems to be an encoding issue + - I asked Francesco if he can give me a PDF URL column instead of a "filename" column so I can download the files myself + - At *least* the ~50 filenames identified by the following GREL will have issues: + +``` +or( + isNotNull(value.match(/^.*’.*$/)), + isNotNull(value.match(/^.*é.*$/)), + isNotNull(value.match(/^.*á.*$/)), + isNotNull(value.match(/^.*è.*$/)), + isNotNull(value.match(/^.*í.*$/)), + isNotNull(value.match(/^.*ó.*$/)), + isNotNull(value.match(/^.*ú.*$/)), + isNotNull(value.match(/^.*à.*$/)), + isNotNull(value.match(/^.*û.*$/)) +).toString() +``` + +- I tried to extract the filenames and construct a URL to download the PDFs with my `generate-thumbnails.py` script, but there seem to be several paths for PDFs so I can't guess it properly +- I will have to wait for Francesco to respond about the PDFs, or perhaps proceed with a metadata-only upload so we can do other checks on DSpace Test + diff --git a/docs/2019-08/index.html b/docs/2019-08/index.html index 9b431216e..de6938e61 100644 --- a/docs/2019-08/index.html +++ b/docs/2019-08/index.html @@ -27,7 +27,7 @@ Run system updates on DSpace Test (linode19) and reboot it - + @@ -59,9 +59,9 @@ Run system updates on DSpace Test (linode19) and reboot it "@type": "BlogPosting", "headline": "August, 2019", "url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-08\/", - "wordCount": "95", + "wordCount": "341", "datePublished": "2019-08-03T12:39:51\x2b03:00", - "dateModified": "2019-08-03T12:39:51\x2b03:00", + "dateModified": "2019-08-04T22:49:04\x2b03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -147,6 +147,55 @@ Run system updates on DSpace Test (linode19) and reboot it
Looking at the 1429 records for Bioversity migration again
+ +Report_of_a_Working_Group_on_Allium_7.pdf
Report_of_a_Working_Group_on_Allium_Fourth_meeting_1696.pdf
At least the ~50 filenames identified by the following GREL will have issues:
+ +or(
+isNotNull(value.match(/^.*’.*$/)),
+isNotNull(value.match(/^.*é.*$/)),
+isNotNull(value.match(/^.*á.*$/)),
+isNotNull(value.match(/^.*è.*$/)),
+isNotNull(value.match(/^.*í.*$/)),
+isNotNull(value.match(/^.*ó.*$/)),
+isNotNull(value.match(/^.*ú.*$/)),
+isNotNull(value.match(/^.*à.*$/)),
+isNotNull(value.match(/^.*û.*$/))
+).toString()
+
I tried to extract the filenames and construct a URL to download the PDFs with my generate-thumbnails.py
script, but there seem to be several paths for PDFs so I can’t guess it properly
I will have to wait for Francesco to respond about the PDFs, or perhaps proceed with a metadata-only upload so we can do other checks on DSpace Test