Add notes for 2019-08-05

2025-01-27 05:49:12 +01:00 · 2019-08-05 16:49:31 +03:00
parent 36afd4c077
commit 833aa58bac
3 changed files with 99 additions and 8 deletions
--- a/content/posts/2019-08.md
+++ b/content/posts/2019-08.md
@ -19,4 +19,46 @@ tags: ["Notes"]

 <!--more-->

+## 2019-08-05
+
+- Update Tomcat to 7.0.96 in the [Ansible infrastructure playbooks](https://github.com/ilri/rmg-ansible-public)
+- Update PostgreSQL JDBC driver to 42.2.6 in the [Ansible infrastrucutre playbooks](https://github.com/ilri/rmg-ansible-public)
+- Deploy both on DSpace Test (linode19)
+- Looking at the 1429 records for Bioversity migration again
+  - The following items use the same exact PDF and seem to be duplicates:
+    - https://www.bioversityinternational.org/index.php?id=244&tx_news_pi1[news]=10191
+    - https://www.bioversityinternational.org/index.php?id=244&tx_news_pi1[news]=342
+  - The following items use the same exact PDF, but one seems to be incorrect:
+    - https://www.bioversityinternational.org/index.php?id=244&tx_news_pi1[news]=5347
+    - https://www.bioversityinternational.org/index.php?id=244&tx_news_pi1[news]=5340
+  - The following PDFs are used by several items incorrectly:
+    - `Report_of_a_Working_Group_on_Allium_7.pdf`
+    - `Report_of_a_Working_Group_on_Allium_Fourth_meeting_1696.pdf`
+  - The following items use the same PDF with a different name, but seem to be duplicates (pick one?):
+    - https://www.bioversityinternational.org/index.php?id=244&tx_news_pi1[news]=433
+    - https://www.bioversityinternational.org/index.php?id=244&tx_news_pi1[news]=10189
+  - The following items use the same PDF with a different name, but seem to be duplicates (pick one?):
+    - https://www.bioversityinternational.org/index.php?id=244&tx_news_pi1[news]=332
+    - https://www.bioversityinternational.org/index.php?id=244&tx_news_pi1[news]=10187
+  - There are about thirty PDFs that have French or Spanish filenames and there seems to be an encoding issue
+    - I asked Francesco if he can give me a PDF URL column instead of a "filename" column so I can download the files myself
+    - At *least* the ~50 filenames identified by the following GREL will have issues:
+
+```
+or(
+  isNotNull(value.match(/^.*’.*$/)),
+  isNotNull(value.match(/^.*é.*$/)),
+  isNotNull(value.match(/^.*á.*$/)),
+  isNotNull(value.match(/^.*è.*$/)),
+  isNotNull(value.match(/^.*í.*$/)),
+  isNotNull(value.match(/^.*ó.*$/)),
+  isNotNull(value.match(/^.*ú.*$/)),
+  isNotNull(value.match(/^.*à.*$/)),
+  isNotNull(value.match(/^.*û.*$/))
+).toString()
+```
+
+- I tried to extract the filenames and construct a URL to download the PDFs with my `generate-thumbnails.py` script, but there seem to be several paths for PDFs so I can't guess it properly
+- I will have to wait for Francesco to respond about the PDFs, or perhaps proceed with a metadata-only upload so we can do other checks on DSpace Test
+
 <!-- vim: set sw=2 ts=2: -->