Add notes for 2019-08-08

2025-01-27 05:49:12 +01:00 · 2019-08-08 18:10:44 +03:00
parent 34e488a327
commit 0beed6b6df
76 changed files with 307 additions and 217 deletions
--- a/content/posts/2019-08.md
+++ b/content/posts/2019-08.md
@ -71,4 +71,42 @@ or(
  - After removing the two duplicates there are now 1427 records
  - Fix one invalid ISSN: 1020-2002→1020-3362

+## 2019-08-07
+
+- Daniel Haile-Michael asked about using a logical OR with the DSpace OpenSearch, but I looked in the DSpace manual and it does not seem to be possible
+
+## 2019-08-08
+
+- Moayad noticed that the HTTPS certificate expired on the AReS dev server (linode20)
+  - The first problem was that there is a Docker container listening on port 80, so it conflicts with the ACME http-01 validation
+  - The second problem was that we only allow access to port 80 from localhost
+  - I adjusted the `renew-letsencrypt` systemd service so it stops/starts the Docker container and firewall:
+
+```
+# /opt/certbot-auto renew --standalone --pre-hook "/usr/bin/docker stop angular_nginx; /bin/systemctl stop firewalld" --post-hook "/bin/systemctl start firewalld; /usr/bin/docker start angular_nginx"
+```
+
+- It is important that the firewall starts back up before the Docker container or else Docker will complain about missing iptables chains
+- Also, I updated to the latest TLS Intermediate settings as appropriate for Ubuntu 18.04's [OpenSSL 1.1.0g with nginx 1.16.0](https://ssl-config.mozilla.org/#server=nginx&server-version=1.16.0&config=intermediate&openssl-version=1.1.0g&hsts=false&ocsp=false)
+- Run all system updates on AReS dev server (linode20) and reboot it
+- Get a list of all PDFs from the Bioversity migration that fail to download and save them so I can try again with a different path in the URL:
+
+```
+$ ./generate-thumbnails.py -i /tmp/2019-08-05-Bioversity-Migration.csv -w --url-field-name url -d | tee /tmp/2019-08-08-download-pdfs.txt
+$ grep -B1 "Download failed" /tmp/2019-08-08-download-pdfs.txt | grep "Downloading" | sed -e 's/> Downloading //' -e 's/\.\.\.//' | sed -r 's/\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[mGK]//g' | csvcut -H -c 1,1 > /tmp/user-upload.csv
+$ ./generate-thumbnails.py -i /tmp/user-upload.csv -w --url-field-name url -d | tee /tmp/2019-08-08-download-pdfs2.txt
+$ grep -B1 "Download failed" /tmp/2019-08-08-download-pdfs2.txt | grep "Downloading" | sed -e 's/> Downloading //' -e 's/\.\.\.//' | sed -r 's/\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[mGK]//g' | csvcut -H -c 1,1 > /tmp/user-upload2.csv
+$ ./generate-thumbnails.py -i /tmp/user-upload2.csv -w --url-field-name url -d | tee /tmp/2019-08-08-download-pdfs3.txt
+```
+
+- (the weird sed regex removes color codes, because my generate-thumbnails script prints pretty colors)
+- Some PDFs are uploaded in different paths so I have to try a few times to get them all:
+  - `/fileadmin/_migrated/uploads/tx_news/`
+  - `/fileadmin/user_upload/online_library/publications/pdfs/`
+  - `/fileadmin/user_upload/`
+
+- Even so, there are still 52 items with incorrect filenames, so I can't derive their PDF URLs...
+  - For example, `Wild_cherry_Prunus_avium_859.pdf` is here (with double underscore): https://www.bioversityinternational.org/fileadmin/_migrated/uploads/tx_news/Wild_cherry__Prunus_avium__859.pdf
+- I will proceed with a metadata-only upload first and then let them know about the missing PDFs
+
 <!-- vim: set sw=2 ts=2: -->