Look at Bioversity’s latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name…
2019-08-04
Deploy ORCID identifier updates requested by Bioversity to CGSpace
Run system updates on CGSpace (linode18) and reboot it
Before updating it I checked Solr and verified that all statistics cores were loaded properly…
After rebooting, all statistics cores were loaded… wow, that’s lucky.
Run system updates on DSpace Test (linode19) and reboot it
Look at Bioversity’s latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name…
2019-08-04
Deploy ORCID identifier updates requested by Bioversity to CGSpace
Run system updates on CGSpace (linode18) and reboot it
Before updating it I checked Solr and verified that all statistics cores were loaded properly…
After rebooting, all statistics cores were loaded… wow, that’s lucky.
Run system updates on DSpace Test (linode19) and reboot it
<li>Look at Bioversity’s latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name…</li>
</ul>
<h2id="2019-08-04">2019-08-04</h2>
<ul>
<li>Deploy ORCID identifier updates requested by Bioversity to CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it
<ul>
<li>Before updating it I checked Solr and verified that all statistics cores were loaded properly…</li>
<li>After rebooting, all statistics cores were loaded… wow, that’s lucky.</li>
</ul></li>
<li>Run system updates on DSpace Test (linode19) and reboot it</li>
<li>There are about thirty PDFs that have French or Spanish filenames and there seems to be an encoding issue</li>
<li>I asked Francesco if he can give me a PDF URL column instead of a “filename” column so I can download the files myself</li>
<li><p>At <em>least</em> the ~50 filenames identified by the following GREL will have issues:</p>
<pre><code>or(
isNotNull(value.match(/^.*’.*$/)),
isNotNull(value.match(/^.*é.*$/)),
isNotNull(value.match(/^.*á.*$/)),
isNotNull(value.match(/^.*è.*$/)),
isNotNull(value.match(/^.*í.*$/)),
isNotNull(value.match(/^.*ó.*$/)),
isNotNull(value.match(/^.*ú.*$/)),
isNotNull(value.match(/^.*à.*$/)),
isNotNull(value.match(/^.*û.*$/))
).toString()
</code></pre></li>
</ul></li>
<li><p>I tried to extract the filenames and construct a URL to download the PDFs with my <code>generate-thumbnails.py</code> script, but there seem to be several paths for PDFs so I can’t guess it properly</p></li>
<li><p>I will have to wait for Francesco to respond about the PDFs, or perhaps proceed with a metadata-only upload so we can do other checks on DSpace Test</p></li>
<li>Daniel Haile-Michael asked about using a logical OR with the DSpace OpenSearch, but I looked in the DSpace manual and it does not seem to be possible</li>
</ul>
<h2id="2019-08-08">2019-08-08</h2>
<ul>
<li><p>Moayad noticed that the HTTPS certificate expired on the AReS dev server (linode20)</p>
<ul>
<li>The first problem was that there is a Docker container listening on port 80, so it conflicts with the ACME http-01 validation</li>
<li>The second problem was that we only allow access to port 80 from localhost</li>
<li><p>I adjusted the <code>renew-letsencrypt</code> systemd service so it stops/starts the Docker container and firewall:</p>
<li><p>It is important that the firewall starts back up before the Docker container or else Docker will complain about missing iptables chains</p></li>
<li><p>Also, I updated to the latest TLS Intermediate settings as appropriate for Ubuntu 18.04’s <ahref="https://ssl-config.mozilla.org/#server=nginx&server-version=1.16.0&config=intermediate&openssl-version=1.1.0g&hsts=false&ocsp=false">OpenSSL 1.1.0g with nginx 1.16.0</a></p></li>
<li><p>Run all system updates on AReS dev server (linode20) and reboot it</p></li>
<li><p>Get a list of all PDFs from the Bioversity migration that fail to download and save them so I can try again with a different path in the URL:</p>
<li><p>Even so, there are still 52 items with incorrect filenames, so I can’t derive their PDF URLs…</p>
<ul>
<li>For example, <code>Wild_cherry_Prunus_avium_859.pdf</code> is here (with double underscore): <ahref="https://www.bioversityinternational.org/fileadmin/_migrated/uploads/tx_news/Wild_cherry__Prunus_avium__859.pdf">https://www.bioversityinternational.org/fileadmin/_migrated/uploads/tx_news/Wild_cherry__Prunus_avium__859.pdf</a></li>
</ul></li>
<li><p>I will proceed with a metadata-only upload first and then let them know about the missing PDFs</p></li>