cgspace-notes/content/posts/2019-08.md

65 lines
3.3 KiB
Markdown
Raw Normal View History

2019-08-04 21:49:04 +02:00
---
title: "August, 2019"
date: 2019-08-03T12:39:51+03:00
author: "Alan Orth"
tags: ["Notes"]
---
## 2019-08-03
- Look at Bioversity's latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name...
## 2019-08-04
- Deploy ORCID identifier updates requested by Bioversity to CGSpace
- Run system updates on CGSpace (linode18) and reboot it
- Before updating it I checked Solr and verified that all statistics cores were loaded properly...
- After rebooting, all statistics cores were loaded... wow, that's lucky.
- Run system updates on DSpace Test (linode19) and reboot it
<!--more-->
2019-08-05 15:49:31 +02:00
## 2019-08-05
- Update Tomcat to 7.0.96 in the [Ansible infrastructure playbooks](https://github.com/ilri/rmg-ansible-public)
- Update PostgreSQL JDBC driver to 42.2.6 in the [Ansible infrastrucutre playbooks](https://github.com/ilri/rmg-ansible-public)
- Deploy both on DSpace Test (linode19)
- Looking at the 1429 records for Bioversity migration again
- The following items use the same exact PDF and seem to be duplicates:
- https://www.bioversityinternational.org/index.php?id=244&tx_news_pi1[news]=10191
- https://www.bioversityinternational.org/index.php?id=244&tx_news_pi1[news]=342
- The following items use the same exact PDF, but one seems to be incorrect:
- https://www.bioversityinternational.org/index.php?id=244&tx_news_pi1[news]=5347
- https://www.bioversityinternational.org/index.php?id=244&tx_news_pi1[news]=5340
- The following PDFs are used by several items incorrectly:
- `Report_of_a_Working_Group_on_Allium_7.pdf`
- `Report_of_a_Working_Group_on_Allium_Fourth_meeting_1696.pdf`
- The following items use the same PDF with a different name, but seem to be duplicates (pick one?):
- https://www.bioversityinternational.org/index.php?id=244&tx_news_pi1[news]=433
- https://www.bioversityinternational.org/index.php?id=244&tx_news_pi1[news]=10189
- The following items use the same PDF with a different name, but seem to be duplicates (pick one?):
- https://www.bioversityinternational.org/index.php?id=244&tx_news_pi1[news]=332
- https://www.bioversityinternational.org/index.php?id=244&tx_news_pi1[news]=10187
- There are about thirty PDFs that have French or Spanish filenames and there seems to be an encoding issue
- I asked Francesco if he can give me a PDF URL column instead of a "filename" column so I can download the files myself
- At *least* the ~50 filenames identified by the following GREL will have issues:
```
or(
isNotNull(value.match(/^.*.*$/)),
isNotNull(value.match(/^.*é.*$/)),
isNotNull(value.match(/^.*á.*$/)),
isNotNull(value.match(/^.*è.*$/)),
isNotNull(value.match(/^.*í.*$/)),
isNotNull(value.match(/^.*ó.*$/)),
isNotNull(value.match(/^.*ú.*$/)),
isNotNull(value.match(/^.*à.*$/)),
isNotNull(value.match(/^.*û.*$/))
).toString()
```
- I tried to extract the filenames and construct a URL to download the PDFs with my `generate-thumbnails.py` script, but there seem to be several paths for PDFs so I can't guess it properly
- I will have to wait for Francesco to respond about the PDFs, or perhaps proceed with a metadata-only upload so we can do other checks on DSpace Test
2019-08-04 21:49:04 +02:00
<!-- vim: set sw=2 ts=2: -->