cgspace-notes/content/posts/2019-08.md

3.9 KiB
Raw Blame History

title date author tags
August, 2019 2019-08-03T12:39:51+03:00 Alan Orth
Notes

2019-08-03

  • Look at Bioversity's latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name...

2019-08-04

  • Deploy ORCID identifier updates requested by Bioversity to CGSpace
  • Run system updates on CGSpace (linode18) and reboot it
    • Before updating it I checked Solr and verified that all statistics cores were loaded properly...
    • After rebooting, all statistics cores were loaded... wow, that's lucky.
  • Run system updates on DSpace Test (linode19) and reboot it

2019-08-05

or(
  isNotNull(value.match(/^.*.*$/)),
  isNotNull(value.match(/^.*é.*$/)),
  isNotNull(value.match(/^.*á.*$/)),
  isNotNull(value.match(/^.*è.*$/)),
  isNotNull(value.match(/^.*í.*$/)),
  isNotNull(value.match(/^.*ó.*$/)),
  isNotNull(value.match(/^.*ú.*$/)),
  isNotNull(value.match(/^.*à.*$/)),
  isNotNull(value.match(/^.*û.*$/))
).toString()
  • I tried to extract the filenames and construct a URL to download the PDFs with my generate-thumbnails.py script, but there seem to be several paths for PDFs so I can't guess it properly
  • I will have to wait for Francesco to respond about the PDFs, or perhaps proceed with a metadata-only upload so we can do other checks on DSpace Test

2019-08-06

  • Francesca responded to address my feedback yesterday
    • I made some changes to the CSV based on her feedback (remove two duplicates, change one PDF file name, change two titles)
    • Then I found some items that have PDFs in multiple languages that only list one language in dc.language.iso so I changed them
    • Strangley, one item was referring to a 7zip file...
    • After removing the two duplicates there are now 1427 records
    • Fix one invalid ISSN: 1020-2002→1020-3362