mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Update notes for 2018-09-04
This commit is contained in:
@ -54,7 +54,15 @@ Caused by: java.lang.RuntimeException: Failed to startup the DSpace Service Mana
|
||||
- I'm looking over the latest round of IITA records from Sisay: [Mercy1806_August_29](https://dspacetest.cgiar.org/handle/10568/104230)
|
||||
- All fields are split with multiple columns like `cg.authorship.types` and `cg.authorship.types[]`
|
||||
- This makes it super annoying to do the checks and cleanup, so I will merge them (also time consuming)
|
||||
- Five issue dates had values like `2013-5` so I corrected them to be `2013-05`
|
||||
- Five items had `dc.date.issued` values like `2013-5` so I corrected them to be `2013-05`
|
||||
- Several metadata fields had values with newlines in them (even in some titles!), which I fixed by trimming the consecutive whitespaces in Open Refine
|
||||
- Many (196!) items from before 2011 are indicated as having a CRP, but CRPs didn't exist then so this is impossible
|
||||
- I got all items that were from 2011 and onwards using a custom facet with this GREL on the `dc.date.issued` column: `isNotNull(value.match(/201[1-8].*/))` and then blanking their CRPs
|
||||
- Some affiliations with only one separator (|) for multiple values
|
||||
- I replaced smart quotes like `’` with plain ones
|
||||
- Some inconsitencies in `cg.subject.iita` like COWPEA and COWPEAS, and YAM and YAMS, etc, as well as some spelling mistakes like IMPACT ASSESSMENTN
|
||||
- Some values in the `dc.identifier.isbn` are actually ISSNs so I moved them to the `dc.identifier.issn` column
|
||||
- I found one invalid ISSN using a custom text facet with the regex from the [ISSN page on Wikipedia](https://en.wikipedia.org/wiki/International_Standard_Serial_Number#Code_format): `isNotBlank(value.match(/^\d{4}-\d{3}[\dxX]$/))`
|
||||
- One invalid value for `dc.type`
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user