diff --git a/content/posts/2018-09.md b/content/posts/2018-09.md index aa7d1ec86..beb0221b0 100644 --- a/content/posts/2018-09.md +++ b/content/posts/2018-09.md @@ -56,11 +56,11 @@ Caused by: java.lang.RuntimeException: Failed to startup the DSpace Service Mana - This makes it super annoying to do the checks and cleanup, so I will merge them (also time consuming) - Five items had `dc.date.issued` values like `2013-5` so I corrected them to be `2013-05` - Several metadata fields had values with newlines in them (even in some titles!), which I fixed by trimming the consecutive whitespaces in Open Refine - - Many (196!) items from before 2011 are indicated as having a CRP, but CRPs didn't exist then so this is impossible + - Many (91!) items from before 2011 are indicated as having a CRP, but CRPs didn't exist then so this is impossible - I got all items that were from 2011 and onwards using a custom facet with this GREL on the `dc.date.issued` column: `isNotNull(value.match(/201[1-8].*/))` and then blanking their CRPs - Some affiliations with only one separator (|) for multiple values - I replaced smart quotes like `’` with plain ones - - Some inconsitencies in `cg.subject.iita` like COWPEA and COWPEAS, and YAM and YAMS, etc, as well as some spelling mistakes like IMPACT ASSESSMENTN + - Some inconsistencies in `cg.subject.iita` like COWPEA and COWPEAS, and YAM and YAMS, etc, as well as some spelling mistakes like IMPACT ASSESSMENTN - Some values in the `dc.identifier.isbn` are actually ISSNs so I moved them to the `dc.identifier.issn` column - I found one invalid ISSN using a custom text facet with the regex from the [ISSN page on Wikipedia](https://en.wikipedia.org/wiki/International_Standard_Serial_Number#Code_format): `isNotBlank(value.match(/^\d{4}-\d{3}[\dxX]$/))` - One invalid value for `dc.type` diff --git a/docs/2018-09/index.html b/docs/2018-09/index.html index d52e532df..2eddb951f 100644 --- a/docs/2018-09/index.html +++ b/docs/2018-09/index.html @@ -18,7 +18,7 @@ I’m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I " /> - + This makes it super annoying to do the checks and cleanup, so I will merge them (also time consuming)
dc.date.issued
values like 2013-5
so I corrected them to be 2013-05
dc.date.issued
column: isNotNull(value.match(/201[1-8].*/))
and then blanking their CRPs’
with plain onescg.subject.iita
like COWPEA and COWPEAS, and YAM and YAMS, etc, as well as some spelling mistakes like IMPACT ASSESSMENTNcg.subject.iita
like COWPEA and COWPEAS, and YAM and YAMS, etc, as well as some spelling mistakes like IMPACT ASSESSMENTNdc.identifier.isbn
are actually ISSNs so I moved them to the dc.identifier.issn
columnisNotBlank(value.match(/^\d{4}-\d{3}[\dxX]$/))
dc.type