diff --git a/content/posts/2018-09.md b/content/posts/2018-09.md index 9e81e2581..aa7d1ec86 100644 --- a/content/posts/2018-09.md +++ b/content/posts/2018-09.md @@ -54,7 +54,15 @@ Caused by: java.lang.RuntimeException: Failed to startup the DSpace Service Mana - I'm looking over the latest round of IITA records from Sisay: [Mercy1806_August_29](https://dspacetest.cgiar.org/handle/10568/104230) - All fields are split with multiple columns like `cg.authorship.types` and `cg.authorship.types[]` - This makes it super annoying to do the checks and cleanup, so I will merge them (also time consuming) - - Five issue dates had values like `2013-5` so I corrected them to be `2013-05` + - Five items had `dc.date.issued` values like `2013-5` so I corrected them to be `2013-05` - Several metadata fields had values with newlines in them (even in some titles!), which I fixed by trimming the consecutive whitespaces in Open Refine + - Many (196!) items from before 2011 are indicated as having a CRP, but CRPs didn't exist then so this is impossible + - I got all items that were from 2011 and onwards using a custom facet with this GREL on the `dc.date.issued` column: `isNotNull(value.match(/201[1-8].*/))` and then blanking their CRPs + - Some affiliations with only one separator (|) for multiple values + - I replaced smart quotes like `’` with plain ones + - Some inconsitencies in `cg.subject.iita` like COWPEA and COWPEAS, and YAM and YAMS, etc, as well as some spelling mistakes like IMPACT ASSESSMENTN + - Some values in the `dc.identifier.isbn` are actually ISSNs so I moved them to the `dc.identifier.issn` column + - I found one invalid ISSN using a custom text facet with the regex from the [ISSN page on Wikipedia](https://en.wikipedia.org/wiki/International_Standard_Serial_Number#Code_format): `isNotBlank(value.match(/^\d{4}-\d{3}[\dxX]$/))` + - One invalid value for `dc.type` diff --git a/docs/2018-05/index.html b/docs/2018-05/index.html index 8438c60d4..71376ee77 100644 --- a/docs/2018-05/index.html +++ b/docs/2018-05/index.html @@ -22,7 +22,7 @@ Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked " /> - + Just a note to myself, I figured out how to get reconcile-csv to run from source rather than running the old pre-compiled JAR file: -
$ lein run /tmp/crps.csv id
+
$ lein run /tmp/crps.csv name id
 
diff --git a/docs/sitemap.xml b/docs/sitemap.xml index b71ab02d5..496215727 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,7 +4,7 @@ https://alanorth.github.io/cgspace-notes/2018-09/ - 2018-09-03T16:47:24+03:00 + 2018-09-04T13:25:13+03:00 @@ -24,7 +24,7 @@ https://alanorth.github.io/cgspace-notes/2018-05/ - 2018-05-31T15:53:12-07:00 + 2018-09-04T16:15:26+03:00 @@ -184,7 +184,7 @@ https://alanorth.github.io/cgspace-notes/ - 2018-09-03T16:47:24+03:00 + 2018-09-04T13:25:13+03:00 0 @@ -195,7 +195,7 @@ https://alanorth.github.io/cgspace-notes/tags/notes/ - 2018-09-03T16:47:24+03:00 + 2018-09-04T13:25:13+03:00 0 @@ -207,13 +207,13 @@ https://alanorth.github.io/cgspace-notes/posts/ - 2018-09-03T16:47:24+03:00 + 2018-09-04T13:25:13+03:00 0 https://alanorth.github.io/cgspace-notes/tags/ - 2018-09-03T16:47:24+03:00 + 2018-09-04T13:25:13+03:00 0