diff --git a/content/posts/2019-09.md b/content/posts/2019-09.md index 15c2baea1..3608a5de8 100644 --- a/content/posts/2019-09.md +++ b/content/posts/2019-09.md @@ -303,6 +303,6 @@ $ dspace import -a me@cgiar.org -m 2019-09-20-bioversity2.map -s /home/aorth/Bio - langid seems to be the best considering the above experiences - I added very experimental language detection to the [csv-metadata-quality](https://github.com/ilri/csv-metadata-quality) module - It works by checking the predicted language of the `dc.title` field against the item's `dc.language.iso` field - - I tested it on the Bioversity migration data set and actually managed to correct about eight incorrect language fields in their records! + - I tested it on the Bioversity migration data set and it actually helped me correct eleven language fields in their records! diff --git a/docs/2019-09/index.html b/docs/2019-09/index.html index fc9eef51b..1a833e1e4 100644 --- a/docs/2019-09/index.html +++ b/docs/2019-09/index.html @@ -40,7 +40,7 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning: - + @@ -85,9 +85,9 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning: "@type": "BlogPosting", "headline": "September, 2019", "url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-09\/", - "wordCount": "2325", + "wordCount": "2324", "datePublished": "2019-09-01T10:17:51\x2b03:00", - "dateModified": "2019-09-21T02:25:19\x2b03:00", + "dateModified": "2019-09-22T01:36:39\x2b03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -531,7 +531,7 @@ $ dspace import -a me@cgiar.org -m 2019-09-20-bioversity2.map -s /home/aorth/Bio
dc.title
field against the item’s dc.language.iso
field