mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Update notes for 2019-09-21
This commit is contained in:
@ -40,7 +40,7 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-09/" />
|
||||
<meta property="article:published_time" content="2019-09-01T10:17:51+03:00" />
|
||||
<meta property="article:modified_time" content="2019-09-20T13:25:59+03:00" />
|
||||
<meta property="article:modified_time" content="2019-09-21T02:25:19+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="September, 2019"/>
|
||||
@ -75,7 +75,7 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
|
||||
9124 45.5.186.2
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
@ -85,9 +85,9 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
|
||||
"@type": "BlogPosting",
|
||||
"headline": "September, 2019",
|
||||
"url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-09\/",
|
||||
"wordCount": "2166",
|
||||
"wordCount": "2325",
|
||||
"datePublished": "2019-09-01T10:17:51\x2b03:00",
|
||||
"dateModified": "2019-09-20T13:25:59\x2b03:00",
|
||||
"dateModified": "2019-09-21T02:25:19\x2b03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -510,6 +510,31 @@ $ dspace import -a me@cgiar.org -m 2019-09-20-bioversity2.map -s /home/aorth/Bio
|
||||
</ul></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2019-09-21">2019-09-21</h2>
|
||||
|
||||
<ul>
|
||||
<li>Re-upload the <a href="https://dspacetest.cgiar.org/handle/10568/105116">IITA Sept 6 (20196th.xls) records to DSpace Test</a> after I did the re-sync yesterday
|
||||
|
||||
<ul>
|
||||
<li>Then I looked at the records again and sent some feedback about three duplicates to Bosede</li>
|
||||
<li>Also I noticed that many journal articles have the journal and page information in the citation, but are missing <code>dc.source</code> and <code>dc.format.extent</code> fields</li>
|
||||
</ul></li>
|
||||
<li>Play with language identification using the langdetect, fasttext, polyglot, and langid libraries
|
||||
|
||||
<ul>
|
||||
<li>ployglot requires too many system things to compile</li>
|
||||
<li>langdetect didn’t seem as accurate as the others</li>
|
||||
<li>fasttext is likely the best, but <a href="https://github.com/facebookresearch/fastText/issues/909">prints a blank link to the console when loading a model</a></li>
|
||||
<li>langid seems to be the best considering the above experiences</li>
|
||||
</ul></li>
|
||||
<li>I added very experimental language detection to the <a href="https://github.com/ilri/csv-metadata-quality">csv-metadata-quality</a> module
|
||||
|
||||
<ul>
|
||||
<li>It works by checking the predicted language of the <code>dc.title</code> field against the item’s <code>dc.language.iso</code> field</li>
|
||||
<li>I tested it on the Bioversity migration data set and actually managed to correct about eight incorrect language fields in their records!</li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
||||
|
||||
|
Reference in New Issue
Block a user