Update notes for 2018-05-17

This commit is contained in:
2018-05-17 13:14:29 +03:00
parent 232693d9d3
commit 00dc8241fc
3 changed files with 10 additions and 8 deletions

View File

@ -27,7 +27,7 @@ Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked
<meta property="article:published_time" content="2018-05-01T16:43:54&#43;03:00"/>
<meta property="article:modified_time" content="2018-05-17T10:51:46&#43;03:00"/>
<meta property="article:modified_time" content="2018-05-17T12:37:21&#43;03:00"/>
@ -65,9 +65,9 @@ Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked
"@type": "BlogPosting",
"headline": "May, 2018",
"url": "https://alanorth.github.io/cgspace-notes/2018-05/",
"wordCount": "2267",
"wordCount": "2313",
"datePublished": "2018-05-01T16:43:54&#43;03:00",
"dateModified": "2018-05-17T10:51:46&#43;03:00",
"dateModified": "2018-05-17T12:37:21&#43;03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -469,6 +469,7 @@ $ ./bin/post -c countries ~/src/git/DSpace/2018-05-10-countries.csv
<li>I&rsquo;m not sure which method is better, perhaps the <code>solr.ASCIIFoldingFilterFactory</code> filter because it doesn&rsquo;t require copying the <code>mapping-FoldToASCII.txt</code> file</li>
<li>And actually I&rsquo;m not entirely sure about the order of filtering before tokenizing, etc&hellip;</li>
<li>Ah, I see that <code>charFilter</code> must be before the tokenizer because it works on a stream, whereas <code>filter</code> operates on tokenized input so it must come after the tokenizer</li>
<li>Regarding the use of the <code>charFilter</code> vs the <code>filter</code> class before and after the tokenizer, respectively, I think it&rsquo;s better to use the <code>charFilter</code> to normalize the input stream before tokenizing it as I have no idea what kinda stuff might get removed by the tokenizer</li>
</ul>