mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Update notes for 2018-05-17
This commit is contained in:
@ -286,3 +286,4 @@ ga('send', 'pageview', {
|
||||
- I'm not sure which method is better, perhaps the `solr.ASCIIFoldingFilterFactory` filter because it doesn't require copying the `mapping-FoldToASCII.txt` file
|
||||
- And actually I'm not entirely sure about the order of filtering before tokenizing, etc...
|
||||
- Ah, I see that `charFilter` must be before the tokenizer because it works on a stream, whereas `filter` operates on tokenized input so it must come after the tokenizer
|
||||
- Regarding the use of the `charFilter` vs the `filter` class before and after the tokenizer, respectively, I think it's better to use the `charFilter` to normalize the input stream before tokenizing it as I have no idea what kinda stuff might get removed by the tokenizer
|
||||
|
Reference in New Issue
Block a user