mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-22 22:55:04 +01:00
Update notes for 2018-05-17
This commit is contained in:
parent
232693d9d3
commit
00dc8241fc
@ -286,3 +286,4 @@ ga('send', 'pageview', {
|
|||||||
- I'm not sure which method is better, perhaps the `solr.ASCIIFoldingFilterFactory` filter because it doesn't require copying the `mapping-FoldToASCII.txt` file
|
- I'm not sure which method is better, perhaps the `solr.ASCIIFoldingFilterFactory` filter because it doesn't require copying the `mapping-FoldToASCII.txt` file
|
||||||
- And actually I'm not entirely sure about the order of filtering before tokenizing, etc...
|
- And actually I'm not entirely sure about the order of filtering before tokenizing, etc...
|
||||||
- Ah, I see that `charFilter` must be before the tokenizer because it works on a stream, whereas `filter` operates on tokenized input so it must come after the tokenizer
|
- Ah, I see that `charFilter` must be before the tokenizer because it works on a stream, whereas `filter` operates on tokenized input so it must come after the tokenizer
|
||||||
|
- Regarding the use of the `charFilter` vs the `filter` class before and after the tokenizer, respectively, I think it's better to use the `charFilter` to normalize the input stream before tokenizing it as I have no idea what kinda stuff might get removed by the tokenizer
|
||||||
|
@ -27,7 +27,7 @@ Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked
|
|||||||
|
|
||||||
<meta property="article:published_time" content="2018-05-01T16:43:54+03:00"/>
|
<meta property="article:published_time" content="2018-05-01T16:43:54+03:00"/>
|
||||||
|
|
||||||
<meta property="article:modified_time" content="2018-05-17T10:51:46+03:00"/>
|
<meta property="article:modified_time" content="2018-05-17T12:37:21+03:00"/>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@ -65,9 +65,9 @@ Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked
|
|||||||
"@type": "BlogPosting",
|
"@type": "BlogPosting",
|
||||||
"headline": "May, 2018",
|
"headline": "May, 2018",
|
||||||
"url": "https://alanorth.github.io/cgspace-notes/2018-05/",
|
"url": "https://alanorth.github.io/cgspace-notes/2018-05/",
|
||||||
"wordCount": "2267",
|
"wordCount": "2313",
|
||||||
"datePublished": "2018-05-01T16:43:54+03:00",
|
"datePublished": "2018-05-01T16:43:54+03:00",
|
||||||
"dateModified": "2018-05-17T10:51:46+03:00",
|
"dateModified": "2018-05-17T12:37:21+03:00",
|
||||||
"author": {
|
"author": {
|
||||||
"@type": "Person",
|
"@type": "Person",
|
||||||
"name": "Alan Orth"
|
"name": "Alan Orth"
|
||||||
@ -469,6 +469,7 @@ $ ./bin/post -c countries ~/src/git/DSpace/2018-05-10-countries.csv
|
|||||||
<li>I’m not sure which method is better, perhaps the <code>solr.ASCIIFoldingFilterFactory</code> filter because it doesn’t require copying the <code>mapping-FoldToASCII.txt</code> file</li>
|
<li>I’m not sure which method is better, perhaps the <code>solr.ASCIIFoldingFilterFactory</code> filter because it doesn’t require copying the <code>mapping-FoldToASCII.txt</code> file</li>
|
||||||
<li>And actually I’m not entirely sure about the order of filtering before tokenizing, etc…</li>
|
<li>And actually I’m not entirely sure about the order of filtering before tokenizing, etc…</li>
|
||||||
<li>Ah, I see that <code>charFilter</code> must be before the tokenizer because it works on a stream, whereas <code>filter</code> operates on tokenized input so it must come after the tokenizer</li>
|
<li>Ah, I see that <code>charFilter</code> must be before the tokenizer because it works on a stream, whereas <code>filter</code> operates on tokenized input so it must come after the tokenizer</li>
|
||||||
|
<li>Regarding the use of the <code>charFilter</code> vs the <code>filter</code> class before and after the tokenizer, respectively, I think it’s better to use the <code>charFilter</code> to normalize the input stream before tokenizing it as I have no idea what kinda stuff might get removed by the tokenizer</li>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
|
|
||||||
|
@ -4,7 +4,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/2018-05/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/2018-05/</loc>
|
||||||
<lastmod>2018-05-17T10:51:46+03:00</lastmod>
|
<lastmod>2018-05-17T12:37:21+03:00</lastmod>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
@ -164,7 +164,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||||
<lastmod>2018-05-17T10:51:46+03:00</lastmod>
|
<lastmod>2018-05-17T12:37:21+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
@ -175,7 +175,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||||
<lastmod>2018-05-17T10:51:46+03:00</lastmod>
|
<lastmod>2018-05-17T12:37:21+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
@ -187,13 +187,13 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||||
<lastmod>2018-05-17T10:51:46+03:00</lastmod>
|
<lastmod>2018-05-17T12:37:21+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||||
<lastmod>2018-05-17T10:51:46+03:00</lastmod>
|
<lastmod>2018-05-17T12:37:21+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user