From da88f0e7a95db9f42d79ebf00c6c4752435f62d6 Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Sat, 24 Oct 2020 22:23:06 +0300 Subject: [PATCH] content/posts/2020-10.md: Fix typo --- content/posts/2020-10.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/posts/2020-10.md b/content/posts/2020-10.md index 4c02450c1..60f5909a1 100644 --- a/content/posts/2020-10.md +++ b/content/posts/2020-10.md @@ -261,7 +261,7 @@ $ http --print Hh https://dspacetest.cgiar.org/rest/bitstreams/dfa1d9c3-75d3-438 - I re-factored the `check-spider-hits.sh` script to read patterns from a text file rather than sed's stdout, and to properly search for spaces in patterns that use `\s` because Lucene's search syntax doesn't support it (and spaces work just fine) - Reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/regexp-syntax.html - Reference: https://lucene.apache.org/core/4_0_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Regexp_Searches -- I added `[Ss]pider` to the Tomcat Crawler Sessions Manager Valve regex because this can catch a few more generic bots and force them to use the same Tomcat JSESSIONID +- I added `[Ss]pider` to the Tomcat Crawler Session Manager Valve regex because this can catch a few more generic bots and force them to use the same Tomcat JSESSIONID - I added a few of the patterns from above to our local agents list and ran the `check-spider-hits.sh` on CGSpace: ```