diff --git a/content/posts/2020-10.md b/content/posts/2020-10.md index 4c02450c1..60f5909a1 100644 --- a/content/posts/2020-10.md +++ b/content/posts/2020-10.md @@ -261,7 +261,7 @@ $ http --print Hh https://dspacetest.cgiar.org/rest/bitstreams/dfa1d9c3-75d3-438 - I re-factored the `check-spider-hits.sh` script to read patterns from a text file rather than sed's stdout, and to properly search for spaces in patterns that use `\s` because Lucene's search syntax doesn't support it (and spaces work just fine) - Reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/regexp-syntax.html - Reference: https://lucene.apache.org/core/4_0_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Regexp_Searches -- I added `[Ss]pider` to the Tomcat Crawler Sessions Manager Valve regex because this can catch a few more generic bots and force them to use the same Tomcat JSESSIONID +- I added `[Ss]pider` to the Tomcat Crawler Session Manager Valve regex because this can catch a few more generic bots and force them to use the same Tomcat JSESSIONID - I added a few of the patterns from above to our local agents list and ran the `check-spider-hits.sh` on CGSpace: ```