Update notes for 2020-10-19

This commit is contained in:
Alan Orth 2020-10-19 17:22:49 +03:00
parent 28d25cdac0
commit 7cdb9f31e6
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
22 changed files with 76 additions and 28 deletions

View File

@ -589,4 +589,24 @@ Purging 1282 hits from curl in statistics
Total number of bot hits purged: 8174 Total number of bot hits purged: 8174
``` ```
- Add "Infographic" to types in input form
- Looking into the spider agent issue from last week, where hits seem to be logged regardless of ANY spider agent patterns being loaded
- I changed the following two options:
- `usage-statistics.logBots = false`
- `usage-statistics.bots.case-insensitive = true`
- Then I made several requests with a bot user agent:
```
$ http --print Hh https://dspacetest.cgiar.org/rest/bitstreams/dfa1d9c3-75d3-4380-a9d3-4c8cbbed2d21/retrieve User-Agent:"RTB website BOT"
$ curl -s 'http://localhost:8083/solr/statistics/update?softCommit=true'
```
- And I saw three hits in Solr with `isBot: true`!!!
- I made a few more requests with user agent "fumanchu" and it logs them with `isBot: false`...
- I made a request with user agent "Delphi 2009" which is in the ilri pattern file, and it was logged with `isBot: true`
- I made a few more requests and confirmed that if a pattern is in the list it gets logged with `isBot: true` despite the fact that `usage-statistics.logBots` is false...
- So WTF this means that it *knows* they are from a bot, but it logs them anyways
- Is this an issue with Atmire's modules?
- I sent them feedback on the ticket
<!-- vim: set sw=2 ts=2: --> <!-- vim: set sw=2 ts=2: -->

View File

@ -23,7 +23,7 @@ During the FlywayDB migration I got an error:
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2020-10/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2020-10/" />
<meta property="article:published_time" content="2020-10-06T16:55:54+03:00" /> <meta property="article:published_time" content="2020-10-06T16:55:54+03:00" />
<meta property="article:modified_time" content="2020-10-15T18:11:00+03:00" /> <meta property="article:modified_time" content="2020-10-19T15:47:59+03:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="October, 2020"/> <meta name="twitter:title" content="October, 2020"/>
@ -51,9 +51,9 @@ During the FlywayDB migration I got an error:
"@type": "BlogPosting", "@type": "BlogPosting",
"headline": "October, 2020", "headline": "October, 2020",
"url": "https://alanorth.github.io/cgspace-notes/2020-10/", "url": "https://alanorth.github.io/cgspace-notes/2020-10/",
"wordCount": "3789", "wordCount": "3963",
"datePublished": "2020-10-06T16:55:54+03:00", "datePublished": "2020-10-06T16:55:54+03:00",
"dateModified": "2020-10-15T18:11:00+03:00", "dateModified": "2020-10-19T15:47:59+03:00",
"author": { "author": {
"@type": "Person", "@type": "Person",
"name": "Alan Orth" "name": "Alan Orth"
@ -776,7 +776,35 @@ Purging 1851 hits from ILRI Livestock Website Publications importer BOT in stati
Purging 1282 hits from curl in statistics Purging 1282 hits from curl in statistics
Total number of bot hits purged: 8174 Total number of bot hits purged: 8174
</code></pre><!-- raw HTML omitted --> </code></pre><ul>
<li>Add &ldquo;Infographic&rdquo; to types in input form</li>
<li>Looking into the spider agent issue from last week, where hits seem to be logged regardless of ANY spider agent patterns being loaded
<ul>
<li>I changed the following two options:
<ul>
<li><code>usage-statistics.logBots = false</code></li>
<li><code>usage-statistics.bots.case-insensitive = true</code></li>
</ul>
</li>
<li>Then I made several requests with a bot user agent:</li>
</ul>
</li>
</ul>
<pre><code>$ http --print Hh https://dspacetest.cgiar.org/rest/bitstreams/dfa1d9c3-75d3-4380-a9d3-4c8cbbed2d21/retrieve User-Agent:&quot;RTB website BOT&quot;
$ curl -s 'http://localhost:8083/solr/statistics/update?softCommit=true'
</code></pre><ul>
<li>And I saw three hits in Solr with <code>isBot: true</code>!!!
<ul>
<li>I made a few more requests with user agent &ldquo;fumanchu&rdquo; and it logs them with <code>isBot: false</code>&hellip;</li>
<li>I made a request with user agent &ldquo;Delphi 2009&rdquo; which is in the ilri pattern file, and it was logged with <code>isBot: true</code></li>
<li>I made a few more requests and confirmed that if a pattern is in the list it gets logged with <code>isBot: true</code> despite the fact that <code>usage-statistics.logBots</code> is false&hellip;</li>
<li>So WTF this means that it <em>knows</em> they are from a bot, but it logs them anyways</li>
<li>Is this an issue with Atmire&rsquo;s modules?</li>
<li>I sent them feedback on the ticket</li>
</ul>
</li>
</ul>
<!-- raw HTML omitted -->

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
<meta property="og:updated_time" content="2020-10-19T15:23:30+03:00" /> <meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Categories"/> <meta name="twitter:title" content="Categories"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2020-10-19T15:23:30+03:00" /> <meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/> <meta name="twitter:title" content="Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2020-10-19T15:23:30+03:00" /> <meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/> <meta name="twitter:title" content="Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2020-10-19T15:23:30+03:00" /> <meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/> <meta name="twitter:title" content="Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2020-10-19T15:23:30+03:00" /> <meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/> <meta name="twitter:title" content="Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-10-19T15:23:30+03:00" /> <meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/> <meta name="twitter:title" content="CGSpace Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-10-19T15:23:30+03:00" /> <meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/> <meta name="twitter:title" content="CGSpace Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-10-19T15:23:30+03:00" /> <meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/> <meta name="twitter:title" content="CGSpace Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-10-19T15:23:30+03:00" /> <meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/> <meta name="twitter:title" content="CGSpace Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-10-19T15:23:30+03:00" /> <meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/> <meta name="twitter:title" content="CGSpace Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-10-19T15:23:30+03:00" /> <meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/> <meta name="twitter:title" content="CGSpace Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-10-19T15:23:30+03:00" /> <meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/> <meta name="twitter:title" content="CGSpace Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-10-19T15:23:30+03:00" /> <meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/> <meta name="twitter:title" content="Posts"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-10-19T15:23:30+03:00" /> <meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/> <meta name="twitter:title" content="Posts"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-10-19T15:23:30+03:00" /> <meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/> <meta name="twitter:title" content="Posts"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-10-19T15:23:30+03:00" /> <meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/> <meta name="twitter:title" content="Posts"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-10-19T15:23:30+03:00" /> <meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/> <meta name="twitter:title" content="Posts"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-10-19T15:23:30+03:00" /> <meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/> <meta name="twitter:title" content="Posts"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." /> <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" /> <meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-10-19T15:23:30+03:00" /> <meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/> <meta name="twitter:title" content="Posts"/>

View File

@ -4,27 +4,27 @@
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc> <loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
<lastmod>2020-10-19T15:23:30+03:00</lastmod> <lastmod>2020-10-19T15:47:59+03:00</lastmod>
</url> </url>
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/</loc> <loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2020-10-19T15:23:30+03:00</lastmod> <lastmod>2020-10-19T15:47:59+03:00</lastmod>
</url> </url>
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc> <loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
<lastmod>2020-10-19T15:23:30+03:00</lastmod> <lastmod>2020-10-19T15:47:59+03:00</lastmod>
</url> </url>
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/2020-10/</loc> <loc>https://alanorth.github.io/cgspace-notes/2020-10/</loc>
<lastmod>2020-10-15T18:11:00+03:00</lastmod> <lastmod>2020-10-19T15:47:59+03:00</lastmod>
</url> </url>
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc> <loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2020-10-19T15:23:30+03:00</lastmod> <lastmod>2020-10-19T15:47:59+03:00</lastmod>
</url> </url>
<url> <url>