diff --git a/content/posts/2019-11.md b/content/posts/2019-11.md index b0c7119e2..041b8a0b0 100644 --- a/content/posts/2019-11.md +++ b/content/posts/2019-11.md @@ -211,5 +211,28 @@ $ http --print b 'http://localhost:8081/solr/statistics-2018/select?facet=true&f - I wrote a quick bash script to check all these user agents against the CGSpace Solr statistics cores - For years 2010 until 2019 there are 1.6 million hits from these spider user agents - For 2019 alone there are 740,000, over half of which come from Unpaywall! + - Looking at the facets I see there were about 200,000 hits from Unpaywall in 2019-10: + +``` +$ curl -s 'http://localhost:8081/solr/statistics/select?facet=true&facet.field=dateYearMonth&facet.mincount=1&facet.offset=0&facet.limit= +12&q=userAgent:*Unpaywall*' | xmllint --format - | less +... + + + + + 198624 + 88422 + 79911 + 67065 + 39026 + 36889 + 36512 + 760 + + +``` + +- That answers Peter's question about why the stats jumped in October... diff --git a/docs/2019-11/index.html b/docs/2019-11/index.html index 3875c4f60..1cc9a6849 100644 --- a/docs/2019-11/index.html +++ b/docs/2019-11/index.html @@ -34,7 +34,7 @@ Let’s see how many of the REST API requests were for bitstreams (because t - + @@ -73,9 +73,9 @@ Let’s see how many of the REST API requests were for bitstreams (because t "@type": "BlogPosting", "headline": "November, 2019", "url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-11\/", - "wordCount": "1293", + "wordCount": "1343", "datePublished": "2019-11-04T12:20:30+02:00", - "dateModified": "2019-11-07T12:40:25+02:00", + "dateModified": "2019-11-07T18:22:19+02:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -364,16 +364,37 @@ $ http –print b ‘http://localhost:8081/solr/statistics-2018/select?facet=true&facet.field=ip&facet.mincount=1&type:0&q=userAgent:*ltx71*' | xmllint –format - | grep numFound - +

+ +

+- I wrote a quick bash script to check all these user agents against the CGSpace Solr statistics cores
+  - For years 2010 until 2019 there are 1.6 million hits from these spider user agents
+  - For 2019 alone there are 740,000, over half of which come from Unpaywall!
+  - Looking at the facets I see there were about 200,000 hits from Unpaywall in 2019-10:
+
+
+ +

$ curl -s ‘http://localhost:8081/solr/statistics/select?facet=true&facet.field=dateYearMonth&facet.mincount=1&facet.offset=0&facet.limit= +12&q=userAgent:Unpaywall’ | xmllint –format - | less +… + + + + + 198624 + 88422 + 79911 + 67065 + 39026 + 36889 + 36512 + 760 + + ```

    -
  • I wrote a quick bash script to check all these user agents against the CGSpace Solr statistics cores - -
      -
    • For years 2010 until 2019 there are 1.6 million hits from these spider user agents
    • -
    • For 2019 alone there are 740,000, over half of which come from Unpaywall!
    • -
  • +
  • That answers Peter’s question about why the stats jumped in October…
diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 544aeff63..e60e1810d 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,27 +4,27 @@ https://alanorth.github.io/cgspace-notes/categories/ - 2019-11-07T12:40:25+02:00 + 2019-11-07T18:22:19+02:00 https://alanorth.github.io/cgspace-notes/ - 2019-11-07T12:40:25+02:00 + 2019-11-07T18:22:19+02:00 https://alanorth.github.io/cgspace-notes/categories/notes/ - 2019-11-07T12:40:25+02:00 + 2019-11-07T18:22:19+02:00 https://alanorth.github.io/cgspace-notes/2019-11/ - 2019-11-07T12:40:25+02:00 + 2019-11-07T18:22:19+02:00 https://alanorth.github.io/cgspace-notes/posts/ - 2019-11-07T12:40:25+02:00 + 2019-11-07T18:22:19+02:00