From c451b22f2c37c44e6f2dfac3822f3518d474abd9 Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Wed, 18 Jul 2018 17:47:36 +0300 Subject: [PATCH] Update notes for 2018-07-18 --- content/posts/2018-07.md | 37 ++++++++++++++++++++++++++++++++++ docs/2018-07/index.html | 43 +++++++++++++++++++++++++++++++++++++--- docs/sitemap.xml | 10 +++++----- 3 files changed, 82 insertions(+), 8 deletions(-) diff --git a/content/posts/2018-07.md b/content/posts/2018-07.md index 51a0634ff..c1951c71f 100644 --- a/content/posts/2018-07.md +++ b/content/posts/2018-07.md @@ -393,5 +393,42 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i - Participate in call with IWMI and WLE to discuss Altmetric, CGSpace, and social media - I told them that they should try to be including the Handle link on their social media shares because that's the only way to get Altmetric to notice them and associate them with their DOIs - I suggested that we should have a wider meeting about this, and that I would post that on Yammer +- I was curious about how and when Altmetric harvests the OAI, so I looked in nginx's OAI log +- For every day in the past week I only see about 50 to 100 requests per day, but then about nine days ago I see 1500 requsts +- In there I see two bots making about 750 requests each, and this one is probably Altmetric: + +``` +178.33.237.157 - - [09/Jul/2018:17:00:46 +0000] "GET /oai/request?verb=ListRecords&resumptionToken=oai_dc////100 HTTP/1.1" 200 58653 "-" "Apache-HttpClient/4.5.2 (Java/1.8.0_121)" +178.33.237.157 - - [09/Jul/2018:17:01:11 +0000] "GET /oai/request?verb=ListRecords&resumptionToken=oai_dc////200 HTTP/1.1" 200 67950 "-" "Apache-HttpClient/4.5.2 (Java/1.8.0_121)" +... +178.33.237.157 - - [09/Jul/2018:22:10:39 +0000] "GET /oai/request?verb=ListRecords&resumptionToken=oai_dc////73900 HTTP/1.1" 20 0 25049 "-" "Apache-HttpClient/4.5.2 (Java/1.8.0_121)" +``` + +- So if they are getting 100 records per OAI request it would take them 739 requests +- I wonder if I should add this user agent to the Tomcat Crawler Session Manager valve... does OAI use Tomcat sessions? +- Appears not: + +``` +$ http --print Hh 'https://cgspace.cgiar.org/oai/request?verb=ListRecords&resumptionToken=oai_dc////100' +GET /oai/request?verb=ListRecords&resumptionToken=oai_dc////100 HTTP/1.1 +Accept: */* +Accept-Encoding: gzip, deflate +Connection: keep-alive +Host: cgspace.cgiar.org +User-Agent: HTTPie/0.9.9 + +HTTP/1.1 200 OK +Connection: keep-alive +Content-Encoding: gzip +Content-Type: application/xml;charset=UTF-8 +Date: Wed, 18 Jul 2018 14:46:37 GMT +Server: nginx +Strict-Transport-Security: max-age=15768000 +Transfer-Encoding: chunked +Vary: Accept-Encoding +X-Content-Type-Options: nosniff +X-Frame-Options: SAMEORIGIN +X-XSS-Protection: 1; mode=block +``` diff --git a/docs/2018-07/index.html b/docs/2018-07/index.html index fe967ff41..3c0f21b83 100644 --- a/docs/2018-07/index.html +++ b/docs/2018-07/index.html @@ -30,7 +30,7 @@ There is insufficient memory for the Java Runtime Environment to continue. - + @@ -71,9 +71,9 @@ There is insufficient memory for the Java Runtime Environment to continue. "@type": "BlogPosting", "headline": "July, 2018", "url": "https://alanorth.github.io/cgspace-notes/2018-07/", - "wordCount": "2704", + "wordCount": "2896", "datePublished": "2018-07-01T12:56:54+03:00", - "dateModified": "2018-07-18T13:16:53+03:00", + "dateModified": "2018-07-18T13:25:02+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -582,8 +582,45 @@ $ ./resolve-orcids.py -i /tmp/2018-07-15-orcid-ids.txt -o /tmp/2018-07-15-resolv
  • Participate in call with IWMI and WLE to discuss Altmetric, CGSpace, and social media
  • I told them that they should try to be including the Handle link on their social media shares because that’s the only way to get Altmetric to notice them and associate them with their DOIs
  • I suggested that we should have a wider meeting about this, and that I would post that on Yammer
  • +
  • I was curious about how and when Altmetric harvests the OAI, so I looked in nginx’s OAI log
  • +
  • For every day in the past week I only see about 50 to 100 requests per day, but then about nine days ago I see 1500 requsts
  • +
  • In there I see two bots making about 750 requests each, and this one is probably Altmetric:
  • +
    178.33.237.157 - - [09/Jul/2018:17:00:46 +0000] "GET /oai/request?verb=ListRecords&resumptionToken=oai_dc////100 HTTP/1.1" 200 58653 "-" "Apache-HttpClient/4.5.2 (Java/1.8.0_121)"
    +178.33.237.157 - - [09/Jul/2018:17:01:11 +0000] "GET /oai/request?verb=ListRecords&resumptionToken=oai_dc////200 HTTP/1.1" 200 67950 "-" "Apache-HttpClient/4.5.2 (Java/1.8.0_121)"
    +...
    +178.33.237.157 - - [09/Jul/2018:22:10:39 +0000] "GET /oai/request?verb=ListRecords&resumptionToken=oai_dc////73900 HTTP/1.1" 20 0 25049 "-" "Apache-HttpClient/4.5.2 (Java/1.8.0_121)"
    +
    + + + +
    $ http --print Hh 'https://cgspace.cgiar.org/oai/request?verb=ListRecords&resumptionToken=oai_dc////100'
    +GET /oai/request?verb=ListRecords&resumptionToken=oai_dc////100 HTTP/1.1
    +Accept: */*
    +Accept-Encoding: gzip, deflate
    +Connection: keep-alive
    +Host: cgspace.cgiar.org
    +User-Agent: HTTPie/0.9.9
    +
    +HTTP/1.1 200 OK
    +Connection: keep-alive
    +Content-Encoding: gzip
    +Content-Type: application/xml;charset=UTF-8
    +Date: Wed, 18 Jul 2018 14:46:37 GMT
    +Server: nginx
    +Strict-Transport-Security: max-age=15768000
    +Transfer-Encoding: chunked
    +Vary: Accept-Encoding
    +X-Content-Type-Options: nosniff
    +X-Frame-Options: SAMEORIGIN
    +X-XSS-Protection: 1; mode=block
    +
    + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index c0781263f..0eddbf275 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,7 +4,7 @@ https://alanorth.github.io/cgspace-notes/2018-07/ - 2018-07-18T13:16:53+03:00 + 2018-07-18T13:25:02+03:00 @@ -174,7 +174,7 @@ https://alanorth.github.io/cgspace-notes/ - 2018-07-18T13:16:53+03:00 + 2018-07-18T13:25:02+03:00 0 @@ -185,7 +185,7 @@ https://alanorth.github.io/cgspace-notes/tags/notes/ - 2018-07-18T13:16:53+03:00 + 2018-07-18T13:25:02+03:00 0 @@ -197,13 +197,13 @@ https://alanorth.github.io/cgspace-notes/posts/ - 2018-07-18T13:16:53+03:00 + 2018-07-18T13:25:02+03:00 0 https://alanorth.github.io/cgspace-notes/tags/ - 2018-07-18T13:16:53+03:00 + 2018-07-18T13:25:02+03:00 0