Update notes for 2018-07-18

This commit is contained in:
Alan Orth 2018-07-18 17:47:36 +03:00
parent b17330f157
commit c451b22f2c
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
3 changed files with 82 additions and 8 deletions

View File

@ -393,5 +393,42 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
- Participate in call with IWMI and WLE to discuss Altmetric, CGSpace, and social media
- I told them that they should try to be including the Handle link on their social media shares because that's the only way to get Altmetric to notice them and associate them with their DOIs
- I suggested that we should have a wider meeting about this, and that I would post that on Yammer
- I was curious about how and when Altmetric harvests the OAI, so I looked in nginx's OAI log
- For every day in the past week I only see about 50 to 100 requests per day, but then about nine days ago I see 1500 requsts
- In there I see two bots making about 750 requests each, and this one is probably Altmetric:
```
178.33.237.157 - - [09/Jul/2018:17:00:46 +0000] "GET /oai/request?verb=ListRecords&resumptionToken=oai_dc////100 HTTP/1.1" 200 58653 "-" "Apache-HttpClient/4.5.2 (Java/1.8.0_121)"
178.33.237.157 - - [09/Jul/2018:17:01:11 +0000] "GET /oai/request?verb=ListRecords&resumptionToken=oai_dc////200 HTTP/1.1" 200 67950 "-" "Apache-HttpClient/4.5.2 (Java/1.8.0_121)"
...
178.33.237.157 - - [09/Jul/2018:22:10:39 +0000] "GET /oai/request?verb=ListRecords&resumptionToken=oai_dc////73900 HTTP/1.1" 20 0 25049 "-" "Apache-HttpClient/4.5.2 (Java/1.8.0_121)"
```
- So if they are getting 100 records per OAI request it would take them 739 requests
- I wonder if I should add this user agent to the Tomcat Crawler Session Manager valve... does OAI use Tomcat sessions?
- Appears not:
```
$ http --print Hh 'https://cgspace.cgiar.org/oai/request?verb=ListRecords&resumptionToken=oai_dc////100'
GET /oai/request?verb=ListRecords&resumptionToken=oai_dc////100 HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: cgspace.cgiar.org
User-Agent: HTTPie/0.9.9
HTTP/1.1 200 OK
Connection: keep-alive
Content-Encoding: gzip
Content-Type: application/xml;charset=UTF-8
Date: Wed, 18 Jul 2018 14:46:37 GMT
Server: nginx
Strict-Transport-Security: max-age=15768000
Transfer-Encoding: chunked
Vary: Accept-Encoding
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
```
<!-- vim: set sw=2 ts=2: -->

View File

@ -30,7 +30,7 @@ There is insufficient memory for the Java Runtime Environment to continue.
<meta property="article:published_time" content="2018-07-01T12:56:54&#43;03:00"/>
<meta property="article:modified_time" content="2018-07-18T13:16:53&#43;03:00"/>
<meta property="article:modified_time" content="2018-07-18T13:25:02&#43;03:00"/>
@ -71,9 +71,9 @@ There is insufficient memory for the Java Runtime Environment to continue.
"@type": "BlogPosting",
"headline": "July, 2018",
"url": "https://alanorth.github.io/cgspace-notes/2018-07/",
"wordCount": "2704",
"wordCount": "2896",
"datePublished": "2018-07-01T12:56:54&#43;03:00",
"dateModified": "2018-07-18T13:16:53&#43;03:00",
"dateModified": "2018-07-18T13:25:02&#43;03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -582,8 +582,45 @@ $ ./resolve-orcids.py -i /tmp/2018-07-15-orcid-ids.txt -o /tmp/2018-07-15-resolv
<li>Participate in call with IWMI and WLE to discuss Altmetric, CGSpace, and social media</li>
<li>I told them that they should try to be including the Handle link on their social media shares because that&rsquo;s the only way to get Altmetric to notice them and associate them with their DOIs</li>
<li>I suggested that we should have a wider meeting about this, and that I would post that on Yammer</li>
<li>I was curious about how and when Altmetric harvests the OAI, so I looked in nginx&rsquo;s OAI log</li>
<li>For every day in the past week I only see about 50 to 100 requests per day, but then about nine days ago I see 1500 requsts</li>
<li>In there I see two bots making about 750 requests each, and this one is probably Altmetric:</li>
</ul>
<pre><code>178.33.237.157 - - [09/Jul/2018:17:00:46 +0000] &quot;GET /oai/request?verb=ListRecords&amp;resumptionToken=oai_dc////100 HTTP/1.1&quot; 200 58653 &quot;-&quot; &quot;Apache-HttpClient/4.5.2 (Java/1.8.0_121)&quot;
178.33.237.157 - - [09/Jul/2018:17:01:11 +0000] &quot;GET /oai/request?verb=ListRecords&amp;resumptionToken=oai_dc////200 HTTP/1.1&quot; 200 67950 &quot;-&quot; &quot;Apache-HttpClient/4.5.2 (Java/1.8.0_121)&quot;
...
178.33.237.157 - - [09/Jul/2018:22:10:39 +0000] &quot;GET /oai/request?verb=ListRecords&amp;resumptionToken=oai_dc////73900 HTTP/1.1&quot; 20 0 25049 &quot;-&quot; &quot;Apache-HttpClient/4.5.2 (Java/1.8.0_121)&quot;
</code></pre>
<ul>
<li>So if they are getting 100 records per OAI request it would take them 739 requests</li>
<li>I wonder if I should add this user agent to the Tomcat Crawler Session Manager valve&hellip; does OAI use Tomcat sessions?</li>
<li>Appears not:</li>
</ul>
<pre><code>$ http --print Hh 'https://cgspace.cgiar.org/oai/request?verb=ListRecords&amp;resumptionToken=oai_dc////100'
GET /oai/request?verb=ListRecords&amp;resumptionToken=oai_dc////100 HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: cgspace.cgiar.org
User-Agent: HTTPie/0.9.9
HTTP/1.1 200 OK
Connection: keep-alive
Content-Encoding: gzip
Content-Type: application/xml;charset=UTF-8
Date: Wed, 18 Jul 2018 14:46:37 GMT
Server: nginx
Strict-Transport-Security: max-age=15768000
Transfer-Encoding: chunked
Vary: Accept-Encoding
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
</code></pre>
<!-- vim: set sw=2 ts=2: -->

View File

@ -4,7 +4,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/2018-07/</loc>
<lastmod>2018-07-18T13:16:53+03:00</lastmod>
<lastmod>2018-07-18T13:25:02+03:00</lastmod>
</url>
<url>
@ -174,7 +174,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2018-07-18T13:16:53+03:00</lastmod>
<lastmod>2018-07-18T13:25:02+03:00</lastmod>
<priority>0</priority>
</url>
@ -185,7 +185,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
<lastmod>2018-07-18T13:16:53+03:00</lastmod>
<lastmod>2018-07-18T13:25:02+03:00</lastmod>
<priority>0</priority>
</url>
@ -197,13 +197,13 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2018-07-18T13:16:53+03:00</lastmod>
<lastmod>2018-07-18T13:25:02+03:00</lastmod>
<priority>0</priority>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
<lastmod>2018-07-18T13:16:53+03:00</lastmod>
<lastmod>2018-07-18T13:25:02+03:00</lastmod>
<priority>0</priority>
</url>