mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-26 08:28:18 +01:00
Add notes for 2019-04-07
This commit is contained in:
parent
aa1264aa42
commit
89a4212e2b
@ -170,4 +170,128 @@ GET /handle/10568/72970/discover?filtertype_0=type&filtertype_1=author&filter_re
|
|||||||
- Maria from Bioversity recommended that we use the phrase "AGROVOC subject" instead of "Subject" in Listings and Reports
|
- Maria from Bioversity recommended that we use the phrase "AGROVOC subject" instead of "Subject" in Listings and Reports
|
||||||
- I made a pull request to update this and merged it to the `5_x-prod` branch ([#418](https://github.com/ilri/DSpace/pull/418))
|
- I made a pull request to update this and merged it to the `5_x-prod` branch ([#418](https://github.com/ilri/DSpace/pull/418))
|
||||||
|
|
||||||
|
## 2019-04-07
|
||||||
|
|
||||||
|
- Looking into the impact of harvesters like `45.5.184.72`, I see in Solr that this user is not categorized as a bot so it definitely impacts the usage stats by some tens of thousands *per day*
|
||||||
|
- Last week CTA switched their frontend code to use HEAD requests instead of GET requests for PDF bitstreams
|
||||||
|
- I am trying to see if these are registered as downloads in Solr or not
|
||||||
|
- I see 96,925 downloads from their AWS gateway IPs in 2019-03:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ http --print b 'http://localhost:8081/solr/statistics/select?q=type%3A0+AND+(ip%3A18.196.196.108+OR+ip%3A18.195.78.144+OR+ip%3A18.195.218.6)&fq=statistics_type%3Aview&fq=bundleName%3AORIGINAL&fq=dateYearMonth%3A2019-03&rows=0&wt=json&indent=true'
|
||||||
|
{
|
||||||
|
"response": {
|
||||||
|
"docs": [],
|
||||||
|
"numFound": 96925,
|
||||||
|
"start": 0
|
||||||
|
},
|
||||||
|
"responseHeader": {
|
||||||
|
"QTime": 1,
|
||||||
|
"params": {
|
||||||
|
"fq": [
|
||||||
|
"statistics_type:view",
|
||||||
|
"bundleName:ORIGINAL",
|
||||||
|
"dateYearMonth:2019-03"
|
||||||
|
],
|
||||||
|
"indent": "true",
|
||||||
|
"q": "type:0 AND (ip:18.196.196.108 OR ip:18.195.78.144 OR ip:18.195.218.6)",
|
||||||
|
"rows": "0",
|
||||||
|
"wt": "json"
|
||||||
|
},
|
||||||
|
"status": 0
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
- Strangely I don't see many hits in 2019-04:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ http --print b 'http://localhost:8081/solr/statistics/select?q=type%3A0+AND+(ip%3A18.196.196.108+OR+ip%3A18.195.78.144+OR+ip%3A18.195.218.6)&fq=statistics_type%3Aview&fq=bundleName%3AORIGINAL&fq=dateYearMonth%3A2019-04&rows=0&wt=json&indent=true'
|
||||||
|
{
|
||||||
|
"response": {
|
||||||
|
"docs": [],
|
||||||
|
"numFound": 38,
|
||||||
|
"start": 0
|
||||||
|
},
|
||||||
|
"responseHeader": {
|
||||||
|
"QTime": 1,
|
||||||
|
"params": {
|
||||||
|
"fq": [
|
||||||
|
"statistics_type:view",
|
||||||
|
"bundleName:ORIGINAL",
|
||||||
|
"dateYearMonth:2019-04"
|
||||||
|
],
|
||||||
|
"indent": "true",
|
||||||
|
"q": "type:0 AND (ip:18.196.196.108 OR ip:18.195.78.144 OR ip:18.195.218.6)",
|
||||||
|
"rows": "0",
|
||||||
|
"wt": "json"
|
||||||
|
},
|
||||||
|
"status": 0
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
- Making some tests on GET vs HEAD requests on the [CTA Spore 192 item](https://dspacetest.cgiar.org/handle/10568/100289) on DSpace Test:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ http --print Hh GET https://dspacetest.cgiar.org/bitstream/handle/10568/100289/Spore-192-EN-web.pdf
|
||||||
|
GET /bitstream/handle/10568/100289/Spore-192-EN-web.pdf HTTP/1.1
|
||||||
|
Accept: */*
|
||||||
|
Accept-Encoding: gzip, deflate
|
||||||
|
Connection: keep-alive
|
||||||
|
Host: dspacetest.cgiar.org
|
||||||
|
User-Agent: HTTPie/1.0.2
|
||||||
|
|
||||||
|
HTTP/1.1 200 OK
|
||||||
|
Connection: keep-alive
|
||||||
|
Content-Language: en-US
|
||||||
|
Content-Length: 2069158
|
||||||
|
Content-Type: application/pdf;charset=ISO-8859-1
|
||||||
|
Date: Sun, 07 Apr 2019 08:38:34 GMT
|
||||||
|
Expires: Sun, 07 Apr 2019 09:38:34 GMT
|
||||||
|
Last-Modified: Thu, 14 Mar 2019 11:20:05 GMT
|
||||||
|
Server: nginx
|
||||||
|
Set-Cookie: JSESSIONID=21A492CC31CA8845278DFA078BD2D9ED; Path=/; Secure; HttpOnly
|
||||||
|
Vary: User-Agent
|
||||||
|
X-Cocoon-Version: 2.2.0
|
||||||
|
X-Content-Type-Options: nosniff
|
||||||
|
X-Frame-Options: SAMEORIGIN
|
||||||
|
X-Robots-Tag: none
|
||||||
|
X-XSS-Protection: 1; mode=block
|
||||||
|
|
||||||
|
$ http --print Hh HEAD https://dspacetest.cgiar.org/bitstream/handle/10568/100289/Spore-192-EN-web.pdf
|
||||||
|
HEAD /bitstream/handle/10568/100289/Spore-192-EN-web.pdf HTTP/1.1
|
||||||
|
Accept: */*
|
||||||
|
Accept-Encoding: gzip, deflate
|
||||||
|
Connection: keep-alive
|
||||||
|
Host: dspacetest.cgiar.org
|
||||||
|
User-Agent: HTTPie/1.0.2
|
||||||
|
|
||||||
|
HTTP/1.1 200 OK
|
||||||
|
Connection: keep-alive
|
||||||
|
Content-Language: en-US
|
||||||
|
Content-Length: 2069158
|
||||||
|
Content-Type: application/pdf;charset=ISO-8859-1
|
||||||
|
Date: Sun, 07 Apr 2019 08:39:01 GMT
|
||||||
|
Expires: Sun, 07 Apr 2019 09:39:01 GMT
|
||||||
|
Last-Modified: Thu, 14 Mar 2019 11:20:05 GMT
|
||||||
|
Server: nginx
|
||||||
|
Set-Cookie: JSESSIONID=36C8502257CC6C72FD3BC9EBF91C4A0E; Path=/; Secure; HttpOnly
|
||||||
|
Vary: User-Agent
|
||||||
|
X-Cocoon-Version: 2.2.0
|
||||||
|
X-Content-Type-Options: nosniff
|
||||||
|
X-Frame-Options: SAMEORIGIN
|
||||||
|
X-Robots-Tag: none
|
||||||
|
X-XSS-Protection: 1; mode=block
|
||||||
|
```
|
||||||
|
|
||||||
|
- And from the server side, the nginx logs show:
|
||||||
|
|
||||||
|
```
|
||||||
|
78.x.x.x - - [07/Apr/2019:01:38:35 -0700] "GET /bitstream/handle/10568/100289/Spore-192-EN-web.pdf HTTP/1.1" 200 68078 "-" "HTTPie/1.0.2"
|
||||||
|
78.x.x.x - - [07/Apr/2019:01:39:01 -0700] "HEAD /bitstream/handle/10568/100289/Spore-192-EN-web.pdf HTTP/1.1" 200 0 "-" "HTTPie/1.0.2"
|
||||||
|
```
|
||||||
|
|
||||||
|
- So definitely the *size* of the transfer is more efficient with a HEAD, but I need to wait to see if these requests show up in Solr
|
||||||
|
|
||||||
<!-- vim: set sw=2 ts=2: -->
|
<!-- vim: set sw=2 ts=2: -->
|
||||||
|
@ -38,7 +38,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
|||||||
<meta property="og:type" content="article" />
|
<meta property="og:type" content="article" />
|
||||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-04/" />
|
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-04/" />
|
||||||
<meta property="article:published_time" content="2019-04-01T09:00:43+03:00"/>
|
<meta property="article:published_time" content="2019-04-01T09:00:43+03:00"/>
|
||||||
<meta property="article:modified_time" content="2019-04-06T12:01:09+03:00"/>
|
<meta property="article:modified_time" content="2019-04-06T12:06:14+03:00"/>
|
||||||
|
|
||||||
<meta name="twitter:card" content="summary"/>
|
<meta name="twitter:card" content="summary"/>
|
||||||
<meta name="twitter:title" content="April, 2019"/>
|
<meta name="twitter:title" content="April, 2019"/>
|
||||||
@ -81,9 +81,9 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
|||||||
"@type": "BlogPosting",
|
"@type": "BlogPosting",
|
||||||
"headline": "April, 2019",
|
"headline": "April, 2019",
|
||||||
"url": "https://alanorth.github.io/cgspace-notes/2019-04/",
|
"url": "https://alanorth.github.io/cgspace-notes/2019-04/",
|
||||||
"wordCount": "1056",
|
"wordCount": "1457",
|
||||||
"datePublished": "2019-04-01T09:00:43+03:00",
|
"datePublished": "2019-04-01T09:00:43+03:00",
|
||||||
"dateModified": "2019-04-06T12:01:09+03:00",
|
"dateModified": "2019-04-06T12:06:14+03:00",
|
||||||
"author": {
|
"author": {
|
||||||
"@type": "Person",
|
"@type": "Person",
|
||||||
"name": "Alan Orth"
|
"name": "Alan Orth"
|
||||||
@ -359,6 +359,139 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
|||||||
</ul></li>
|
</ul></li>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
|
<h2 id="2019-04-07">2019-04-07</h2>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>Looking into the impact of harvesters like <code>45.5.184.72</code>, I see in Solr that this user is not categorized as a bot so it definitely impacts the usage stats by some tens of thousands <em>per day</em></li>
|
||||||
|
<li>Last week CTA switched their frontend code to use HEAD requests instead of GET requests for PDF bitstreams
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>I am trying to see if these are registered as downloads in Solr or not</li>
|
||||||
|
<li>I see 96,925 downloads from their AWS gateway IPs in 2019-03:</li>
|
||||||
|
</ul></li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>$ http --print b 'http://localhost:8081/solr/statistics/select?q=type%3A0+AND+(ip%3A18.196.196.108+OR+ip%3A18.195.78.144+OR+ip%3A18.195.218.6)&fq=statistics_type%3Aview&fq=bundleName%3AORIGINAL&fq=dateYearMonth%3A2019-03&rows=0&wt=json&indent=true'
|
||||||
|
{
|
||||||
|
"response": {
|
||||||
|
"docs": [],
|
||||||
|
"numFound": 96925,
|
||||||
|
"start": 0
|
||||||
|
},
|
||||||
|
"responseHeader": {
|
||||||
|
"QTime": 1,
|
||||||
|
"params": {
|
||||||
|
"fq": [
|
||||||
|
"statistics_type:view",
|
||||||
|
"bundleName:ORIGINAL",
|
||||||
|
"dateYearMonth:2019-03"
|
||||||
|
],
|
||||||
|
"indent": "true",
|
||||||
|
"q": "type:0 AND (ip:18.196.196.108 OR ip:18.195.78.144 OR ip:18.195.218.6)",
|
||||||
|
"rows": "0",
|
||||||
|
"wt": "json"
|
||||||
|
},
|
||||||
|
"status": 0
|
||||||
|
}
|
||||||
|
}
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>Strangely I don’t see many hits in 2019-04:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>$ http --print b 'http://localhost:8081/solr/statistics/select?q=type%3A0+AND+(ip%3A18.196.196.108+OR+ip%3A18.195.78.144+OR+ip%3A18.195.218.6)&fq=statistics_type%3Aview&fq=bundleName%3AORIGINAL&fq=dateYearMonth%3A2019-04&rows=0&wt=json&indent=true'
|
||||||
|
{
|
||||||
|
"response": {
|
||||||
|
"docs": [],
|
||||||
|
"numFound": 38,
|
||||||
|
"start": 0
|
||||||
|
},
|
||||||
|
"responseHeader": {
|
||||||
|
"QTime": 1,
|
||||||
|
"params": {
|
||||||
|
"fq": [
|
||||||
|
"statistics_type:view",
|
||||||
|
"bundleName:ORIGINAL",
|
||||||
|
"dateYearMonth:2019-04"
|
||||||
|
],
|
||||||
|
"indent": "true",
|
||||||
|
"q": "type:0 AND (ip:18.196.196.108 OR ip:18.195.78.144 OR ip:18.195.218.6)",
|
||||||
|
"rows": "0",
|
||||||
|
"wt": "json"
|
||||||
|
},
|
||||||
|
"status": 0
|
||||||
|
}
|
||||||
|
}
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>Making some tests on GET vs HEAD requests on the <a href="https://dspacetest.cgiar.org/handle/10568/100289">CTA Spore 192 item</a> on DSpace Test:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>$ http --print Hh GET https://dspacetest.cgiar.org/bitstream/handle/10568/100289/Spore-192-EN-web.pdf
|
||||||
|
GET /bitstream/handle/10568/100289/Spore-192-EN-web.pdf HTTP/1.1
|
||||||
|
Accept: */*
|
||||||
|
Accept-Encoding: gzip, deflate
|
||||||
|
Connection: keep-alive
|
||||||
|
Host: dspacetest.cgiar.org
|
||||||
|
User-Agent: HTTPie/1.0.2
|
||||||
|
|
||||||
|
HTTP/1.1 200 OK
|
||||||
|
Connection: keep-alive
|
||||||
|
Content-Language: en-US
|
||||||
|
Content-Length: 2069158
|
||||||
|
Content-Type: application/pdf;charset=ISO-8859-1
|
||||||
|
Date: Sun, 07 Apr 2019 08:38:34 GMT
|
||||||
|
Expires: Sun, 07 Apr 2019 09:38:34 GMT
|
||||||
|
Last-Modified: Thu, 14 Mar 2019 11:20:05 GMT
|
||||||
|
Server: nginx
|
||||||
|
Set-Cookie: JSESSIONID=21A492CC31CA8845278DFA078BD2D9ED; Path=/; Secure; HttpOnly
|
||||||
|
Vary: User-Agent
|
||||||
|
X-Cocoon-Version: 2.2.0
|
||||||
|
X-Content-Type-Options: nosniff
|
||||||
|
X-Frame-Options: SAMEORIGIN
|
||||||
|
X-Robots-Tag: none
|
||||||
|
X-XSS-Protection: 1; mode=block
|
||||||
|
|
||||||
|
$ http --print Hh HEAD https://dspacetest.cgiar.org/bitstream/handle/10568/100289/Spore-192-EN-web.pdf
|
||||||
|
HEAD /bitstream/handle/10568/100289/Spore-192-EN-web.pdf HTTP/1.1
|
||||||
|
Accept: */*
|
||||||
|
Accept-Encoding: gzip, deflate
|
||||||
|
Connection: keep-alive
|
||||||
|
Host: dspacetest.cgiar.org
|
||||||
|
User-Agent: HTTPie/1.0.2
|
||||||
|
|
||||||
|
HTTP/1.1 200 OK
|
||||||
|
Connection: keep-alive
|
||||||
|
Content-Language: en-US
|
||||||
|
Content-Length: 2069158
|
||||||
|
Content-Type: application/pdf;charset=ISO-8859-1
|
||||||
|
Date: Sun, 07 Apr 2019 08:39:01 GMT
|
||||||
|
Expires: Sun, 07 Apr 2019 09:39:01 GMT
|
||||||
|
Last-Modified: Thu, 14 Mar 2019 11:20:05 GMT
|
||||||
|
Server: nginx
|
||||||
|
Set-Cookie: JSESSIONID=36C8502257CC6C72FD3BC9EBF91C4A0E; Path=/; Secure; HttpOnly
|
||||||
|
Vary: User-Agent
|
||||||
|
X-Cocoon-Version: 2.2.0
|
||||||
|
X-Content-Type-Options: nosniff
|
||||||
|
X-Frame-Options: SAMEORIGIN
|
||||||
|
X-Robots-Tag: none
|
||||||
|
X-XSS-Protection: 1; mode=block
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>And from the server side, the nginx logs show:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>78.x.x.x - - [07/Apr/2019:01:38:35 -0700] "GET /bitstream/handle/10568/100289/Spore-192-EN-web.pdf HTTP/1.1" 200 68078 "-" "HTTPie/1.0.2"
|
||||||
|
78.x.x.x - - [07/Apr/2019:01:39:01 -0700] "HEAD /bitstream/handle/10568/100289/Spore-192-EN-web.pdf HTTP/1.1" 200 0 "-" "HTTPie/1.0.2"
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>So definitely the <em>size</em> of the transfer is more efficient with a HEAD, but I need to wait to see if these requests show up in Solr</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
<!-- vim: set sw=2 ts=2: -->
|
<!-- vim: set sw=2 ts=2: -->
|
||||||
|
|
||||||
|
|
||||||
|
@ -4,7 +4,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/2019-04/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/2019-04/</loc>
|
||||||
<lastmod>2019-04-06T12:01:09+03:00</lastmod>
|
<lastmod>2019-04-06T12:06:14+03:00</lastmod>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
@ -219,7 +219,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||||
<lastmod>2019-04-06T12:01:09+03:00</lastmod>
|
<lastmod>2019-04-06T12:06:14+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
@ -230,7 +230,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||||
<lastmod>2019-04-06T12:01:09+03:00</lastmod>
|
<lastmod>2019-04-06T12:06:14+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
@ -242,13 +242,13 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||||
<lastmod>2019-04-06T12:01:09+03:00</lastmod>
|
<lastmod>2019-04-06T12:06:14+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||||
<lastmod>2019-04-06T12:01:09+03:00</lastmod>
|
<lastmod>2019-04-06T12:06:14+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user