mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-22 14:45:03 +01:00
Add notes for 2019-04-07
This commit is contained in:
parent
aa1264aa42
commit
89a4212e2b
@ -170,4 +170,128 @@ GET /handle/10568/72970/discover?filtertype_0=type&filtertype_1=author&filter_re
|
||||
- Maria from Bioversity recommended that we use the phrase "AGROVOC subject" instead of "Subject" in Listings and Reports
|
||||
- I made a pull request to update this and merged it to the `5_x-prod` branch ([#418](https://github.com/ilri/DSpace/pull/418))
|
||||
|
||||
## 2019-04-07
|
||||
|
||||
- Looking into the impact of harvesters like `45.5.184.72`, I see in Solr that this user is not categorized as a bot so it definitely impacts the usage stats by some tens of thousands *per day*
|
||||
- Last week CTA switched their frontend code to use HEAD requests instead of GET requests for PDF bitstreams
|
||||
- I am trying to see if these are registered as downloads in Solr or not
|
||||
- I see 96,925 downloads from their AWS gateway IPs in 2019-03:
|
||||
|
||||
```
|
||||
$ http --print b 'http://localhost:8081/solr/statistics/select?q=type%3A0+AND+(ip%3A18.196.196.108+OR+ip%3A18.195.78.144+OR+ip%3A18.195.218.6)&fq=statistics_type%3Aview&fq=bundleName%3AORIGINAL&fq=dateYearMonth%3A2019-03&rows=0&wt=json&indent=true'
|
||||
{
|
||||
"response": {
|
||||
"docs": [],
|
||||
"numFound": 96925,
|
||||
"start": 0
|
||||
},
|
||||
"responseHeader": {
|
||||
"QTime": 1,
|
||||
"params": {
|
||||
"fq": [
|
||||
"statistics_type:view",
|
||||
"bundleName:ORIGINAL",
|
||||
"dateYearMonth:2019-03"
|
||||
],
|
||||
"indent": "true",
|
||||
"q": "type:0 AND (ip:18.196.196.108 OR ip:18.195.78.144 OR ip:18.195.218.6)",
|
||||
"rows": "0",
|
||||
"wt": "json"
|
||||
},
|
||||
"status": 0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- Strangely I don't see many hits in 2019-04:
|
||||
|
||||
```
|
||||
$ http --print b 'http://localhost:8081/solr/statistics/select?q=type%3A0+AND+(ip%3A18.196.196.108+OR+ip%3A18.195.78.144+OR+ip%3A18.195.218.6)&fq=statistics_type%3Aview&fq=bundleName%3AORIGINAL&fq=dateYearMonth%3A2019-04&rows=0&wt=json&indent=true'
|
||||
{
|
||||
"response": {
|
||||
"docs": [],
|
||||
"numFound": 38,
|
||||
"start": 0
|
||||
},
|
||||
"responseHeader": {
|
||||
"QTime": 1,
|
||||
"params": {
|
||||
"fq": [
|
||||
"statistics_type:view",
|
||||
"bundleName:ORIGINAL",
|
||||
"dateYearMonth:2019-04"
|
||||
],
|
||||
"indent": "true",
|
||||
"q": "type:0 AND (ip:18.196.196.108 OR ip:18.195.78.144 OR ip:18.195.218.6)",
|
||||
"rows": "0",
|
||||
"wt": "json"
|
||||
},
|
||||
"status": 0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- Making some tests on GET vs HEAD requests on the [CTA Spore 192 item](https://dspacetest.cgiar.org/handle/10568/100289) on DSpace Test:
|
||||
|
||||
```
|
||||
$ http --print Hh GET https://dspacetest.cgiar.org/bitstream/handle/10568/100289/Spore-192-EN-web.pdf
|
||||
GET /bitstream/handle/10568/100289/Spore-192-EN-web.pdf HTTP/1.1
|
||||
Accept: */*
|
||||
Accept-Encoding: gzip, deflate
|
||||
Connection: keep-alive
|
||||
Host: dspacetest.cgiar.org
|
||||
User-Agent: HTTPie/1.0.2
|
||||
|
||||
HTTP/1.1 200 OK
|
||||
Connection: keep-alive
|
||||
Content-Language: en-US
|
||||
Content-Length: 2069158
|
||||
Content-Type: application/pdf;charset=ISO-8859-1
|
||||
Date: Sun, 07 Apr 2019 08:38:34 GMT
|
||||
Expires: Sun, 07 Apr 2019 09:38:34 GMT
|
||||
Last-Modified: Thu, 14 Mar 2019 11:20:05 GMT
|
||||
Server: nginx
|
||||
Set-Cookie: JSESSIONID=21A492CC31CA8845278DFA078BD2D9ED; Path=/; Secure; HttpOnly
|
||||
Vary: User-Agent
|
||||
X-Cocoon-Version: 2.2.0
|
||||
X-Content-Type-Options: nosniff
|
||||
X-Frame-Options: SAMEORIGIN
|
||||
X-Robots-Tag: none
|
||||
X-XSS-Protection: 1; mode=block
|
||||
|
||||
$ http --print Hh HEAD https://dspacetest.cgiar.org/bitstream/handle/10568/100289/Spore-192-EN-web.pdf
|
||||
HEAD /bitstream/handle/10568/100289/Spore-192-EN-web.pdf HTTP/1.1
|
||||
Accept: */*
|
||||
Accept-Encoding: gzip, deflate
|
||||
Connection: keep-alive
|
||||
Host: dspacetest.cgiar.org
|
||||
User-Agent: HTTPie/1.0.2
|
||||
|
||||
HTTP/1.1 200 OK
|
||||
Connection: keep-alive
|
||||
Content-Language: en-US
|
||||
Content-Length: 2069158
|
||||
Content-Type: application/pdf;charset=ISO-8859-1
|
||||
Date: Sun, 07 Apr 2019 08:39:01 GMT
|
||||
Expires: Sun, 07 Apr 2019 09:39:01 GMT
|
||||
Last-Modified: Thu, 14 Mar 2019 11:20:05 GMT
|
||||
Server: nginx
|
||||
Set-Cookie: JSESSIONID=36C8502257CC6C72FD3BC9EBF91C4A0E; Path=/; Secure; HttpOnly
|
||||
Vary: User-Agent
|
||||
X-Cocoon-Version: 2.2.0
|
||||
X-Content-Type-Options: nosniff
|
||||
X-Frame-Options: SAMEORIGIN
|
||||
X-Robots-Tag: none
|
||||
X-XSS-Protection: 1; mode=block
|
||||
```
|
||||
|
||||
- And from the server side, the nginx logs show:
|
||||
|
||||
```
|
||||
78.x.x.x - - [07/Apr/2019:01:38:35 -0700] "GET /bitstream/handle/10568/100289/Spore-192-EN-web.pdf HTTP/1.1" 200 68078 "-" "HTTPie/1.0.2"
|
||||
78.x.x.x - - [07/Apr/2019:01:39:01 -0700] "HEAD /bitstream/handle/10568/100289/Spore-192-EN-web.pdf HTTP/1.1" 200 0 "-" "HTTPie/1.0.2"
|
||||
```
|
||||
|
||||
- So definitely the *size* of the transfer is more efficient with a HEAD, but I need to wait to see if these requests show up in Solr
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
@ -38,7 +38,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-04/" />
|
||||
<meta property="article:published_time" content="2019-04-01T09:00:43+03:00"/>
|
||||
<meta property="article:modified_time" content="2019-04-06T12:01:09+03:00"/>
|
||||
<meta property="article:modified_time" content="2019-04-06T12:06:14+03:00"/>
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="April, 2019"/>
|
||||
@ -81,9 +81,9 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
||||
"@type": "BlogPosting",
|
||||
"headline": "April, 2019",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2019-04/",
|
||||
"wordCount": "1056",
|
||||
"wordCount": "1457",
|
||||
"datePublished": "2019-04-01T09:00:43+03:00",
|
||||
"dateModified": "2019-04-06T12:01:09+03:00",
|
||||
"dateModified": "2019-04-06T12:06:14+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -359,6 +359,139 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
||||
</ul></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2019-04-07">2019-04-07</h2>
|
||||
|
||||
<ul>
|
||||
<li>Looking into the impact of harvesters like <code>45.5.184.72</code>, I see in Solr that this user is not categorized as a bot so it definitely impacts the usage stats by some tens of thousands <em>per day</em></li>
|
||||
<li>Last week CTA switched their frontend code to use HEAD requests instead of GET requests for PDF bitstreams
|
||||
|
||||
<ul>
|
||||
<li>I am trying to see if these are registered as downloads in Solr or not</li>
|
||||
<li>I see 96,925 downloads from their AWS gateway IPs in 2019-03:</li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
|
||||
<pre><code>$ http --print b 'http://localhost:8081/solr/statistics/select?q=type%3A0+AND+(ip%3A18.196.196.108+OR+ip%3A18.195.78.144+OR+ip%3A18.195.218.6)&fq=statistics_type%3Aview&fq=bundleName%3AORIGINAL&fq=dateYearMonth%3A2019-03&rows=0&wt=json&indent=true'
|
||||
{
|
||||
"response": {
|
||||
"docs": [],
|
||||
"numFound": 96925,
|
||||
"start": 0
|
||||
},
|
||||
"responseHeader": {
|
||||
"QTime": 1,
|
||||
"params": {
|
||||
"fq": [
|
||||
"statistics_type:view",
|
||||
"bundleName:ORIGINAL",
|
||||
"dateYearMonth:2019-03"
|
||||
],
|
||||
"indent": "true",
|
||||
"q": "type:0 AND (ip:18.196.196.108 OR ip:18.195.78.144 OR ip:18.195.218.6)",
|
||||
"rows": "0",
|
||||
"wt": "json"
|
||||
},
|
||||
"status": 0
|
||||
}
|
||||
}
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>Strangely I don’t see many hits in 2019-04:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>$ http --print b 'http://localhost:8081/solr/statistics/select?q=type%3A0+AND+(ip%3A18.196.196.108+OR+ip%3A18.195.78.144+OR+ip%3A18.195.218.6)&fq=statistics_type%3Aview&fq=bundleName%3AORIGINAL&fq=dateYearMonth%3A2019-04&rows=0&wt=json&indent=true'
|
||||
{
|
||||
"response": {
|
||||
"docs": [],
|
||||
"numFound": 38,
|
||||
"start": 0
|
||||
},
|
||||
"responseHeader": {
|
||||
"QTime": 1,
|
||||
"params": {
|
||||
"fq": [
|
||||
"statistics_type:view",
|
||||
"bundleName:ORIGINAL",
|
||||
"dateYearMonth:2019-04"
|
||||
],
|
||||
"indent": "true",
|
||||
"q": "type:0 AND (ip:18.196.196.108 OR ip:18.195.78.144 OR ip:18.195.218.6)",
|
||||
"rows": "0",
|
||||
"wt": "json"
|
||||
},
|
||||
"status": 0
|
||||
}
|
||||
}
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>Making some tests on GET vs HEAD requests on the <a href="https://dspacetest.cgiar.org/handle/10568/100289">CTA Spore 192 item</a> on DSpace Test:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>$ http --print Hh GET https://dspacetest.cgiar.org/bitstream/handle/10568/100289/Spore-192-EN-web.pdf
|
||||
GET /bitstream/handle/10568/100289/Spore-192-EN-web.pdf HTTP/1.1
|
||||
Accept: */*
|
||||
Accept-Encoding: gzip, deflate
|
||||
Connection: keep-alive
|
||||
Host: dspacetest.cgiar.org
|
||||
User-Agent: HTTPie/1.0.2
|
||||
|
||||
HTTP/1.1 200 OK
|
||||
Connection: keep-alive
|
||||
Content-Language: en-US
|
||||
Content-Length: 2069158
|
||||
Content-Type: application/pdf;charset=ISO-8859-1
|
||||
Date: Sun, 07 Apr 2019 08:38:34 GMT
|
||||
Expires: Sun, 07 Apr 2019 09:38:34 GMT
|
||||
Last-Modified: Thu, 14 Mar 2019 11:20:05 GMT
|
||||
Server: nginx
|
||||
Set-Cookie: JSESSIONID=21A492CC31CA8845278DFA078BD2D9ED; Path=/; Secure; HttpOnly
|
||||
Vary: User-Agent
|
||||
X-Cocoon-Version: 2.2.0
|
||||
X-Content-Type-Options: nosniff
|
||||
X-Frame-Options: SAMEORIGIN
|
||||
X-Robots-Tag: none
|
||||
X-XSS-Protection: 1; mode=block
|
||||
|
||||
$ http --print Hh HEAD https://dspacetest.cgiar.org/bitstream/handle/10568/100289/Spore-192-EN-web.pdf
|
||||
HEAD /bitstream/handle/10568/100289/Spore-192-EN-web.pdf HTTP/1.1
|
||||
Accept: */*
|
||||
Accept-Encoding: gzip, deflate
|
||||
Connection: keep-alive
|
||||
Host: dspacetest.cgiar.org
|
||||
User-Agent: HTTPie/1.0.2
|
||||
|
||||
HTTP/1.1 200 OK
|
||||
Connection: keep-alive
|
||||
Content-Language: en-US
|
||||
Content-Length: 2069158
|
||||
Content-Type: application/pdf;charset=ISO-8859-1
|
||||
Date: Sun, 07 Apr 2019 08:39:01 GMT
|
||||
Expires: Sun, 07 Apr 2019 09:39:01 GMT
|
||||
Last-Modified: Thu, 14 Mar 2019 11:20:05 GMT
|
||||
Server: nginx
|
||||
Set-Cookie: JSESSIONID=36C8502257CC6C72FD3BC9EBF91C4A0E; Path=/; Secure; HttpOnly
|
||||
Vary: User-Agent
|
||||
X-Cocoon-Version: 2.2.0
|
||||
X-Content-Type-Options: nosniff
|
||||
X-Frame-Options: SAMEORIGIN
|
||||
X-Robots-Tag: none
|
||||
X-XSS-Protection: 1; mode=block
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>And from the server side, the nginx logs show:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>78.x.x.x - - [07/Apr/2019:01:38:35 -0700] "GET /bitstream/handle/10568/100289/Spore-192-EN-web.pdf HTTP/1.1" 200 68078 "-" "HTTPie/1.0.2"
|
||||
78.x.x.x - - [07/Apr/2019:01:39:01 -0700] "HEAD /bitstream/handle/10568/100289/Spore-192-EN-web.pdf HTTP/1.1" 200 0 "-" "HTTPie/1.0.2"
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>So definitely the <em>size</em> of the transfer is more efficient with a HEAD, but I need to wait to see if these requests show up in Solr</li>
|
||||
</ul>
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
||||
|
||||
|
@ -4,7 +4,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2019-04/</loc>
|
||||
<lastmod>2019-04-06T12:01:09+03:00</lastmod>
|
||||
<lastmod>2019-04-06T12:06:14+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
@ -219,7 +219,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||
<lastmod>2019-04-06T12:01:09+03:00</lastmod>
|
||||
<lastmod>2019-04-06T12:06:14+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
@ -230,7 +230,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||
<lastmod>2019-04-06T12:01:09+03:00</lastmod>
|
||||
<lastmod>2019-04-06T12:06:14+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
@ -242,13 +242,13 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||
<lastmod>2019-04-06T12:01:09+03:00</lastmod>
|
||||
<lastmod>2019-04-06T12:06:14+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||
<lastmod>2019-04-06T12:01:09+03:00</lastmod>
|
||||
<lastmod>2019-04-06T12:06:14+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user