mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2019-04-07
This commit is contained in:
@ -170,4 +170,128 @@ GET /handle/10568/72970/discover?filtertype_0=type&filtertype_1=author&filter_re
|
||||
- Maria from Bioversity recommended that we use the phrase "AGROVOC subject" instead of "Subject" in Listings and Reports
|
||||
- I made a pull request to update this and merged it to the `5_x-prod` branch ([#418](https://github.com/ilri/DSpace/pull/418))
|
||||
|
||||
## 2019-04-07
|
||||
|
||||
- Looking into the impact of harvesters like `45.5.184.72`, I see in Solr that this user is not categorized as a bot so it definitely impacts the usage stats by some tens of thousands *per day*
|
||||
- Last week CTA switched their frontend code to use HEAD requests instead of GET requests for PDF bitstreams
|
||||
- I am trying to see if these are registered as downloads in Solr or not
|
||||
- I see 96,925 downloads from their AWS gateway IPs in 2019-03:
|
||||
|
||||
```
|
||||
$ http --print b 'http://localhost:8081/solr/statistics/select?q=type%3A0+AND+(ip%3A18.196.196.108+OR+ip%3A18.195.78.144+OR+ip%3A18.195.218.6)&fq=statistics_type%3Aview&fq=bundleName%3AORIGINAL&fq=dateYearMonth%3A2019-03&rows=0&wt=json&indent=true'
|
||||
{
|
||||
"response": {
|
||||
"docs": [],
|
||||
"numFound": 96925,
|
||||
"start": 0
|
||||
},
|
||||
"responseHeader": {
|
||||
"QTime": 1,
|
||||
"params": {
|
||||
"fq": [
|
||||
"statistics_type:view",
|
||||
"bundleName:ORIGINAL",
|
||||
"dateYearMonth:2019-03"
|
||||
],
|
||||
"indent": "true",
|
||||
"q": "type:0 AND (ip:18.196.196.108 OR ip:18.195.78.144 OR ip:18.195.218.6)",
|
||||
"rows": "0",
|
||||
"wt": "json"
|
||||
},
|
||||
"status": 0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- Strangely I don't see many hits in 2019-04:
|
||||
|
||||
```
|
||||
$ http --print b 'http://localhost:8081/solr/statistics/select?q=type%3A0+AND+(ip%3A18.196.196.108+OR+ip%3A18.195.78.144+OR+ip%3A18.195.218.6)&fq=statistics_type%3Aview&fq=bundleName%3AORIGINAL&fq=dateYearMonth%3A2019-04&rows=0&wt=json&indent=true'
|
||||
{
|
||||
"response": {
|
||||
"docs": [],
|
||||
"numFound": 38,
|
||||
"start": 0
|
||||
},
|
||||
"responseHeader": {
|
||||
"QTime": 1,
|
||||
"params": {
|
||||
"fq": [
|
||||
"statistics_type:view",
|
||||
"bundleName:ORIGINAL",
|
||||
"dateYearMonth:2019-04"
|
||||
],
|
||||
"indent": "true",
|
||||
"q": "type:0 AND (ip:18.196.196.108 OR ip:18.195.78.144 OR ip:18.195.218.6)",
|
||||
"rows": "0",
|
||||
"wt": "json"
|
||||
},
|
||||
"status": 0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- Making some tests on GET vs HEAD requests on the [CTA Spore 192 item](https://dspacetest.cgiar.org/handle/10568/100289) on DSpace Test:
|
||||
|
||||
```
|
||||
$ http --print Hh GET https://dspacetest.cgiar.org/bitstream/handle/10568/100289/Spore-192-EN-web.pdf
|
||||
GET /bitstream/handle/10568/100289/Spore-192-EN-web.pdf HTTP/1.1
|
||||
Accept: */*
|
||||
Accept-Encoding: gzip, deflate
|
||||
Connection: keep-alive
|
||||
Host: dspacetest.cgiar.org
|
||||
User-Agent: HTTPie/1.0.2
|
||||
|
||||
HTTP/1.1 200 OK
|
||||
Connection: keep-alive
|
||||
Content-Language: en-US
|
||||
Content-Length: 2069158
|
||||
Content-Type: application/pdf;charset=ISO-8859-1
|
||||
Date: Sun, 07 Apr 2019 08:38:34 GMT
|
||||
Expires: Sun, 07 Apr 2019 09:38:34 GMT
|
||||
Last-Modified: Thu, 14 Mar 2019 11:20:05 GMT
|
||||
Server: nginx
|
||||
Set-Cookie: JSESSIONID=21A492CC31CA8845278DFA078BD2D9ED; Path=/; Secure; HttpOnly
|
||||
Vary: User-Agent
|
||||
X-Cocoon-Version: 2.2.0
|
||||
X-Content-Type-Options: nosniff
|
||||
X-Frame-Options: SAMEORIGIN
|
||||
X-Robots-Tag: none
|
||||
X-XSS-Protection: 1; mode=block
|
||||
|
||||
$ http --print Hh HEAD https://dspacetest.cgiar.org/bitstream/handle/10568/100289/Spore-192-EN-web.pdf
|
||||
HEAD /bitstream/handle/10568/100289/Spore-192-EN-web.pdf HTTP/1.1
|
||||
Accept: */*
|
||||
Accept-Encoding: gzip, deflate
|
||||
Connection: keep-alive
|
||||
Host: dspacetest.cgiar.org
|
||||
User-Agent: HTTPie/1.0.2
|
||||
|
||||
HTTP/1.1 200 OK
|
||||
Connection: keep-alive
|
||||
Content-Language: en-US
|
||||
Content-Length: 2069158
|
||||
Content-Type: application/pdf;charset=ISO-8859-1
|
||||
Date: Sun, 07 Apr 2019 08:39:01 GMT
|
||||
Expires: Sun, 07 Apr 2019 09:39:01 GMT
|
||||
Last-Modified: Thu, 14 Mar 2019 11:20:05 GMT
|
||||
Server: nginx
|
||||
Set-Cookie: JSESSIONID=36C8502257CC6C72FD3BC9EBF91C4A0E; Path=/; Secure; HttpOnly
|
||||
Vary: User-Agent
|
||||
X-Cocoon-Version: 2.2.0
|
||||
X-Content-Type-Options: nosniff
|
||||
X-Frame-Options: SAMEORIGIN
|
||||
X-Robots-Tag: none
|
||||
X-XSS-Protection: 1; mode=block
|
||||
```
|
||||
|
||||
- And from the server side, the nginx logs show:
|
||||
|
||||
```
|
||||
78.x.x.x - - [07/Apr/2019:01:38:35 -0700] "GET /bitstream/handle/10568/100289/Spore-192-EN-web.pdf HTTP/1.1" 200 68078 "-" "HTTPie/1.0.2"
|
||||
78.x.x.x - - [07/Apr/2019:01:39:01 -0700] "HEAD /bitstream/handle/10568/100289/Spore-192-EN-web.pdf HTTP/1.1" 200 0 "-" "HTTPie/1.0.2"
|
||||
```
|
||||
|
||||
- So definitely the *size* of the transfer is more efficient with a HEAD, but I need to wait to see if these requests show up in Solr
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user