mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Update notes for 2019-04-07
This commit is contained in:
@ -173,7 +173,7 @@ GET /handle/10568/72970/discover?filtertype_0=type&filtertype_1=author&filter_re
|
||||
## 2019-04-07
|
||||
|
||||
- Looking into the impact of harvesters like `45.5.184.72`, I see in Solr that this user is not categorized as a bot so it definitely impacts the usage stats by some tens of thousands *per day*
|
||||
- Last week CTA switched their frontend code to use HEAD requests instead of GET requests for PDF bitstreams
|
||||
- Last week CTA switched their frontend code to use HEAD requests instead of GET requests for bitstreams
|
||||
- I am trying to see if these are registered as downloads in Solr or not
|
||||
- I see 96,925 downloads from their AWS gateway IPs in 2019-03:
|
||||
|
||||
@ -293,5 +293,63 @@ X-XSS-Protection: 1; mode=block
|
||||
```
|
||||
|
||||
- So definitely the *size* of the transfer is more efficient with a HEAD, but I need to wait to see if these requests show up in Solr
|
||||
- After twenty minutes of waiting I still don't see any new requests in the statistics core, but when I try the requests from the command line again I see the following in the DSpace log:
|
||||
|
||||
```
|
||||
2019-04-07 02:05:30,966 INFO org.dspace.usage.LoggerUsageEventListener @ anonymous:session_id=EF2DB6E4F69926C5555B3492BB0071A8:ip_addr=78.x.x.x:view_bitstream:bitstream_id=165818
|
||||
2019-04-07 02:05:39,265 INFO org.dspace.usage.LoggerUsageEventListener @ anonymous:session_id=B6381FC590A5160D84930102D068C7A3:ip_addr=78.x.x.x:view_bitstream:bitstream_id=165818
|
||||
```
|
||||
|
||||
- So my inclination is that both HEAD and GET requests are registered as views as far as Solr and DSpace are concerned
|
||||
- Strangely, the statistics Solr core says it hasn't been modified in 24 hours, so I tried to start the "optimize" process from the Admin UI and I see this in the Solr log:
|
||||
|
||||
```
|
||||
2019-04-07 02:08:44,186 INFO org.apache.solr.update.UpdateHandler @ start commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
|
||||
```
|
||||
|
||||
- Ugh, even after optimizing there are no Solr results for requests from my IP, and actually I only see 18 results from 2019-04 so far and none of them are `statistics_type:view`... very weird
|
||||
- I don't even see many hits for days after 2019-03-17, when I migrated the server to Ubuntu 18.04 and copied the statistics core from CGSpace (linode18)
|
||||
- I will try to re-deploy the `5_x-dev` branch and test again
|
||||
- According to the [DSpace 5.x Solr documentation](https://wiki.duraspace.org/display/DSDOC5x/SOLR+Statistics) the default commit time is after 15 minutes or 10,000 documents (see `solrconfig.xml`)
|
||||
- I looped some GET and HEAD requests to a bitstream on my local instance and after some time I see that they *do* register as downloads (even though they are internal):
|
||||
|
||||
```
|
||||
$ http --print b 'http://localhost:8080/solr/statistics/select?q=type%3A0+AND+time%3A2019-04-07*&fq=statistics_type%3Aview&fq=isInternal%3Atrue&rows=0&wt=json&indent=true'
|
||||
{
|
||||
"response": {
|
||||
"docs": [],
|
||||
"numFound": 909,
|
||||
"start": 0
|
||||
},
|
||||
"responseHeader": {
|
||||
"QTime": 0,
|
||||
"params": {
|
||||
"fq": [
|
||||
"statistics_type:view",
|
||||
"isInternal:true"
|
||||
],
|
||||
"indent": "true",
|
||||
"q": "type:0 AND time:2019-04-07*",
|
||||
"rows": "0",
|
||||
"wt": "json"
|
||||
},
|
||||
"status": 0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- I confirmed the same on CGSpace itself after making one HEAD request
|
||||
- So I'm pretty sure it's something about DSpace Test using the CGSpace statistics core, and not that I deployed Solr 4.10.4 there last week
|
||||
- I deployed Solr 4.10.4 locally and ran a bunch of requests for bitstreams and they do show up in the Solr statistics log, so the issue must be with re-using the existing Solr core from CGSpace
|
||||
- Now this gets more frustrating: I did the same GET and HEAD tests on a local Ubuntu 16.04 VM with Solr 4.10.2 and 4.10.4 and the statistics are recorded
|
||||
- This leads me to believe there is something specifically wrong with DSpace Test (linode19)
|
||||
- The only thing I can think of is that the JVM is using G1GC instead of ConcMarkSweepGC
|
||||
- Holy shit, all this is actually because of the GeoIP1 deprecation and a missing `GeoLiteCity.dat`
|
||||
- For some reason the missing GeoIP data causes stats to not be recorded whatsoever and there is no error!
|
||||
- See: [DS-3986](https://jira.duraspace.org/browse/DS-3986)
|
||||
- See: [DS-4020](https://jira.duraspace.org/browse/DS-4020)
|
||||
- See: [DS-3832](https://jira.duraspace.org/browse/DS-3832)
|
||||
- DSpace 5.10 upgraded to use GeoIP2, but we are on 5.8 so I just copied the missing database file from another server because it has been *removed* from MaxMind's server as of 2018-04-01
|
||||
- Now I made 100 requests and I see them in the Solr statistics... fuck my life for wasting five hours debugging this
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user