mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Update notes for 2020-02-23
This commit is contained in:
@ -17,8 +17,7 @@ categories: ["Notes"]
|
||||
- I see Moayad was busy collecting item views and downloads from CGSpace yesterday:
|
||||
|
||||
```
|
||||
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Oct/2018" | awk '{print $1}
|
||||
' | sort | uniq -c | sort -n | tail -n 10
|
||||
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Oct/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
933 40.77.167.90
|
||||
971 95.108.181.88
|
||||
1043 41.204.190.40
|
||||
|
@ -486,10 +486,182 @@ Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)
|
||||
|
||||
- Another IP address (31.6.77.23) in the UK making a few hundred requests without a user agent
|
||||
- I will add the IP addresses to the nginx badbots list
|
||||
- 31.6.77.23 is in the UK and judging by its DNS it belongs to a [web marketing company called Bronco](https://www.bronco.co.uk/)
|
||||
- I looked for its DNS entry in Solr statistics and found a few hundred thousand over the years:
|
||||
|
||||
```
|
||||
$ curl -s "http://localhost:8081/solr/statistics/select" -d "q=dns:/squeeze3.bronco.co.uk./&rows=0"
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<response>
|
||||
<lst name="responseHeader"><int name="status">0</int><int name="QTime">4</int><lst name="params"><str name="q">dns:/squeeze3.bronco.co.uk./</str><str name="rows">0</str></lst></lst><result name="response" numFound="86044" start="0"></result>
|
||||
</response>
|
||||
```
|
||||
|
||||
- The totals in each core are:
|
||||
- statistics: 86044
|
||||
- statistics-2018: 65144
|
||||
- statistics-2017: 79405
|
||||
- statistics-2016: 121316
|
||||
- statistics-2015: 30720
|
||||
- statistics-2014: 4524
|
||||
- ... so about 387,000 hits!
|
||||
- I will purge them from each core one by one, ie:
|
||||
|
||||
```
|
||||
$ curl -s "http://localhost:8081/solr/statistics-2015/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>dns:squeeze3.bronco.co.uk.</query></delete>"
|
||||
$ curl -s "http://localhost:8081/solr/statistics-2014/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>dns:squeeze3.bronco.co.uk.</query></delete>"
|
||||
```
|
||||
|
||||
- Deploy latest Tomcat and PostgreSQL JDBC driver changes on CGSpace (linode18)
|
||||
- Deploy latest `5_x-prod` branch on CGSpace (linode18)
|
||||
- Run all system updates on CGSpace (linode18) server and reboot it
|
||||
- After the server came back up Tomcat started, but there were errors loading some Solr statistics cores
|
||||
- Luckily after restarting Tomcat once more they all came back up
|
||||
- I ran the `dspace cleanup -v` process on CGSpace and got an error:
|
||||
|
||||
```
|
||||
Error: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"
|
||||
Detail: Key (bitstream_id)=(183996) is still referenced from table "bundle".
|
||||
```
|
||||
|
||||
- The solution is, as always:
|
||||
|
||||
```
|
||||
# su - postgres
|
||||
$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (183996);'
|
||||
UPDATE 1
|
||||
```
|
||||
|
||||
- Аdd one more new Bioversity ORCID iD to the controlled vocabulary on CGSpace
|
||||
- Felix Shaw from Earlham emailed me to ask about his admin account on DSpace Test
|
||||
- His old one got lost when I re-sync'd DSpace Test with CGSpace a few weeks ago
|
||||
- I added a new account for him and added it to the Administrators group:
|
||||
|
||||
```
|
||||
$ dspace user -a -m wow@me.com -g Felix -s Shaw -p 'fuananaaa'
|
||||
```
|
||||
|
||||
- For some reason the Atmire Content and Usage Analysis (CUA) module's Usage Statistics is drawing blank graphs
|
||||
- I looked in the dspace.log and see:
|
||||
|
||||
```
|
||||
2020-02-23 11:28:13,696 ERROR org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter @ Serious Error Occurred Processing Request!
|
||||
org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.NoClassDefFoundError: Could not
|
||||
initialize class org.jfree.chart.JFreeChart
|
||||
```
|
||||
|
||||
- The same error happens on DSpace Test, but graphs are working on my local instance
|
||||
- The only thing I've changed recently is the Tomcat version, but it's working locally...
|
||||
- I see the following file on my local instance, CGSpace, and DSpace Test: `dspace/webapps/xmlui/WEB-INF/lib/jfreechart-1.0.5.jar`
|
||||
- I deployed Tomcat 7.0.99 on DSpace Test but the JFreeChart classs still can't be found...
|
||||
- So it must be somthing with the library search path...
|
||||
- Strange it works with Tomcat 7.0.100 on my local machine
|
||||
- I copied the `jfreechart-1.0.5.jar` file to the Tomcat lib folder and then there was a different error when I loaded Atmire CUA:
|
||||
|
||||
```
|
||||
2020-02-23 16:25:10,841 ERROR org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter @ Serious Error Occurred Processing Request! org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.awt.AWTError: Assistive Technology not found: org.GNOME.Accessibility.AtkWrapper
|
||||
```
|
||||
|
||||
- Some search results suggested commenting out the following line in `/etc/java-8-openjdk/accessibility.properties`:
|
||||
|
||||
```
|
||||
assistive_technologies=org.GNOME.Accessibility.AtkWrapper
|
||||
```
|
||||
|
||||
- And removing the extra jfreechart library and restarting Tomcat I was able to load the usage statistics graph on DSpace Test...
|
||||
- Hmm, actually I think this is an Java bug, perhaps introduced or at [least present in 18.04](https://bugs.openjdk.java.net/browse/JDK-8204862), with lots of [references](https://code-maven.com/slides/jenkins-intro/no-graph-error) to it [happening in other](https://issues.jenkins-ci.org/browse/JENKINS-39636) configurations like Debian 9 with Jenkins, etc...
|
||||
- Apparently if you use the *non-headless* version of openjdk this doesn't happen... but that pulls in X11 stuff so no thanks
|
||||
- Also, I see dozens of occurences of this going back over one month (we have logs for about that period):
|
||||
|
||||
```
|
||||
# grep -c 'initialize class org.jfree.chart.JFreeChart' dspace.log.2020-0*
|
||||
dspace.log.2020-01-12:4
|
||||
dspace.log.2020-01-13:66
|
||||
dspace.log.2020-01-14:4
|
||||
dspace.log.2020-01-15:36
|
||||
dspace.log.2020-01-16:88
|
||||
dspace.log.2020-01-17:4
|
||||
dspace.log.2020-01-18:4
|
||||
dspace.log.2020-01-19:4
|
||||
dspace.log.2020-01-20:4
|
||||
dspace.log.2020-01-21:4
|
||||
...
|
||||
```
|
||||
|
||||
- I deployed the fix on CGSpace (linode18) and I was able to see the graphs in the Atmire CUA Usage Statistics...
|
||||
- On an unrelated note there is something weird going on in that I see millions of hits from IP 34.218.226.147 in Solr statistics, but if I remember correctly that IP belongs to CodeObia's AReS explorer, but it should only be using REST and therefore no Solr statistics...?
|
||||
|
||||
```
|
||||
$ curl -s "http://localhost:8081/solr/statistics-2018/select" -d "q=ip:34.218.226.147&rows=0"
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<response>
|
||||
<lst name="responseHeader"><int name="status">0</int><int name="QTime">811</int><lst name="params"><str name="q">ip:34.218.226.147</str><str name="rows">0</str></lst></lst><result name="response" numFound="5536097" start="0"></result>
|
||||
</response>
|
||||
```
|
||||
|
||||
- And there are apparently two million from last month (2020-01):
|
||||
|
||||
```
|
||||
$ curl -s "http://localhost:8081/solr/statistics/select" -d "q=ip:34.218.226.147&fq=dateYearMonth:2020-01&rows=0"
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<response>
|
||||
<lst name="responseHeader"><int name="status">0</int><int name="QTime">248</int><lst name="params"><str name="q">ip:34.218.226.147</str><str name="fq">dateYearMonth:2020-01</str><str name="rows">0</str></lst></lst><result name="response" numFound="2173455" start="0"></result>
|
||||
</response>
|
||||
```
|
||||
|
||||
- But when I look at the nginx access logs for the past month or so I only see 84,000, all of which are on `/rest` and none of which are to XMLUI:
|
||||
|
||||
```
|
||||
# zcat /var/log/nginx/*.log.*.gz | grep -c 34.218.226.147
|
||||
84322
|
||||
# zcat /var/log/nginx/*.log.*.gz | grep 34.218.226.147 | grep -c '/rest'
|
||||
84322
|
||||
```
|
||||
|
||||
- Either the requests didn't get logged, or there is some mixup with the Solr documents (fuck!)
|
||||
- On second inspection, I *do* see lots of notes here about 34.218.226.147, including 150,000 on one day in October, 2018 alone...
|
||||
- To make matters worse, I see hits from REST in the regular nginx access log!
|
||||
- I did a few tests and I can't figure out, but it seems that hits appear in either (not both)
|
||||
- Also, I see *zero* hits to `/rest` in the access.log on DSpace Test (linode19)
|
||||
- Anyways, I faceted by IP in 2020-01 and see:
|
||||
|
||||
```
|
||||
$ curl -s 'http://localhost:8081/solr/statistics/select?q=*:*&fq=dateYearMonth:2020-01&rows=0&wt=json&indent=true&facet=true&facet.field=ip'
|
||||
...
|
||||
"172.104.229.92",2686876,
|
||||
"34.218.226.147",2173455,
|
||||
"163.172.70.248",80945,
|
||||
"163.172.71.24",55211,
|
||||
"163.172.68.99",38427,
|
||||
```
|
||||
|
||||
- Surprise surprise, the top two IPs are from AReS servers... wtf.
|
||||
- The next three are from Online in France and they are all using this weird user agent and making tens of thousands of requests to Discovery:
|
||||
|
||||
```
|
||||
Mozilla/5.0 ((Windows; U; Windows NT 6.1; fr; rv:1.9.2) Gecko/20100115 Firefox/3.6)
|
||||
```
|
||||
|
||||
- And all the same three are already inflating the statistics for 2020-02... hmmm.
|
||||
- I need to see why AReS harvesting is inflating the stats, as it should only be making REST requests...
|
||||
- Shiiiiit, I see 84,000 requests from the AReS IP today alone:
|
||||
|
||||
```
|
||||
$ curl -s 'http://localhost:8081/solr/statistics/select?q=time:2020-02-22*+AND+ip:172.104.229.92&rows=0&wt=json&indent=true'
|
||||
...
|
||||
"response":{"numFound":84594,"start":0,"docs":[]
|
||||
```
|
||||
|
||||
- Fuck! And of course the ILRI websites doing their daily REST harvesting are causing issues too, from today alone:
|
||||
|
||||
```
|
||||
"2a01:7e00::f03c:91ff:fe9a:3a37",35512,
|
||||
"2a01:7e00::f03c:91ff:fe18:7396",26155,
|
||||
```
|
||||
|
||||
- I need to try to make some requests for these URLs and observe if they make a statistics hit:
|
||||
- `/rest/items?expand=metadata,bitstreams,parentCommunityList&limit=50&offset=82450`
|
||||
- `/rest/handle/10568/28702?expand=all`
|
||||
- Those are the requests AReS and ILRI servers are making... nearly 150,000 per day!
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user