mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-25 08:00:18 +01:00
Update notes for 2020-02-23
This commit is contained in:
parent
58738a19f3
commit
c88af71838
@ -17,8 +17,7 @@ categories: ["Notes"]
|
||||
- I see Moayad was busy collecting item views and downloads from CGSpace yesterday:
|
||||
|
||||
```
|
||||
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Oct/2018" | awk '{print $1}
|
||||
' | sort | uniq -c | sort -n | tail -n 10
|
||||
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Oct/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
933 40.77.167.90
|
||||
971 95.108.181.88
|
||||
1043 41.204.190.40
|
||||
|
@ -486,10 +486,182 @@ Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)
|
||||
|
||||
- Another IP address (31.6.77.23) in the UK making a few hundred requests without a user agent
|
||||
- I will add the IP addresses to the nginx badbots list
|
||||
- 31.6.77.23 is in the UK and judging by its DNS it belongs to a [web marketing company called Bronco](https://www.bronco.co.uk/)
|
||||
- I looked for its DNS entry in Solr statistics and found a few hundred thousand over the years:
|
||||
|
||||
```
|
||||
$ curl -s "http://localhost:8081/solr/statistics/select" -d "q=dns:/squeeze3.bronco.co.uk./&rows=0"
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<response>
|
||||
<lst name="responseHeader"><int name="status">0</int><int name="QTime">4</int><lst name="params"><str name="q">dns:/squeeze3.bronco.co.uk./</str><str name="rows">0</str></lst></lst><result name="response" numFound="86044" start="0"></result>
|
||||
</response>
|
||||
```
|
||||
|
||||
- The totals in each core are:
|
||||
- statistics: 86044
|
||||
- statistics-2018: 65144
|
||||
- statistics-2017: 79405
|
||||
- statistics-2016: 121316
|
||||
- statistics-2015: 30720
|
||||
- statistics-2014: 4524
|
||||
- ... so about 387,000 hits!
|
||||
- I will purge them from each core one by one, ie:
|
||||
|
||||
```
|
||||
$ curl -s "http://localhost:8081/solr/statistics-2015/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>dns:squeeze3.bronco.co.uk.</query></delete>"
|
||||
$ curl -s "http://localhost:8081/solr/statistics-2014/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>dns:squeeze3.bronco.co.uk.</query></delete>"
|
||||
```
|
||||
|
||||
- Deploy latest Tomcat and PostgreSQL JDBC driver changes on CGSpace (linode18)
|
||||
- Deploy latest `5_x-prod` branch on CGSpace (linode18)
|
||||
- Run all system updates on CGSpace (linode18) server and reboot it
|
||||
- After the server came back up Tomcat started, but there were errors loading some Solr statistics cores
|
||||
- Luckily after restarting Tomcat once more they all came back up
|
||||
- I ran the `dspace cleanup -v` process on CGSpace and got an error:
|
||||
|
||||
```
|
||||
Error: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"
|
||||
Detail: Key (bitstream_id)=(183996) is still referenced from table "bundle".
|
||||
```
|
||||
|
||||
- The solution is, as always:
|
||||
|
||||
```
|
||||
# su - postgres
|
||||
$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (183996);'
|
||||
UPDATE 1
|
||||
```
|
||||
|
||||
- Аdd one more new Bioversity ORCID iD to the controlled vocabulary on CGSpace
|
||||
- Felix Shaw from Earlham emailed me to ask about his admin account on DSpace Test
|
||||
- His old one got lost when I re-sync'd DSpace Test with CGSpace a few weeks ago
|
||||
- I added a new account for him and added it to the Administrators group:
|
||||
|
||||
```
|
||||
$ dspace user -a -m wow@me.com -g Felix -s Shaw -p 'fuananaaa'
|
||||
```
|
||||
|
||||
- For some reason the Atmire Content and Usage Analysis (CUA) module's Usage Statistics is drawing blank graphs
|
||||
- I looked in the dspace.log and see:
|
||||
|
||||
```
|
||||
2020-02-23 11:28:13,696 ERROR org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter @ Serious Error Occurred Processing Request!
|
||||
org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.NoClassDefFoundError: Could not
|
||||
initialize class org.jfree.chart.JFreeChart
|
||||
```
|
||||
|
||||
- The same error happens on DSpace Test, but graphs are working on my local instance
|
||||
- The only thing I've changed recently is the Tomcat version, but it's working locally...
|
||||
- I see the following file on my local instance, CGSpace, and DSpace Test: `dspace/webapps/xmlui/WEB-INF/lib/jfreechart-1.0.5.jar`
|
||||
- I deployed Tomcat 7.0.99 on DSpace Test but the JFreeChart classs still can't be found...
|
||||
- So it must be somthing with the library search path...
|
||||
- Strange it works with Tomcat 7.0.100 on my local machine
|
||||
- I copied the `jfreechart-1.0.5.jar` file to the Tomcat lib folder and then there was a different error when I loaded Atmire CUA:
|
||||
|
||||
```
|
||||
2020-02-23 16:25:10,841 ERROR org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter @ Serious Error Occurred Processing Request! org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.awt.AWTError: Assistive Technology not found: org.GNOME.Accessibility.AtkWrapper
|
||||
```
|
||||
|
||||
- Some search results suggested commenting out the following line in `/etc/java-8-openjdk/accessibility.properties`:
|
||||
|
||||
```
|
||||
assistive_technologies=org.GNOME.Accessibility.AtkWrapper
|
||||
```
|
||||
|
||||
- And removing the extra jfreechart library and restarting Tomcat I was able to load the usage statistics graph on DSpace Test...
|
||||
- Hmm, actually I think this is an Java bug, perhaps introduced or at [least present in 18.04](https://bugs.openjdk.java.net/browse/JDK-8204862), with lots of [references](https://code-maven.com/slides/jenkins-intro/no-graph-error) to it [happening in other](https://issues.jenkins-ci.org/browse/JENKINS-39636) configurations like Debian 9 with Jenkins, etc...
|
||||
- Apparently if you use the *non-headless* version of openjdk this doesn't happen... but that pulls in X11 stuff so no thanks
|
||||
- Also, I see dozens of occurences of this going back over one month (we have logs for about that period):
|
||||
|
||||
```
|
||||
# grep -c 'initialize class org.jfree.chart.JFreeChart' dspace.log.2020-0*
|
||||
dspace.log.2020-01-12:4
|
||||
dspace.log.2020-01-13:66
|
||||
dspace.log.2020-01-14:4
|
||||
dspace.log.2020-01-15:36
|
||||
dspace.log.2020-01-16:88
|
||||
dspace.log.2020-01-17:4
|
||||
dspace.log.2020-01-18:4
|
||||
dspace.log.2020-01-19:4
|
||||
dspace.log.2020-01-20:4
|
||||
dspace.log.2020-01-21:4
|
||||
...
|
||||
```
|
||||
|
||||
- I deployed the fix on CGSpace (linode18) and I was able to see the graphs in the Atmire CUA Usage Statistics...
|
||||
- On an unrelated note there is something weird going on in that I see millions of hits from IP 34.218.226.147 in Solr statistics, but if I remember correctly that IP belongs to CodeObia's AReS explorer, but it should only be using REST and therefore no Solr statistics...?
|
||||
|
||||
```
|
||||
$ curl -s "http://localhost:8081/solr/statistics-2018/select" -d "q=ip:34.218.226.147&rows=0"
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<response>
|
||||
<lst name="responseHeader"><int name="status">0</int><int name="QTime">811</int><lst name="params"><str name="q">ip:34.218.226.147</str><str name="rows">0</str></lst></lst><result name="response" numFound="5536097" start="0"></result>
|
||||
</response>
|
||||
```
|
||||
|
||||
- And there are apparently two million from last month (2020-01):
|
||||
|
||||
```
|
||||
$ curl -s "http://localhost:8081/solr/statistics/select" -d "q=ip:34.218.226.147&fq=dateYearMonth:2020-01&rows=0"
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<response>
|
||||
<lst name="responseHeader"><int name="status">0</int><int name="QTime">248</int><lst name="params"><str name="q">ip:34.218.226.147</str><str name="fq">dateYearMonth:2020-01</str><str name="rows">0</str></lst></lst><result name="response" numFound="2173455" start="0"></result>
|
||||
</response>
|
||||
```
|
||||
|
||||
- But when I look at the nginx access logs for the past month or so I only see 84,000, all of which are on `/rest` and none of which are to XMLUI:
|
||||
|
||||
```
|
||||
# zcat /var/log/nginx/*.log.*.gz | grep -c 34.218.226.147
|
||||
84322
|
||||
# zcat /var/log/nginx/*.log.*.gz | grep 34.218.226.147 | grep -c '/rest'
|
||||
84322
|
||||
```
|
||||
|
||||
- Either the requests didn't get logged, or there is some mixup with the Solr documents (fuck!)
|
||||
- On second inspection, I *do* see lots of notes here about 34.218.226.147, including 150,000 on one day in October, 2018 alone...
|
||||
- To make matters worse, I see hits from REST in the regular nginx access log!
|
||||
- I did a few tests and I can't figure out, but it seems that hits appear in either (not both)
|
||||
- Also, I see *zero* hits to `/rest` in the access.log on DSpace Test (linode19)
|
||||
- Anyways, I faceted by IP in 2020-01 and see:
|
||||
|
||||
```
|
||||
$ curl -s 'http://localhost:8081/solr/statistics/select?q=*:*&fq=dateYearMonth:2020-01&rows=0&wt=json&indent=true&facet=true&facet.field=ip'
|
||||
...
|
||||
"172.104.229.92",2686876,
|
||||
"34.218.226.147",2173455,
|
||||
"163.172.70.248",80945,
|
||||
"163.172.71.24",55211,
|
||||
"163.172.68.99",38427,
|
||||
```
|
||||
|
||||
- Surprise surprise, the top two IPs are from AReS servers... wtf.
|
||||
- The next three are from Online in France and they are all using this weird user agent and making tens of thousands of requests to Discovery:
|
||||
|
||||
```
|
||||
Mozilla/5.0 ((Windows; U; Windows NT 6.1; fr; rv:1.9.2) Gecko/20100115 Firefox/3.6)
|
||||
```
|
||||
|
||||
- And all the same three are already inflating the statistics for 2020-02... hmmm.
|
||||
- I need to see why AReS harvesting is inflating the stats, as it should only be making REST requests...
|
||||
- Shiiiiit, I see 84,000 requests from the AReS IP today alone:
|
||||
|
||||
```
|
||||
$ curl -s 'http://localhost:8081/solr/statistics/select?q=time:2020-02-22*+AND+ip:172.104.229.92&rows=0&wt=json&indent=true'
|
||||
...
|
||||
"response":{"numFound":84594,"start":0,"docs":[]
|
||||
```
|
||||
|
||||
- Fuck! And of course the ILRI websites doing their daily REST harvesting are causing issues too, from today alone:
|
||||
|
||||
```
|
||||
"2a01:7e00::f03c:91ff:fe9a:3a37",35512,
|
||||
"2a01:7e00::f03c:91ff:fe18:7396",26155,
|
||||
```
|
||||
|
||||
- I need to try to make some requests for these URLs and observe if they make a statistics hit:
|
||||
- `/rest/items?expand=metadata,bitstreams,parentCommunityList&limit=50&offset=82450`
|
||||
- `/rest/handle/10568/28702?expand=all`
|
||||
- Those are the requests AReS and ILRI servers are making... nearly 150,000 per day!
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
@ -31,7 +31,7 @@ Last week I had increased the limit from 30 to 60, which seemed to help, but now
|
||||
$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
|
||||
78
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -33,7 +33,7 @@ Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -25,7 +25,7 @@ Move ILRI collection 10568/12503 from 10568/27869 to 10568/27629 using the move_
|
||||
I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.
|
||||
Update GitHub wiki for documentation of maintenance tasks.
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -35,7 +35,7 @@ I noticed we have a very interesting list of countries on CGSpace:
|
||||
Not only are there 49,000 countries, we have some blanks (25)…
|
||||
Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE”
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -25,7 +25,7 @@ Looking at issues with author authorities on CGSpace
|
||||
For some reason we still have the index-lucene-update cron job active on CGSpace, but I’m pretty sure we don’t need it as of the latest few versions of Atmire’s Listings and Reports module
|
||||
Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -29,7 +29,7 @@ After running DSpace for over five years I’ve never needed to look in any
|
||||
This will save us a few gigs of backup space we’re paying for on S3
|
||||
Also, I noticed the checker log has some errors we should pay attention to:
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -31,7 +31,7 @@ There are 3,000 IPs accessing the REST API in a 24-hour period!
|
||||
# awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l
|
||||
3168
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -31,7 +31,7 @@ This is their publications set: http://ebrary.ifpri.org/oai/oai.php?verb=ListRec
|
||||
You can see the others by using the OAI ListSets verb: http://ebrary.ifpri.org/oai/oai.php?verb=ListSets
|
||||
Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in dc.identifier.fund to cg.identifier.cpwfproject and then the rest to dc.description.sponsorship
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -41,7 +41,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
|
||||
|
||||
In this case the select query was showing 95 results before the update
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -39,7 +39,7 @@ $ git checkout -b 55new 5_x-prod
|
||||
$ git reset --hard ilri/5_x-prod
|
||||
$ git rebase -i dspace-5.5
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -31,7 +31,7 @@ It looks like we might be able to use OUs now, instead of DCs:
|
||||
|
||||
$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)"
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -39,7 +39,7 @@ I exported a random item’s metadata as CSV, deleted all columns except id
|
||||
|
||||
0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -23,7 +23,7 @@ Add dc.type to the output options for Atmire’s Listings and Reports module
|
||||
Add dc.type to the output options for Atmire’s Listings and Reports module (#286)
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -43,7 +43,7 @@ I see thousands of them in the logs for the last few months, so it’s not r
|
||||
I’ve raised a ticket with Atmire to ask
|
||||
Another worrying error from dspace.log is:
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -25,7 +25,7 @@ I checked to see if the Solr sharding task that is supposed to run on January 1s
|
||||
I tested on DSpace Test as well and it doesn’t work there either
|
||||
I asked on the dspace-tech mailing list because it seems to be broken, and actually now I’m not sure if we’ve ever had the sharding task run successfully over all these years
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -47,7 +47,7 @@ DELETE 1
|
||||
Create issue on GitHub to track the addition of CCAFS Phase II project tags (#301)
|
||||
Looks like we’ll be using cg.identifier.ccafsprojectpii as the field name
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -51,7 +51,7 @@ Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing reg
|
||||
$ identify ~/Desktop/alc_contrastes_desafios.jpg
|
||||
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -37,7 +37,7 @@ Testing the CMYK patch on a collection with 650 items:
|
||||
|
||||
$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="May, 2017"/>
|
||||
<meta name="twitter:description" content="2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it’s a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire’s CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="June, 2017"/>
|
||||
<meta name="twitter:description" content="2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we’ll create a new sub-community for Phase II and create collections for the research themes there The current “Research Themes” community will be renamed to “WLE Phase I Research Themes” Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -33,7 +33,7 @@ Merge changes for WLE Phase II theme rename (#329)
|
||||
Looking at extracting the metadata registries from ICARDA’s MEL DSpace database so we can compare fields with CGSpace
|
||||
We can use PostgreSQL’s extended output format (-x) plus sed to format the output into quasi XML:
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -57,7 +57,7 @@ This was due to newline characters in the dc.description.abstract column, which
|
||||
I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using g/^$/d
|
||||
Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -29,7 +29,7 @@ Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two
|
||||
|
||||
Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account is both in the approvers step as well as the group
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -31,7 +31,7 @@ http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
|
||||
There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine
|
||||
Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -45,7 +45,7 @@ Generate list of authors on CGSpace for Peter to go through and correct:
|
||||
dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
|
||||
COPY 54701
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -27,7 +27,7 @@ The logs say “Timeout waiting for idle object”
|
||||
PostgreSQL activity says there are 115 connections currently
|
||||
The list of connections to XMLUI and REST API for today:
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -147,7 +147,7 @@ dspace.log.2018-01-02:34
|
||||
|
||||
Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’s Encrypt if it’s just a handful of domains
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -27,7 +27,7 @@ We don’t need to distinguish between internal and external works, so that
|
||||
Yesterday I figured out how to monitor DSpace sessions using JMX
|
||||
I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu’s munin-plugins-java package and used the stuff I discovered about JMX in 2018-01
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -21,7 +21,7 @@ Export a CSV of the IITA community metadata for Martin Mueller
|
||||
|
||||
Export a CSV of the IITA community metadata for Martin Mueller
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -23,7 +23,7 @@ Catalina logs at least show some memory errors yesterday:
|
||||
I tried to test something on DSpace Test but noticed that it’s down since god knows when
|
||||
Catalina logs at least show some memory errors yesterday:
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -35,7 +35,7 @@ http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E
|
||||
Then I reduced the JVM heap size from 6144 back to 5120m
|
||||
Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the Ansible infrastructure scripts to support hosts choosing which distribution they want to use
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -55,7 +55,7 @@ real 74m42.646s
|
||||
user 8m5.056s
|
||||
sys 2m7.289s
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -33,7 +33,7 @@ During the mvn package stage on the 5.8 branch I kept getting issues with java r
|
||||
|
||||
There is insufficient memory for the Java Runtime Environment to continue.
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -43,7 +43,7 @@ Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did
|
||||
The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes
|
||||
I ran all system updates on DSpace Test and rebooted it
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -27,7 +27,7 @@ I’ll update the DSpace role in our Ansible infrastructure playbooks and ru
|
||||
Also, I’ll re-run the postgresql tasks because the custom PostgreSQL variables are dynamic according to the system’s RAM, and we never re-ran them after migrating to larger Linodes last month
|
||||
I’m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I’m getting those autowire errors in Tomcat 8.5.30 again:
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -23,7 +23,7 @@ I created a GitHub issue to track this #389, because I’m super busy in Nai
|
||||
Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items
|
||||
I created a GitHub issue to track this #389, because I’m super busy in Nairobi right now
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
@ -33,7 +33,7 @@ I created a GitHub issue to track this #389, because I’m super busy in Nai
|
||||
"@type": "BlogPosting",
|
||||
"headline": "October, 2018",
|
||||
"url": "https:\/\/alanorth.github.io\/cgspace-notes\/2018-10\/",
|
||||
"wordCount": "4519",
|
||||
"wordCount": "4518",
|
||||
"datePublished": "2018-10-01T22:31:54+03:00",
|
||||
"dateModified": "2019-10-28T13:39:25+02:00",
|
||||
"author": {
|
||||
@ -118,8 +118,7 @@ I created a GitHub issue to track this #389, because I’m super busy in Nai
|
||||
<ul>
|
||||
<li>I see Moayad was busy collecting item views and downloads from CGSpace yesterday:</li>
|
||||
</ul>
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Oct/2018" | awk '{print $1}
|
||||
' | sort | uniq -c | sort -n | tail -n 10
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Oct/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
933 40.77.167.90
|
||||
971 95.108.181.88
|
||||
1043 41.204.190.40
|
||||
|
@ -33,7 +33,7 @@ Send a note about my dspace-statistics-api to the dspace-tech mailing list
|
||||
Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage
|
||||
Today these are the top 10 IPs:
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -33,7 +33,7 @@ Then I ran all system updates and restarted the server
|
||||
|
||||
I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another Ghostscript vulnerability last week
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -47,7 +47,7 @@ I don’t see anything interesting in the web server logs around that time t
|
||||
357 207.46.13.1
|
||||
903 54.70.40.11
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -69,7 +69,7 @@ real 0m19.873s
|
||||
user 0m22.203s
|
||||
sys 0m1.979s
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -43,7 +43,7 @@ Most worryingly, there are encoding errors in the abstracts for eleven items, fo
|
||||
|
||||
I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -61,7 +61,7 @@ $ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u ds
|
||||
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
|
||||
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -45,7 +45,7 @@ DELETE 1
|
||||
|
||||
But after this I tried to delete the item from the XMLUI and it is still present…
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -31,7 +31,7 @@ Run system updates on CGSpace (linode18) and reboot it
|
||||
|
||||
Skype with Marie-Angélique and Abenet about CG Core v2
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -35,7 +35,7 @@ CGSpace
|
||||
|
||||
Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -43,7 +43,7 @@ After rebooting, all statistics cores were loaded… wow, that’s luck
|
||||
|
||||
Run system updates on DSpace Test (linode19) and reboot it
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -69,7 +69,7 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
|
||||
7249 2a01:7e00::f03c:91ff:fe18:7396
|
||||
9124 45.5.186.2
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="October, 2019"/>
|
||||
<meta name="twitter:description" content="2019-10-01 Udana from IWMI asked me for a CSV export of their community on CGSpace I exported it, but a quick run through the csv-metadata-quality tool shows that there are some low-hanging fruits we can fix before I send him the data I will limit the scope to the titles, regions, subregions, and river basins for now to manually fix some non-breaking spaces (U+00A0) there that would otherwise be removed by the csv-metadata-quality script’s “unneccesary Unicode” fix: $ csvcut -c 'id,dc."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -55,7 +55,7 @@ Let’s see how many of the REST API requests were for bitstreams (because t
|
||||
# zcat --force /var/log/nginx/rest.log.*.gz | grep -E "[0-9]{1,2}/Oct/2019" | grep -c -E "/rest/bitstreams"
|
||||
106781
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -43,7 +43,7 @@ Make sure all packages are up to date and the package manager is up to date, the
|
||||
# dpkg -C
|
||||
# reboot
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -53,7 +53,7 @@ I tweeted the CGSpace repository link
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -20,7 +20,7 @@ The code finally builds and runs with a fresh install
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2020-02/" />
|
||||
<meta property="article:published_time" content="2020-02-02T11:56:30+02:00" />
|
||||
<meta property="article:modified_time" content="2020-02-19T15:17:32+02:00" />
|
||||
<meta property="article:modified_time" content="2020-02-23T09:16:50+02:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="February, 2020"/>
|
||||
@ -35,7 +35,7 @@ The code finally builds and runs with a fresh install
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
@ -45,9 +45,9 @@ The code finally builds and runs with a fresh install
|
||||
"@type": "BlogPosting",
|
||||
"headline": "February, 2020",
|
||||
"url": "https:\/\/alanorth.github.io\/cgspace-notes\/2020-02\/",
|
||||
"wordCount": "3245",
|
||||
"wordCount": "4210",
|
||||
"datePublished": "2020-02-02T11:56:30+02:00",
|
||||
"dateModified": "2020-02-19T15:17:32+02:00",
|
||||
"dateModified": "2020-02-23T09:16:50+02:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -613,6 +613,34 @@ UPDATE 26
|
||||
</code></pre><ul>
|
||||
<li>Another IP address (31.6.77.23) in the UK making a few hundred requests without a user agent</li>
|
||||
<li>I will add the IP addresses to the nginx badbots list</li>
|
||||
<li>31.6.77.23 is in the UK and judging by its DNS it belongs to a <a href="https://www.bronco.co.uk/">web marketing company called Bronco</a>
|
||||
<ul>
|
||||
<li>I looked for its DNS entry in Solr statistics and found a few hundred thousand over the years:</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre><code>$ curl -s "http://localhost:8081/solr/statistics/select" -d "q=dns:/squeeze3.bronco.co.uk./&rows=0"
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<response>
|
||||
<lst name="responseHeader"><int name="status">0</int><int name="QTime">4</int><lst name="params"><str name="q">dns:/squeeze3.bronco.co.uk./</str><str name="rows">0</str></lst></lst><result name="response" numFound="86044" start="0"></result>
|
||||
</response>
|
||||
</code></pre><ul>
|
||||
<li>The totals in each core are:
|
||||
<ul>
|
||||
<li>statistics: 86044</li>
|
||||
<li>statistics-2018: 65144</li>
|
||||
<li>statistics-2017: 79405</li>
|
||||
<li>statistics-2016: 121316</li>
|
||||
<li>statistics-2015: 30720</li>
|
||||
<li>statistics-2014: 4524</li>
|
||||
<li>… so about 387,000 hits!</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>I will purge them from each core one by one, ie:</li>
|
||||
</ul>
|
||||
<pre><code>$ curl -s "http://localhost:8081/solr/statistics-2015/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>dns:squeeze3.bronco.co.uk.</query></delete>"
|
||||
$ curl -s "http://localhost:8081/solr/statistics-2014/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>dns:squeeze3.bronco.co.uk.</query></delete>"
|
||||
</code></pre><ul>
|
||||
<li>Deploy latest Tomcat and PostgreSQL JDBC driver changes on CGSpace (linode18)</li>
|
||||
<li>Deploy latest <code>5_x-prod</code> branch on CGSpace (linode18)</li>
|
||||
<li>Run all system updates on CGSpace (linode18) server and reboot it
|
||||
@ -621,6 +649,145 @@ UPDATE 26
|
||||
<li>Luckily after restarting Tomcat once more they all came back up</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>I ran the <code>dspace cleanup -v</code> process on CGSpace and got an error:</li>
|
||||
</ul>
|
||||
<pre><code>Error: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"
|
||||
Detail: Key (bitstream_id)=(183996) is still referenced from table "bundle".
|
||||
</code></pre><ul>
|
||||
<li>The solution is, as always:</li>
|
||||
</ul>
|
||||
<pre><code># su - postgres
|
||||
$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (183996);'
|
||||
UPDATE 1
|
||||
</code></pre><ul>
|
||||
<li>Аdd one more new Bioversity ORCID iD to the controlled vocabulary on CGSpace</li>
|
||||
<li>Felix Shaw from Earlham emailed me to ask about his admin account on DSpace Test
|
||||
<ul>
|
||||
<li>His old one got lost when I re-sync’d DSpace Test with CGSpace a few weeks ago</li>
|
||||
<li>I added a new account for him and added it to the Administrators group:</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre><code>$ dspace user -a -m wow@me.com -g Felix -s Shaw -p 'fuananaaa'
|
||||
</code></pre><ul>
|
||||
<li>For some reason the Atmire Content and Usage Analysis (CUA) module’s Usage Statistics is drawing blank graphs
|
||||
<ul>
|
||||
<li>I looked in the dspace.log and see:</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre><code>2020-02-23 11:28:13,696 ERROR org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter @ Serious Error Occurred Processing Request!
|
||||
org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.NoClassDefFoundError: Could not
|
||||
initialize class org.jfree.chart.JFreeChart
|
||||
</code></pre><ul>
|
||||
<li>The same error happens on DSpace Test, but graphs are working on my local instance
|
||||
<ul>
|
||||
<li>The only thing I’ve changed recently is the Tomcat version, but it’s working locally…</li>
|
||||
<li>I see the following file on my local instance, CGSpace, and DSpace Test: <code>dspace/webapps/xmlui/WEB-INF/lib/jfreechart-1.0.5.jar</code></li>
|
||||
<li>I deployed Tomcat 7.0.99 on DSpace Test but the JFreeChart classs still can’t be found…</li>
|
||||
<li>So it must be somthing with the library search path…</li>
|
||||
<li>Strange it works with Tomcat 7.0.100 on my local machine</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>I copied the <code>jfreechart-1.0.5.jar</code> file to the Tomcat lib folder and then there was a different error when I loaded Atmire CUA:</li>
|
||||
</ul>
|
||||
<pre><code>2020-02-23 16:25:10,841 ERROR org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter @ Serious Error Occurred Processing Request! org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.awt.AWTError: Assistive Technology not found: org.GNOME.Accessibility.AtkWrapper
|
||||
</code></pre><ul>
|
||||
<li>Some search results suggested commenting out the following line in <code>/etc/java-8-openjdk/accessibility.properties</code>:</li>
|
||||
</ul>
|
||||
<pre><code>assistive_technologies=org.GNOME.Accessibility.AtkWrapper
|
||||
</code></pre><ul>
|
||||
<li>And removing the extra jfreechart library and restarting Tomcat I was able to load the usage statistics graph on DSpace Test…
|
||||
<ul>
|
||||
<li>Hmm, actually I think this is an Java bug, perhaps introduced or at <a href="https://bugs.openjdk.java.net/browse/JDK-8204862">least present in 18.04</a>, with lots of <a href="https://code-maven.com/slides/jenkins-intro/no-graph-error">references</a> to it <a href="https://issues.jenkins-ci.org/browse/JENKINS-39636">happening in other</a> configurations like Debian 9 with Jenkins, etc…</li>
|
||||
<li>Apparently if you use the <em>non-headless</em> version of openjdk this doesn’t happen… but that pulls in X11 stuff so no thanks</li>
|
||||
<li>Also, I see dozens of occurences of this going back over one month (we have logs for about that period):</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre><code># grep -c 'initialize class org.jfree.chart.JFreeChart' dspace.log.2020-0*
|
||||
dspace.log.2020-01-12:4
|
||||
dspace.log.2020-01-13:66
|
||||
dspace.log.2020-01-14:4
|
||||
dspace.log.2020-01-15:36
|
||||
dspace.log.2020-01-16:88
|
||||
dspace.log.2020-01-17:4
|
||||
dspace.log.2020-01-18:4
|
||||
dspace.log.2020-01-19:4
|
||||
dspace.log.2020-01-20:4
|
||||
dspace.log.2020-01-21:4
|
||||
...
|
||||
</code></pre><ul>
|
||||
<li>I deployed the fix on CGSpace (linode18) and I was able to see the graphs in the Atmire CUA Usage Statistics…</li>
|
||||
<li>On an unrelated note there is something weird going on in that I see millions of hits from IP 34.218.226.147 in Solr statistics, but if I remember correctly that IP belongs to CodeObia’s AReS explorer, but it should only be using REST and therefore no Solr statistics…?</li>
|
||||
</ul>
|
||||
<pre><code>$ curl -s "http://localhost:8081/solr/statistics-2018/select" -d "q=ip:34.218.226.147&rows=0"
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<response>
|
||||
<lst name="responseHeader"><int name="status">0</int><int name="QTime">811</int><lst name="params"><str name="q">ip:34.218.226.147</str><str name="rows">0</str></lst></lst><result name="response" numFound="5536097" start="0"></result>
|
||||
</response>
|
||||
</code></pre><ul>
|
||||
<li>And there are apparently two million from last month (2020-01):</li>
|
||||
</ul>
|
||||
<pre><code>$ curl -s "http://localhost:8081/solr/statistics/select" -d "q=ip:34.218.226.147&fq=dateYearMonth:2020-01&rows=0"
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<response>
|
||||
<lst name="responseHeader"><int name="status">0</int><int name="QTime">248</int><lst name="params"><str name="q">ip:34.218.226.147</str><str name="fq">dateYearMonth:2020-01</str><str name="rows">0</str></lst></lst><result name="response" numFound="2173455" start="0"></result>
|
||||
</response>
|
||||
</code></pre><ul>
|
||||
<li>But when I look at the nginx access logs for the past month or so I only see 84,000, all of which are on <code>/rest</code> and none of which are to XMLUI:</li>
|
||||
</ul>
|
||||
<pre><code># zcat /var/log/nginx/*.log.*.gz | grep -c 34.218.226.147
|
||||
84322
|
||||
# zcat /var/log/nginx/*.log.*.gz | grep 34.218.226.147 | grep -c '/rest'
|
||||
84322
|
||||
</code></pre><ul>
|
||||
<li>Either the requests didn’t get logged, or there is some mixup with the Solr documents (fuck!)
|
||||
<ul>
|
||||
<li>On second inspection, I <em>do</em> see lots of notes here about 34.218.226.147, including 150,000 on one day in October, 2018 alone…</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>To make matters worse, I see hits from REST in the regular nginx access log!
|
||||
<ul>
|
||||
<li>I did a few tests and I can’t figure out, but it seems that hits appear in either (not both)</li>
|
||||
<li>Also, I see <em>zero</em> hits to <code>/rest</code> in the access.log on DSpace Test (linode19)</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>Anyways, I faceted by IP in 2020-01 and see:</li>
|
||||
</ul>
|
||||
<pre><code>$ curl -s 'http://localhost:8081/solr/statistics/select?q=*:*&fq=dateYearMonth:2020-01&rows=0&wt=json&indent=true&facet=true&facet.field=ip'
|
||||
...
|
||||
"172.104.229.92",2686876,
|
||||
"34.218.226.147",2173455,
|
||||
"163.172.70.248",80945,
|
||||
"163.172.71.24",55211,
|
||||
"163.172.68.99",38427,
|
||||
</code></pre><ul>
|
||||
<li>Surprise surprise, the top two IPs are from AReS servers… wtf.</li>
|
||||
<li>The next three are from Online in France and they are all using this weird user agent and making tens of thousands of requests to Discovery:</li>
|
||||
</ul>
|
||||
<pre><code>Mozilla/5.0 ((Windows; U; Windows NT 6.1; fr; rv:1.9.2) Gecko/20100115 Firefox/3.6)
|
||||
</code></pre><ul>
|
||||
<li>And all the same three are already inflating the statistics for 2020-02… hmmm.</li>
|
||||
<li>I need to see why AReS harvesting is inflating the stats, as it should only be making REST requests…</li>
|
||||
<li>Shiiiiit, I see 84,000 requests from the AReS IP today alone:</li>
|
||||
</ul>
|
||||
<pre><code>$ curl -s 'http://localhost:8081/solr/statistics/select?q=time:2020-02-22*+AND+ip:172.104.229.92&rows=0&wt=json&indent=true'
|
||||
...
|
||||
"response":{"numFound":84594,"start":0,"docs":[]
|
||||
</code></pre><ul>
|
||||
<li>Fuck! And of course the ILRI websites doing their daily REST harvesting are causing issues too, from today alone:</li>
|
||||
</ul>
|
||||
<pre><code> "2a01:7e00::f03c:91ff:fe9a:3a37",35512,
|
||||
"2a01:7e00::f03c:91ff:fe18:7396",26155,
|
||||
</code></pre><ul>
|
||||
<li>I need to try to make some requests for these URLs and observe if they make a statistics hit:
|
||||
<ul>
|
||||
<li><code>/rest/items?expand=metadata,bitstreams,parentCommunityList&limit=50&offset=82450</code></li>
|
||||
<li><code>/rest/handle/10568/28702?expand=all</code></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>Those are the requests AReS and ILRI servers are making… nearly 150,000 per day!</li>
|
||||
</ul>
|
||||
<!-- raw HTML omitted -->
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="404 Page not found"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Categories"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Categories"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Categories"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Categories"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Categories"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Categories"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGIAR Library Migration"/>
|
||||
<meta name="twitter:description" content="Notes on the migration of the CGIAR Library to CGSpace"/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace CG Core v2 Migration"/>
|
||||
<meta name="twitter:description" content="Possible changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -4,27 +4,27 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
|
||||
<lastmod>2020-02-19T15:17:32+02:00</lastmod>
|
||||
<lastmod>2020-02-23T09:16:50+02:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||
<lastmod>2020-02-19T15:17:32+02:00</lastmod>
|
||||
<lastmod>2020-02-23T09:16:50+02:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2020-02/</loc>
|
||||
<lastmod>2020-02-19T15:17:32+02:00</lastmod>
|
||||
<lastmod>2020-02-23T09:16:50+02:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
|
||||
<lastmod>2020-02-19T15:17:32+02:00</lastmod>
|
||||
<lastmod>2020-02-23T09:16:50+02:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||
<lastmod>2020-02-19T15:17:32+02:00</lastmod>
|
||||
<lastmod>2020-02-23T09:16:50+02:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Tags"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Migration"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Tags"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Tags"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Tags"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Tags"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Tags"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.65.2" />
|
||||
<meta name="generator" content="Hugo 0.65.3" />
|
||||
|
||||
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user