mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Update notes for 2019-04-06
This commit is contained in:
@ -109,4 +109,54 @@ statistics-2017: org.apache.solr.common.SolrException:org.apache.solr.common.Sol
|
||||
|
||||
- I restarted it again and all the Solr cores came up properly...
|
||||
|
||||
## 2019-04-06
|
||||
|
||||
- Udana asked why item [10568/91278](https://cgspace.cgiar.org/handle/10568/91278) didn't have an Altmetric badge on CGSpace, but on the [WLE website](https://wle.cgiar.org/food-and-agricultural-innovation-pathways-prosperity) it does
|
||||
- I looked and saw that the WLE website is using the Altmetric score associated with the DOI, and that the Handle has no score at all
|
||||
- I tweeted the item and I assume this will link the Handle with the DOI in the system
|
||||
- Linode sent an alert that there was high CPU usage this morning on CGSpace (linode18) and these were the top IPs in the webserver access logs around the time:
|
||||
|
||||
```
|
||||
# zcat --force /var/log/nginx/{access,error,library-access}.log /var/log/nginx/{access,error,library-access}.log.1 | grep -E "06/Apr/2019:(06|07|08|09)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
222 18.195.78.144
|
||||
245 207.46.13.58
|
||||
303 207.46.13.194
|
||||
328 66.249.79.33
|
||||
564 207.46.13.210
|
||||
566 66.249.79.62
|
||||
575 40.77.167.66
|
||||
1803 66.249.79.59
|
||||
2834 2a01:4f8:140:3192::2
|
||||
9623 45.5.184.72
|
||||
# zcat --force /var/log/nginx/{rest,oai}.log /var/log/nginx/{rest,oai}.log.1 | grep -E "06/Apr/2019:(06|07|08|09)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
31 66.249.79.62
|
||||
41 207.46.13.210
|
||||
42 40.77.167.66
|
||||
54 42.113.50.219
|
||||
132 66.249.79.59
|
||||
785 2001:41d0:d:1990::
|
||||
1164 45.5.184.72
|
||||
2014 50.116.102.77
|
||||
4267 45.5.186.2
|
||||
4893 205.186.128.185
|
||||
```
|
||||
|
||||
- `45.5.184.72` is in Colombia so it's probably CIAT, and I see they are indeed trying to get crawl the Discover pages on CIAT's datasets collection:
|
||||
|
||||
```
|
||||
GET /handle/10568/72970/discover?filtertype_0=type&filtertype_1=author&filter_relational_operator_1=contains&filter_relational_operator_0=equals&filter_1=&filter_0=Dataset&filtertype=dateIssued&filter_relational_operator=equals&filter=2014
|
||||
```
|
||||
|
||||
- Their user agent is the one I added to the badbots list in nginx last week: "GuzzleHttp/6.3.3 curl/7.47.0 PHP/7.0.30-0ubuntu0.16.04.1"
|
||||
- They made 22,000 requests to Discover on this collection today alone (and it's only 11AM):
|
||||
|
||||
```
|
||||
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep "06/Apr/2019" | grep 45.5.184.72 | grep -oE '/handle/[0-9]+/[0-9]+/discover' | sort | uniq -c
|
||||
22077 /handle/10568/72970/discover
|
||||
```
|
||||
|
||||
- I need to find a contact at CIAT to tell them to use the REST API rather than crawling Discover
|
||||
- Maria from Bioversity recommended that we use the phrase "AGROVOC subject" instead of "Subject" in Listings and Reports
|
||||
- I made a pull request to update this and merged it to the `5_x-prod` branch ([#418](https://github.com/ilri/DSpace/pull/418))
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user