mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-16 20:07:03 +01:00
119 lines
4.4 KiB
Markdown
119 lines
4.4 KiB
Markdown
---
|
|
title: "October, 2018"
|
|
date: 2018-10-01T22:31:54+03:00
|
|
author: "Alan Orth"
|
|
tags: ["Notes"]
|
|
---
|
|
|
|
## 2018-10-01
|
|
|
|
- Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items
|
|
- I created a GitHub issue to track this [#389](https://github.com/ilri/DSpace/issues/389), because I'm super busy in Nairobi right now
|
|
|
|
## 2018-10-03
|
|
|
|
- I see Moayad was busy collecting item views and downloads from CGSpace yesterday:
|
|
|
|
```
|
|
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Oct/2018" | awk '{print $1}
|
|
' | sort | uniq -c | sort -n | tail -n 10
|
|
933 40.77.167.90
|
|
971 95.108.181.88
|
|
1043 41.204.190.40
|
|
1454 157.55.39.54
|
|
1538 207.46.13.69
|
|
1719 66.249.64.61
|
|
2048 50.116.102.77
|
|
4639 66.249.64.59
|
|
4736 35.237.175.180
|
|
150362 34.218.226.147
|
|
```
|
|
|
|
- Of those, about 20% were HTTP 500 responses (!):
|
|
|
|
```
|
|
$ zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Oct/2018" | grep 34.218.226.147 | awk '{print $9}' | sort -n | uniq -c
|
|
118927 200
|
|
31435 500
|
|
```
|
|
|
|
- I added Phil Thornton and Sonal Henson's ORCID identifiers to the controlled vocabulary for `cg.creator.orcid` and then re-generated the names using my [resolve-orcids.py](https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b) script:
|
|
|
|
```
|
|
$ grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml | sort | uniq > 2018-10-03-orcids.txt
|
|
$ ./resolve-orcids.py -i 2018-10-03-orcids.txt -o 2018-10-03-names.txt -d
|
|
```
|
|
|
|
- I found a new corner case error that I need to check, given *and* family names deactivated:
|
|
|
|
```
|
|
Looking up the names associated with ORCID iD: 0000-0001-7930-5752
|
|
Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752
|
|
```
|
|
|
|
- It appears to be Jim Lorenzen... I need to check that later!
|
|
- I merged the changes to the `5_x-prod` branch ([#390](https://github.com/ilri/DSpace/pull/390))
|
|
- Linode sent another alert about CPU usage on CGSpace (linode18) this evening
|
|
- It seems that Moayad is making quite a lot of requests today:
|
|
|
|
```
|
|
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Oct/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
|
1594 157.55.39.160
|
|
1627 157.55.39.173
|
|
1774 136.243.6.84
|
|
4228 35.237.175.180
|
|
4497 70.32.83.92
|
|
4856 66.249.64.59
|
|
7120 50.116.102.77
|
|
12518 138.201.49.199
|
|
87646 34.218.226.147
|
|
111729 213.139.53.62
|
|
```
|
|
|
|
- But in super positive news, he says they are using my new [dspace-statistics-api](https://github.com/alanorth/dspace-statistics-api) and it's MUCH faster than using Atmire CUA's internal "restlet" API
|
|
- I don't recognize the `138.201.49.199` IP, but it is in Germany (Hetzner) and appears to be paginating over some browse pages and downloading bitstreams:
|
|
|
|
```
|
|
# grep 138.201.49.199 /var/log/nginx/access.log | grep -o -E 'GET /[a-z]+' | sort | uniq -c
|
|
8324 GET /bitstream
|
|
4193 GET /handle
|
|
```
|
|
|
|
- Suspiciously, it's only grabbing the CGIAR System Office community (handle prefix 10947):
|
|
|
|
```
|
|
# grep 138.201.49.199 /var/log/nginx/access.log | grep -o -E 'GET /handle/[0-9]{5}' | sort | uniq -c
|
|
7 GET /handle/10568
|
|
4186 GET /handle/10947
|
|
```
|
|
|
|
- The user agent is suspicious too:
|
|
|
|
```
|
|
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36
|
|
```
|
|
|
|
- It's clearly a bot and it's not re-using its Tomcat session, so I will add its IP to the nginx bad bot list
|
|
- I looked in Solr's statistics core and these hits were actually all counted as `isBot:false` (of course)... hmmm
|
|
- I tagged all of Sonal and Phil's items with their ORCID identifiers on CGSpace using my [add-orcid-identifiers.py](https://gist.github.com/alanorth/a49d85cd9c5dea89cddbe809813a7050) script:
|
|
|
|
```
|
|
$ ./add-orcid-identifiers-csv.py -i 2018-10-03-add-orcids.csv -db dspace -u dspace -p 'fuuu'
|
|
```
|
|
|
|
- Where `2018-10-03-add-orcids.csv` contained:
|
|
|
|
```
|
|
dc.contributor.author,cg.creator.id
|
|
"Henson, Sonal P.",Sonal Henson: 0000-0002-2002-5462
|
|
"Henson, S.",Sonal Henson: 0000-0002-2002-5462
|
|
"Thornton, P.K.",Philip Thornton: 0000-0002-1854-0182
|
|
"Thornton, Philip K",Philip Thornton: 0000-0002-1854-0182
|
|
"Thornton, Phil",Philip Thornton: 0000-0002-1854-0182
|
|
"Thornton, Philip K.",Philip Thornton: 0000-0002-1854-0182
|
|
"Thornton, Phillip",Philip Thornton: 0000-0002-1854-0182
|
|
"Thornton, Phillip K.",Philip Thornton: 0000-0002-1854-0182
|
|
```
|
|
|
|
<!-- vim: set sw=2 ts=2: -->
|