mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2020-04-20
This commit is contained in:
@ -173,5 +173,64 @@ dspace=# UPDATE metadatavalue SET text_value='Knight-Jones, Theodore J.D.' WHERE
|
||||
- Atmire responded to some of the issues I raised earlier this week about the DSpace 6 pull request
|
||||
- They said they don't think the glyphicon encoding issue is due to their changes, but I built a new clean version of the vanilla `6_x-dev` branch from before their pull request and it *does not* have the encoding issue in the Mirage 2 header trails
|
||||
- Also, they said we need to use something called `AtomicStatisticsUpdateCLI` to do the Solr legacy integer ID to UUID conversion so I asked for more information about that workflow
|
||||
|
||||
|
||||
## 2020-04-20
|
||||
|
||||
- Looking into a high rate of outgoing bandwidth from yesterday on CGSpace (linode18):
|
||||
|
||||
```
|
||||
# cat /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "19/Apr/2020:0[6789]" | goaccess --log-format=COMBINED -
|
||||
```
|
||||
|
||||
- One host in Russia (91.241.19.70) download 23GiB over those few hours in the morning
|
||||
- It looks like all the requests were for one single item's bitstreams:
|
||||
|
||||
```
|
||||
# grep -c 91.241.19.70 /var/log/nginx/access.log.1
|
||||
8900
|
||||
# grep 91.241.19.70 /var/log/nginx/access.log.1 | grep -c '10568/35187'
|
||||
8900
|
||||
```
|
||||
|
||||
- I thought the host might have been Yandex misbehaving, but its user agent is:
|
||||
|
||||
```
|
||||
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_3; nl-nl) AppleWebKit/527 (KHTML, like Gecko) Version/3.1.1 Safari/525.20
|
||||
```
|
||||
|
||||
- I will purge that IP from the Solr statistics using my `check-spider-ip-hits.sh` script:
|
||||
|
||||
```
|
||||
$ ./check-spider-ip-hits.sh -d -f /tmp/ip -p
|
||||
(DEBUG) Using spider IPs file: /tmp/ip
|
||||
(DEBUG) Checking for hits from spider IP: 91.241.19.70
|
||||
Purging 8909 hits from 91.241.19.70 in statistics
|
||||
|
||||
Total number of bot hits purged: 8909
|
||||
```
|
||||
|
||||
- While investigating that I noticed ORCID identifiers missing from a few authors names, so I added them with my `add-orcid-identifiers.py` script:
|
||||
|
||||
```
|
||||
$ ./add-orcid-identifiers-csv.py -i 2020-04-20-add-orcids.csv -db dspace -u dspace -p 'fuuu' -d
|
||||
```
|
||||
|
||||
- The contents of `2020-04-20-add-orcids.csv` was:
|
||||
|
||||
```
|
||||
dc.contributor.author,cg.creator.id
|
||||
"Schut, Marc","Marc Schut: 0000-0002-3361-4581"
|
||||
"Schut, M.","Marc Schut: 0000-0002-3361-4581"
|
||||
"Kamau, G.","Geoffrey Kamau: 0000-0002-6995-4801"
|
||||
"Kamau, G","Geoffrey Kamau: 0000-0002-6995-4801"
|
||||
"Triomphe, Bernard","Bernard Triomphe: 0000-0001-6657-3002"
|
||||
"Waters-Bayer, Ann","Ann Waters-Bayer: 0000-0003-1887-7903"
|
||||
"Klerkx, Laurens","Laurens Klerkx: 0000-0002-1664-886X"
|
||||
```
|
||||
|
||||
- I confirmed some of the authors' names from the report itself, then by looking at their profiles on ORCID.org
|
||||
- Add new ILRI subject "COVID19" to the `5_x-prod` branch
|
||||
- Add new CCAFS Phase II project tags to the `5_x-prod` branch
|
||||
- I will deploy these to CGSpace in the next few days
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user