Add notes for 2020-04-20

This commit is contained in:
2020-04-20 12:41:21 +03:00
parent 3b0dbf2f78
commit 32018333d1
3 changed files with 118 additions and 9 deletions

View File

@ -173,5 +173,64 @@ dspace=# UPDATE metadatavalue SET text_value='Knight-Jones, Theodore J.D.' WHERE
- Atmire responded to some of the issues I raised earlier this week about the DSpace 6 pull request
- They said they don't think the glyphicon encoding issue is due to their changes, but I built a new clean version of the vanilla `6_x-dev` branch from before their pull request and it *does not* have the encoding issue in the Mirage 2 header trails
- Also, they said we need to use something called `AtomicStatisticsUpdateCLI` to do the Solr legacy integer ID to UUID conversion so I asked for more information about that workflow
## 2020-04-20
- Looking into a high rate of outgoing bandwidth from yesterday on CGSpace (linode18):
```
# cat /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "19/Apr/2020:0[6789]" | goaccess --log-format=COMBINED -
```
- One host in Russia (91.241.19.70) download 23GiB over those few hours in the morning
- It looks like all the requests were for one single item's bitstreams:
```
# grep -c 91.241.19.70 /var/log/nginx/access.log.1
8900
# grep 91.241.19.70 /var/log/nginx/access.log.1 | grep -c '10568/35187'
8900
```
- I thought the host might have been Yandex misbehaving, but its user agent is:
```
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_3; nl-nl) AppleWebKit/527 (KHTML, like Gecko) Version/3.1.1 Safari/525.20
```
- I will purge that IP from the Solr statistics using my `check-spider-ip-hits.sh` script:
```
$ ./check-spider-ip-hits.sh -d -f /tmp/ip -p
(DEBUG) Using spider IPs file: /tmp/ip
(DEBUG) Checking for hits from spider IP: 91.241.19.70
Purging 8909 hits from 91.241.19.70 in statistics
Total number of bot hits purged: 8909
```
- While investigating that I noticed ORCID identifiers missing from a few authors names, so I added them with my `add-orcid-identifiers.py` script:
```
$ ./add-orcid-identifiers-csv.py -i 2020-04-20-add-orcids.csv -db dspace -u dspace -p 'fuuu' -d
```
- The contents of `2020-04-20-add-orcids.csv` was:
```
dc.contributor.author,cg.creator.id
"Schut, Marc","Marc Schut: 0000-0002-3361-4581"
"Schut, M.","Marc Schut: 0000-0002-3361-4581"
"Kamau, G.","Geoffrey Kamau: 0000-0002-6995-4801"
"Kamau, G","Geoffrey Kamau: 0000-0002-6995-4801"
"Triomphe, Bernard","Bernard Triomphe: 0000-0001-6657-3002"
"Waters-Bayer, Ann","Ann Waters-Bayer: 0000-0003-1887-7903"
"Klerkx, Laurens","Laurens Klerkx: 0000-0002-1664-886X"
```
- I confirmed some of the authors' names from the report itself, then by looking at their profiles on ORCID.org
- Add new ILRI subject "COVID19" to the `5_x-prod` branch
- Add new CCAFS Phase II project tags to the `5_x-prod` branch
- I will deploy these to CGSpace in the next few days
<!-- vim: set sw=2 ts=2: -->