cgspace-notes/content/posts/2019-04.md

106 lines
4.6 KiB
Markdown
Raw Normal View History

2019-04-01 08:01:43 +02:00
---
title: "April, 2019"
date: 2019-04-01T09:00:43+03:00
author: "Alan Orth"
tags: ["Notes"]
---
## 2019-04-01
2019-04-01 16:02:54 +02:00
- Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
- They asked if we had plans to enable RDF support in CGSpace
- There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
- I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!
```
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
4432 200
```
- In the last two weeks there have been 47,000 downloads of this *same exact PDF* by these three IP addresses
- Apply country and region corrections and deletions on DSpace Test and CGSpace:
```
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
```
2019-04-01 08:01:43 +02:00
<!--more-->
## 2019-04-02
2019-04-02 11:44:18 +02:00
- CTA says the Amazon IPs are AWS gateways for real user traffic
2019-04-02 19:32:18 +02:00
- I was trying to add Felix Shaw's account back to the Administrators group on DSpace Test, but I couldn't find his name in the user search of the groups page
- If I searched for "Felix" or "Shaw" I saw other matches, included one for his personal email address!
- I ended up finding him via searching for his email address
2019-04-02 11:44:18 +02:00
2019-04-03 16:01:31 +02:00
## 2019-04-03
- Maria from Bioversity emailed me a list of new ORCID identifiers for their researchers so I will add them to our controlled vocabulary
- First I need to extract the ones that are unique from their list compared to our existing one:
```
2019-04-03 16:40:05 +02:00
$ cat dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/bioversity.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort -u > /tmp/2019-04-03-orcid-ids.txt
2019-04-03 16:01:31 +02:00
```
- We currently have 1177 unique ORCID identifiers, and this brings our total to 1237!
- Next I will resolve all their names using my `resolve-orcids.py` script:
```
$ ./resolve-orcids.py -i /tmp/2019-04-03-orcid-ids.txt -o 2019-04-03-orcid-ids.txt -d
```
2019-04-03 16:40:05 +02:00
- After that I added the XML formatting, formatted the file with tidy, and sorted the names in vim
- One user's name has changed so I will update those using my `fix-metadata-values.py` script:
```
$ ./fix-metadata-values.py -i 2019-04-03-update-orcids.csv -db dspace -u dspace -p 'fuuu' -f cg.creator.id -m 240 -t correct -d
```
- I created a pull request and merged the changes to the 5_x-prod branch ([#417](https://github.com/ilri/DSpace/pull/417))
- A few days ago I noticed some weird update process for the statistics-2018 Solr core and I see it's still going:
```
2019-04-03 16:34:02,262 INFO org.dspace.statistics.SolrLogger @ Updating : 1754500/21701 docs in http://localhost:8081/solr//statistics-2018
```
- Interestingly, there are 5666 occurences, and they are mostly for the 2018 core:
```
$ grep 'org.dspace.statistics.SolrLogger @ Updating' /home/cgspace.cgiar.org/log/dspace.log.2019-04-03 | awk '{print $11}' | sort | uniq -c
1
3 http://localhost:8081/solr//statistics-2017
5662 http://localhost:8081/solr//statistics-2018
```
- I will have to keep an eye on it because nothing should be updating 2018 stats in 2019...
2019-04-05 21:22:41 +02:00
## 2019-04-05
- Uptime Robot reported that CGSpace (linode18) went down tonight
- I see there are lots of PostgreSQL connections:
```
$ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c
5 dspaceApi
10 dspaceCli
250 dspaceWeb
```
- I still see those weird messages about updating the statistics-2018 Solr core:
```
2019-04-05 21:06:53,770 INFO org.dspace.statistics.SolrLogger @ Updating : 2444600/21697 docs in http://localhost:8081/solr//statistics-2018
```
- Looking at `iostat 1 10` I also see some CPU steal has come back, and I can confirm it by looking at the Munin graphs:
![CPU usage week](/cgspace-notes/2019/04/cpu-week.png)
- The other thing visible there is that the past few days the load has spiked to 500% and I don't think it's a coincidence that the Solr updating thing is happening...
- I ran all system updates and rebooted the server
2019-04-01 08:01:43 +02:00
<!-- vim: set sw=2 ts=2: -->