- Yesterday I figured out how to monitor DSpace sessions using JMX
- I copied the logic in the `jmx_tomcat_dbpools` provided by Ubuntu's `munin-plugins-java` package and used the stuff I discovered about JMX [in 2018-01]({{< relref "2018-01.md" >}})
- Wow, I packaged up the `jmx_dspace_sessions` stuff in the [Ansible infrastructure scripts](https://github.com/ilri/rmg-ansible-public) and deployed it on CGSpace and it totally works:
- Bram from Atmire responded about the high load caused by the Solr updater script and said it will be fixed with the updates to DSpace 5.8 compatibility: https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=566
- We will close that ticket for now and wait for the 5.8 stuff: https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560
- Generate a new list of affiliations for Peter to sort through:
```
dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'affiliation') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/affiliations.csv with csv;
COPY 3723
```
- Oh, and it looks like we processed over 3.1 million requests in January, up from 2.9 million in [December]({{< relref "2017-12.md" >}}):
```
# time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Jan/2018"
- Toying with correcting authors with trailing spaces via PostgreSQL:
```
dspace=# update metadatavalue set text_value=REGEXP_REPLACE(text_value, '\s+$' , '') where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^.*?\s+$';
UPDATE 20
```
- I tried the `TRIM(TRAILING from text_value)` function and it said it changed 20 items but the spaces didn't go away
- This is on a fresh import of the CGSpace database, but when I tried to apply it on CGSpace there were no changes detected. Weird.
- Anyways, Peter wants a new list of authors to clean up, so I exported another CSV:
```
dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors-2018-02-05.csv with csv;
- I did notice in `/var/log/tomcat7/catalina.out` that Atmire's update thing was running though
- So I restarted Tomcat and now everything is fine
- Next time I see that many database connections I need to save the output so I can analyze it later
- I'm going to re-schedule the taskUpdateSolrStatsMetadata task as [Bram detailed in ticket 566](https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=566) to see if it makes CGSpace stop crashing every morning
- If I move the task from 3AM to 3PM, deally CGSpace will stop crashing in the morning, or start crashing ~12 hours later
- Eventually Atmire has said that there will be a fix for this high load caused by their script, but it will come with the 5.8 compatability they are already working on
- I re-deployed CGSpace with the new task time of 3PM, ran all system updates, and restarted the server
- Also, I changed the name of the DSpace fallback pool on DSpace Test and CGSpace to be called 'dspaceCli' so that I can distinguish it in `pg_stat_activity`
- I implemented some changes to the pooling in the [Ansible infrastructure scripts](https://github.com/ilri/rmg-ansible-public) so that each DSpace web application can use its own pool (web, api, and solr)
- Each pool uses its own name and hopefully this should help me figure out which one is using too many connections next time CGSpace goes down
- Also, this will mean that when a search bot comes along and hammers the XMLUI, the REST and OAI applications will be fine
- I'm not actually sure if the Solr web application uses the database though, so I'll have to check later and remove it if necessary
- Abenet wrote to ask a question about the ORCiD lookup not working for one CIAT user on CGSpace
- I tried on DSpace Test and indeed the lookup just doesn't work!
- The ORCiD code in DSpace appears to be using `http://pub.orcid.org/`, but when I go there in the browser it redirects me to `https://pub.orcid.org/v2.0/`
- According to [the announcement](https://groups.google.com/forum/#!topic/orcid-api-users/qfg-HwAB1bk) the v1 API was moved from `http://pub.orcid.org/` to `https://pub.orcid.org/v1.2` until March 1st when it will be discontinued for good
- But the old URL is hard coded in DSpace and it doesn't work anyways, because it currently redirects you to `https://pub.orcid.org/v2.0/v1.2`
- So I guess we have to disable that shit once and for all and switch to a controlled vocabulary