cgspace-notes/2018-02.md at 22a2323bcb6c71b8f97c45fa54dd494795f63c28

alanorth/cgspace-notes

Fork 0

mirror of https://github.com/alanorth/cgspace-notes.git synced 2024-11-17 04:17:02 +01:00

Alan Orth 22a2323bcb

Update notes

2018-02-04 11:26:07 +02:00

2.7 KiB

Raw Blame History

title

date

author

2018-02-01

Peter gave feedback on the dc.rights proof of concept that I had sent him last week
We don't need to distinguish between internal and external works, so that makes it just a simple list
Yesterday I figured out how to monitor DSpace sessions using JMX
I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu's munin-plugins-java package and used the stuff I discovered about JMX [in 2018-01]({{< relref "2018-01.md" >}})

Run all system updates and reboot DSpace Test
Wow, I packaged up the jmx_dspace_sessions stuff in the Ansible infrastructure scripts and deployed it on CGSpace and it totally works:

# munin-run jmx_dspace_sessions
v_.value 223
v_jspui.value 1
v_oai.value 0

2018-02-03

Bram from Atmire responded about the high load caused by the Solr updater script and said it will be fixed with the updates to DSpace 5.8 compatibility: https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=566
We will close that ticket for now and wait for the 5.8 stuff: https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560
I finally took a look at the second round of cleanups Peter had sent me for author affiliations in mid January
After trimming whitespace and quickly scanning for encoding errors I applied them on CGSpace:

$ ./delete-metadata-values.py -i /tmp/2018-02-03-Affiliations-12-deletions.csv -f cg.contributor.affiliation -m 211 -d dspace -u dspace -p 'fuuu'
$ ./fix-metadata-values.py -i /tmp/2018-02-03-Affiliations-1116-corrections.csv -f cg.contributor.affiliation -t correct -m 211 -d dspace -u dspace -p 'fuuu'

Then I started a full Discovery reindex:

$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b

real    96m39.823s
user    14m10.975s
sys     2m29.088s

Generate a new list of affiliations for Peter to sort through:

dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'affiliation') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/affiliations.csv with csv;
COPY 3723

Oh, and it looks like we processed over 3.1 million requests in January, up from 2.9 million in [December]({{< relref "2017-12.md" >}}):

# time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Jan/2018"
3126109

real    0m23.839s
user    0m27.225s
sys     0m1.905s

2.7 KiB Raw Blame History

2018-02-01

2018-02-03

2.7 KiB

Raw Blame History