diff --git a/content/posts/2021-10.md b/content/posts/2021-10.md index 4687f4c70..b0d2601bb 100644 --- a/content/posts/2021-10.md +++ b/content/posts/2021-10.md @@ -330,8 +330,42 @@ $ psql -h localhost -p 5433 -U postgres dspace7 -c "DELETE FROM schema_version W ``` - Now DSpace 7 starts with my CGSpace data... nice + - The Discovery indexing still takes seven hours... fuck - I tested the `metadata-export` on DSpace 7.1-SNAPSHOT and it still has the duplicate items issue introduced by DS-4211 - I filed a GitHub issue and notified nwoodward: https://github.com/DSpace/DSpace/issues/7988 - Start a full reindex on AReS +## 2021-10-11 + +- Start a full Discovery reindex on my local DSpace 6.3 instance: + +```console +$ /usr/bin/time -f %M:%e chrt -b 0 ~/dspace63/bin/dspace index-discovery -b +Loading @mire database changes for module MQM +Changes have been processed +836140:6543.6 +``` + +- So that's 1.8 hours versus 7 on DSpace 7, with the same database! +- Several users wrote to me that CGSpace was slow recently + - Looking at the PostgreSQL database I see connections look normal, but locks for `dspaceWeb` are high: + +```console +$ psql -c 'SELECT * FROM pg_stat_activity' | wc -l +53 +$ psql -c "SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid" | wc -l +1697 +$ psql -c "SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid WHERE application_name='dspaceWeb'" | wc -l +1681 +``` + +- Looking at Munin, I see there are indeed a higher number of locks starting on the morning of 2021-10-07: + +![PostgreSQL locks week](/cgspace-notes/2021/10/postgres_locks_ALL-week.png) + +- The only thing I did on 2021-10-07 was import a few thousand metadata corrections... +- I restarted PostgreSQL (instead of restarting Tomcat), so let's see if that helps +- I filed [a bug for the DSpace 6/7 duplicate values metadata import issue](https://github.com/DSpace/DSpace/issues/7989) +- I tested the two patches for removing abandoned submissions from the workflow but unfortunately it seems that they are for the configurable aka XML workflow, and we are using the basic workflow + diff --git a/docs/2021-10/index.html b/docs/2021-10/index.html index 1b0ffe590..ca8366f15 100644 --- a/docs/2021-10/index.html +++ b/docs/2021-10/index.html @@ -25,7 +25,7 @@ So we have 1879/7100 (26.46%) matching already - + @@ -56,9 +56,9 @@ So we have 1879/7100 (26.46%) matching already "@type": "BlogPosting", "headline": "October, 2021", "url": "https://alanorth.github.io/cgspace-notes/2021-10/", - "wordCount": "2199", + "wordCount": "2424", "datePublished": "2021-10-01T11:14:07+03:00", - "dateModified": "2021-10-09T22:00:59+03:00", + "dateModified": "2021-10-10T16:01:27+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -453,7 +453,11 @@ $ psql -h localhost -p 5433 -U postgres -c 'alter user dspacetest nosuperuser;'
$ psql -h localhost -p 5433 -U postgres dspace7 -c "DELETE FROM schema_version WHERE description LIKE '%Atmire%' OR description LIKE '%CUA%' OR description LIKE '%cua%';"
 $ psql -h localhost -p 5433 -U postgres dspace7 -c "DELETE FROM schema_version WHERE version IN ('5.0.2017.09.25', '6.0.2017.01.30', '6.0.2017.09.25');"