From bf122d4ac391c1af208297510318b6ee5e88cec8 Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Sun, 25 Dec 2022 16:48:19 +0200 Subject: [PATCH] Add notes for 2022-12-25 --- content/posts/2022-12.md | 54 +++++++++++++++++++ docs/2022-12/index.html | 70 +++++++++++++++++++++++-- docs/categories/index.html | 2 +- docs/categories/notes/index.html | 2 +- docs/categories/notes/page/2/index.html | 2 +- docs/categories/notes/page/3/index.html | 2 +- docs/categories/notes/page/4/index.html | 2 +- docs/categories/notes/page/5/index.html | 2 +- docs/categories/notes/page/6/index.html | 2 +- docs/categories/notes/page/7/index.html | 2 +- docs/index.html | 2 +- docs/page/2/index.html | 2 +- docs/page/3/index.html | 2 +- docs/page/4/index.html | 2 +- docs/page/5/index.html | 2 +- docs/page/6/index.html | 2 +- docs/page/7/index.html | 2 +- docs/page/8/index.html | 2 +- docs/page/9/index.html | 2 +- docs/posts/index.html | 2 +- docs/posts/page/2/index.html | 2 +- docs/posts/page/3/index.html | 2 +- docs/posts/page/4/index.html | 2 +- docs/posts/page/5/index.html | 2 +- docs/posts/page/6/index.html | 2 +- docs/posts/page/7/index.html | 2 +- docs/posts/page/8/index.html | 2 +- docs/posts/page/9/index.html | 2 +- docs/sitemap.xml | 10 ++-- 29 files changed, 152 insertions(+), 34 deletions(-) diff --git a/content/posts/2022-12.md b/content/posts/2022-12.md index 4ed10efa7..4c7d30df6 100644 --- a/content/posts/2022-12.md +++ b/content/posts/2022-12.md @@ -263,5 +263,59 @@ $ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = - I exported the Initiatives collection to check the metadata quality - I fixed a few errors and missing regions using csv-metadata-quality +- Abenet and Bizu noticed some strange characters in affiliations submitted by MEL + - They appear like so in four items currently `Instituto Nacional de Investigaci�n y Tecnolog�a Agraria y Alimentaria, Spain` + - I submitted [an issue](https://github.com/CodeObia/MEL/issues/11108) on MEL's GitHub repository + +## 2022-12-24 + +- Export the ILRI community to try to see if there were any items with Initiative metadata that are not mapped to Initiative collections + - I found about twenty... + - Then I did the same for the AICCRA community + +## 2022-12-25 + +- The load on the server is high and I see some seemingly stuck PostgreSQL locks from dspaceCli: + +```console +$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c + 44 dspaceApi + 58 dspaceCli +``` + +- [Looking into this more](https://jaketrent.com/post/find-kill-locks-postgres/) I see the PIDs for the dspaceCli locks: + +```sql +SELECT pl.pid FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid WHERE psa.application_name = 'dspaceCli' +``` + +- And the SQL queries themselves: + +```console +postgres=# SELECT pid, state, usename, query, query_start +FROM pg_stat_activity +WHERE pid IN ( + SELECT pl.pid FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid WHERE psa.application_name = 'dspaceCli' +); +``` + +- For these fifty-eight locks there are only six queries running + - Interestingly, they all started at either 04:00 or 05:00 this morning... +- I canceled one using `SELECT pg_cancel_backend(1098749);` and then two of the other PIDs died, perhaps they were dependent? + - Then I canceled the next one and the remaining ones died also +- I exported the entire CGSpace and then ran the `fix-initiative-mappings.py` script, which found 124 items to be mapped + - Getting only the items that have new mappings from the output file is currently tricky because you have to change the file to unix encoding, capture the diff output from the original, and re-add the column headers, but at least this makes the DSpace batch import have to check WAY fewer items + - For the record, I used grep to get only the new lines: + +```console +$ grep -xvFf /tmp/orig.csv /tmp/cgspace-mappings.csv > /tmp/2022-12-25-fix-mappings.csv +``` + +- Then I imported to CGSpace, and will start an AReS harvest once its done + - The import process was quick but it triggered a lot of Solr updates and I see locks rising from dspaceCli again + - After five hours the Solr updating from the metadata import wasn't finished, so I cancelled it, and I see that the items were *not* mapped... + - I split the CSV into multiple files, each with ten items, and the first one imported, but the second went on to do Solr updating stuff forever... + - All twelve files worked except the second one, so it must be something with one of those items... +- Now I started a harvest on AReS diff --git a/docs/2022-12/index.html b/docs/2022-12/index.html index 5d5857423..04c8a077a 100644 --- a/docs/2022-12/index.html +++ b/docs/2022-12/index.html @@ -20,7 +20,7 @@ Replace “East Asia” with “Eastern Asia” region on CGSpac - + @@ -46,9 +46,9 @@ Replace “East Asia” with “Eastern Asia” region on CGSpac "@type": "BlogPosting", "headline": "December, 2022", "url": "https://alanorth.github.io/cgspace-notes/2022-12/", - "wordCount": "1727", + "wordCount": "2167", "datePublished": "2022-12-01T08:52:36+03:00", - "dateModified": "2022-12-21T20:39:09+02:00", + "dateModified": "2022-12-23T10:04:37+02:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -420,6 +420,70 @@ Replace “East Asia” with “Eastern Asia” region on CGSpac
  • I fixed a few errors and missing regions using csv-metadata-quality
  • +
  • Abenet and Bizu noticed some strange characters in affiliations submitted by MEL + +
  • + +

    2022-12-24

    + +

    2022-12-25

    + +
    $ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c
    +     44 dspaceApi
    +     58 dspaceCli
    +
    +
    SELECT pl.pid FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid WHERE psa.application_name = 'dspaceCli'
    +
    +
    postgres=# SELECT pid, state, usename, query, query_start 
    +FROM pg_stat_activity 
    +WHERE pid IN (
    +  SELECT pl.pid FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid WHERE psa.application_name = 'dspaceCli'
    +);
    +
    +
    $ grep -xvFf /tmp/orig.csv /tmp/cgspace-mappings.csv > /tmp/2022-12-25-fix-mappings.csv
    +
    diff --git a/docs/categories/index.html b/docs/categories/index.html index 9e6ce62b9..8dcc0497e 100644 --- a/docs/categories/index.html +++ b/docs/categories/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html index e80671292..77394ebc4 100644 --- a/docs/categories/notes/index.html +++ b/docs/categories/notes/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html index 9ac052bf2..39796b739 100644 --- a/docs/categories/notes/page/2/index.html +++ b/docs/categories/notes/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html index ab5f05ba3..2ff17e2b1 100644 --- a/docs/categories/notes/page/3/index.html +++ b/docs/categories/notes/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html index aae55a837..b60a5c378 100644 --- a/docs/categories/notes/page/4/index.html +++ b/docs/categories/notes/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html index c7b1256c6..d8503b40e 100644 --- a/docs/categories/notes/page/5/index.html +++ b/docs/categories/notes/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/6/index.html b/docs/categories/notes/page/6/index.html index 896a90ef7..9aa85c637 100644 --- a/docs/categories/notes/page/6/index.html +++ b/docs/categories/notes/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/7/index.html b/docs/categories/notes/page/7/index.html index a9bd9b5e8..c1a0e8d4e 100644 --- a/docs/categories/notes/page/7/index.html +++ b/docs/categories/notes/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/index.html b/docs/index.html index 7c9c978a4..501b4704c 100644 --- a/docs/index.html +++ b/docs/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/2/index.html b/docs/page/2/index.html index 6d1357692..ba0c371f6 100644 --- a/docs/page/2/index.html +++ b/docs/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/3/index.html b/docs/page/3/index.html index 4b169d463..42f43cd43 100644 --- a/docs/page/3/index.html +++ b/docs/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/4/index.html b/docs/page/4/index.html index 51c85ca62..e7e5d5678 100644 --- a/docs/page/4/index.html +++ b/docs/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/5/index.html b/docs/page/5/index.html index d8235807e..79e3b1458 100644 --- a/docs/page/5/index.html +++ b/docs/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/6/index.html b/docs/page/6/index.html index 99786569b..2760f6e35 100644 --- a/docs/page/6/index.html +++ b/docs/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/7/index.html b/docs/page/7/index.html index 6dc92163b..f58c57c70 100644 --- a/docs/page/7/index.html +++ b/docs/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/8/index.html b/docs/page/8/index.html index 66c4c7582..934d2fe90 100644 --- a/docs/page/8/index.html +++ b/docs/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/9/index.html b/docs/page/9/index.html index ac536ca67..364d39975 100644 --- a/docs/page/9/index.html +++ b/docs/page/9/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/index.html b/docs/posts/index.html index 8fed10408..433f9513b 100644 --- a/docs/posts/index.html +++ b/docs/posts/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html index f97623f59..e69ccce30 100644 --- a/docs/posts/page/2/index.html +++ b/docs/posts/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html index 96a940c74..560db21a5 100644 --- a/docs/posts/page/3/index.html +++ b/docs/posts/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html index f3147606c..825a56164 100644 --- a/docs/posts/page/4/index.html +++ b/docs/posts/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html index 476b28aff..0112e0dfa 100644 --- a/docs/posts/page/5/index.html +++ b/docs/posts/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html index 8977fc85a..3775df830 100644 --- a/docs/posts/page/6/index.html +++ b/docs/posts/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html index af6eacf95..dbb036d9b 100644 --- a/docs/posts/page/7/index.html +++ b/docs/posts/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/8/index.html b/docs/posts/page/8/index.html index ac0fdd163..2a86be16f 100644 --- a/docs/posts/page/8/index.html +++ b/docs/posts/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/9/index.html b/docs/posts/page/9/index.html index 1afc55bf3..313240787 100644 --- a/docs/posts/page/9/index.html +++ b/docs/posts/page/9/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index cf47d375d..af4c679f7 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -3,19 +3,19 @@ xmlns:xhtml="http://www.w3.org/1999/xhtml"> https://alanorth.github.io/cgspace-notes/categories/ - 2022-12-21T20:39:09+02:00 + 2022-12-23T10:04:37+02:00 https://alanorth.github.io/cgspace-notes/ - 2022-12-21T20:39:09+02:00 + 2022-12-23T10:04:37+02:00 https://alanorth.github.io/cgspace-notes/2022-12/ - 2022-12-21T20:39:09+02:00 + 2022-12-23T10:04:37+02:00 https://alanorth.github.io/cgspace-notes/categories/notes/ - 2022-12-21T20:39:09+02:00 + 2022-12-23T10:04:37+02:00 https://alanorth.github.io/cgspace-notes/posts/ - 2022-12-21T20:39:09+02:00 + 2022-12-23T10:04:37+02:00 https://alanorth.github.io/cgspace-notes/2022-11/ 2022-12-03T10:46:29+03:00