Add notes for 2022-12-25

This commit is contained in:
2022-12-25 16:48:19 +02:00
parent 249a63404b
commit bf122d4ac3
29 changed files with 152 additions and 34 deletions

View File

@ -263,5 +263,59 @@ $ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid =
- I exported the Initiatives collection to check the metadata quality
- I fixed a few errors and missing regions using csv-metadata-quality
- Abenet and Bizu noticed some strange characters in affiliations submitted by MEL
- They appear like so in four items currently `Instituto Nacional de Investigaci<63>n y Tecnolog<6F>a Agraria y Alimentaria, Spain`
- I submitted [an issue](https://github.com/CodeObia/MEL/issues/11108) on MEL's GitHub repository
## 2022-12-24
- Export the ILRI community to try to see if there were any items with Initiative metadata that are not mapped to Initiative collections
- I found about twenty...
- Then I did the same for the AICCRA community
## 2022-12-25
- The load on the server is high and I see some seemingly stuck PostgreSQL locks from dspaceCli:
```console
$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c
44 dspaceApi
58 dspaceCli
```
- [Looking into this more](https://jaketrent.com/post/find-kill-locks-postgres/) I see the PIDs for the dspaceCli locks:
```sql
SELECT pl.pid FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid WHERE psa.application_name = 'dspaceCli'
```
- And the SQL queries themselves:
```console
postgres=# SELECT pid, state, usename, query, query_start
FROM pg_stat_activity
WHERE pid IN (
SELECT pl.pid FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid WHERE psa.application_name = 'dspaceCli'
);
```
- For these fifty-eight locks there are only six queries running
- Interestingly, they all started at either 04:00 or 05:00 this morning...
- I canceled one using `SELECT pg_cancel_backend(1098749);` and then two of the other PIDs died, perhaps they were dependent?
- Then I canceled the next one and the remaining ones died also
- I exported the entire CGSpace and then ran the `fix-initiative-mappings.py` script, which found 124 items to be mapped
- Getting only the items that have new mappings from the output file is currently tricky because you have to change the file to unix encoding, capture the diff output from the original, and re-add the column headers, but at least this makes the DSpace batch import have to check WAY fewer items
- For the record, I used grep to get only the new lines:
```console
$ grep -xvFf /tmp/orig.csv /tmp/cgspace-mappings.csv > /tmp/2022-12-25-fix-mappings.csv
```
- Then I imported to CGSpace, and will start an AReS harvest once its done
- The import process was quick but it triggered a lot of Solr updates and I see locks rising from dspaceCli again
- After five hours the Solr updating from the metadata import wasn't finished, so I cancelled it, and I see that the items were *not* mapped...
- I split the CSV into multiple files, each with ten items, and the first one imported, but the second went on to do Solr updating stuff forever...
- All twelve files worked except the second one, so it must be something with one of those items...
- Now I started a harvest on AReS
<!-- vim: set sw=2 ts=2: -->