mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2023-01-01
This commit is contained in:
@ -344,5 +344,35 @@ UPDATE metadatavalue SET text_lang='en_US' WHERE dspace_object_id IN (SELECT uui
|
||||
- I updated the text lang values on CGSpace and re-exported the community
|
||||
- I fixed a bunch of invalid licenses in these items
|
||||
- Then I added mappings for another handful of items
|
||||
- I tagged ORCID identifiers for another thirty items or so
|
||||
- At 8PM I got a notice from UptimeRobot again that CGSpace was down
|
||||
- The load is still only around 2.x or 3.x, but there are a lot (and increasing) number of PostgreSQL connections and locks
|
||||
- They appear to be all from the frontend:
|
||||
|
||||
```console
|
||||
$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c
|
||||
2892 dspaceWeb
|
||||
$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c
|
||||
2950 dspaceWeb
|
||||
$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c
|
||||
3792 dspaceWeb
|
||||
$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c
|
||||
4460 dspaceWeb
|
||||
```
|
||||
|
||||
- I don't see any other system statistics that look out of order...
|
||||
- DSpace sessions, network throughput, CPU, etc all seem sane...
|
||||
- And then all of a sudden, I didn't do anything, but all the locks disappeared and I was able to access the website... WTF
|
||||
|
||||
## 2022-12-30
|
||||
|
||||
- Start a harvest on AReS
|
||||
|
||||
## 2022-12-31
|
||||
|
||||
- I found a bunch of items on AReS that have issue dates in 2023 which made me curious
|
||||
- Looking closer, I think all of these have been tagged incorrectly because they were published online already in 2022
|
||||
- I sent a mail to Abenet and Bizu to ask, but for sure I know that PRMS will be considering first published date as first published date, no matter if that is online or in print
|
||||
- I also added some ORCID identifiers to our list and generated thumbnails for some journal articles that were Creative Commons
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
51
content/posts/2023-01.md
Normal file
51
content/posts/2023-01.md
Normal file
@ -0,0 +1,51 @@
|
||||
---
|
||||
title: "January, 2023"
|
||||
date: 2023-01-01T08:44:36+03:00
|
||||
author: "Alan Orth"
|
||||
categories: ["Notes"]
|
||||
---
|
||||
|
||||
## 2023-01-01
|
||||
|
||||
- Apply some more ORCID identifiers to items on CGSpace using my `2022-09-22-add-orcids.csv` file
|
||||
- I want to update all ORCID names and refresh them in the database
|
||||
- I see we have some new ones that aren't in our list if I combine with this file:
|
||||
|
||||
<!--more-->
|
||||
|
||||
```console
|
||||
$ cat dspace/config/controlled-vocabularies/cg-creator-identifier.xml | grep - oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort -u | wc -l
|
||||
1939
|
||||
$ cat dspace/config/controlled-vocabularies/cg-creator-identifier.xml 2022-09-22-add-orcids.csv| grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort -u | wc -l
|
||||
1973
|
||||
```
|
||||
|
||||
- I will extract and process them with my `resolve-orcids.py` script:
|
||||
|
||||
```console
|
||||
$ cat dspace/config/controlled-vocabularies/cg-creator-identifier.xml 2022-09-22-add-orcids.csv| grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort -u > /tmp/2023-01-01-orcids.txt
|
||||
$ ./ilri/resolve-orcids.py -i /tmp/2023-01-01-orcids.txt -o /tmp/2023-01-01-orcids-names.txt -d
|
||||
```
|
||||
|
||||
-
|
||||
|
||||
```console
|
||||
$ ./ilri/update-orcids.py -i /tmp/2023-01-01-orcids-names.txt -db dspace -u dspace -p 'fuuu' -m 247
|
||||
```
|
||||
|
||||
- Load on CGSpace is high around 9.x
|
||||
- I see there is a CIAT bot harvesting via the REST API with IP 45.5.186.2
|
||||
- Other than that I don't see any particular system stats as alarming
|
||||
- There has been a marked increase in load in the last few weeks, perhaps due to Initiative activity...
|
||||
- Perhaps there are some stuck PostgreSQL locks from CLI tools?
|
||||
|
||||
```console
|
||||
$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c
|
||||
58 dspaceCli
|
||||
46 dspaceWeb
|
||||
```
|
||||
|
||||
- The current time on the server is 08:52 and I see the dspaceCli locks were started at 04:00 and 05:00... so I need to check which cron jobs those belong to as I think I noticed this last month too
|
||||
- I'm going to wait and see if they finish, but by tomorrow I will kill them
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
Reference in New Issue
Block a user