mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-12-22 13:12:19 +01:00
Add notes for 2021-12
This commit is contained in:
parent
80c9765cc7
commit
803d91481e
90
content/posts/2021-12.md
Normal file
90
content/posts/2021-12.md
Normal file
@ -0,0 +1,90 @@
|
||||
---
|
||||
title: "December, 2021"
|
||||
date: 2021-12-01T16:07:07+02:00
|
||||
author: "Alan Orth"
|
||||
categories: ["Notes"]
|
||||
---
|
||||
|
||||
## 2021-12-01
|
||||
|
||||
- Atmire merged some changes I had submitted to the COUNTER-Robots project
|
||||
- I updated our local spider user agents and then re-ran the list with my `check-spider-hits.sh` script on CGSpace:
|
||||
|
||||
```console
|
||||
$ ./ilri/check-spider-hits.sh -f /tmp/agents -p
|
||||
Purging 1989 hits from The Knowledge AI in statistics
|
||||
Purging 1235 hits from MaCoCu in statistics
|
||||
Purging 455 hits from WhatsApp in statistics
|
||||
|
||||
Total number of bot hits purged: 3679
|
||||
```
|
||||
|
||||
<!--more-->
|
||||
|
||||
## 2021-12-02
|
||||
|
||||
- Francesca from Alliance asked me for help with approving a submission that gets stuck
|
||||
- I looked at the PostgreSQL activity and the locks are back up like they were earlier this week
|
||||
|
||||
```console
|
||||
$ psql -c "SELECT application_name FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid" | sort | uniq -c | sort -n
|
||||
1
|
||||
1 ------------------
|
||||
1 (1437 rows)
|
||||
1 application_name
|
||||
9 psql
|
||||
1428 dspaceWeb
|
||||
```
|
||||
|
||||
- Munin shows the same:
|
||||
|
||||
![PostgreSQL locks week](/cgspace-notes/2021/12/postgres_locks_ALL-week.png)
|
||||
|
||||
- Last month I enabled the `log_lock_waits` in PostgreSQL so I checked the log and was surprised to find only a few since I restarted PostgreSQL three days ago:
|
||||
|
||||
```console
|
||||
# grep -E '^2021-(11-29|11-30|12-01|12-02)' /var/log/postgresql/postgresql-10-main.log | grep -c 'still waiting for'
|
||||
15
|
||||
```
|
||||
|
||||
- I think you could analyze the locks for the `dspaceWeb` user (XMLUI) and find out what queries were locking... but it's so much information and I don't know where to start
|
||||
- For now I just restarted PostgreSQL...
|
||||
- Francesca was able to do her submission immediately...
|
||||
- On a related note, I want to enable the `pg_stat_statement` feature to see which queries get run the most, so I created the extension on the CGSpace database
|
||||
- I was doing some research on PostgreSQL locks and found some interesting things to consider
|
||||
- The default `lock_timeout` is 0, aka disabled
|
||||
- The default `statement_timeout` is 0, aka disabled
|
||||
- It seems to be recommended to start by setting `statement_timeout` first, rule of thumb [ten times longer than your longest query](https://github.com/jberkus/annotated.conf/blob/master/postgresql.10.simple.conf#L211)
|
||||
- Mark Wood mentioned the `checker` cron job that apparently runs in one transaction and might be an issue
|
||||
- I definitely saw it holding a bunch of locks for ~30 minutes during the first part of its execution, then it dropped them and did some other less-intensive things without locks
|
||||
- Bizuwork was still not receiving emails even after we fixed the SMTP access on CGSpace
|
||||
- After some troubleshooting it turns out that the emails from CGSpace were going in her Junk!
|
||||
|
||||
## 2021-12-03
|
||||
|
||||
- I see GARDIAN is now using a "GARDIAN" user agent finally
|
||||
- I will add them to our local spider agent override in DSpace so that the hits don't get counted in Solr
|
||||
|
||||
## 2021-12-05
|
||||
|
||||
- Proof fifty records Abenet sent me from Africa Rice Center ("AfricaRice 1st batch Import")
|
||||
- Fixed forty-six incorrect collections
|
||||
- Cleaned up and normalize affiliations
|
||||
- Cleaned up dates (extra `*` character in all?)
|
||||
- Cleaned up citation format
|
||||
- Fixed some encoding issues in abstracts
|
||||
- Removed empty columns
|
||||
- Removed one duplicate: Enhancing Rice Productivity and Soil Nitrogen Using Dual-Purpose Cowpea-NERICA® Rice Sequence in Degraded Savanna
|
||||
- Added volume and issue metadata by extracting it from the citations
|
||||
- All PDFs hosted on davidpublishing.com are dead...
|
||||
- All DOIs linking to African Journal of Agricultural Research are dead...
|
||||
- Fixed a handful of items marked as "Open Access" that are actually closed
|
||||
- Added many missing ISSNs
|
||||
- Added many missing countries/regions
|
||||
- Fixed invalid AGROVOC terms and added some more based on article subjects
|
||||
- I also made some minor changes to the [CSV Metadata Quality Checker](https://github.com/ilri/csv-metadata-quality)
|
||||
- Added the ability to check if the item's title exists in the citation
|
||||
- Updated to only run the mojibake check if we're not running in unsafe mode (so we don't print the same warning during both the check and fix steps)
|
||||
- I ran the re-harvesting on AReS
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
@ -50,7 +50,7 @@ Total number of bot hits purged: 3679
|
||||
"@type": "BlogPosting",
|
||||
"headline": "December, 2021",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2021-12/",
|
||||
"wordCount": "404",
|
||||
"wordCount": "597",
|
||||
"datePublished": "2021-12-01T16:07:07+02:00",
|
||||
"dateModified": "2021-12-01T16:07:07+02:00",
|
||||
"author": {
|
||||
@ -191,10 +191,38 @@ Purging 455 hits from WhatsApp in statistics
|
||||
<ul>
|
||||
<li>I see GARDIAN is now using a “GARDIAN” user agent finally
|
||||
<ul>
|
||||
<li>I will add them to our local bot override for Solr</li>
|
||||
<li>I will add them to our local spider agent override in DSpace so that the hits don’t get counted in Solr</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<h2 id="2021-12-05">2021-12-05</h2>
|
||||
<ul>
|
||||
<li>Proof fifty records Abenet sent me from Africa Rice Center (“AfricaRice 1st batch Import”)
|
||||
<ul>
|
||||
<li>Fixed forty-six incorrect collections</li>
|
||||
<li>Cleaned up and normalize affiliations</li>
|
||||
<li>Cleaned up dates (extra <code>*</code> character in all?)</li>
|
||||
<li>Cleaned up citation format</li>
|
||||
<li>Fixed some encoding issues in abstracts</li>
|
||||
<li>Removed empty columns</li>
|
||||
<li>Removed one duplicate: Enhancing Rice Productivity and Soil Nitrogen Using Dual-Purpose Cowpea-NERICA® Rice Sequence in Degraded Savanna</li>
|
||||
<li>Added volume and issue metadata by extracting it from the citations</li>
|
||||
<li>All PDFs hosted on davidpublishing.com are dead…</li>
|
||||
<li>All DOIs linking to African Journal of Agricultural Research are dead…</li>
|
||||
<li>Fixed a handful of items marked as “Open Access” that are actually closed</li>
|
||||
<li>Added many missing ISSNs</li>
|
||||
<li>Added many missing countries/regions</li>
|
||||
<li>Fixed invalid AGROVOC terms and added some more based on article subjects</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>I also made some minor changes to the <a href="https://github.com/ilri/csv-metadata-quality">CSV Metadata Quality Checker</a>
|
||||
<ul>
|
||||
<li>Added the ability to check if the item’s title exists in the citation</li>
|
||||
<li>Updated to only run the mojibake check if we’re not running in unsafe mode (so we don’t print the same warning during both the check and fix steps)</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>I ran the re-harvesting on AReS</li>
|
||||
</ul>
|
||||
<!-- raw HTML omitted -->
|
||||
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user