Add notes for 2023-01-04

This commit is contained in:
Alan Orth 2023-01-04 17:08:14 +03:00
parent 676eefafbb
commit d1278a67d8
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
31 changed files with 114 additions and 42 deletions

View File

@ -48,4 +48,31 @@ $ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid =
- The current time on the server is 08:52 and I see the dspaceCli locks were started at 04:00 and 05:00... so I need to check which cron jobs those belong to as I think I noticed this last month too
- I'm going to wait and see if they finish, but by tomorrow I will kill them
## 2023-01-02
- The load on the server is now very low and there are no more locks from dspaceCli
- So there *was* some long-running process that was running and had to finish!
- That finally sheds some light on the "high load on Sunday" problem where I couldn't find any other distinct pattern in the nginx or Tomcat requests
## 2023-01-03
- The load from the server on Sundays, which I have noticed for a long time, seems to be coming from the DSpace checker cron job
- This checks the checksums of all bitstreams to see if they match the ones in the database
- I exported the entire CGSpace metadata to do country/region checks with `csv-metadata-quality`
- I extracted only the items with countries, which was about 48,000, then split the file into parts of 10,000 items, but the upload found 2,000 changes in the first one and took several hours to complete...
- IWMI sent me ORCID identifiers for new scientsts, bringing our total to 2,010
## 2023-01-04
- I finally finished applying the region imports (in five batches of 10,000)
- It was about 7,500 missing regions in total...
- Now I will move on to doing the Initiative mappings
- I modified my `fix-initiative-mappings.py` script to only write out the items that have updated mappings
- This makes it way easier to apply fixes to the entire CGSpace because we don't try to import 100,000 items with no changes in mappings
- More dspaceCli locks from 04:00 this morning (current time on server is 07:33) and today is a Wednesday
- The checker cron job runs on `0,3`, which is Sunday and Wednesday, so this is from that...
- Finally at 16:30 I decided to kill the PIDs associated with those locks...
- I am going to disable that cron job for now and watch the server load for a few weeks
- Start a harvest on AReS
<!-- vim: set sw=2 ts=2: -->

View File

@ -24,7 +24,7 @@ I reverted the Cocoon autosave change because it was more of a nuissance that Pe
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2022-11/" />
<meta property="article:published_time" content="2022-11-01T09:11:36+03:00" />
<meta property="article:modified_time" content="2022-12-03T10:46:29+03:00" />
<meta property="article:modified_time" content="2023-01-04T10:53:02+03:00" />
@ -54,9 +54,9 @@ I reverted the Cocoon autosave change because it was more of a nuissance that Pe
"@type": "BlogPosting",
"headline": "November, 2022",
"url": "https://alanorth.github.io/cgspace-notes/2022-11/",
"wordCount": "3414",
"wordCount": "3411",
"datePublished": "2022-11-01T09:11:36+03:00",
"dateModified": "2022-12-03T10:46:29+03:00",
"dateModified": "2023-01-04T10:53:02+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -721,7 +721,7 @@ I reverted the Cocoon autosave change because it was more of a nuissance that Pe
</span></span><span style="display:flex;"><span> 60 dspaceCli
</span></span><span style="display:flex;"><span> 176 dspaceApi
</span></span><span style="display:flex;"><span> 1194 dspaceWeb
</span></span></code></pre></div><p><a href="/cgspace-notes/2022/11/postgres_locks_cgspace-day.png">!PostgreSQL database locks</a></p>
</span></span></code></pre></div><p><img src="/cgspace-notes/2022/11/postgres_locks_cgspace-day.png" alt="PostgreSQL database locks"></p>
<ul>
<li>The timing looks suspiciously close to when I was running the batch updates on the ILRI community this morning.
<ul>

View File

@ -20,7 +20,7 @@ Replace &ldquo;East Asia&rdquo; with &ldquo;Eastern Asia&rdquo; region on CGSpac
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2022-12/" />
<meta property="article:published_time" content="2022-12-01T08:52:36+03:00" />
<meta property="article:modified_time" content="2022-12-29T08:32:08+02:00" />
<meta property="article:modified_time" content="2023-01-01T10:12:13+02:00" />
@ -48,7 +48,7 @@ Replace &ldquo;East Asia&rdquo; with &ldquo;Eastern Asia&rdquo; region on CGSpac
"url": "https://alanorth.github.io/cgspace-notes/2022-12/",
"wordCount": "2671",
"datePublished": "2022-12-01T08:52:36+03:00",
"dateModified": "2022-12-29T08:32:08+02:00",
"dateModified": "2023-01-01T10:12:13+02:00",
"author": {
"@type": "Person",
"name": "Alan Orth"

View File

@ -19,7 +19,7 @@ I see we have some new ones that aren&rsquo;t in our list if I combine with this
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2023-01/" />
<meta property="article:published_time" content="2023-01-01T08:44:36+03:00" />
<meta property="article:modified_time" content="2023-01-01T08:44:36+03:00" />
<meta property="article:modified_time" content="2023-01-01T10:12:13+02:00" />
@ -44,9 +44,9 @@ I see we have some new ones that aren&rsquo;t in our list if I combine with this
"@type": "BlogPosting",
"headline": "January, 2023",
"url": "https://alanorth.github.io/cgspace-notes/2023-01/",
"wordCount": "263",
"wordCount": "567",
"datePublished": "2023-01-01T08:44:36+03:00",
"dateModified": "2023-01-01T08:44:36+03:00",
"dateModified": "2023-01-01T10:12:13+02:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -162,6 +162,51 @@ I see we have some new ones that aren&rsquo;t in our list if I combine with this
</ul>
</li>
</ul>
<h2 id="2023-01-02">2023-01-02</h2>
<ul>
<li>The load on the server is now very low and there are no more locks from dspaceCli
<ul>
<li>So there <em>was</em> some long-running process that was running and had to finish!</li>
<li>That finally sheds some light on the &ldquo;high load on Sunday&rdquo; problem where I couldn&rsquo;t find any other distinct pattern in the nginx or Tomcat requests</li>
</ul>
</li>
</ul>
<h2 id="2023-01-03">2023-01-03</h2>
<ul>
<li>The load from the server on Sundays, which I have noticed for a long time, seems to be coming from the DSpace checker cron job
<ul>
<li>This checks the checksums of all bitstreams to see if they match the ones in the database</li>
</ul>
</li>
<li>I exported the entire CGSpace metadata to do country/region checks with <code>csv-metadata-quality</code>
<ul>
<li>I extracted only the items with countries, which was about 48,000, then split the file into parts of 10,000 items, but the upload found 2,000 changes in the first one and took several hours to complete&hellip;</li>
</ul>
</li>
<li>IWMI sent me ORCID identifiers for new scientsts, bringing our total to 2,010</li>
</ul>
<h2 id="2023-01-04">2023-01-04</h2>
<ul>
<li>I finally finished applying the region imports (in five batches of 10,000)
<ul>
<li>It was about 7,500 missing regions in total&hellip;</li>
</ul>
</li>
<li>Now I will move on to doing the Initiative mappings
<ul>
<li>I modified my <code>fix-initiative-mappings.py</code> script to only write out the items that have updated mappings</li>
<li>This makes it way easier to apply fixes to the entire CGSpace because we don&rsquo;t try to import 100,000 items with no changes in mappings</li>
</ul>
</li>
<li>More dspaceCli locks from 04:00 this morning (current time on server is 07:33) and today is a Wednesday
<ul>
<li>The checker cron job runs on <code>0,3</code>, which is Sunday and Wednesday, so this is from that&hellip;</li>
<li>Finally at 16:30 I decided to kill the PIDs associated with those locks&hellip;</li>
<li>I am going to disable that cron job for now and watch the server load for a few weeks</li>
</ul>
</li>
<li>Start a harvest on AReS</li>
</ul>
<!-- raw HTML omitted -->

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
<meta property="og:updated_time" content="2023-01-01T08:44:36+03:00" />
<meta property="og:updated_time" content="2023-01-04T10:53:02+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-01-01T08:44:36+03:00" />
<meta property="og:updated_time" content="2023-01-04T10:53:02+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-01-01T08:44:36+03:00" />
<meta property="og:updated_time" content="2023-01-04T10:53:02+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-01-01T08:44:36+03:00" />
<meta property="og:updated_time" content="2023-01-04T10:53:02+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-01-01T08:44:36+03:00" />
<meta property="og:updated_time" content="2023-01-04T10:53:02+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-01-01T08:44:36+03:00" />
<meta property="og:updated_time" content="2023-01-04T10:53:02+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-01-01T08:44:36+03:00" />
<meta property="og:updated_time" content="2023-01-04T10:53:02+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-01-01T08:44:36+03:00" />
<meta property="og:updated_time" content="2023-01-04T10:53:02+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-01-01T08:44:36+03:00" />
<meta property="og:updated_time" content="2023-01-04T10:53:02+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-01-01T08:44:36+03:00" />
<meta property="og:updated_time" content="2023-01-04T10:53:02+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-01-01T08:44:36+03:00" />
<meta property="og:updated_time" content="2023-01-04T10:53:02+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-01-01T08:44:36+03:00" />
<meta property="og:updated_time" content="2023-01-04T10:53:02+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-01-01T08:44:36+03:00" />
<meta property="og:updated_time" content="2023-01-04T10:53:02+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-01-01T08:44:36+03:00" />
<meta property="og:updated_time" content="2023-01-04T10:53:02+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-01-01T08:44:36+03:00" />
<meta property="og:updated_time" content="2023-01-04T10:53:02+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-01-01T08:44:36+03:00" />
<meta property="og:updated_time" content="2023-01-04T10:53:02+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-01-01T08:44:36+03:00" />
<meta property="og:updated_time" content="2023-01-04T10:53:02+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-01-01T08:44:36+03:00" />
<meta property="og:updated_time" content="2023-01-04T10:53:02+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-01-01T08:44:36+03:00" />
<meta property="og:updated_time" content="2023-01-04T10:53:02+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-01-01T08:44:36+03:00" />
<meta property="og:updated_time" content="2023-01-04T10:53:02+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-01-01T08:44:36+03:00" />
<meta property="og:updated_time" content="2023-01-04T10:53:02+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-01-01T08:44:36+03:00" />
<meta property="og:updated_time" content="2023-01-04T10:53:02+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-01-01T08:44:36+03:00" />
<meta property="og:updated_time" content="2023-01-04T10:53:02+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-01-01T08:44:36+03:00" />
<meta property="og:updated_time" content="2023-01-04T10:53:02+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-01-01T08:44:36+03:00" />
<meta property="og:updated_time" content="2023-01-04T10:53:02+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-01-01T08:44:36+03:00" />
<meta property="og:updated_time" content="2023-01-04T10:53:02+03:00" />

View File

@ -3,25 +3,25 @@
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<url>
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
<lastmod>2023-01-01T08:44:36+03:00</lastmod>
<lastmod>2023-01-04T10:53:02+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2023-01-01T08:44:36+03:00</lastmod>
<lastmod>2023-01-04T10:53:02+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/2023-01/</loc>
<lastmod>2023-01-01T08:44:36+03:00</lastmod>
<lastmod>2023-01-01T10:12:13+02:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
<lastmod>2023-01-01T08:44:36+03:00</lastmod>
<lastmod>2023-01-04T10:53:02+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2023-01-01T08:44:36+03:00</lastmod>
<lastmod>2023-01-04T10:53:02+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/2022-12/</loc>
<lastmod>2022-12-29T08:32:08+02:00</lastmod>
<lastmod>2023-01-01T10:12:13+02:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/2022-11/</loc>
<lastmod>2022-12-03T10:46:29+03:00</lastmod>
<lastmod>2023-01-04T10:53:02+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/2022-10/</loc>
<lastmod>2022-10-31T16:59:47+03:00</lastmod>