Add notes for 2022-12-25

This commit is contained in:
Alan Orth 2022-12-25 16:48:19 +02:00
parent 249a63404b
commit bf122d4ac3
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
29 changed files with 152 additions and 34 deletions

View File

@ -263,5 +263,59 @@ $ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid =
- I exported the Initiatives collection to check the metadata quality
- I fixed a few errors and missing regions using csv-metadata-quality
- Abenet and Bizu noticed some strange characters in affiliations submitted by MEL
- They appear like so in four items currently `Instituto Nacional de Investigaci<63>n y Tecnolog<6F>a Agraria y Alimentaria, Spain`
- I submitted [an issue](https://github.com/CodeObia/MEL/issues/11108) on MEL's GitHub repository
## 2022-12-24
- Export the ILRI community to try to see if there were any items with Initiative metadata that are not mapped to Initiative collections
- I found about twenty...
- Then I did the same for the AICCRA community
## 2022-12-25
- The load on the server is high and I see some seemingly stuck PostgreSQL locks from dspaceCli:
```console
$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c
44 dspaceApi
58 dspaceCli
```
- [Looking into this more](https://jaketrent.com/post/find-kill-locks-postgres/) I see the PIDs for the dspaceCli locks:
```sql
SELECT pl.pid FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid WHERE psa.application_name = 'dspaceCli'
```
- And the SQL queries themselves:
```console
postgres=# SELECT pid, state, usename, query, query_start
FROM pg_stat_activity
WHERE pid IN (
SELECT pl.pid FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid WHERE psa.application_name = 'dspaceCli'
);
```
- For these fifty-eight locks there are only six queries running
- Interestingly, they all started at either 04:00 or 05:00 this morning...
- I canceled one using `SELECT pg_cancel_backend(1098749);` and then two of the other PIDs died, perhaps they were dependent?
- Then I canceled the next one and the remaining ones died also
- I exported the entire CGSpace and then ran the `fix-initiative-mappings.py` script, which found 124 items to be mapped
- Getting only the items that have new mappings from the output file is currently tricky because you have to change the file to unix encoding, capture the diff output from the original, and re-add the column headers, but at least this makes the DSpace batch import have to check WAY fewer items
- For the record, I used grep to get only the new lines:
```console
$ grep -xvFf /tmp/orig.csv /tmp/cgspace-mappings.csv > /tmp/2022-12-25-fix-mappings.csv
```
- Then I imported to CGSpace, and will start an AReS harvest once its done
- The import process was quick but it triggered a lot of Solr updates and I see locks rising from dspaceCli again
- After five hours the Solr updating from the metadata import wasn't finished, so I cancelled it, and I see that the items were *not* mapped...
- I split the CSV into multiple files, each with ten items, and the first one imported, but the second went on to do Solr updating stuff forever...
- All twelve files worked except the second one, so it must be something with one of those items...
- Now I started a harvest on AReS
<!-- vim: set sw=2 ts=2: -->

View File

@ -20,7 +20,7 @@ Replace &ldquo;East Asia&rdquo; with &ldquo;Eastern Asia&rdquo; region on CGSpac
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2022-12/" />
<meta property="article:published_time" content="2022-12-01T08:52:36+03:00" />
<meta property="article:modified_time" content="2022-12-21T20:39:09+02:00" />
<meta property="article:modified_time" content="2022-12-23T10:04:37+02:00" />
@ -46,9 +46,9 @@ Replace &ldquo;East Asia&rdquo; with &ldquo;Eastern Asia&rdquo; region on CGSpac
"@type": "BlogPosting",
"headline": "December, 2022",
"url": "https://alanorth.github.io/cgspace-notes/2022-12/",
"wordCount": "1727",
"wordCount": "2167",
"datePublished": "2022-12-01T08:52:36+03:00",
"dateModified": "2022-12-21T20:39:09+02:00",
"dateModified": "2022-12-23T10:04:37+02:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -420,6 +420,70 @@ Replace &ldquo;East Asia&rdquo; with &ldquo;Eastern Asia&rdquo; region on CGSpac
<li>I fixed a few errors and missing regions using csv-metadata-quality</li>
</ul>
</li>
<li>Abenet and Bizu noticed some strange characters in affiliations submitted by MEL
<ul>
<li>They appear like so in four items currently <code>Instituto Nacional de Investigaci<63>n y Tecnolog<6F>a Agraria y Alimentaria, Spain</code></li>
<li>I submitted <a href="https://github.com/CodeObia/MEL/issues/11108">an issue</a> on MEL&rsquo;s GitHub repository</li>
</ul>
</li>
</ul>
<h2 id="2022-12-24">2022-12-24</h2>
<ul>
<li>Export the ILRI community to try to see if there were any items with Initiative metadata that are not mapped to Initiative collections
<ul>
<li>I found about twenty&hellip;</li>
<li>Then I did the same for the AICCRA community</li>
</ul>
</li>
</ul>
<h2 id="2022-12-25">2022-12-25</h2>
<ul>
<li>The load on the server is high and I see some seemingly stuck PostgreSQL locks from dspaceCli:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ psql -c <span style="color:#e6db74">&#39;SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;&#39;</span> | grep -o -E <span style="color:#e6db74">&#39;(dspaceWeb|dspaceApi|dspaceCli)&#39;</span> | sort | uniq -c
</span></span><span style="display:flex;"><span> 44 dspaceApi
</span></span><span style="display:flex;"><span> 58 dspaceCli
</span></span></code></pre></div><ul>
<li><a href="https://jaketrent.com/post/find-kill-locks-postgres/">Looking into this more</a> I see the PIDs for the dspaceCli locks:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-sql" data-lang="sql"><span style="display:flex;"><span><span style="color:#66d9ef">SELECT</span> pl.pid <span style="color:#66d9ef">FROM</span> pg_locks pl <span style="color:#66d9ef">LEFT</span> <span style="color:#66d9ef">JOIN</span> pg_stat_activity psa <span style="color:#66d9ef">ON</span> pl.pid <span style="color:#f92672">=</span> psa.pid <span style="color:#66d9ef">WHERE</span> psa.application_name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#39;dspaceCli&#39;</span>
</span></span></code></pre></div><ul>
<li>And the SQL queries themselves:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>postgres=# SELECT pid, state, usename, query, query_start
</span></span><span style="display:flex;"><span>FROM pg_stat_activity
</span></span><span style="display:flex;"><span>WHERE pid IN (
</span></span><span style="display:flex;"><span> SELECT pl.pid FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid WHERE psa.application_name = &#39;dspaceCli&#39;
</span></span><span style="display:flex;"><span>);
</span></span></code></pre></div><ul>
<li>For these fifty-eight locks there are only six queries running
<ul>
<li>Interestingly, they all started at either 04:00 or 05:00 this morning&hellip;</li>
</ul>
</li>
<li>I canceled one using <code>SELECT pg_cancel_backend(1098749);</code> and then two of the other PIDs died, perhaps they were dependent?
<ul>
<li>Then I canceled the next one and the remaining ones died also</li>
</ul>
</li>
<li>I exported the entire CGSpace and then ran the <code>fix-initiative-mappings.py</code> script, which found 124 items to be mapped
<ul>
<li>Getting only the items that have new mappings from the output file is currently tricky because you have to change the file to unix encoding, capture the diff output from the original, and re-add the column headers, but at least this makes the DSpace batch import have to check WAY fewer items</li>
<li>For the record, I used grep to get only the new lines:</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ grep -xvFf /tmp/orig.csv /tmp/cgspace-mappings.csv &gt; /tmp/2022-12-25-fix-mappings.csv
</span></span></code></pre></div><ul>
<li>Then I imported to CGSpace, and will start an AReS harvest once its done
<ul>
<li>The import process was quick but it triggered a lot of Solr updates and I see locks rising from dspaceCli again</li>
<li>After five hours the Solr updating from the metadata import wasn&rsquo;t finished, so I cancelled it, and I see that the items were <em>not</em> mapped&hellip;</li>
<li>I split the CSV into multiple files, each with ten items, and the first one imported, but the second went on to do Solr updating stuff forever&hellip;</li>
<li>All twelve files worked except the second one, so it must be something with one of those items&hellip;</li>
</ul>
</li>
<li>Now I started a harvest on AReS</li>
</ul>
<!-- raw HTML omitted -->

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
<meta property="og:updated_time" content="2022-12-21T20:39:09+02:00" />
<meta property="og:updated_time" content="2022-12-23T10:04:37+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-12-21T20:39:09+02:00" />
<meta property="og:updated_time" content="2022-12-23T10:04:37+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-12-21T20:39:09+02:00" />
<meta property="og:updated_time" content="2022-12-23T10:04:37+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-12-21T20:39:09+02:00" />
<meta property="og:updated_time" content="2022-12-23T10:04:37+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-12-21T20:39:09+02:00" />
<meta property="og:updated_time" content="2022-12-23T10:04:37+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-12-21T20:39:09+02:00" />
<meta property="og:updated_time" content="2022-12-23T10:04:37+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-12-21T20:39:09+02:00" />
<meta property="og:updated_time" content="2022-12-23T10:04:37+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-12-21T20:39:09+02:00" />
<meta property="og:updated_time" content="2022-12-23T10:04:37+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-12-21T20:39:09+02:00" />
<meta property="og:updated_time" content="2022-12-23T10:04:37+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-12-21T20:39:09+02:00" />
<meta property="og:updated_time" content="2022-12-23T10:04:37+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-12-21T20:39:09+02:00" />
<meta property="og:updated_time" content="2022-12-23T10:04:37+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-12-21T20:39:09+02:00" />
<meta property="og:updated_time" content="2022-12-23T10:04:37+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-12-21T20:39:09+02:00" />
<meta property="og:updated_time" content="2022-12-23T10:04:37+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-12-21T20:39:09+02:00" />
<meta property="og:updated_time" content="2022-12-23T10:04:37+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-12-21T20:39:09+02:00" />
<meta property="og:updated_time" content="2022-12-23T10:04:37+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-12-21T20:39:09+02:00" />
<meta property="og:updated_time" content="2022-12-23T10:04:37+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-12-21T20:39:09+02:00" />
<meta property="og:updated_time" content="2022-12-23T10:04:37+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-12-21T20:39:09+02:00" />
<meta property="og:updated_time" content="2022-12-23T10:04:37+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-12-21T20:39:09+02:00" />
<meta property="og:updated_time" content="2022-12-23T10:04:37+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-12-21T20:39:09+02:00" />
<meta property="og:updated_time" content="2022-12-23T10:04:37+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-12-21T20:39:09+02:00" />
<meta property="og:updated_time" content="2022-12-23T10:04:37+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-12-21T20:39:09+02:00" />
<meta property="og:updated_time" content="2022-12-23T10:04:37+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-12-21T20:39:09+02:00" />
<meta property="og:updated_time" content="2022-12-23T10:04:37+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-12-21T20:39:09+02:00" />
<meta property="og:updated_time" content="2022-12-23T10:04:37+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-12-21T20:39:09+02:00" />
<meta property="og:updated_time" content="2022-12-23T10:04:37+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-12-21T20:39:09+02:00" />
<meta property="og:updated_time" content="2022-12-23T10:04:37+02:00" />

View File

@ -3,19 +3,19 @@
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<url>
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
<lastmod>2022-12-21T20:39:09+02:00</lastmod>
<lastmod>2022-12-23T10:04:37+02:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2022-12-21T20:39:09+02:00</lastmod>
<lastmod>2022-12-23T10:04:37+02:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/2022-12/</loc>
<lastmod>2022-12-21T20:39:09+02:00</lastmod>
<lastmod>2022-12-23T10:04:37+02:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
<lastmod>2022-12-21T20:39:09+02:00</lastmod>
<lastmod>2022-12-23T10:04:37+02:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2022-12-21T20:39:09+02:00</lastmod>
<lastmod>2022-12-23T10:04:37+02:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/2022-11/</loc>
<lastmod>2022-12-03T10:46:29+03:00</lastmod>