Compare commits

...

2 Commits

Author SHA1 Message Date
9b4498de04
Add notes for 2022-02-14 2022-02-14 16:43:12 +03:00
e3109b7483
Update notes 2022-02-14 09:40:59 +03:00
26 changed files with 172 additions and 31 deletions

View File

@ -341,4 +341,75 @@ Total number of bot hits purged: 14696
- Peter asked me to add a new item type on CGSpace: Opinion Piece
- Map an item on CGSpace for Maria since she couldn't find it in the item mapper
## 2022-02-11
- CGSpace is slow and the load has been over 400% for a few hours
- The number of DSpace sessions seems normal, even lower than a few days ago
- The number of PostgreSQL connections is low, but I see there are lots of "AccessShare" locks (green on Munin, not blue like usual)
- I will run all system updates, copy the latest config changes, and restart the server
## 2022-02-12
- Install PostgreSQL 12 on my local dev environment to starting DSpace 6.x workflows with it:
```console
$ podman run --name dspacedb -v dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:12-alpine
$ createuser -h localhost -p 5432 -U postgres --pwprompt dspacetest
$ createdb -h localhost -p 5432 -U postgres -O dspacetest --encoding=UNICODE dspacetest
$ psql -h localhost -U postgres -c 'ALTER USER dspacetest SUPERUSER;'
$ pg_restore -h localhost -U postgres -d dspacetest -O --role=dspacetest -h localhost ~/Downloads/dspace-2022-02-12.backup
$ psql -h localhost -U postgres -c 'ALTER USER dspacetest NOSUPERUSER;'
```
- Eventually I will updated DSpace Test, then CGSpace (time to start paying off some technical debt!)
- Start a full Discovery re-index on CGSpace:
```console
$ time chrt -b 0 ionice -c2 -n7 nice -n19 dspace index-discovery -b
real 292m49.263s
user 201m26.097s
sys 3m2.459s
```
- Start a full harvest on AReS
## 2022-02-14
- Last week Gaia sent me her notes on the second batch of TAC/ICW documents (items 201400 in the spreadsheet)
- I created a filter in LibreOffice and selected the IDs for items with the action "delete", then I created a custom text facet in OpenRefine with this GREL:
```
or(
isNotNull(value.match('201')),
isNotNull(value.match('203')),
isNotNull(value.match('209')),
isNotNull(value.match('209')),
isNotNull(value.match('215')),
isNotNull(value.match('220')),
isNotNull(value.match('225')),
isNotNull(value.match('226')),
isNotNull(value.match('227')),
...
isNotNull(value.match('396'))
```
- Then I flagged all matching records and exported a CSV to use with SAFBuilder
- Then I imported the SAF bundle on DSpace Test:
```console
$ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" dspace import --add --eperson=fuuu@umm.com --source /tmp/SimpleArchiveFormat --mapfile=./2022-02-14-tac-batch2-201to400.map
```
- Export the next batch from OpenRefine (items with ID 401 to 700), check duplicates, and then join with the file names:
```console
$ csvcut -c id,dc.title,dcterms.issued,dcterms.type ~/Downloads/2022-01-21-CGSpace-TAC-ICW-batch3-401to700.csv > /tmp/tac3.csv
$ ./ilri/check-duplicates.py -i /tmp/tac3.csv -db dspacetest -u dspacetest -p 'dom@in34sniper' -o /tmp/2022-02-14-tac-batch3-401-700.csv
$ csvcut -c id,filename ~/Downloads/2022-01-21-CGSpace-TAC-ICW-batch3-401to700.csv > /tmp/tac3-filenames.csv
$ csvjoin -c id /tmp/2022-02-14-tac-batch3-401-700.csv /tmp/tac3-filenames.csv > /tmp/2022-02-14-tac-batch3-401-700-filenames.csv
```
- I sent these 300 items to Gaia...
<!-- vim: set sw=2 ts=2: -->

View File

@ -21,7 +21,7 @@ We agreed to try to do more alignment of affiliations/funders with ROR
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2022-02/" />
<meta property="article:published_time" content="2022-02-01T14:06:54+02:00" />
<meta property="article:modified_time" content="2022-02-10T20:35:40+03:00" />
<meta property="article:modified_time" content="2022-02-14T09:40:59+03:00" />
@ -48,9 +48,9 @@ We agreed to try to do more alignment of affiliations/funders with ROR
"@type": "BlogPosting",
"headline": "February, 2022",
"url": "https://alanorth.github.io/cgspace-notes/2022-02/",
"wordCount": "1838",
"wordCount": "2194",
"datePublished": "2022-02-01T14:06:54+02:00",
"dateModified": "2022-02-10T20:35:40+03:00",
"dateModified": "2022-02-14T09:40:59+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -462,6 +462,76 @@ Purging 217 hits from 1science in statistics
<li>Peter asked me to add a new item type on CGSpace: Opinion Piece</li>
<li>Map an item on CGSpace for Maria since she couldn&rsquo;t find it in the item mapper</li>
</ul>
<h2 id="2022-02-11">2022-02-11</h2>
<ul>
<li>CGSpace is slow and the load has been over 400% for a few hours
<ul>
<li>The number of DSpace sessions seems normal, even lower than a few days ago</li>
<li>The number of PostgreSQL connections is low, but I see there are lots of &ldquo;AccessShare&rdquo; locks (green on Munin, not blue like usual)</li>
<li>I will run all system updates, copy the latest config changes, and restart the server</li>
</ul>
</li>
</ul>
<h2 id="2022-02-12">2022-02-12</h2>
<ul>
<li>Install PostgreSQL 12 on my local dev environment to starting DSpace 6.x workflows with it:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ podman run --name dspacedb -v dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD<span style="color:#f92672">=</span>postgres -p 5432:5432 -d postgres:12-alpine
$ createuser -h localhost -p <span style="color:#ae81ff">5432</span> -U postgres --pwprompt dspacetest
$ createdb -h localhost -p <span style="color:#ae81ff">5432</span> -U postgres -O dspacetest --encoding<span style="color:#f92672">=</span>UNICODE dspacetest
$ psql -h localhost -U postgres -c <span style="color:#e6db74">&#39;ALTER USER dspacetest SUPERUSER;&#39;</span>
$ pg_restore -h localhost -U postgres -d dspacetest -O --role<span style="color:#f92672">=</span>dspacetest -h localhost ~/Downloads/dspace-2022-02-12.backup
$ psql -h localhost -U postgres -c <span style="color:#e6db74">&#39;ALTER USER dspacetest NOSUPERUSER;&#39;</span>
</code></pre></div><ul>
<li>Eventually I will updated DSpace Test, then CGSpace (time to start paying off some technical debt!)</li>
<li>Start a full Discovery re-index on CGSpace:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ time chrt -b <span style="color:#ae81ff">0</span> ionice -c2 -n7 nice -n19 dspace index-discovery -b
<span style="color:#960050;background-color:#1e0010">
</span><span style="color:#960050;background-color:#1e0010"></span>real 292m49.263s
user 201m26.097s
sys 3m2.459s
</code></pre></div><ul>
<li>Start a full harvest on AReS</li>
</ul>
<h2 id="2022-02-14">2022-02-14</h2>
<ul>
<li>Last week Gaia sent me her notes on the second batch of TAC/ICW documents (items 201400 in the spreadsheet)
<ul>
<li>I created a filter in LibreOffice and selected the IDs for items with the action &ldquo;delete&rdquo;, then I created a custom text facet in OpenRefine with this GREL:</li>
</ul>
</li>
</ul>
<pre tabindex="0"><code>or(
isNotNull(value.match('201')),
isNotNull(value.match('203')),
isNotNull(value.match('209')),
isNotNull(value.match('209')),
isNotNull(value.match('215')),
isNotNull(value.match('220')),
isNotNull(value.match('225')),
isNotNull(value.match('226')),
isNotNull(value.match('227')),
...
isNotNull(value.match('396'))
</code></pre><ul>
<li>Then I flagged all matching records and exported a CSV to use with SAFBuilder
<ul>
<li>Then I imported the SAF bundle on DSpace Test:</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ JAVA_OPTS<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;-Xmx1024m -Dfile.encoding=UTF-8&#34;</span> dspace import --add --eperson<span style="color:#f92672">=</span>fuuu@umm.com --source /tmp/SimpleArchiveFormat --mapfile<span style="color:#f92672">=</span>./2022-02-14-tac-batch2-201to400.map
</code></pre></div><ul>
<li>Export the next batch from OpenRefine (items with ID 401 to 700), check duplicates, and then join with the file names:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ csvcut -c id,dc.title,dcterms.issued,dcterms.type ~/Downloads/2022-01-21-CGSpace-TAC-ICW-batch3-401to700.csv &gt; /tmp/tac3.csv
$ ./ilri/check-duplicates.py -i /tmp/tac3.csv -db dspacetest -u dspacetest -p <span style="color:#e6db74">&#39;dom@in34sniper&#39;</span> -o /tmp/2022-02-14-tac-batch3-401-700.csv
$ csvcut -c id,filename ~/Downloads/2022-01-21-CGSpace-TAC-ICW-batch3-401to700.csv &gt; /tmp/tac3-filenames.csv
$ csvjoin -c id /tmp/2022-02-14-tac-batch3-401-700.csv /tmp/tac3-filenames.csv &gt; /tmp/2022-02-14-tac-batch3-401-700-filenames.csv
</code></pre></div><ul>
<li>I sent these 300 items to Gaia&hellip;</li>
</ul>
<!-- raw HTML omitted -->

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
<meta property="og:updated_time" content="2022-02-10T20:35:40+03:00" />
<meta property="og:updated_time" content="2022-02-14T09:40:59+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-02-10T20:35:40+03:00" />
<meta property="og:updated_time" content="2022-02-14T09:40:59+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-02-10T20:35:40+03:00" />
<meta property="og:updated_time" content="2022-02-14T09:40:59+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-02-10T20:35:40+03:00" />
<meta property="og:updated_time" content="2022-02-14T09:40:59+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-02-10T20:35:40+03:00" />
<meta property="og:updated_time" content="2022-02-14T09:40:59+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-02-10T20:35:40+03:00" />
<meta property="og:updated_time" content="2022-02-14T09:40:59+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2022-02-10T20:35:40+03:00" />
<meta property="og:updated_time" content="2022-02-14T09:40:59+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-02-10T20:35:40+03:00" />
<meta property="og:updated_time" content="2022-02-14T09:40:59+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-02-10T20:35:40+03:00" />
<meta property="og:updated_time" content="2022-02-14T09:40:59+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-02-10T20:35:40+03:00" />
<meta property="og:updated_time" content="2022-02-14T09:40:59+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-02-10T20:35:40+03:00" />
<meta property="og:updated_time" content="2022-02-14T09:40:59+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-02-10T20:35:40+03:00" />
<meta property="og:updated_time" content="2022-02-14T09:40:59+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-02-10T20:35:40+03:00" />
<meta property="og:updated_time" content="2022-02-14T09:40:59+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-02-10T20:35:40+03:00" />
<meta property="og:updated_time" content="2022-02-14T09:40:59+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2022-02-10T20:35:40+03:00" />
<meta property="og:updated_time" content="2022-02-14T09:40:59+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-02-10T20:35:40+03:00" />
<meta property="og:updated_time" content="2022-02-14T09:40:59+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-02-10T20:35:40+03:00" />
<meta property="og:updated_time" content="2022-02-14T09:40:59+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-02-10T20:35:40+03:00" />
<meta property="og:updated_time" content="2022-02-14T09:40:59+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-02-10T20:35:40+03:00" />
<meta property="og:updated_time" content="2022-02-14T09:40:59+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-02-10T20:35:40+03:00" />
<meta property="og:updated_time" content="2022-02-14T09:40:59+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-02-10T20:35:40+03:00" />
<meta property="og:updated_time" content="2022-02-14T09:40:59+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-02-10T20:35:40+03:00" />
<meta property="og:updated_time" content="2022-02-14T09:40:59+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2022-02-10T20:35:40+03:00" />
<meta property="og:updated_time" content="2022-02-14T09:40:59+03:00" />

View File

@ -3,19 +3,19 @@
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<url>
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
<lastmod>2022-02-10T20:35:40+03:00</lastmod>
<lastmod>2022-02-14T09:40:59+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2022-02-10T20:35:40+03:00</lastmod>
<lastmod>2022-02-14T09:40:59+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/2022-02/</loc>
<lastmod>2022-02-10T20:35:40+03:00</lastmod>
<lastmod>2022-02-14T09:40:59+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
<lastmod>2022-02-10T20:35:40+03:00</lastmod>
<lastmod>2022-02-14T09:40:59+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2022-02-10T20:35:40+03:00</lastmod>
<lastmod>2022-02-14T09:40:59+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/2022-01/</loc>
<lastmod>2022-02-07T09:49:34+03:00</lastmod>