mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2022-03-04
This commit is contained in:
@ -24,7 +24,7 @@ Start a full harvest on AReS
|
||||
|
||||
Start a full harvest on AReS
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.92.2" />
|
||||
<meta name="generator" content="Hugo 0.93.1" />
|
||||
|
||||
|
||||
|
||||
@ -122,12 +122,12 @@ Start a full harvest on AReS
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ cat 2022-01-06-add-orcids.csv
|
||||
dc.contributor.author,cg.creator.identifier
|
||||
"Jones, Chris","Chris Jones: 0000-0001-9096-9728"
|
||||
"Jones, Christopher S.","Chris Jones: 0000-0001-9096-9728"
|
||||
$ ./ilri/add-orcid-identifiers-csv.py -i 2022-01-06-add-orcids.csv -db dspace63 -u dspacetest -p <span style="color:#e6db74">'dom@in34sniper'</span>
|
||||
</code></pre></div><h2 id="2022-01-09">2022-01-09</h2>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ cat 2022-01-06-add-orcids.csv
|
||||
</span></span><span style="display:flex;"><span>dc.contributor.author,cg.creator.identifier
|
||||
</span></span><span style="display:flex;"><span>"Jones, Chris","Chris Jones: 0000-0001-9096-9728"
|
||||
</span></span><span style="display:flex;"><span>"Jones, Christopher S.","Chris Jones: 0000-0001-9096-9728"
|
||||
</span></span><span style="display:flex;"><span>$ ./ilri/add-orcid-identifiers-csv.py -i 2022-01-06-add-orcids.csv -db dspace63 -u dspacetest -p <span style="color:#e6db74">'dom@in34sniper'</span>
|
||||
</span></span></code></pre></div><h2 id="2022-01-09">2022-01-09</h2>
|
||||
<ul>
|
||||
<li>Validate and register CGSpace on <a href="https://www.openarchives.org/Register/ValidateSite?log=Z2V7WCT7">OpenArchives</a>
|
||||
<ul>
|
||||
@ -147,21 +147,21 @@ $ ./ilri/add-orcid-identifiers-csv.py -i 2022-01-06-add-orcids.csv -db dspace63
|
||||
<ul>
|
||||
<li>I tried to re-build the Docker image for OpenRXV and got an error in the backend:</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">...
|
||||
> openrxv-backend@0.0.1 build
|
||||
> nest build
|
||||
<span style="color:#960050;background-color:#1e0010">
|
||||
</span><span style="color:#960050;background-color:#1e0010"></span>node_modules/@elastic/elasticsearch/api/types.d.ts:2454:13 - error TS2456: Type alias 'AggregationsAggregate' circularly references itself.
|
||||
<span style="color:#960050;background-color:#1e0010">
|
||||
</span><span style="color:#960050;background-color:#1e0010"></span>2454 export type AggregationsAggregate = AggregationsSingleBucketAggregate | AggregationsAutoDateHistogramAggregate | AggregationsFiltersAggregate | AggregationsSignificantTermsAggregate<any> | AggregationsTermsAggregate<any> | AggregationsBucketAggregate | AggregationsCompositeBucketAggregate | AggregationsMultiBucketAggregate<AggregationsBucket> | AggregationsMatrixStatsAggregate | AggregationsKeyedValueAggregate | AggregationsMetricAggregate
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
node_modules/@elastic/elasticsearch/api/types.d.ts:3209:13 - error TS2456: Type alias 'AggregationsSingleBucketAggregate' circularly references itself.
|
||||
<span style="color:#960050;background-color:#1e0010">
|
||||
</span><span style="color:#960050;background-color:#1e0010"></span>3209 export type AggregationsSingleBucketAggregate = AggregationsSingleBucketAggregateKeys
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
<span style="color:#960050;background-color:#1e0010">
|
||||
</span><span style="color:#960050;background-color:#1e0010"></span>Found 2 error(s).
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>...
|
||||
</span></span><span style="display:flex;"><span>> openrxv-backend@0.0.1 build
|
||||
</span></span><span style="display:flex;"><span>> nest build
|
||||
</span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
|
||||
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>node_modules/@elastic/elasticsearch/api/types.d.ts:2454:13 - error TS2456: Type alias 'AggregationsAggregate' circularly references itself.
|
||||
</span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
|
||||
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>2454 export type AggregationsAggregate = AggregationsSingleBucketAggregate | AggregationsAutoDateHistogramAggregate | AggregationsFiltersAggregate | AggregationsSignificantTermsAggregate<any> | AggregationsTermsAggregate<any> | AggregationsBucketAggregate | AggregationsCompositeBucketAggregate | AggregationsMultiBucketAggregate<AggregationsBucket> | AggregationsMatrixStatsAggregate | AggregationsKeyedValueAggregate | AggregationsMetricAggregate
|
||||
</span></span><span style="display:flex;"><span> ~~~~~~~~~~~~~~~~~~~~~
|
||||
</span></span><span style="display:flex;"><span>node_modules/@elastic/elasticsearch/api/types.d.ts:3209:13 - error TS2456: Type alias 'AggregationsSingleBucketAggregate' circularly references itself.
|
||||
</span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
|
||||
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>3209 export type AggregationsSingleBucketAggregate = AggregationsSingleBucketAggregateKeys
|
||||
</span></span><span style="display:flex;"><span> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
</span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
|
||||
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>Found 2 error(s).
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>Ah, it seems the code on the server was slightly out of date
|
||||
<ul>
|
||||
<li>I checked out the latest master branch and it built</li>
|
||||
@ -180,20 +180,20 @@ node_modules/@elastic/elasticsearch/api/types.d.ts:3209:13 - error TS2456: Type
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ psql -c <span style="color:#e6db74">"SELECT application_name FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid"</span> | sort | uniq -c | sort -n
|
||||
1
|
||||
1 ------------------
|
||||
1 (3506 rows)
|
||||
1 application_name
|
||||
9 psql
|
||||
10
|
||||
3487 dspaceWeb
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ psql -c <span style="color:#e6db74">"SELECT application_name FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid"</span> | sort | uniq -c | sort -n
|
||||
</span></span><span style="display:flex;"><span> 1
|
||||
</span></span><span style="display:flex;"><span> 1 ------------------
|
||||
</span></span><span style="display:flex;"><span> 1 (3506 rows)
|
||||
</span></span><span style="display:flex;"><span> 1 application_name
|
||||
</span></span><span style="display:flex;"><span> 9 psql
|
||||
</span></span><span style="display:flex;"><span> 10
|
||||
</span></span><span style="display:flex;"><span> 3487 dspaceWeb
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>As before, I see messages from PostgreSQL about processes waiting for locks since I enabled the <code>log_lock_waits</code> setting last month:</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ grep -E <span style="color:#e6db74">'^2022-01*'</span> /var/log/postgresql/postgresql-10-main.log | grep -c <span style="color:#e6db74">'still waiting for'</span>
|
||||
12
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ grep -E <span style="color:#e6db74">'^2022-01*'</span> /var/log/postgresql/postgresql-10-main.log | grep -c <span style="color:#e6db74">'still waiting for'</span>
|
||||
</span></span><span style="display:flex;"><span>12
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>I set a system alert on DSpace and then restarted the server</li>
|
||||
</ul>
|
||||
<h2 id="2022-01-20">2022-01-20</h2>
|
||||
@ -204,8 +204,8 @@ node_modules/@elastic/elasticsearch/api/types.d.ts:3209:13 - error TS2456: Type
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ JAVA_OPTS<span style="color:#f92672">=</span><span style="color:#e6db74">"-Xmx1024m -Dfile.encoding=UTF-8"</span> dspace import --add --eperson<span style="color:#f92672">=</span>aorth@mjanja.ch --source /tmp/SimpleArchiveFormat --mapfile<span style="color:#f92672">=</span>./2022-01-20-green-covers.map
|
||||
</code></pre></div><h2 id="2022-01-21">2022-01-21</h2>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ JAVA_OPTS<span style="color:#f92672">=</span><span style="color:#e6db74">"-Xmx1024m -Dfile.encoding=UTF-8"</span> dspace import --add --eperson<span style="color:#f92672">=</span>aorth@mjanja.ch --source /tmp/SimpleArchiveFormat --mapfile<span style="color:#f92672">=</span>./2022-01-20-green-covers.map
|
||||
</span></span></code></pre></div><h2 id="2022-01-21">2022-01-21</h2>
|
||||
<ul>
|
||||
<li>Start working on the rest of the ~980 CGIAR TAC and ICW documents from Gaia
|
||||
<ul>
|
||||
@ -243,21 +243,21 @@ node_modules/@elastic/elasticsearch/api/types.d.ts:3209:13 - error TS2456: Type
|
||||
</li>
|
||||
<li>Normalize the metadata <code>text_lang</code> attributes on CGSpace database:</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">dspace=# SELECT DISTINCT text_lang, count(text_lang) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) GROUP BY text_lang ORDER BY count DESC;
|
||||
text_lang | count
|
||||
-----------+---------
|
||||
en_US | 2803350
|
||||
en | 6232
|
||||
| 3200
|
||||
fr | 2
|
||||
vn | 2
|
||||
92 | 1
|
||||
sp | 1
|
||||
| 0
|
||||
(8 rows)
|
||||
dspace=# UPDATE metadatavalue SET text_lang='en_US' WHERE dspace_object_id IN (SELECT uuid FROM item) AND text_lang IN ('en', '92', '');
|
||||
UPDATE 9433
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>dspace=# SELECT DISTINCT text_lang, count(text_lang) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) GROUP BY text_lang ORDER BY count DESC;
|
||||
</span></span><span style="display:flex;"><span> text_lang | count
|
||||
</span></span><span style="display:flex;"><span>-----------+---------
|
||||
</span></span><span style="display:flex;"><span> en_US | 2803350
|
||||
</span></span><span style="display:flex;"><span> en | 6232
|
||||
</span></span><span style="display:flex;"><span> | 3200
|
||||
</span></span><span style="display:flex;"><span> fr | 2
|
||||
</span></span><span style="display:flex;"><span> vn | 2
|
||||
</span></span><span style="display:flex;"><span> 92 | 1
|
||||
</span></span><span style="display:flex;"><span> sp | 1
|
||||
</span></span><span style="display:flex;"><span> | 0
|
||||
</span></span><span style="display:flex;"><span>(8 rows)
|
||||
</span></span><span style="display:flex;"><span>dspace=# UPDATE metadatavalue SET text_lang='en_US' WHERE dspace_object_id IN (SELECT uuid FROM item) AND text_lang IN ('en', '92', '');
|
||||
</span></span><span style="display:flex;"><span>UPDATE 9433
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>Then export the WLE Journal Articles collection again so there are fewer columns to mess with</li>
|
||||
</ul>
|
||||
<h2 id="2022-01-26">2022-01-26</h2>
|
||||
@ -273,7 +273,7 @@ UPDATE 9433
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>cells['dcterms.bibliographicCitation[en_US]'].value.split("doi: ")[1]
|
||||
<pre tabindex="0"><code>cells['dcterms.bibliographicCitation[en_US]'].value.split("doi: ")[1]
|
||||
</code></pre><ul>
|
||||
<li>I also spent a bit of time cleaning up ILRI Journal Articles, but I notice that we don’t put DOIs in the citation so it’s not possible to fix items that are missing DOIs that way
|
||||
<ul>
|
||||
@ -286,17 +286,17 @@ UPDATE 9433
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ psql -c <span style="color:#e6db74">"SELECT application_name FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid"</span> | sort | uniq -c | sort -n
|
||||
1
|
||||
1 ------------------
|
||||
1 (537 rows)
|
||||
1 application_name
|
||||
9 psql
|
||||
51 dspaceApi
|
||||
477 dspaceWeb
|
||||
$ grep -E <span style="color:#e6db74">'^2022-01*'</span> /var/log/postgresql/postgresql-10-main.log | grep -c <span style="color:#e6db74">'still waiting for'</span>
|
||||
3
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ psql -c <span style="color:#e6db74">"SELECT application_name FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid"</span> | sort | uniq -c | sort -n
|
||||
</span></span><span style="display:flex;"><span> 1
|
||||
</span></span><span style="display:flex;"><span> 1 ------------------
|
||||
</span></span><span style="display:flex;"><span> 1 (537 rows)
|
||||
</span></span><span style="display:flex;"><span> 1 application_name
|
||||
</span></span><span style="display:flex;"><span> 9 psql
|
||||
</span></span><span style="display:flex;"><span> 51 dspaceApi
|
||||
</span></span><span style="display:flex;"><span> 477 dspaceWeb
|
||||
</span></span><span style="display:flex;"><span>$ grep -E <span style="color:#e6db74">'^2022-01*'</span> /var/log/postgresql/postgresql-10-main.log | grep -c <span style="color:#e6db74">'still waiting for'</span>
|
||||
</span></span><span style="display:flex;"><span>3
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>I set a system alert on CGSpace and then restarted Tomcat and PostgreSQL
|
||||
<ul>
|
||||
<li>The issue in Francesca’s case was actually that someone had taken the task, not that PostgreSQL transactions were locked!</li>
|
||||
@ -344,19 +344,19 @@ $ grep -E <span style="color:#e6db74">'^2022-01*'</span> /var/log/postgr
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">value.contains(/:\s?\d+(-|–)\d+/)
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>value.contains(/:\s?\d+(-|–)\d+/)
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>Then I faceted by blank on <code>dcterms.extent</code> and did a transform to extract the page information for over 1,000 items!</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">'p. ' +
|
||||
cells['dcterms.bibliographicCitation[en_US]'].value.match(/.*:\s?(\d+)(-|–)(\d+).*/)[0] +
|
||||
'-' +
|
||||
cells['dcterms.bibliographicCitation[en_US]'].value.match(/.*:\s?(\d+)(-|–)(\d+).*/)[2]
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>'p. ' +
|
||||
</span></span><span style="display:flex;"><span>cells['dcterms.bibliographicCitation[en_US]'].value.match(/.*:\s?(\d+)(-|–)(\d+).*/)[0] +
|
||||
</span></span><span style="display:flex;"><span>'-' +
|
||||
</span></span><span style="display:flex;"><span>cells['dcterms.bibliographicCitation[en_US]'].value.match(/.*:\s?(\d+)(-|–)(\d+).*/)[2]
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>Then I did similar for <code>cg.volume</code> and <code>cg.issue</code>, also based on the citation, for example to extract the “16” from “Journal of Blah 16(1)”, where “16” is the second capture group in a zero-based match:</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">cells['dcterms.bibliographicCitation[en_US]'].value.match(/.*( |;)(\d+)\((\d+)\).*/)[1]
|
||||
</code></pre></div><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>cells['dcterms.bibliographicCitation[en_US]'].value.match(/.*( |;)(\d+)\((\d+)\).*/)[1]
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>This was 3,000 items so I imported the changes on CGSpace 1,000 at a time…</li>
|
||||
</ul>
|
||||
<!-- raw HTML omitted -->
|
||||
|
Reference in New Issue
Block a user