mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2021-11-08
This commit is contained in:
@ -10,14 +10,14 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />
|
||||
<meta property="og:updated_time" content="2021-11-07T11:26:32+02:00" />
|
||||
|
||||
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.88.1" />
|
||||
<meta name="generator" content="Hugo 0.89.2" />
|
||||
|
||||
|
||||
|
||||
@ -110,9 +110,9 @@
|
||||
<li>I experimented with manually sharding the Solr statistics on DSpace Test</li>
|
||||
<li>First I exported all the 2019 stats from CGSpace:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ ./run.sh -s http://localhost:8081/solr/statistics -f 'time:2019-*' -a export -o statistics-2019.json -k uid
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ./run.sh -s http://localhost:8081/solr/statistics -f <span style="color:#e6db74">'time:2019-*'</span> -a export -o statistics-2019.json -k uid
|
||||
$ zstd statistics-2019.json
|
||||
</code></pre>
|
||||
</code></pre></div>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2021-11/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -134,15 +134,15 @@ $ zstd statistics-2019.json
|
||||
<ul>
|
||||
<li>Export all affiliations on CGSpace and run them against the latest RoR data dump:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">localhost/dspace63= > \COPY (SELECT DISTINCT text_value as "cg.contributor.affiliation", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC) to /tmp/2021-10-01-affiliations.csv WITH CSV HEADER;
|
||||
$ csvcut -c 1 /tmp/2021-10-01-affiliations.csv | sed 1d > /tmp/2021-10-01-affiliations.txt
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">localhost/dspace63= > \COPY (SELECT DISTINCT text_value as "cg.contributor.affiliation", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC) to /tmp/2021-10-01-affiliations.csv WITH CSV HEADER;
|
||||
$ csvcut -c <span style="color:#ae81ff">1</span> /tmp/2021-10-01-affiliations.csv | sed 1d > /tmp/2021-10-01-affiliations.txt
|
||||
$ ./ilri/ror-lookup.py -i /tmp/2021-10-01-affiliations.txt -r 2021-09-23-ror-data.json -o /tmp/2021-10-01-affili
|
||||
ations-matching.csv
|
||||
$ csvgrep -c matched -m true /tmp/2021-10-01-affiliations-matching.csv | sed 1d | wc -l
|
||||
1879
|
||||
$ wc -l /tmp/2021-10-01-affiliations.txt
|
||||
7100 /tmp/2021-10-01-affiliations.txt
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>So we have 1879/7100 (26.46%) matching already</li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2021-10/'>Read more →</a>
|
||||
@ -199,8 +199,8 @@ $ wc -l /tmp/2021-10-01-affiliations.txt
|
||||
<ul>
|
||||
<li>Update Docker images on AReS server (linode20) and reboot the server:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console"># docker images | grep -v ^REPO | sed 's/ \+/:/g' | cut -d: -f1,2 | grep -v none | xargs -L1 docker pull
|
||||
</code></pre><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># docker images | grep -v ^REPO | sed <span style="color:#e6db74">'s/ \+/:/g'</span> | cut -d: -f1,2 | grep -v none | xargs -L1 docker pull
|
||||
</code></pre></div><ul>
|
||||
<li>I decided to upgrade linode20 from Ubuntu 18.04 to 20.04</li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2021-08/'>Read more →</a>
|
||||
@ -224,9 +224,9 @@ $ wc -l /tmp/2021-10-01-affiliations.txt
|
||||
<ul>
|
||||
<li>Export another list of ALL subjects on CGSpace, including AGROVOC and non-AGROVOC for Enrico:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">localhost/dspace63= > \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-01-all-subjects.csv WITH CSV HEADER;
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">localhost/dspace63= > \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-01-all-subjects.csv WITH CSV HEADER;
|
||||
COPY 20994
|
||||
</code></pre>
|
||||
</code></pre></div>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2021-07/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
|
@ -18,9 +18,9 @@
|
||||
<li>I experimented with manually sharding the Solr statistics on DSpace Test</li>
|
||||
<li>First I exported all the 2019 stats from CGSpace:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ ./run.sh -s http://localhost:8081/solr/statistics -f 'time:2019-*' -a export -o statistics-2019.json -k uid
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ./run.sh -s http://localhost:8081/solr/statistics -f <span style="color:#e6db74">&#39;time:2019-*&#39;</span> -a export -o statistics-2019.json -k uid
|
||||
$ zstd statistics-2019.json
|
||||
</code></pre></description>
|
||||
</code></pre></div></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -33,15 +33,15 @@ $ zstd statistics-2019.json
|
||||
<ul>
|
||||
<li>Export all affiliations on CGSpace and run them against the latest RoR data dump:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">localhost/dspace63= &gt; \COPY (SELECT DISTINCT text_value as &quot;cg.contributor.affiliation&quot;, count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC) to /tmp/2021-10-01-affiliations.csv WITH CSV HEADER;
|
||||
$ csvcut -c 1 /tmp/2021-10-01-affiliations.csv | sed 1d &gt; /tmp/2021-10-01-affiliations.txt
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">localhost/dspace63= &gt; \COPY (SELECT DISTINCT text_value as &#34;cg.contributor.affiliation&#34;, count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC) to /tmp/2021-10-01-affiliations.csv WITH CSV HEADER;
|
||||
$ csvcut -c <span style="color:#ae81ff">1</span> /tmp/2021-10-01-affiliations.csv | sed 1d &gt; /tmp/2021-10-01-affiliations.txt
|
||||
$ ./ilri/ror-lookup.py -i /tmp/2021-10-01-affiliations.txt -r 2021-09-23-ror-data.json -o /tmp/2021-10-01-affili
|
||||
ations-matching.csv
|
||||
$ csvgrep -c matched -m true /tmp/2021-10-01-affiliations-matching.csv | sed 1d | wc -l
|
||||
1879
|
||||
$ wc -l /tmp/2021-10-01-affiliations.txt
|
||||
7100 /tmp/2021-10-01-affiliations.txt
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>So we have 1879/7100 (26.46%) matching already</li>
|
||||
</ul></description>
|
||||
</item>
|
||||
@ -80,8 +80,8 @@ $ wc -l /tmp/2021-10-01-affiliations.txt
|
||||
<ul>
|
||||
<li>Update Docker images on AReS server (linode20) and reboot the server:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console"># docker images | grep -v ^REPO | sed 's/ \+/:/g' | cut -d: -f1,2 | grep -v none | xargs -L1 docker pull
|
||||
</code></pre><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># docker images | grep -v ^REPO | sed <span style="color:#e6db74">&#39;s/ \+/:/g&#39;</span> | cut -d: -f1,2 | grep -v none | xargs -L1 docker pull
|
||||
</code></pre></div><ul>
|
||||
<li>I decided to upgrade linode20 from Ubuntu 18.04 to 20.04</li>
|
||||
</ul></description>
|
||||
</item>
|
||||
@ -96,9 +96,9 @@ $ wc -l /tmp/2021-10-01-affiliations.txt
|
||||
<ul>
|
||||
<li>Export another list of ALL subjects on CGSpace, including AGROVOC and non-AGROVOC for Enrico:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">localhost/dspace63= &gt; \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-01-all-subjects.csv WITH CSV HEADER;
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">localhost/dspace63= &gt; \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-01-all-subjects.csv WITH CSV HEADER;
|
||||
COPY 20994
|
||||
</code></pre></description>
|
||||
</code></pre></div></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -203,17 +203,17 @@ COPY 20994
|
||||
<li>I had a call with CodeObia to discuss the work on OpenRXV</li>
|
||||
<li>Check the results of the AReS harvesting from last night:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*&amp;pretty'
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ curl -s <span style="color:#e6db74">&#39;http://localhost:9200/openrxv-items-temp/_count?q=*&amp;pretty&#39;</span>
|
||||
{
|
||||
&quot;count&quot; : 100875,
|
||||
&quot;_shards&quot; : {
|
||||
&quot;total&quot; : 1,
|
||||
&quot;successful&quot; : 1,
|
||||
&quot;skipped&quot; : 0,
|
||||
&quot;failed&quot; : 0
|
||||
&#34;count&#34; : 100875,
|
||||
&#34;_shards&#34; : {
|
||||
&#34;total&#34; : 1,
|
||||
&#34;successful&#34; : 1,
|
||||
&#34;skipped&#34; : 0,
|
||||
&#34;failed&#34; : 0
|
||||
}
|
||||
}
|
||||
</code></pre></description>
|
||||
</code></pre></div></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
|
@ -10,14 +10,14 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />
|
||||
<meta property="og:updated_time" content="2021-11-07T11:26:32+02:00" />
|
||||
|
||||
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.88.1" />
|
||||
<meta name="generator" content="Hugo 0.89.2" />
|
||||
|
||||
|
||||
|
||||
@ -116,17 +116,17 @@
|
||||
<li>I had a call with CodeObia to discuss the work on OpenRXV</li>
|
||||
<li>Check the results of the AReS harvesting from last night:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*&pretty'
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ curl -s <span style="color:#e6db74">'http://localhost:9200/openrxv-items-temp/_count?q=*&pretty'</span>
|
||||
{
|
||||
"count" : 100875,
|
||||
"_shards" : {
|
||||
"total" : 1,
|
||||
"successful" : 1,
|
||||
"skipped" : 0,
|
||||
"failed" : 0
|
||||
"count" : 100875,
|
||||
"_shards" : {
|
||||
"total" : 1,
|
||||
"successful" : 1,
|
||||
"skipped" : 0,
|
||||
"failed" : 0
|
||||
}
|
||||
}
|
||||
</code></pre>
|
||||
</code></pre></div>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2021-02/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
|
@ -10,14 +10,14 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />
|
||||
<meta property="og:updated_time" content="2021-11-07T11:26:32+02:00" />
|
||||
|
||||
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.88.1" />
|
||||
<meta name="generator" content="Hugo 0.89.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,14 +10,14 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />
|
||||
<meta property="og:updated_time" content="2021-11-07T11:26:32+02:00" />
|
||||
|
||||
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.88.1" />
|
||||
<meta name="generator" content="Hugo 0.89.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,14 +10,14 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />
|
||||
<meta property="og:updated_time" content="2021-11-07T11:26:32+02:00" />
|
||||
|
||||
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.88.1" />
|
||||
<meta name="generator" content="Hugo 0.89.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,14 +10,14 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />
|
||||
<meta property="og:updated_time" content="2021-11-07T11:26:32+02:00" />
|
||||
|
||||
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.88.1" />
|
||||
<meta name="generator" content="Hugo 0.89.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,14 +10,14 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />
|
||||
<meta property="og:updated_time" content="2021-11-07T11:26:32+02:00" />
|
||||
|
||||
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.88.1" />
|
||||
<meta name="generator" content="Hugo 0.89.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,14 +10,14 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />
|
||||
<meta property="og:updated_time" content="2021-11-07T11:26:32+02:00" />
|
||||
|
||||
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.88.1" />
|
||||
<meta name="generator" content="Hugo 0.89.2" />
|
||||
|
||||
|
||||
|
||||
|
Reference in New Issue
Block a user