mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2021-11-08
This commit is contained in:
@ -10,14 +10,14 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2021-11-03T15:56:15+02:00" />
|
||||
<meta property="og:updated_time" content="2021-11-07T11:26:32+02:00" />
|
||||
|
||||
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.88.1" />
|
||||
<meta name="generator" content="Hugo 0.89.2" />
|
||||
|
||||
|
||||
|
||||
@ -110,9 +110,9 @@
|
||||
<li>I experimented with manually sharding the Solr statistics on DSpace Test</li>
|
||||
<li>First I exported all the 2019 stats from CGSpace:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ ./run.sh -s http://localhost:8081/solr/statistics -f 'time:2019-*' -a export -o statistics-2019.json -k uid
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ./run.sh -s http://localhost:8081/solr/statistics -f <span style="color:#e6db74">'time:2019-*'</span> -a export -o statistics-2019.json -k uid
|
||||
$ zstd statistics-2019.json
|
||||
</code></pre>
|
||||
</code></pre></div>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2021-11/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -134,15 +134,15 @@ $ zstd statistics-2019.json
|
||||
<ul>
|
||||
<li>Export all affiliations on CGSpace and run them against the latest RoR data dump:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">localhost/dspace63= > \COPY (SELECT DISTINCT text_value as "cg.contributor.affiliation", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC) to /tmp/2021-10-01-affiliations.csv WITH CSV HEADER;
|
||||
$ csvcut -c 1 /tmp/2021-10-01-affiliations.csv | sed 1d > /tmp/2021-10-01-affiliations.txt
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">localhost/dspace63= > \COPY (SELECT DISTINCT text_value as "cg.contributor.affiliation", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC) to /tmp/2021-10-01-affiliations.csv WITH CSV HEADER;
|
||||
$ csvcut -c <span style="color:#ae81ff">1</span> /tmp/2021-10-01-affiliations.csv | sed 1d > /tmp/2021-10-01-affiliations.txt
|
||||
$ ./ilri/ror-lookup.py -i /tmp/2021-10-01-affiliations.txt -r 2021-09-23-ror-data.json -o /tmp/2021-10-01-affili
|
||||
ations-matching.csv
|
||||
$ csvgrep -c matched -m true /tmp/2021-10-01-affiliations-matching.csv | sed 1d | wc -l
|
||||
1879
|
||||
$ wc -l /tmp/2021-10-01-affiliations.txt
|
||||
7100 /tmp/2021-10-01-affiliations.txt
|
||||
</code></pre><ul>
|
||||
</code></pre></div><ul>
|
||||
<li>So we have 1879/7100 (26.46%) matching already</li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2021-10/'>Read more →</a>
|
||||
@ -199,8 +199,8 @@ $ wc -l /tmp/2021-10-01-affiliations.txt
|
||||
<ul>
|
||||
<li>Update Docker images on AReS server (linode20) and reboot the server:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console"># docker images | grep -v ^REPO | sed 's/ \+/:/g' | cut -d: -f1,2 | grep -v none | xargs -L1 docker pull
|
||||
</code></pre><ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># docker images | grep -v ^REPO | sed <span style="color:#e6db74">'s/ \+/:/g'</span> | cut -d: -f1,2 | grep -v none | xargs -L1 docker pull
|
||||
</code></pre></div><ul>
|
||||
<li>I decided to upgrade linode20 from Ubuntu 18.04 to 20.04</li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2021-08/'>Read more →</a>
|
||||
@ -224,9 +224,9 @@ $ wc -l /tmp/2021-10-01-affiliations.txt
|
||||
<ul>
|
||||
<li>Export another list of ALL subjects on CGSpace, including AGROVOC and non-AGROVOC for Enrico:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">localhost/dspace63= > \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-01-all-subjects.csv WITH CSV HEADER;
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">localhost/dspace63= > \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-01-all-subjects.csv WITH CSV HEADER;
|
||||
COPY 20994
|
||||
</code></pre>
|
||||
</code></pre></div>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2021-07/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
|
Reference in New Issue
Block a user