Add notes for 2022-03-04

This commit is contained in:
2022-03-04 15:30:06 +03:00
parent 7453499827
commit 27acbac859
115 changed files with 6550 additions and 6444 deletions

View File

@ -32,7 +32,7 @@ Update Docker images on AReS server (linode20) and reboot the server:
I decided to upgrade linode20 from Ubuntu 18.04 to 20.04
"/>
<meta name="generator" content="Hugo 0.92.2" />
<meta name="generator" content="Hugo 0.93.1" />
@ -122,37 +122,37 @@ I decided to upgrade linode20 from Ubuntu 18.04 to 20.04
<ul>
<li>Update Docker images on AReS server (linode20) and reboot the server:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># docker images | grep -v ^REPO | sed <span style="color:#e6db74">&#39;s/ \+/:/g&#39;</span> | cut -d: -f1,2 | grep -v none | xargs -L1 docker pull
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># docker images | grep -v ^REPO | sed <span style="color:#e6db74">&#39;s/ \+/:/g&#39;</span> | cut -d: -f1,2 | grep -v none | xargs -L1 docker pull
</span></span></code></pre></div><ul>
<li>I decided to upgrade linode20 from Ubuntu 18.04 to 20.04</li>
</ul>
<ul>
<li>First running all existing updates, taking some backups, checking for broken packages, and then rebooting:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># apt update <span style="color:#f92672">&amp;&amp;</span> apt dist-upgrade
# apt autoremove <span style="color:#f92672">&amp;&amp;</span> apt autoclean
# check <span style="color:#66d9ef">for</span> any packages with residual configs we can purge
# dpkg -l | grep -E <span style="color:#e6db74">&#39;^rc&#39;</span> | awk <span style="color:#e6db74">&#39;{print $2}&#39;</span>
# dpkg -l | grep -E <span style="color:#e6db74">&#39;^rc&#39;</span> | awk <span style="color:#e6db74">&#39;{print $2}&#39;</span> | xargs dpkg -P
# dpkg -C
# dpkg -l &gt; 2021-08-01-linode20-dpkg.txt
# tar -I zstd -cvf 2021-08-01-etc.tar.zst /etc
# reboot
# sed -i <span style="color:#e6db74">&#39;s/bionic/focal/&#39;</span> /etc/apt/sources.list.d/*.list
# <span style="color:#66d9ef">do</span>-release-upgrade
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># apt update <span style="color:#f92672">&amp;&amp;</span> apt dist-upgrade
</span></span><span style="display:flex;"><span># apt autoremove <span style="color:#f92672">&amp;&amp;</span> apt autoclean
</span></span><span style="display:flex;"><span># check <span style="color:#66d9ef">for</span> any packages with residual configs we can purge
</span></span><span style="display:flex;"><span># dpkg -l | grep -E <span style="color:#e6db74">&#39;^rc&#39;</span> | awk <span style="color:#e6db74">&#39;{print $2}&#39;</span>
</span></span><span style="display:flex;"><span># dpkg -l | grep -E <span style="color:#e6db74">&#39;^rc&#39;</span> | awk <span style="color:#e6db74">&#39;{print $2}&#39;</span> | xargs dpkg -P
</span></span><span style="display:flex;"><span># dpkg -C
</span></span><span style="display:flex;"><span># dpkg -l &gt; 2021-08-01-linode20-dpkg.txt
</span></span><span style="display:flex;"><span># tar -I zstd -cvf 2021-08-01-etc.tar.zst /etc
</span></span><span style="display:flex;"><span># reboot
</span></span><span style="display:flex;"><span># sed -i <span style="color:#e6db74">&#39;s/bionic/focal/&#39;</span> /etc/apt/sources.list.d/*.list
</span></span><span style="display:flex;"><span># <span style="color:#66d9ef">do</span>-release-upgrade
</span></span></code></pre></div><ul>
<li>&hellip; but of course it hit <a href="https://bugs.launchpad.net/ubuntu/+source/libxcrypt/+bug/1903838">the libxcrypt bug</a></li>
<li>I had to get a copy of libcrypt.so.1.1.0 from a working Ubuntu 20.04 system and finish the upgrade manually</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># apt install -f
# apt dist-upgrade
# reboot
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># apt install -f
</span></span><span style="display:flex;"><span># apt dist-upgrade
</span></span><span style="display:flex;"><span># reboot
</span></span></code></pre></div><ul>
<li>After rebooting I purged all packages with residual configs and cleaned up again:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># dpkg -l | grep -E <span style="color:#e6db74">&#39;^rc&#39;</span> | awk <span style="color:#e6db74">&#39;{print $2}&#39;</span> | xargs dpkg -P
# apt autoremove <span style="color:#f92672">&amp;&amp;</span> apt autoclean
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># dpkg -l | grep -E <span style="color:#e6db74">&#39;^rc&#39;</span> | awk <span style="color:#e6db74">&#39;{print $2}&#39;</span> | xargs dpkg -P
</span></span><span style="display:flex;"><span># apt autoremove <span style="color:#f92672">&amp;&amp;</span> apt autoclean
</span></span></code></pre></div><ul>
<li>Then I cleared my local Ansible fact cache and re-ran the <a href="https://github.com/ilri/rmg-ansible-public">infrastructure playbooks</a></li>
<li>Open <a href="https://github.com/ilri/OpenRXV/issues/111">an issue for the value mappings global replacement bug in OpenRXV</a></li>
<li>Advise Peter and Abenet on expected CGSpace budget for 2022</li>
@ -190,21 +190,21 @@ I decided to upgrade linode20 from Ubuntu 18.04 to 20.04
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/access.log.2 /var/log/nginx/access.log.3 /var/log/nginx/access.log.4 /var/log/nginx/access.log.5 /var/log/nginx/access.log.6 /var/log/nginx/access.log.7 /var/log/nginx/access.log.8 | grep -E <span style="color:#e6db74">&#34; (200|499) &#34;</span> | grep -v -E <span style="color:#e6db74">&#34;(mahider|Googlebot|Turnitin|Grammarly|Unpaywall|UptimeRobot|bot)&#34;</span> | awk <span style="color:#e6db74">&#39;{print $1}&#39;</span> | sort | uniq &gt; /tmp/2021-08-05-all-ips.txt
# wc -l /tmp/2021-08-05-all-ips.txt
43428 /tmp/2021-08-05-all-ips.txt
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/access.log.2 /var/log/nginx/access.log.3 /var/log/nginx/access.log.4 /var/log/nginx/access.log.5 /var/log/nginx/access.log.6 /var/log/nginx/access.log.7 /var/log/nginx/access.log.8 | grep -E <span style="color:#e6db74">&#34; (200|499) &#34;</span> | grep -v -E <span style="color:#e6db74">&#34;(mahider|Googlebot|Turnitin|Grammarly|Unpaywall|UptimeRobot|bot)&#34;</span> | awk <span style="color:#e6db74">&#39;{print $1}&#39;</span> | sort | uniq &gt; /tmp/2021-08-05-all-ips.txt
</span></span><span style="display:flex;"><span># wc -l /tmp/2021-08-05-all-ips.txt
</span></span><span style="display:flex;"><span>43428 /tmp/2021-08-05-all-ips.txt
</span></span></code></pre></div><ul>
<li>Already I can see that the total is much less than during the attack on one weekend last month (over 50,000!)
<ul>
<li>Indeed, now I see that there are no IPs from those networks coming in now:</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ./ilri/resolve-addresses-geoip2.py -i /tmp/2021-08-05-all-ips.txt -o /tmp/2021-08-05-all-ips.csv
$ csvgrep -c asn -r <span style="color:#e6db74">&#39;^(49453|46844|206485|62282|36352|35913|35624|8100)$&#39;</span> /tmp/2021-08-05-all-ips.csv | csvcut -c ip | sed 1d | sort | uniq &gt; /tmp/2021-08-05-all-ips-to-purge.csv
$ wc -l /tmp/2021-08-05-all-ips-to-purge.csv
0 /tmp/2021-08-05-all-ips-to-purge.csv
</code></pre></div><h2 id="2021-08-08">2021-08-08</h2>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./ilri/resolve-addresses-geoip2.py -i /tmp/2021-08-05-all-ips.txt -o /tmp/2021-08-05-all-ips.csv
</span></span><span style="display:flex;"><span>$ csvgrep -c asn -r <span style="color:#e6db74">&#39;^(49453|46844|206485|62282|36352|35913|35624|8100)$&#39;</span> /tmp/2021-08-05-all-ips.csv | csvcut -c ip | sed 1d | sort | uniq &gt; /tmp/2021-08-05-all-ips-to-purge.csv
</span></span><span style="display:flex;"><span>$ wc -l /tmp/2021-08-05-all-ips-to-purge.csv
</span></span><span style="display:flex;"><span>0 /tmp/2021-08-05-all-ips-to-purge.csv
</span></span></code></pre></div><h2 id="2021-08-08">2021-08-08</h2>
<ul>
<li>Advise IWMI colleagues on best practices for thumbnails</li>
<li>Add a handful of mappings for incorrect countries, regions, and licenses on AReS and start a new harvest
@ -220,8 +220,8 @@ $ wc -l /tmp/2021-08-05-all-ips-to-purge.csv
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36
</span></span></code></pre></div><ul>
<li>That IP is on Amazon, and from looking at the DSpace logs I don&rsquo;t see them logging in at all, only scraping&hellip; so I will purge hits from that IP</li>
<li>I see 93.158.90.30 is some Swedish IP that also has a normal-looking user agent, but never logs in and requests thousands of XMLUI pages, I will purge their hits too
<ul>
@ -232,14 +232,14 @@ $ wc -l /tmp/2021-08-05-all-ips-to-purge.csv
<li>3.225.28.105 uses a normal-looking user agent but makes thousands of request to the REST API a few seconds apart</li>
<li>61.143.40.50 is in China and uses this hilarious user agent:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.{random.randint(0, 9999)} Safari/537.{random.randint(0, 99)}&#34;
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.{random.randint(0, 9999)} Safari/537.{random.randint(0, 99)}&#34;
</span></span></code></pre></div><ul>
<li>47.252.80.214 is owned by Alibaba in the US and has the same user agent</li>
<li>159.138.131.15 is in Hong Kong and also seems to be a bot because I never see it log in and it downloads 4,300 PDFs over the course of a few hours</li>
<li>95.87.154.12 seems to be a new bot with the following user agent:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">Mozilla/5.0 (compatible; MaCoCu; +https://www.clarin.si/info/macocu-massive-collection-and-curation-of-monolingual-and-bilingual-data/
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>Mozilla/5.0 (compatible; MaCoCu; +https://www.clarin.si/info/macocu-massive-collection-and-curation-of-monolingual-and-bilingual-data/
</span></span></code></pre></div><ul>
<li>They have a legitimate EU-funded project to enrich data for under-resourced languages in the EU
<ul>
<li>I will purge the hits and add them to our list of bot overrides in the mean time before I submit it to COUNTER-Robots</li>
@ -247,37 +247,37 @@ $ wc -l /tmp/2021-08-05-all-ips-to-purge.csv
</li>
<li>I see a new bot using this user agent:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">nettle (+https://www.nettle.sk)
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>nettle (+https://www.nettle.sk)
</span></span></code></pre></div><ul>
<li>129.0.211.251 is in Cameroon and uses a normal-looking user agent, but seems to be a bot of some sort, as it downloaded 900 PDFs over a short period.</li>
<li>217.182.21.193 is on OVH in France and uses a Linux user agent, but never logs in and makes several requests per minute, over 1,000 in a day</li>
<li>103.135.104.139 is in Hong Kong and also seems to be making real requests, but makes way too many to be a human</li>
<li>There are probably more but that&rsquo;s most of them over 1,000 hits last month, so I will purge them:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ./ilri/check-spider-ip-hits.sh -f /tmp/ips.txt -p
Purging 10796 hits from 35.174.144.154 in statistics
Purging 9993 hits from 93.158.90.30 in statistics
Purging 6092 hits from 130.255.162.173 in statistics
Purging 24863 hits from 3.225.28.105 in statistics
Purging 2988 hits from 93.158.90.91 in statistics
Purging 2497 hits from 61.143.40.50 in statistics
Purging 13866 hits from 159.138.131.15 in statistics
Purging 2721 hits from 95.87.154.12 in statistics
Purging 2786 hits from 47.252.80.214 in statistics
Purging 1485 hits from 129.0.211.251 in statistics
Purging 8952 hits from 217.182.21.193 in statistics
Purging 3446 hits from 103.135.104.139 in statistics
<span style="color:#960050;background-color:#1e0010">
</span><span style="color:#960050;background-color:#1e0010"></span>Total number of bot hits purged: 90485
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./ilri/check-spider-ip-hits.sh -f /tmp/ips.txt -p
</span></span><span style="display:flex;"><span>Purging 10796 hits from 35.174.144.154 in statistics
</span></span><span style="display:flex;"><span>Purging 9993 hits from 93.158.90.30 in statistics
</span></span><span style="display:flex;"><span>Purging 6092 hits from 130.255.162.173 in statistics
</span></span><span style="display:flex;"><span>Purging 24863 hits from 3.225.28.105 in statistics
</span></span><span style="display:flex;"><span>Purging 2988 hits from 93.158.90.91 in statistics
</span></span><span style="display:flex;"><span>Purging 2497 hits from 61.143.40.50 in statistics
</span></span><span style="display:flex;"><span>Purging 13866 hits from 159.138.131.15 in statistics
</span></span><span style="display:flex;"><span>Purging 2721 hits from 95.87.154.12 in statistics
</span></span><span style="display:flex;"><span>Purging 2786 hits from 47.252.80.214 in statistics
</span></span><span style="display:flex;"><span>Purging 1485 hits from 129.0.211.251 in statistics
</span></span><span style="display:flex;"><span>Purging 8952 hits from 217.182.21.193 in statistics
</span></span><span style="display:flex;"><span>Purging 3446 hits from 103.135.104.139 in statistics
</span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>Total number of bot hits purged: 90485
</span></span></code></pre></div><ul>
<li>Then I purged a few thousand more by user agent:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ./ilri/check-spider-hits.sh -f dspace/config/spiders/agents/ilri
Found 2707 hits from MaCoCu in statistics
Found 1785 hits from nettle in statistics
<span style="color:#960050;background-color:#1e0010">
</span><span style="color:#960050;background-color:#1e0010"></span>Total number of hits from bots: 4492
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./ilri/check-spider-hits.sh -f dspace/config/spiders/agents/ilri
</span></span><span style="display:flex;"><span>Found 2707 hits from MaCoCu in statistics
</span></span><span style="display:flex;"><span>Found 1785 hits from nettle in statistics
</span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>Total number of hits from bots: 4492
</span></span></code></pre></div><ul>
<li>I found some CGSpace metadata in the wrong fields
<ul>
<li>Seven metadata in dc.subject (57) should be in dcterms.subject (187)</li>
@ -289,8 +289,8 @@ Found 1785 hits from nettle in statistics
</li>
<li>I exported the entire CGSpace repository as a CSV to do some work on ISSNs and ISBNs:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ csvcut -c <span style="color:#e6db74">&#39;id,cg.issn,cg.issn[],cg.issn[en],cg.issn[en_US],cg.isbn,cg.isbn[],cg.isbn[en_US]&#39;</span> /tmp/2021-08-08-cgspace.csv &gt; /tmp/2021-08-08-issn-isbn.csv
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ csvcut -c <span style="color:#e6db74">&#39;id,cg.issn,cg.issn[],cg.issn[en],cg.issn[en_US],cg.isbn,cg.isbn[],cg.isbn[en_US]&#39;</span> /tmp/2021-08-08-cgspace.csv &gt; /tmp/2021-08-08-issn-isbn.csv
</span></span></code></pre></div><ul>
<li>Then in OpenRefine I merged all null, blank, and en fields into the <code>en_US</code> one for each, removed all spaces, fixed invalid multi-value separators, removed everything other than ISSN/ISBNs themselves
<ul>
<li>In total it was a few thousand metadata entries or so so I had to split the CSV with <code>xsv split</code> in order to process it</li>
@ -303,20 +303,20 @@ Found 1785 hits from nettle in statistics
<ul>
<li>Extract all unique ISSNs to look up on Sherpa Romeo and Crossref</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ csvcut -c <span style="color:#e6db74">&#39;cg.issn[en_US]&#39;</span> ~/Downloads/2021-08-08-CGSpace-ISBN-ISSN.csv | csvgrep -c <span style="color:#ae81ff">1</span> -r <span style="color:#e6db74">&#39;^[0-9]{4}&#39;</span> | sed 1d | sort | uniq &gt; /tmp/2021-08-09-issns.txt
$ ./ilri/sherpa-issn-lookup.py -a mehhhhhhhhhhhhh -i /tmp/2021-08-09-issns.txt -o /tmp/2021-08-09-journals-sherpa-romeo.csv
$ ./ilri/crossref-issn-lookup.py -e me@cgiar.org -i /tmp/2021-08-09-issns.txt -o /tmp/2021-08-09-journals-crossref.csv
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ csvcut -c <span style="color:#e6db74">&#39;cg.issn[en_US]&#39;</span> ~/Downloads/2021-08-08-CGSpace-ISBN-ISSN.csv | csvgrep -c <span style="color:#ae81ff">1</span> -r <span style="color:#e6db74">&#39;^[0-9]{4}&#39;</span> | sed 1d | sort | uniq &gt; /tmp/2021-08-09-issns.txt
</span></span><span style="display:flex;"><span>$ ./ilri/sherpa-issn-lookup.py -a mehhhhhhhhhhhhh -i /tmp/2021-08-09-issns.txt -o /tmp/2021-08-09-journals-sherpa-romeo.csv
</span></span><span style="display:flex;"><span>$ ./ilri/crossref-issn-lookup.py -e me@cgiar.org -i /tmp/2021-08-09-issns.txt -o /tmp/2021-08-09-journals-crossref.csv
</span></span></code></pre></div><ul>
<li>Then I updated the CSV headers for each and joined the CSVs on the issn column:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ sed -i <span style="color:#e6db74">&#39;1s/journal title/sherpa romeo journal title/&#39;</span> /tmp/2021-08-09-journals-sherpa-romeo.csv
$ sed -i <span style="color:#e6db74">&#39;1s/journal title/crossref journal title/&#39;</span> /tmp/2021-08-09-journals-crossref.csv
$ csvjoin -c issn /tmp/2021-08-09-journals-sherpa-romeo.csv /tmp/2021-08-09-journals-crossref.csv &gt; /tmp/2021-08-09-journals-all.csv
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ sed -i <span style="color:#e6db74">&#39;1s/journal title/sherpa romeo journal title/&#39;</span> /tmp/2021-08-09-journals-sherpa-romeo.csv
</span></span><span style="display:flex;"><span>$ sed -i <span style="color:#e6db74">&#39;1s/journal title/crossref journal title/&#39;</span> /tmp/2021-08-09-journals-crossref.csv
</span></span><span style="display:flex;"><span>$ csvjoin -c issn /tmp/2021-08-09-journals-sherpa-romeo.csv /tmp/2021-08-09-journals-crossref.csv &gt; /tmp/2021-08-09-journals-all.csv
</span></span></code></pre></div><ul>
<li>In OpenRefine I faceted by blank in each column and copied the values from the other, then created a new column to indicate whether the values were the same with this GREL:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">if(cells[&#39;sherpa romeo journal title&#39;].value == cells[&#39;crossref journal title&#39;].value,&#34;same&#34;,&#34;different&#34;)
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>if(cells[&#39;sherpa romeo journal title&#39;].value == cells[&#39;crossref journal title&#39;].value,&#34;same&#34;,&#34;different&#34;)
</span></span></code></pre></div><ul>
<li>Then I exported the list of journals that differ and sent it to Peter for comments and corrections
<ul>
<li>I want to build an updated controlled vocabulary so I can update CGSpace and reconcile our existing metadata against it</li>
@ -332,15 +332,15 @@ $ csvjoin -c issn /tmp/2021-08-09-journals-sherpa-romeo.csv /tmp/2021-08-09-jour
</li>
<li>I did some tests of the memory used and time elapsed with libvips, GraphicsMagick, and ImageMagick:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ /usr/bin/time -f %M:%e vipsthumbnail IPCC.pdf -s <span style="color:#ae81ff">600</span> -o <span style="color:#e6db74">&#39;%s-vips.jpg[Q=85,optimize_coding,strip]&#39;</span>
39004:0.08
$ /usr/bin/time -f %M:%e gm convert IPCC.pdf<span style="color:#ae81ff">\[</span>0<span style="color:#ae81ff">\]</span> -quality <span style="color:#ae81ff">85</span> -thumbnail x600 -flatten IPCC-gm.jpg
40932:0.53
$ /usr/bin/time -f %M:%e convert IPCC.pdf<span style="color:#ae81ff">\[</span>0<span style="color:#ae81ff">\]</span> -flatten -profile /usr/share/ghostscript/9.54.0/iccprofiles/default_cmyk.icc -profile /usr/share/ghostscript/9.54.0/iccprofiles/default_rgb.icc /tmp/impdfthumb2862933674765647409.pdf.jpg
41724:0.59
$ /usr/bin/time -f %M:%e convert -auto-orient /tmp/impdfthumb2862933674765647409.pdf.jpg -quality <span style="color:#ae81ff">85</span> -thumbnail 600x600 IPCC-im.jpg
24736:0.04
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ /usr/bin/time -f %M:%e vipsthumbnail IPCC.pdf -s <span style="color:#ae81ff">600</span> -o <span style="color:#e6db74">&#39;%s-vips.jpg[Q=85,optimize_coding,strip]&#39;</span>
</span></span><span style="display:flex;"><span>39004:0.08
</span></span><span style="display:flex;"><span>$ /usr/bin/time -f %M:%e gm convert IPCC.pdf<span style="color:#ae81ff">\[</span>0<span style="color:#ae81ff">\]</span> -quality <span style="color:#ae81ff">85</span> -thumbnail x600 -flatten IPCC-gm.jpg
</span></span><span style="display:flex;"><span>40932:0.53
</span></span><span style="display:flex;"><span>$ /usr/bin/time -f %M:%e convert IPCC.pdf<span style="color:#ae81ff">\[</span>0<span style="color:#ae81ff">\]</span> -flatten -profile /usr/share/ghostscript/9.54.0/iccprofiles/default_cmyk.icc -profile /usr/share/ghostscript/9.54.0/iccprofiles/default_rgb.icc /tmp/impdfthumb2862933674765647409.pdf.jpg
</span></span><span style="display:flex;"><span>41724:0.59
</span></span><span style="display:flex;"><span>$ /usr/bin/time -f %M:%e convert -auto-orient /tmp/impdfthumb2862933674765647409.pdf.jpg -quality <span style="color:#ae81ff">85</span> -thumbnail 600x600 IPCC-im.jpg
</span></span><span style="display:flex;"><span>24736:0.04
</span></span></code></pre></div><ul>
<li>The ImageMagick way is the same as how DSpace does it (first creating an intermediary image, then getting a thumbnail)
<ul>
<li>libvips does use less time and memory&hellip; I should do more tests!</li>
@ -359,17 +359,17 @@ $ /usr/bin/time -f %M:%e convert -auto-orient /tmp/impdfthumb2862933674765647409
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ csvcut -c cgspace ~/Downloads/2021-08-09-CGSpace-Journals-PB.csv | sort -u | sed 1d &gt; /tmp/journals1.txt
$ csvcut -c <span style="color:#e6db74">&#39;sherpa romeo journal title&#39;</span> ~/Downloads/2021-08-09-CGSpace-Journals-All.csv | sort -u | sed 1d &gt; /tmp/journals2.txt
$ cat /tmp/journals1.txt /tmp/journals2.txt | sort -u | wc -l
1911
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ csvcut -c cgspace ~/Downloads/2021-08-09-CGSpace-Journals-PB.csv | sort -u | sed 1d &gt; /tmp/journals1.txt
</span></span><span style="display:flex;"><span>$ csvcut -c <span style="color:#e6db74">&#39;sherpa romeo journal title&#39;</span> ~/Downloads/2021-08-09-CGSpace-Journals-All.csv | sort -u | sed 1d &gt; /tmp/journals2.txt
</span></span><span style="display:flex;"><span>$ cat /tmp/journals1.txt /tmp/journals2.txt | sort -u | wc -l
</span></span><span style="display:flex;"><span>1911
</span></span></code></pre></div><ul>
<li>Now I will create a controlled vocabulary out of this list and reconcile our existing journal title metadata with it in OpenRefine</li>
<li>I exported a list of all the journal titles we have in the <code>cg.journal</code> field:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">localhost/dspace63= &gt; \COPY (SELECT DISTINCT(text_value) AS &#34;cg.journal&#34; FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (251)) to /tmp/2021-08-11-journals.csv WITH CSV;
COPY 3245
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>localhost/dspace63= &gt; \COPY (SELECT DISTINCT(text_value) AS &#34;cg.journal&#34; FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (251)) to /tmp/2021-08-11-journals.csv WITH CSV;
</span></span><span style="display:flex;"><span>COPY 3245
</span></span></code></pre></div><ul>
<li>I started looking at reconciling them with reconcile-csv in OpenRefine, but ouch, there are 1,600 journal titles that don&rsquo;t match, so I&rsquo;d have to go check many of them manually before selecting a match or fixing them&hellip;
<ul>
<li>I think it&rsquo;s better if I try to write a Python script to fetch the ISSNs for each journal article and update them that way</li>
@ -421,10 +421,10 @@ COPY 3245
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ dspace community-filiator --set --parent<span style="color:#f92672">=</span>10568/114644 --child<span style="color:#f92672">=</span>10568/72600
$ dspace community-filiator --set --parent<span style="color:#f92672">=</span>10568/114644 --child<span style="color:#f92672">=</span>10568/35730
$ dspace community-filiator --set --parent<span style="color:#f92672">=</span>10568/114644 --child<span style="color:#f92672">=</span>10568/76451
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ dspace community-filiator --set --parent<span style="color:#f92672">=</span>10568/114644 --child<span style="color:#f92672">=</span>10568/72600
</span></span><span style="display:flex;"><span>$ dspace community-filiator --set --parent<span style="color:#f92672">=</span>10568/114644 --child<span style="color:#f92672">=</span>10568/35730
</span></span><span style="display:flex;"><span>$ dspace community-filiator --set --parent<span style="color:#f92672">=</span>10568/114644 --child<span style="color:#f92672">=</span>10568/76451
</span></span></code></pre></div><ul>
<li>I made a minor fix to OpenRXV to prefix all image names with <code>docker.io</code> so it works with less changes on podman
<ul>
<li>Docker assumes the <code>docker.io</code> registry by default, but we should be explicit</li>
@ -446,40 +446,40 @@ $ dspace community-filiator --set --parent<span style="color:#f92672">=</span>10
</li>
<li>Lower case all AGROVOC metadata, as I had noticed a few in sentence case:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">dspace=# UPDATE metadatavalue SET text_value=LOWER(text_value) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=187 AND text_value ~ &#39;[[:upper:]]&#39;;
UPDATE 484
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>dspace=# UPDATE metadatavalue SET text_value=LOWER(text_value) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=187 AND text_value ~ &#39;[[:upper:]]&#39;;
</span></span><span style="display:flex;"><span>UPDATE 484
</span></span></code></pre></div><ul>
<li>Also update some DOIs using the <code>dx.doi.org</code> format, just to keep things uniform:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">dspace=# UPDATE metadatavalue SET text_value = regexp_replace(text_value, &#39;https://dx.doi.org&#39;, &#39;https://doi.org&#39;) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 220 AND text_value LIKE &#39;https://dx.doi.org%&#39;;
UPDATE 469
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>dspace=# UPDATE metadatavalue SET text_value = regexp_replace(text_value, &#39;https://dx.doi.org&#39;, &#39;https://doi.org&#39;) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 220 AND text_value LIKE &#39;https://dx.doi.org%&#39;;
</span></span><span style="display:flex;"><span>UPDATE 469
</span></span></code></pre></div><ul>
<li>Then start a full Discovery re-indexing to update the Feed the Future community item counts that have been stuck at 0 since we moved the three projects to be a subcommunity a few days ago:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ time chrt -b <span style="color:#ae81ff">0</span> ionice -c2 -n7 nice -n19 dspace index-discovery -b
<span style="color:#960050;background-color:#1e0010">
</span><span style="color:#960050;background-color:#1e0010"></span>real 322m16.917s
user 226m43.121s
sys 3m17.469s
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ time chrt -b <span style="color:#ae81ff">0</span> ionice -c2 -n7 nice -n19 dspace index-discovery -b
</span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>real 322m16.917s
</span></span><span style="display:flex;"><span>user 226m43.121s
</span></span><span style="display:flex;"><span>sys 3m17.469s
</span></span></code></pre></div><ul>
<li>I learned how to use the OpenRXV API, which is just a thin wrapper around Elasticsearch:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ curl -X POST <span style="color:#e6db74">&#39;https://cgspace.cgiar.org/explorer/api/search?scroll=1d&#39;</span> <span style="color:#ae81ff">\
</span><span style="color:#ae81ff"></span> -H &#39;Content-Type: application/json&#39; \
-d &#39;{
&#34;size&#34;: 10,
&#34;query&#34;: {
&#34;bool&#34;: {
&#34;filter&#34;: {
&#34;term&#34;: {
&#34;repo.keyword&#34;: &#34;CGSpace&#34;
}
}
}
}
}&#39;
$ curl -X POST <span style="color:#e6db74">&#39;https://cgspace.cgiar.org/explorer/api/search/scroll/DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAASekWMTRwZ3lEMkVRYUtKZjgyMno4dV9CUQ==&#39;</span>
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -X POST <span style="color:#e6db74">&#39;https://cgspace.cgiar.org/explorer/api/search?scroll=1d&#39;</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> -H &#39;Content-Type: application/json&#39; \
</span></span><span style="display:flex;"><span> -d &#39;{
</span></span><span style="display:flex;"><span> &#34;size&#34;: 10,
</span></span><span style="display:flex;"><span> &#34;query&#34;: {
</span></span><span style="display:flex;"><span> &#34;bool&#34;: {
</span></span><span style="display:flex;"><span> &#34;filter&#34;: {
</span></span><span style="display:flex;"><span> &#34;term&#34;: {
</span></span><span style="display:flex;"><span> &#34;repo.keyword&#34;: &#34;CGSpace&#34;
</span></span><span style="display:flex;"><span> }
</span></span><span style="display:flex;"><span> }
</span></span><span style="display:flex;"><span> }
</span></span><span style="display:flex;"><span> }
</span></span><span style="display:flex;"><span>}&#39;
</span></span><span style="display:flex;"><span>$ curl -X POST <span style="color:#e6db74">&#39;https://cgspace.cgiar.org/explorer/api/search/scroll/DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAASekWMTRwZ3lEMkVRYUtKZjgyMno4dV9CUQ==&#39;</span>
</span></span></code></pre></div><ul>
<li>This uses the Elasticsearch scroll ID to page through results
<ul>
<li>The second query doesn&rsquo;t need the request body because it is saved for 1 day as part of the first request</li>
@ -525,46 +525,46 @@ $ curl -X POST <span style="color:#e6db74">&#39;https://cgspace.cgiar.org/explor
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-identifier.xml /tmp/bioversity-orcids.txt | grep -oE <span style="color:#e6db74">&#39;[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}&#39;</span> | sort | uniq &gt; /tmp/2021-08-25-combined-orcids.txt
$ wc -l /tmp/2021-08-25-combined-orcids.txt
1331
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-identifier.xml /tmp/bioversity-orcids.txt | grep -oE <span style="color:#e6db74">&#39;[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}&#39;</span> | sort | uniq &gt; /tmp/2021-08-25-combined-orcids.txt
</span></span><span style="display:flex;"><span>$ wc -l /tmp/2021-08-25-combined-orcids.txt
</span></span><span style="display:flex;"><span>1331
</span></span></code></pre></div><ul>
<li>After I combined them and removed duplicates, I resolved all the names using my <code>resolve-orcids.py</code> script:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ./ilri/resolve-orcids.py -i /tmp/2021-08-25-combined-orcids.txt -o /tmp/2021-08-25-combined-orcids-names.txt
</code></pre></div><ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./ilri/resolve-orcids.py -i /tmp/2021-08-25-combined-orcids.txt -o /tmp/2021-08-25-combined-orcids-names.txt
</span></span></code></pre></div><ul>
<li>Tag existing items from the Alliance&rsquo;s new authors with ORCID iDs using <code>add-orcid-identifiers-csv.py</code> (181 new metadata fields added):</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ cat 2021-08-25-add-orcids.csv
dc.contributor.author,cg.creator.identifier
&#34;Chege, Christine G. Kiria&#34;,&#34;Christine G.Kiria Chege: 0000-0001-8360-0279&#34;
&#34;Chege, Christine Kiria&#34;,&#34;Christine G.Kiria Chege: 0000-0001-8360-0279&#34;
&#34;Kiria, C.&#34;,&#34;Christine G.Kiria Chege: 0000-0001-8360-0279&#34;
&#34;Kinyua, Ivy&#34;,&#34;Ivy Kinyua :0000-0002-1978-8833&#34;
&#34;Rahn, E.&#34;,&#34;Eric Rahn: 0000-0001-6280-7430&#34;
&#34;Rahn, Eric&#34;,&#34;Eric Rahn: 0000-0001-6280-7430&#34;
&#34;Jager M.&#34;,&#34;Matthias Jager: 0000-0003-1059-3949&#34;
&#34;Jager, M.&#34;,&#34;Matthias Jager: 0000-0003-1059-3949&#34;
&#34;Jager, Matthias&#34;,&#34;Matthias Jager: 0000-0003-1059-3949&#34;
&#34;Waswa, Boaz&#34;,&#34;Boaz Waswa: 0000-0002-0066-0215&#34;
&#34;Waswa, Boaz S.&#34;,&#34;Boaz Waswa: 0000-0002-0066-0215&#34;
&#34;Rivera, Tatiana&#34;,&#34;Tatiana Rivera: 0000-0003-4876-5873&#34;
&#34;Andrade, Robert&#34;,&#34;Robert Andrade: 0000-0002-5764-3854&#34;
&#34;Ceccarelli, Viviana&#34;,&#34;Viviana Ceccarelli: 0000-0003-2160-9483&#34;
&#34;Ceccarellia, Viviana&#34;,&#34;Viviana Ceccarelli: 0000-0003-2160-9483&#34;
&#34;Nyawira, Sylvia&#34;,&#34;Sylvia Sarah Nyawira: 0000-0003-4913-1389&#34;
&#34;Nyawira, Sylvia S.&#34;,&#34;Sylvia Sarah Nyawira: 0000-0003-4913-1389&#34;
&#34;Nyawira, Sylvia Sarah&#34;,&#34;Sylvia Sarah Nyawira: 0000-0003-4913-1389&#34;
&#34;Groot, J.C.&#34;,&#34;Groot, J.C.J.: 0000-0001-6516-5170&#34;
&#34;Groot, J.C.J.&#34;,&#34;Groot, J.C.J.: 0000-0001-6516-5170&#34;
&#34;Groot, Jeroen C.J.&#34;,&#34;Groot, J.C.J.: 0000-0001-6516-5170&#34;
&#34;Groot, Jeroen CJ&#34;,&#34;Groot, J.C.J.: 0000-0001-6516-5170&#34;
&#34;Abera, W.&#34;,&#34;Wuletawu Abera: 0000-0002-3657-5223&#34;
&#34;Abera, Wuletawu&#34;,&#34;Wuletawu Abera: 0000-0002-3657-5223&#34;
&#34;Kanyenga Lubobo, Antoine&#34;,&#34;Antoine Lubobo Kanyenga: 0000-0003-0806-9304&#34;
&#34;Lubobo Antoine, Kanyenga&#34;,&#34;Antoine Lubobo Kanyenga: 0000-0003-0806-9304&#34;
$ ./ilri/add-orcid-identifiers-csv.py -i 2021-08-25-add-orcids.csv -db dspace -u dspace -p <span style="color:#e6db74">&#39;fuuu&#39;</span>
</code></pre></div><h2 id="2021-08-29">2021-08-29</h2>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ cat 2021-08-25-add-orcids.csv
</span></span><span style="display:flex;"><span>dc.contributor.author,cg.creator.identifier
</span></span><span style="display:flex;"><span>&#34;Chege, Christine G. Kiria&#34;,&#34;Christine G.Kiria Chege: 0000-0001-8360-0279&#34;
</span></span><span style="display:flex;"><span>&#34;Chege, Christine Kiria&#34;,&#34;Christine G.Kiria Chege: 0000-0001-8360-0279&#34;
</span></span><span style="display:flex;"><span>&#34;Kiria, C.&#34;,&#34;Christine G.Kiria Chege: 0000-0001-8360-0279&#34;
</span></span><span style="display:flex;"><span>&#34;Kinyua, Ivy&#34;,&#34;Ivy Kinyua :0000-0002-1978-8833&#34;
</span></span><span style="display:flex;"><span>&#34;Rahn, E.&#34;,&#34;Eric Rahn: 0000-0001-6280-7430&#34;
</span></span><span style="display:flex;"><span>&#34;Rahn, Eric&#34;,&#34;Eric Rahn: 0000-0001-6280-7430&#34;
</span></span><span style="display:flex;"><span>&#34;Jager M.&#34;,&#34;Matthias Jager: 0000-0003-1059-3949&#34;
</span></span><span style="display:flex;"><span>&#34;Jager, M.&#34;,&#34;Matthias Jager: 0000-0003-1059-3949&#34;
</span></span><span style="display:flex;"><span>&#34;Jager, Matthias&#34;,&#34;Matthias Jager: 0000-0003-1059-3949&#34;
</span></span><span style="display:flex;"><span>&#34;Waswa, Boaz&#34;,&#34;Boaz Waswa: 0000-0002-0066-0215&#34;
</span></span><span style="display:flex;"><span>&#34;Waswa, Boaz S.&#34;,&#34;Boaz Waswa: 0000-0002-0066-0215&#34;
</span></span><span style="display:flex;"><span>&#34;Rivera, Tatiana&#34;,&#34;Tatiana Rivera: 0000-0003-4876-5873&#34;
</span></span><span style="display:flex;"><span>&#34;Andrade, Robert&#34;,&#34;Robert Andrade: 0000-0002-5764-3854&#34;
</span></span><span style="display:flex;"><span>&#34;Ceccarelli, Viviana&#34;,&#34;Viviana Ceccarelli: 0000-0003-2160-9483&#34;
</span></span><span style="display:flex;"><span>&#34;Ceccarellia, Viviana&#34;,&#34;Viviana Ceccarelli: 0000-0003-2160-9483&#34;
</span></span><span style="display:flex;"><span>&#34;Nyawira, Sylvia&#34;,&#34;Sylvia Sarah Nyawira: 0000-0003-4913-1389&#34;
</span></span><span style="display:flex;"><span>&#34;Nyawira, Sylvia S.&#34;,&#34;Sylvia Sarah Nyawira: 0000-0003-4913-1389&#34;
</span></span><span style="display:flex;"><span>&#34;Nyawira, Sylvia Sarah&#34;,&#34;Sylvia Sarah Nyawira: 0000-0003-4913-1389&#34;
</span></span><span style="display:flex;"><span>&#34;Groot, J.C.&#34;,&#34;Groot, J.C.J.: 0000-0001-6516-5170&#34;
</span></span><span style="display:flex;"><span>&#34;Groot, J.C.J.&#34;,&#34;Groot, J.C.J.: 0000-0001-6516-5170&#34;
</span></span><span style="display:flex;"><span>&#34;Groot, Jeroen C.J.&#34;,&#34;Groot, J.C.J.: 0000-0001-6516-5170&#34;
</span></span><span style="display:flex;"><span>&#34;Groot, Jeroen CJ&#34;,&#34;Groot, J.C.J.: 0000-0001-6516-5170&#34;
</span></span><span style="display:flex;"><span>&#34;Abera, W.&#34;,&#34;Wuletawu Abera: 0000-0002-3657-5223&#34;
</span></span><span style="display:flex;"><span>&#34;Abera, Wuletawu&#34;,&#34;Wuletawu Abera: 0000-0002-3657-5223&#34;
</span></span><span style="display:flex;"><span>&#34;Kanyenga Lubobo, Antoine&#34;,&#34;Antoine Lubobo Kanyenga: 0000-0003-0806-9304&#34;
</span></span><span style="display:flex;"><span>&#34;Lubobo Antoine, Kanyenga&#34;,&#34;Antoine Lubobo Kanyenga: 0000-0003-0806-9304&#34;
</span></span><span style="display:flex;"><span>$ ./ilri/add-orcid-identifiers-csv.py -i 2021-08-25-add-orcids.csv -db dspace -u dspace -p <span style="color:#e6db74">&#39;fuuu&#39;</span>
</span></span></code></pre></div><h2 id="2021-08-29">2021-08-29</h2>
<ul>
<li>Run a full harvest on AReS</li>
<li>Also do more work the past few days on OpenRXV