mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2022-03-04
This commit is contained in:
@ -12,7 +12,7 @@
|
||||
Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning
|
||||
Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
|
||||
|
||||
# zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
# zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
440 17.58.101.255
|
||||
441 157.55.39.101
|
||||
485 207.46.13.43
|
||||
@ -23,7 +23,7 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
|
||||
814 207.46.13.212
|
||||
2472 163.172.71.23
|
||||
6092 3.94.211.189
|
||||
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
33 2a01:7e00::f03c:91ff:fe16:fcb
|
||||
57 3.83.192.124
|
||||
57 3.87.77.25
|
||||
@ -49,7 +49,7 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
|
||||
Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning
|
||||
Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
|
||||
|
||||
# zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
# zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
440 17.58.101.255
|
||||
441 157.55.39.101
|
||||
485 207.46.13.43
|
||||
@ -60,7 +60,7 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
|
||||
814 207.46.13.212
|
||||
2472 163.172.71.23
|
||||
6092 3.94.211.189
|
||||
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
33 2a01:7e00::f03c:91ff:fe16:fcb
|
||||
57 3.83.192.124
|
||||
57 3.87.77.25
|
||||
@ -72,7 +72,7 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
|
||||
7249 2a01:7e00::f03c:91ff:fe18:7396
|
||||
9124 45.5.186.2
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.92.2" />
|
||||
<meta name="generator" content="Hugo 0.93.1" />
|
||||
|
||||
|
||||
|
||||
@ -163,7 +163,7 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
|
||||
<li>Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning</li>
|
||||
<li>Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
440 17.58.101.255
|
||||
441 157.55.39.101
|
||||
485 207.46.13.43
|
||||
@ -174,7 +174,7 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
|
||||
814 207.46.13.212
|
||||
2472 163.172.71.23
|
||||
6092 3.94.211.189
|
||||
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
33 2a01:7e00::f03c:91ff:fe16:fcb
|
||||
57 3.83.192.124
|
||||
57 3.87.77.25
|
||||
@ -193,14 +193,14 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
|
||||
</code></pre><ul>
|
||||
<li>It actually got mostly HTTP 200 responses:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "01/Sep/2019:0" | grep 163.172.71.23 | awk '{print $9}' | sort | uniq -c
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "01/Sep/2019:0" | grep 163.172.71.23 | awk '{print $9}' | sort | uniq -c
|
||||
1775 200
|
||||
703 499
|
||||
72 503
|
||||
</code></pre><ul>
|
||||
<li>And it was mostly requesting Discover pages:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "01/Sep/2019:0" | grep 163.172.71.23 | grep -o -E "(bitstream|discover|handle)" | sort | uniq -c
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "01/Sep/2019:0" | grep 163.172.71.23 | grep -o -E "(bitstream|discover|handle)" | sort | uniq -c
|
||||
2350 discover
|
||||
71 handle
|
||||
</code></pre><ul>
|
||||
@ -284,11 +284,11 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
|
||||
<li>Around the same time I see the following in the DSpace log:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>2019-09-15 15:32:18,079 INFO org.dspace.usage.LoggerUsageEventListener @ aorth@blah:session_id=A11C362A7127004C24E77198AF9E4418:ip_addr=x.x.x.x:view_item:handle=10568/103644
|
||||
2019-09-15 15:32:18,135 WARN org.dspace.core.PluginManager @ Cannot find named plugin for interface=org.dspace.content.crosswalk.DisseminationCrosswalk, name="METSRIGHTS"
|
||||
2019-09-15 15:32:18,135 WARN org.dspace.core.PluginManager @ Cannot find named plugin for interface=org.dspace.content.crosswalk.DisseminationCrosswalk, name="METSRIGHTS"
|
||||
</code></pre><ul>
|
||||
<li>I see a lot of these errors today, but not earlier this month:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># grep -c 'Cannot find named plugin' dspace.log.2019-09-*
|
||||
<pre tabindex="0"><code># grep -c 'Cannot find named plugin' dspace.log.2019-09-*
|
||||
dspace.log.2019-09-01:0
|
||||
dspace.log.2019-09-02:0
|
||||
dspace.log.2019-09-03:0
|
||||
@ -307,9 +307,9 @@ dspace.log.2019-09-15:808
|
||||
</code></pre><ul>
|
||||
<li>Something must have happened when I restarted Tomcat a few hours ago, because earlier in the DSpace log I see a bunch of errors like this:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>2019-09-15 13:59:24,136 ERROR org.dspace.core.PluginManager @ Name collision in named plugin, implementation class="org.dspace.content.crosswalk.METSRightsCrosswalk", name="METSRIGHTS"
|
||||
2019-09-15 13:59:24,136 ERROR org.dspace.core.PluginManager @ Name collision in named plugin, implementation class="org.dspace.content.crosswalk.OREDisseminationCrosswalk", name="ore"
|
||||
2019-09-15 13:59:24,136 ERROR org.dspace.core.PluginManager @ Name collision in named plugin, implementation class="org.dspace.content.crosswalk.DIMDisseminationCrosswalk", name="dim"
|
||||
<pre tabindex="0"><code>2019-09-15 13:59:24,136 ERROR org.dspace.core.PluginManager @ Name collision in named plugin, implementation class="org.dspace.content.crosswalk.METSRightsCrosswalk", name="METSRIGHTS"
|
||||
2019-09-15 13:59:24,136 ERROR org.dspace.core.PluginManager @ Name collision in named plugin, implementation class="org.dspace.content.crosswalk.OREDisseminationCrosswalk", name="ore"
|
||||
2019-09-15 13:59:24,136 ERROR org.dspace.core.PluginManager @ Name collision in named plugin, implementation class="org.dspace.content.crosswalk.DIMDisseminationCrosswalk", name="dim"
|
||||
</code></pre><ul>
|
||||
<li>I restarted Tomcat and the item views came back, but then the Solr statistics cores didn’t all load properly
|
||||
<ul>
|
||||
@ -326,9 +326,9 @@ dspace.log.2019-09-15:808
|
||||
# docker run --name dspacedb -v dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine
|
||||
$ createuser -h localhost -U postgres --pwprompt dspacetest
|
||||
$ createdb -h localhost -U postgres -O dspacetest --encoding=UNICODE dspacetest
|
||||
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest superuser;'
|
||||
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest superuser;'
|
||||
$ pg_restore -h localhost -U postgres -d dspacetest -O --role=dspacetest -h localhost ~/Downloads/cgspace_2019-08-31.backup
|
||||
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;'
|
||||
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;'
|
||||
$ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest
|
||||
</code></pre><ul>
|
||||
<li>Elizabeth from CIAT sent me a list of sixteen authors who need to have their ORCID identifiers tagged with their publications
|
||||
@ -339,26 +339,26 @@ $ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-s
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>dc.contributor.author,cg.creator.id
|
||||
"Kihara, Job","Job Kihara: 0000-0002-4394-9553"
|
||||
"Twyman, Jennifer","Jennifer Twyman: 0000-0002-8581-5668"
|
||||
"Ishitani, Manabu","Manabu Ishitani: 0000-0002-6950-4018"
|
||||
"Arango, Jacobo","Jacobo Arango: 0000-0002-4828-9398"
|
||||
"Chavarriaga Aguirre, Paul","Paul Chavarriaga-Aguirre: 0000-0001-7579-3250"
|
||||
"Paul, Birthe","Birthe Paul: 0000-0002-5994-5354"
|
||||
"Eitzinger, Anton","Anton Eitzinger: 0000-0001-7317-3381"
|
||||
"Hoek, Rein van der","Rein van der Hoek: 0000-0003-4528-7669"
|
||||
"Aranzales Rondón, Ericson","Ericson Aranzales Rondon: 0000-0001-7487-9909"
|
||||
"Staiger-Rivas, Simone","Simone Staiger: 0000-0002-3539-0817"
|
||||
"de Haan, Stef","Stef de Haan: 0000-0001-8690-1886"
|
||||
"Pulleman, Mirjam","Mirjam Pulleman: 0000-0001-9950-0176"
|
||||
"Abera, Wuletawu","Wuletawu Abera: 0000-0002-3657-5223"
|
||||
"Tamene, Lulseged","Lulseged Tamene: 0000-0002-3806-8890"
|
||||
"Andrieu, Nadine","Nadine Andrieu: 0000-0001-9558-9302"
|
||||
"Ramírez-Villegas, Julián","Julian Ramirez-Villegas: 0000-0002-8044-583X"
|
||||
"Kihara, Job","Job Kihara: 0000-0002-4394-9553"
|
||||
"Twyman, Jennifer","Jennifer Twyman: 0000-0002-8581-5668"
|
||||
"Ishitani, Manabu","Manabu Ishitani: 0000-0002-6950-4018"
|
||||
"Arango, Jacobo","Jacobo Arango: 0000-0002-4828-9398"
|
||||
"Chavarriaga Aguirre, Paul","Paul Chavarriaga-Aguirre: 0000-0001-7579-3250"
|
||||
"Paul, Birthe","Birthe Paul: 0000-0002-5994-5354"
|
||||
"Eitzinger, Anton","Anton Eitzinger: 0000-0001-7317-3381"
|
||||
"Hoek, Rein van der","Rein van der Hoek: 0000-0003-4528-7669"
|
||||
"Aranzales Rondón, Ericson","Ericson Aranzales Rondon: 0000-0001-7487-9909"
|
||||
"Staiger-Rivas, Simone","Simone Staiger: 0000-0002-3539-0817"
|
||||
"de Haan, Stef","Stef de Haan: 0000-0001-8690-1886"
|
||||
"Pulleman, Mirjam","Mirjam Pulleman: 0000-0001-9950-0176"
|
||||
"Abera, Wuletawu","Wuletawu Abera: 0000-0002-3657-5223"
|
||||
"Tamene, Lulseged","Lulseged Tamene: 0000-0002-3806-8890"
|
||||
"Andrieu, Nadine","Nadine Andrieu: 0000-0001-9558-9302"
|
||||
"Ramírez-Villegas, Julián","Julian Ramirez-Villegas: 0000-0002-8044-583X"
|
||||
</code></pre><ul>
|
||||
<li>I tested the file on my local development machine with the following invocation:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ ./add-orcid-identifiers-csv.py -i 2019-09-19-ciat-orcids.csv -db dspace -u dspace -p 'fuuu'
|
||||
<pre tabindex="0"><code>$ ./add-orcid-identifiers-csv.py -i 2019-09-19-ciat-orcids.csv -db dspace -u dspace -p 'fuuu'
|
||||
</code></pre><ul>
|
||||
<li>In my test environment this added 390 ORCID identifier</li>
|
||||
<li>I ran the same updates on CGSpace and DSpace Test and then started a Discovery re-index to force the search index to update</li>
|
||||
@ -386,11 +386,11 @@ $ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-s
|
||||
<li>Follow up with Marissa again about the CCAFS phase II project tags</li>
|
||||
<li>Generate a list of the top 1500 authors on CGSpace:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>dspace=# \copy (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = (SELECT metadata_field_id FROM metadatafieldregistry WHERE element = 'contributor' AND qualifier = 'author') AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2019-09-19-top-1500-authors.csv WITH CSV HEADER;
|
||||
<pre tabindex="0"><code>dspace=# \copy (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = (SELECT metadata_field_id FROM metadatafieldregistry WHERE element = 'contributor' AND qualifier = 'author') AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2019-09-19-top-1500-authors.csv WITH CSV HEADER;
|
||||
</code></pre><ul>
|
||||
<li>Then I used <code>csvcut</code> to select the column of author names, strip the header and quote characters, and saved the sorted file:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ csvcut -c text_value /tmp/2019-09-19-top-1500-authors.csv | grep -v text_value | sed 's/"//g' | sort > dspace/config/controlled-vocabularies/dc-contributor-author.xml
|
||||
<pre tabindex="0"><code>$ csvcut -c text_value /tmp/2019-09-19-top-1500-authors.csv | grep -v text_value | sed 's/"//g' | sort > dspace/config/controlled-vocabularies/dc-contributor-author.xml
|
||||
</code></pre><ul>
|
||||
<li>After adding the XML formatting back to the file I formatted it using XML tidy:</li>
|
||||
</ul>
|
||||
@ -416,7 +416,7 @@ $ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-s
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ perl-rename -n 's/_{2,3}/_/g' *.pdf
|
||||
<pre tabindex="0"><code>$ perl-rename -n 's/_{2,3}/_/g' *.pdf
|
||||
</code></pre><ul>
|
||||
<li>I was going preparing to run SAFBuilder for the Bioversity migration and decided to check the list of PDFs on my local machine versus on DSpace Test (where I had downloaded them last month)
|
||||
<ul>
|
||||
@ -426,25 +426,25 @@ $ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-s
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ rename -v 's/___/_/g' *.pdf
|
||||
$ rename -v 's/__/_/g' *.pdf
|
||||
<pre tabindex="0"><code>$ rename -v 's/___/_/g' *.pdf
|
||||
$ rename -v 's/__/_/g' *.pdf
|
||||
</code></pre><ul>
|
||||
<li>I’m still waiting to hear what Carol and Francesca want to do with the <code>1195.pdf.LCK</code> file (for now I’ve removed it from the CSV, but for future reference it has the number 630 in its permalink)</li>
|
||||
<li>I wrote two fairly long GREL expressions to clean up the institutional author names in the <code>dc.contributor.author</code> and <code>dc.identifier.citation</code> fields using OpenRefine
|
||||
<ul>
|
||||
<li>The first targets acronyms in parentheses like “International Livestock Research Institute (ILRI)":</li>
|
||||
<li>The first targets acronyms in parentheses like “International Livestock Research Institute (ILRI)”:</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>value.replace(/,? ?\((ANDES|APAFRI|APFORGEN|Canada|CFC|CGRFA|China|CacaoNet|CATAS|CDU|CIAT|CIRF|CIP|CIRNMA|COSUDE|Colombia|COA|COGENT|CTDT|Denmark|DfLP|DSE|ECPGR|ECOWAS|ECP\/GR|England|EUFORGEN|FAO|France|Francia|FFTC|Germany|GEF|GFU|GGCO|GRPI|italy|Italy|Italia|India|ICCO|ICAR|ICGR|ICRISAT|IDRC|INFOODS|IPGRI|IBPGR|ICARDA|ILRI|INIBAP|INBAR|IPK|ISG|IT|Japan|JIRCAS|Kenya|LI\-BIRD|Malaysia|NARC|NBPGR|Nepal|OOAS|RDA|RISBAP|Rome|ROPPA|SEARICE|Senegal|SGRP|Sweden|Syrian Arab Republic|The Netherlands|UNDP|UK|UNEP|UoB|UoM|United Kingdom|WAHO)\)/,"")
|
||||
<pre tabindex="0"><code>value.replace(/,? ?\((ANDES|APAFRI|APFORGEN|Canada|CFC|CGRFA|China|CacaoNet|CATAS|CDU|CIAT|CIRF|CIP|CIRNMA|COSUDE|Colombia|COA|COGENT|CTDT|Denmark|DfLP|DSE|ECPGR|ECOWAS|ECP\/GR|England|EUFORGEN|FAO|France|Francia|FFTC|Germany|GEF|GFU|GGCO|GRPI|italy|Italy|Italia|India|ICCO|ICAR|ICGR|ICRISAT|IDRC|INFOODS|IPGRI|IBPGR|ICARDA|ILRI|INIBAP|INBAR|IPK|ISG|IT|Japan|JIRCAS|Kenya|LI\-BIRD|Malaysia|NARC|NBPGR|Nepal|OOAS|RDA|RISBAP|Rome|ROPPA|SEARICE|Senegal|SGRP|Sweden|Syrian Arab Republic|The Netherlands|UNDP|UK|UNEP|UoB|UoM|United Kingdom|WAHO)\)/,"")
|
||||
</code></pre><ul>
|
||||
<li>The second targets cities and countries after names like “International Livestock Research Intstitute, Kenya”:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>replace(/,? ?(ali|Aleppo|Amsterdam|Beijing|Bonn|Burkina Faso|CN|Dakar|Gatersleben|London|Montpellier|Nairobi|New Delhi|Kaski|Kepong|Malaysia|Khumaltar|Lima|Ltpur|Ottawa|Patancheru|Peru|Pokhara|Rome|Uppsala|University of Mauritius|Tsukuba)/,"")
|
||||
<pre tabindex="0"><code>replace(/,? ?(ali|Aleppo|Amsterdam|Beijing|Bonn|Burkina Faso|CN|Dakar|Gatersleben|London|Montpellier|Nairobi|New Delhi|Kaski|Kepong|Malaysia|Khumaltar|Lima|Ltpur|Ottawa|Patancheru|Peru|Pokhara|Rome|Uppsala|University of Mauritius|Tsukuba)/,"")
|
||||
</code></pre><ul>
|
||||
<li>I imported the 1,427 Bioversity records with bitstreams to a new collection called <a href="https://dspacetest.cgiar.org/handle/10568/103688">2019-09-20 Bioversity Migration Test</a> on DSpace Test (after splitting them in two batches of about 700 each):</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx768m'
|
||||
<pre tabindex="0"><code>$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx768m'
|
||||
$ dspace import -a me@cgiar.org -m 2019-09-20-bioversity1.map -s /home/aorth/Bioversity/bioversity1
|
||||
$ dspace import -a me@cgiar.org -m 2019-09-20-bioversity2.map -s /home/aorth/Bioversity/bioversity2
|
||||
</code></pre><ul>
|
||||
@ -513,7 +513,7 @@ $ dspace import -a me@cgiar.org -m 2019-09-20-bioversity2.map -s /home/aorth/Bio
|
||||
</li>
|
||||
<li>Get a list of institutions from CCAFS’s Clarisa API and try to parse it with <code>jq</code>, do some small cleanups and add a header in <code>sed</code>, and then pass it through <code>csvcut</code> to add line numbers:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ cat ~/Downloads/institutions.json| jq '.[] | {name: .name}' | grep name | awk -F: '{print $2}' | sed -e 's/"//g' -e 's/^ //' -e '1iname' | csvcut -l | sed '1s/line_number/id/' > /tmp/clarisa-institutions.csv
|
||||
<pre tabindex="0"><code>$ cat ~/Downloads/institutions.json| jq '.[] | {name: .name}' | grep name | awk -F: '{print $2}' | sed -e 's/"//g' -e 's/^ //' -e '1iname' | csvcut -l | sed '1s/line_number/id/' > /tmp/clarisa-institutions.csv
|
||||
$ csv-metadata-quality -i /tmp/clarisa-institutions.csv -o /tmp/clarisa-institutions-cleaned.csv -u
|
||||
</code></pre><ul>
|
||||
<li>The csv-metadata-quality tool caught a few records with excessive spacing and unnecessary Unicode</li>
|
||||
|
Reference in New Issue
Block a user