Add notes for 2022-03-04

This commit is contained in:
2022-03-04 15:30:06 +03:00
parent 7453499827
commit 27acbac859
115 changed files with 6550 additions and 6444 deletions

View File

@ -42,7 +42,7 @@ You need to download this into the DSpace 6.x source and compile it
"/>
<meta name="generator" content="Hugo 0.92.2" />
<meta name="generator" content="Hugo 0.93.1" />
@ -141,7 +141,7 @@ You need to download this into the DSpace 6.x source and compile it
</ul>
</li>
</ul>
<pre tabindex="0"><code>$ export JAVA_OPTS=&quot;-Xmx1024m -Dfile.encoding=UTF-8&quot;
<pre tabindex="0"><code>$ export JAVA_OPTS=&#34;-Xmx1024m -Dfile.encoding=UTF-8&#34;
$ ~/dspace63/bin/dspace solr-upgrade-statistics-6x
</code></pre><h2 id="2020-03-03">2020-03-03</h2>
<ul>
@ -160,7 +160,7 @@ $ ~/dspace63/bin/dspace solr-upgrade-statistics-6x
</ul>
</li>
</ul>
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i 2020-03-04-fix-1-ilri-subject.csv -db dspace -u dspace -p 'fuuu' -f cg.subject.ilri -m 203 -t correct -d
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i 2020-03-04-fix-1-ilri-subject.csv -db dspace -u dspace -p &#39;fuuu&#39; -f cg.subject.ilri -m 203 -t correct -d
</code></pre><ul>
<li>But I have not run it on CGSpace yet because we want to ask Peter if he is sure about it&hellip;</li>
<li>Send a message to Macaroni Bros to ask them about their Drupal module and its readiness for DSpace 6 UUIDs</li>
@ -179,16 +179,16 @@ $ ~/dspace63/bin/dspace solr-upgrade-statistics-6x
</ul>
<pre tabindex="0"><code>$ ./run.sh -s http://localhost:8081/solr/statistics-2010 -a export -o /tmp/statistics-2010.json -k uid
$ ./run.sh -s http://localhost:8081/solr/statistics -a import -o /tmp/statistics-2010.json -k uid
$ curl -s &quot;http://localhost:8081/solr/statistics-2010/update?softCommit=true&quot; -H &quot;Content-Type: text/xml&quot; --data-binary &quot;&lt;delete&gt;&lt;query&gt;time:2010*&lt;/query&gt;&lt;/delete&gt;&quot;
$ curl -s &#34;http://localhost:8081/solr/statistics-2010/update?softCommit=true&#34; -H &#34;Content-Type: text/xml&#34; --data-binary &#34;&lt;delete&gt;&lt;query&gt;time:2010*&lt;/query&gt;&lt;/delete&gt;&#34;
$ ./run.sh -s http://localhost:8081/solr/statistics-2011 -a export -o /tmp/statistics-2011.json -k uid
$ ./run.sh -s http://localhost:8081/solr/statistics -a import -o /tmp/statistics-2011.json -k uid
$ curl -s &quot;http://localhost:8081/solr/statistics-2011/update?softCommit=true&quot; -H &quot;Content-Type: text/xml&quot; --data-binary &quot;&lt;delete&gt;&lt;query&gt;time:2011*&lt;/query&gt;&lt;/delete&gt;&quot;
$ curl -s &#34;http://localhost:8081/solr/statistics-2011/update?softCommit=true&#34; -H &#34;Content-Type: text/xml&#34; --data-binary &#34;&lt;delete&gt;&lt;query&gt;time:2011*&lt;/query&gt;&lt;/delete&gt;&#34;
$ ./run.sh -s http://localhost:8081/solr/statistics -a import -o /tmp/statistics-2012.json -k uid
$ curl -s 'http://localhost:8081/solr/statistics/select?q=time:2012*&amp;rows=0&amp;wt=json&amp;indent=true' | grep numFound
&quot;response&quot;:{&quot;numFound&quot;:3761989,&quot;start&quot;:0,&quot;docs&quot;:[]
$ curl -s 'http://localhost:8081/solr/statistics-2012/select?q=time:2012*&amp;rows=0&amp;wt=json&amp;indent=true' | grep numFound
&quot;response&quot;:{&quot;numFound&quot;:3761989,&quot;start&quot;:0,&quot;docs&quot;:[]
$ curl -s &quot;http://localhost:8081/solr/statistics-2012/update?softCommit=true&quot; -H &quot;Content-Type: text/xml&quot; --data-binary &quot;&lt;delete&gt;&lt;query&gt;time:2012*&lt;/query&gt;&lt;/delete&gt;&quot;
$ curl -s &#39;http://localhost:8081/solr/statistics/select?q=time:2012*&amp;rows=0&amp;wt=json&amp;indent=true&#39; | grep numFound
&#34;response&#34;:{&#34;numFound&#34;:3761989,&#34;start&#34;:0,&#34;docs&#34;:[]
$ curl -s &#39;http://localhost:8081/solr/statistics-2012/select?q=time:2012*&amp;rows=0&amp;wt=json&amp;indent=true&#39; | grep numFound
&#34;response&#34;:{&#34;numFound&#34;:3761989,&#34;start&#34;:0,&#34;docs&#34;:[]
$ curl -s &#34;http://localhost:8081/solr/statistics-2012/update?softCommit=true&#34; -H &#34;Content-Type: text/xml&#34; --data-binary &#34;&lt;delete&gt;&lt;query&gt;time:2012*&lt;/query&gt;&lt;/delete&gt;&#34;
</code></pre><ul>
<li>I will do this for as many cores as I can (disk space limited) and then monitor the effect on the system and JVM memory usage
<ul>
@ -196,7 +196,7 @@ $ curl -s &quot;http://localhost:8081/solr/statistics-2012/update?softCommit=tru
</ul>
</li>
</ul>
<pre tabindex="0"><code>$ ./run.sh -s http://localhost:8081/solr/statistics-2014 -a export -o /tmp/statistics-2014-1.json -k uid -f 'time:/2014-0[1-6].*/'
<pre tabindex="0"><code>$ ./run.sh -s http://localhost:8081/solr/statistics-2014 -a export -o /tmp/statistics-2014-1.json -k uid -f &#39;time:/2014-0[1-6].*/&#39;
</code></pre><ul>
<li>Upgrade PostgreSQL from 9.6 to 10 on DSpace Test (linode19)
<ul>
@ -213,7 +213,7 @@ $ curl -s &quot;http://localhost:8081/solr/statistics-2012/update?softCommit=tru
# pg_dropcluster 10 main
# pg_upgradecluster 9.6 main
# pg_dropcluster 9.6 main
# dpkg -l | grep postgresql | grep 9.6 | awk '{print $2}' | xargs dpkg -r
# dpkg -l | grep postgresql | grep 9.6 | awk &#39;{print $2}&#39; | xargs dpkg -r
</code></pre><h2 id="2020-03-09">2020-03-09</h2>
<ul>
<li>Peter noticed that the Solr stats were not showing anything before 2020
@ -250,7 +250,7 @@ $ curl -s &quot;http://localhost:8081/solr/statistics-2012/update?softCommit=tru
<li>In Solr the IP is 127.0.0.1, but in the nginx logs I can luckily see the real IP (64.225.40.66), which is on Digital Ocean</li>
<li>I will purge them from Solr statistics:</li>
</ul>
<pre tabindex="0"><code>$ curl -s &quot;http://localhost:8081/solr/statistics/update?softCommit=true&quot; -H &quot;Content-Type: text/xml&quot; --data-binary '&lt;delete&gt;&lt;query&gt;userAgent:&quot;Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 (.NET CLR 3.5.30729)&quot;&lt;/query&gt;&lt;/delete&gt;'
<pre tabindex="0"><code>$ curl -s &#34;http://localhost:8081/solr/statistics/update?softCommit=true&#34; -H &#34;Content-Type: text/xml&#34; --data-binary &#39;&lt;delete&gt;&lt;query&gt;userAgent:&#34;Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 (.NET CLR 3.5.30729)&#34;&lt;/query&gt;&lt;/delete&gt;&#39;
</code></pre><ul>
<li>Another user agent that seems to be a bot is:</li>
</ul>
@ -258,14 +258,14 @@ $ curl -s &quot;http://localhost:8081/solr/statistics-2012/update?softCommit=tru
</code></pre><ul>
<li>In Solr the IP is 127.0.0.1 because of the misconfiguration, but in nginx&rsquo;s logs I see it belongs to three IPs on Online.net in France:</li>
</ul>
<pre tabindex="0"><code># zcat /var/log/nginx/access.log.*.gz /var/log/nginx/rest.log.*.gz | grep 'Mozilla/5.0 ((Windows; U; Windows NT 6.1; fr; rv:1.9.2) Gecko/20100115 Firefox/3.6)' | awk '{print $1}' | sort | uniq -c
<pre tabindex="0"><code># zcat /var/log/nginx/access.log.*.gz /var/log/nginx/rest.log.*.gz | grep &#39;Mozilla/5.0 ((Windows; U; Windows NT 6.1; fr; rv:1.9.2) Gecko/20100115 Firefox/3.6)&#39; | awk &#39;{print $1}&#39; | sort | uniq -c
63090 163.172.68.99
183428 163.172.70.248
147608 163.172.71.24
</code></pre><ul>
<li>It is making 10,000 to 40,000 requests to XMLUI per day&hellip;</li>
</ul>
<pre tabindex="0"><code># zgrep -c 'Mozilla/5.0 ((Windows; U; Windows NT 6.1; fr; rv:1.9.2) Gecko/20100115 Firefox/3.6)' /var/log/nginx/access.log.{1..9}
<pre tabindex="0"><code># zgrep -c &#39;Mozilla/5.0 ((Windows; U; Windows NT 6.1; fr; rv:1.9.2) Gecko/20100115 Firefox/3.6)&#39; /var/log/nginx/access.log.{1..9}
/var/log/nginx/access.log.30.gz:18687
/var/log/nginx/access.log.31.gz:28936
/var/log/nginx/access.log.32.gz:36402
@ -284,7 +284,7 @@ $ curl -s &quot;http://localhost:8081/solr/statistics-2012/update?softCommit=tru
</code></pre><ul>
<li>I will purge those hits too!</li>
</ul>
<pre tabindex="0"><code>$ curl -s &quot;http://localhost:8081/solr/statistics/update?softCommit=true&quot; -H &quot;Content-Type: text/xml&quot; --data-binary '&lt;delete&gt;&lt;query&gt;userAgent:&quot;Mozilla/5.0 ((Windows; U; Windows NT 6.1; fr; rv:1.9.2) Gecko/20100115 Firefox/3.6)&quot;&lt;/query&gt;&lt;/delete&gt;'
<pre tabindex="0"><code>$ curl -s &#34;http://localhost:8081/solr/statistics/update?softCommit=true&#34; -H &#34;Content-Type: text/xml&#34; --data-binary &#39;&lt;delete&gt;&lt;query&gt;userAgent:&#34;Mozilla/5.0 ((Windows; U; Windows NT 6.1; fr; rv:1.9.2) Gecko/20100115 Firefox/3.6)&#34;&lt;/query&gt;&lt;/delete&gt;&#39;
</code></pre><ul>
<li>Shit, and something happened and a few thousand hits from user agents with &ldquo;Bot&rdquo; in their user agent got through
<ul>
@ -348,7 +348,7 @@ Purging 62 hits from [Ss]pider in statistics
</ul>
<pre tabindex="0"><code>dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE resource_type_id = 2 AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2020-03-12-affiliations.csv WITH CSV HEADER;`
dspace=# \q
$ csvcut -l -c 0 /tmp/2020-03-12-affiliations.csv | sed -e 's/^line_number/id/' -e 's/text_value/name/' &gt; /tmp/affiliations.csv
$ csvcut -l -c 0 /tmp/2020-03-12-affiliations.csv | sed -e &#39;s/^line_number/id/&#39; -e &#39;s/text_value/name/&#39; &gt; /tmp/affiliations.csv
$ lein run /tmp/affiliations.csv name id
</code></pre><ul>
<li>I always forget how to copy the reconciled values in OpenRefine, but you need to make a new column and populate it using this GREL: <code>if(cell.recon.matched, cell.recon.match.name, value)</code></li>
@ -417,7 +417,7 @@ $ lein run /tmp/affiliations.csv name id
<li>Update Tomcat to version 7.0.103 in the Ansible infrastrcutrue playbooks and deploy on DSpace Test (linode26)</li>
<li>Maria sent me a few new ORCID identifiers from Bioversity so I combined them with our existing ones, filtered the unique ones, and then resolved their names using my <code>resolve-orcids.py</code> script:</li>
</ul>
<pre tabindex="0"><code>$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/bioversity-orcids | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq &gt; /tmp/2020-03-26-combined-orcids.txt
<pre tabindex="0"><code>$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/bioversity-orcids | grep -oE &#39;[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}&#39; | sort | uniq &gt; /tmp/2020-03-26-combined-orcids.txt
$ ./resolve-orcids.py -i /tmp/2020-03-26-combined-orcids.txt -o /tmp/2020-03-26-combined-names.txt -d
# sort names, copy to cg-creator-id.xml, add XML formatting, and then format with tidy (preserving accents)
$ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-id.xml
@ -425,16 +425,16 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
<li>I checked the database for likely matches to the author name and then created a CSV with the author names and ORCID iDs:</li>
</ul>
<pre tabindex="0"><code>dc.contributor.author,cg.creator.id
&quot;King, Brian&quot;,&quot;Brian King: 0000-0002-7056-9214&quot;
&quot;Ortiz-Crespo, Berta&quot;,&quot;Berta Ortiz-Crespo: 0000-0002-6664-0815&quot;
&quot;Ekesa, Beatrice&quot;,&quot;Beatrice Ekesa: 0000-0002-2630-258X&quot;
&quot;Ekesa, B.&quot;,&quot;Beatrice Ekesa: 0000-0002-2630-258X&quot;
&quot;Ekesa, B.N.&quot;,&quot;Beatrice Ekesa: 0000-0002-2630-258X&quot;
&quot;Gullotta, G.&quot;,&quot;Gaia Gullotta: 0000-0002-2240-3869&quot;
&#34;King, Brian&#34;,&#34;Brian King: 0000-0002-7056-9214&#34;
&#34;Ortiz-Crespo, Berta&#34;,&#34;Berta Ortiz-Crespo: 0000-0002-6664-0815&#34;
&#34;Ekesa, Beatrice&#34;,&#34;Beatrice Ekesa: 0000-0002-2630-258X&#34;
&#34;Ekesa, B.&#34;,&#34;Beatrice Ekesa: 0000-0002-2630-258X&#34;
&#34;Ekesa, B.N.&#34;,&#34;Beatrice Ekesa: 0000-0002-2630-258X&#34;
&#34;Gullotta, G.&#34;,&#34;Gaia Gullotta: 0000-0002-2240-3869&#34;
</code></pre><ul>
<li>Running the <code>add-orcid-identifiers-csv.py</code> script I added 32 ORCID iDs to items on CGSpace!</li>
</ul>
<pre tabindex="0"><code>$ ./add-orcid-identifiers-csv.py -i /tmp/2020-03-26-ciat-orcids.csv -db dspace -u dspace -p 'fuuu'
<pre tabindex="0"><code>$ ./add-orcid-identifiers-csv.py -i /tmp/2020-03-26-ciat-orcids.csv -db dspace -u dspace -p &#39;fuuu&#39;
</code></pre><ul>
<li>Udana from IWMI asked about some items that are missing Altmetric donuts on CGSpace
<ul>
@ -447,13 +447,13 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
</ul>
<h2 id="2020-03-29">2020-03-29</h2>
<ul>
<li>Add two more Bioversity ORCID iDs to CGSpace and then tag ~70 of the authors' existing publications in the database using this CSV with my <code>add-orcid-identifiers-csv.py</code> script:</li>
<li>Add two more Bioversity ORCID iDs to CGSpace and then tag ~70 of the authors&rsquo; existing publications in the database using this CSV with my <code>add-orcid-identifiers-csv.py</code> script:</li>
</ul>
<pre tabindex="0"><code>dc.contributor.author,cg.creator.id
&quot;Snook, L.K.&quot;,&quot;Laura Snook: 0000-0002-9168-1301&quot;
&quot;Snook, L.&quot;,&quot;Laura Snook: 0000-0002-9168-1301&quot;
&quot;Zheng, S.J.&quot;,&quot;Sijun Zheng: 0000-0003-1550-3738&quot;
&quot;Zheng, S.&quot;,&quot;Sijun Zheng: 0000-0003-1550-3738&quot;
&#34;Snook, L.K.&#34;,&#34;Laura Snook: 0000-0002-9168-1301&#34;
&#34;Snook, L.&#34;,&#34;Laura Snook: 0000-0002-9168-1301&#34;
&#34;Zheng, S.J.&#34;,&#34;Sijun Zheng: 0000-0003-1550-3738&#34;
&#34;Zheng, S.&#34;,&#34;Sijun Zheng: 0000-0003-1550-3738&#34;
</code></pre><ul>
<li>Deploy latest Bioversity and CIAT updates on CGSpace (linode18) and DSpace Test (linode26)</li>
<li>Deploy latest Ansible infrastructure playbooks on CGSpace and DSpace Test to get the latest dspace-statistics-api (v1.1.1) and Tomcat (7.0.103) versions</li>