Add notes for 2020-10-15

This commit is contained in:
2020-10-15 18:11:00 +03:00
parent ae2c5bd8f6
commit d44615936a
89 changed files with 225 additions and 115 deletions

View File

@ -23,7 +23,7 @@ During the FlywayDB migration I got an error:
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2020-10/" />
<meta property="article:published_time" content="2020-10-06T16:55:54+03:00" />
<meta property="article:modified_time" content="2020-10-12T17:53:24+03:00" />
<meta property="article:modified_time" content="2020-10-14T22:21:03+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="October, 2020"/>
@ -41,7 +41,7 @@ During the FlywayDB migration I got an error:
"/>
<meta name="generator" content="Hugo 0.76.4" />
<meta name="generator" content="Hugo 0.76.5" />
@ -51,9 +51,9 @@ During the FlywayDB migration I got an error:
"@type": "BlogPosting",
"headline": "October, 2020",
"url": "https://alanorth.github.io/cgspace-notes/2020-10/",
"wordCount": "2381",
"wordCount": "2831",
"datePublished": "2020-10-06T16:55:54+03:00",
"dateModified": "2020-10-12T17:53:24+03:00",
"dateModified": "2020-10-14T22:21:03+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -539,7 +539,66 @@ sys 2m22.713s
</ul>
</li>
</ul>
<!-- raw HTML omitted -->
<h2 id="2020-10-15">2020-10-15</h2>
<ul>
<li>Re-deploy latest code on both CGSpace and DSpace Test to get the input forms changes
<ul>
<li>Run system updates and reboot each server (linode18 and linode26)</li>
<li>I had to restart Tomcat seven times on CGSpace before all Solr stats cores came up OK</li>
</ul>
</li>
<li>Skype with Peter and Abenet about AReS and CGSpace
<ul>
<li>We agreed to lower case the AGROVOC subjects on CGSpace to make it harmonized with MELSpace and WorldFish</li>
<li>We agreed to separate the AGROVOC from the other center- and CRP-specific subjects so that the search and tag clouds are cleaner and more useful</li>
<li>We added a filter for journal title</li>
</ul>
</li>
<li>I enabled anonymous access to the &ldquo;Export search metadata&rdquo; option on DSpace Test
<ul>
<li>If I search for author containing &ldquo;Orth, Alan&rdquo; or &ldquo;Orth Alan&rdquo; the export search metadata returns HTTP 400</li>
<li>If I search for author containing &ldquo;Orth&rdquo; it exports a CSV properly&hellip;</li>
</ul>
</li>
<li>I created issues on the OpenRXV repository:
<ul>
<li><a href="https://github.com/ilri/OpenRXV/issues/42">Can&rsquo;t download templates that have spaces in their file name</a></li>
<li><a href="https://github.com/ilri/OpenRXV/issues/43">Can&rsquo;t search for text values with a space in &ldquo;Mapping Values&rdquo; interface</a></li>
</ul>
</li>
<li>Atmire responded about the Listings and Reports and Content and Usage Statistics issues with DSpace 6 that I reported last week
<ul>
<li>They said that the CUA issue was a mistake and should be fixed in a minor version bump</li>
<li>They asked me to confirm if the L&amp;R version bump from last week did not solve the issue there (which I had tested locally, but not on DSpace Test)</li>
<li>I will test them both again on DSpace Test and report back</li>
</ul>
</li>
<li>I posted a message on Yammer to inform all our users about the changes to countries, regions, and AGROVOC subjects</li>
<li>I modified all AGROVOC subjects to be lower case in PostgreSQL and then exported a list of the top 1500 to update the controlled vocabulary in our submission form:</li>
</ul>
<pre><code>dspace=&gt; BEGIN;
dspace=&gt; UPDATE metadatavalue SET text_value=LOWER(text_value) WHERE resource_type_id=2 AND metadata_field_id=57;
UPDATE 335063
dspace=&gt; COMMIT;
dspace=&gt; \COPY (SELECT DISTINCT text_value as &quot;dc.subject&quot;, count(text_value) FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=57 GROUP BY &quot;dc.subject&quot; ORDER BY count DESC LIMIT 1500) TO /tmp/2020-10-15-top-1500-agrovoc-subject.csv WITH CSV HEADER;
COPY 1500
</code></pre><ul>
<li>Use my <code>agrovoc-lookup.py</code> script to validate subject terms against the AGROVOC REST API, extract matches with <code>csvgrep</code>, and then update and format the controlled vocabulary:</li>
</ul>
<pre><code>$ csvcut -c 1 /tmp/2020-10-15-top-1500-agrovoc-subject.csv | tail -n 1500 &gt; /tmp/subjects.txt
$ ./agrovoc-lookup.py -i /tmp/subjects.txt -o /tmp/subjects.csv -d
$ csvgrep -c 4 -m 0 -i /tmp/subjects.csv | csvcut -c 1 | sed '1d' &gt; dspace/config/controlled-vocabularies/dc-subject.xml
# apply formatting in XML file
$ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/dc-subject.xml
</code></pre><ul>
<li>Then I started a full re-indexing on CGSpace:</li>
</ul>
<pre><code>$ time chrt -b 0 ionice -c2 -n7 nice -n19 dspace index-discovery -b
real 88m21.678s
user 7m59.182s
sys 2m22.713s
</code></pre><!-- raw HTML omitted -->