mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2021-09-13
This commit is contained in:
@ -46,7 +46,7 @@ After rebooting, all statistics cores were loaded… wow, that’s luck
|
||||
|
||||
Run system updates on DSpace Test (linode19) and reboot it
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.87.0" />
|
||||
<meta name="generator" content="Hugo 0.88.1" />
|
||||
|
||||
|
||||
|
||||
@ -194,7 +194,7 @@ Run system updates on DSpace Test (linode19) and reboot it
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre><code>or(
|
||||
<pre tabindex="0"><code>or(
|
||||
isNotNull(value.match(/^.*’.*$/)),
|
||||
isNotNull(value.match(/^.*é.*$/)),
|
||||
isNotNull(value.match(/^.*á.*$/)),
|
||||
@ -235,14 +235,14 @@ Run system updates on DSpace Test (linode19) and reboot it
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre><code># /opt/certbot-auto renew --standalone --pre-hook "/usr/bin/docker stop angular_nginx; /bin/systemctl stop firewalld" --post-hook "/bin/systemctl start firewalld; /usr/bin/docker start angular_nginx"
|
||||
<pre tabindex="0"><code># /opt/certbot-auto renew --standalone --pre-hook "/usr/bin/docker stop angular_nginx; /bin/systemctl stop firewalld" --post-hook "/bin/systemctl start firewalld; /usr/bin/docker start angular_nginx"
|
||||
</code></pre><ul>
|
||||
<li>It is important that the firewall starts back up before the Docker container or else Docker will complain about missing iptables chains</li>
|
||||
<li>Also, I updated to the latest TLS Intermediate settings as appropriate for Ubuntu 18.04’s <a href="https://ssl-config.mozilla.org/#server=nginx&server-version=1.16.0&config=intermediate&openssl-version=1.1.0g&hsts=false&ocsp=false">OpenSSL 1.1.0g with nginx 1.16.0</a></li>
|
||||
<li>Run all system updates on AReS dev server (linode20) and reboot it</li>
|
||||
<li>Get a list of all PDFs from the Bioversity migration that fail to download and save them so I can try again with a different path in the URL:</li>
|
||||
</ul>
|
||||
<pre><code>$ ./generate-thumbnails.py -i /tmp/2019-08-05-Bioversity-Migration.csv -w --url-field-name url -d | tee /tmp/2019-08-08-download-pdfs.txt
|
||||
<pre tabindex="0"><code>$ ./generate-thumbnails.py -i /tmp/2019-08-05-Bioversity-Migration.csv -w --url-field-name url -d | tee /tmp/2019-08-08-download-pdfs.txt
|
||||
$ grep -B1 "Download failed" /tmp/2019-08-08-download-pdfs.txt | grep "Downloading" | sed -e 's/> Downloading //' -e 's/\.\.\.//' | sed -r 's/\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[mGK]//g' | csvcut -H -c 1,1 > /tmp/user-upload.csv
|
||||
$ ./generate-thumbnails.py -i /tmp/user-upload.csv -w --url-field-name url -d | tee /tmp/2019-08-08-download-pdfs2.txt
|
||||
$ grep -B1 "Download failed" /tmp/2019-08-08-download-pdfs2.txt | grep "Downloading" | sed -e 's/> Downloading //' -e 's/\.\.\.//' | sed -r 's/\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[mGK]//g' | csvcut -H -c 1,1 > /tmp/user-upload2.csv
|
||||
@ -277,7 +277,7 @@ $ ./generate-thumbnails.py -i /tmp/user-upload2.csv -w --url-field-name url -d |
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre><code>proxy_set_header Host dev.ares.codeobia.com;
|
||||
<pre tabindex="0"><code>proxy_set_header Host dev.ares.codeobia.com;
|
||||
</code></pre><ul>
|
||||
<li>Though I am really wondering why this happened now, because the configuration has been working for months…</li>
|
||||
<li>Improve the output of the suspicious characters check in <a href="https://github.com/alanorth/csv-metadata-quality">csv-metadata-quality</a> script and tag version 0.2.0</li>
|
||||
@ -329,7 +329,7 @@ $ ./generate-thumbnails.py -i /tmp/user-upload2.csv -w --url-field-name url -d |
|
||||
<ul>
|
||||
<li>Create a test user on DSpace Test for Mohammad Salem to attempt depositing:</li>
|
||||
</ul>
|
||||
<pre><code>$ dspace user -a -m blah@blah.com -g Mohammad -s Salem -p 'domoamaaa'
|
||||
<pre tabindex="0"><code>$ dspace user -a -m blah@blah.com -g Mohammad -s Salem -p 'domoamaaa'
|
||||
</code></pre><ul>
|
||||
<li>Create and merge a pull request (<a href="https://github.com/ilri/DSpace/pull/429">#429</a>) to add eleven new CCAFS Phase II Project Tags to CGSpace</li>
|
||||
<li>Atmire responded to the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=685">Solr cores issue</a> last week, but they could not reproduce the issue
|
||||
@ -339,13 +339,13 @@ $ ./generate-thumbnails.py -i /tmp/user-upload2.csv -w --url-field-name url -d |
|
||||
</li>
|
||||
<li>Testing an import of 1,429 Bioversity items (metadata only) on my local development machine and got an error with Java memory after about 1,000 items:</li>
|
||||
</ul>
|
||||
<pre><code>$ ~/dspace/bin/dspace metadata-import -f /tmp/bioversity.csv -e blah@blah.com
|
||||
<pre tabindex="0"><code>$ ~/dspace/bin/dspace metadata-import -f /tmp/bioversity.csv -e blah@blah.com
|
||||
...
|
||||
java.lang.OutOfMemoryError: GC overhead limit exceeded
|
||||
</code></pre><ul>
|
||||
<li>I increased the heap size to 1536m and tried again:</li>
|
||||
</ul>
|
||||
<pre><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1536m"
|
||||
<pre tabindex="0"><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1536m"
|
||||
$ ~/dspace/bin/dspace metadata-import -f /tmp/bioversity.csv -e blah@blah.com
|
||||
</code></pre><ul>
|
||||
<li>This time it succeeded, and using VisualVM I noticed that the import process used a maximum of 620MB of RAM</li>
|
||||
@ -361,7 +361,7 @@ $ ~/dspace/bin/dspace metadata-import -f /tmp/bioversity.csv -e blah@blah.com
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre><code>$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx512m'
|
||||
<pre tabindex="0"><code>$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx512m'
|
||||
$ dspace metadata-import -f /tmp/bioversity1.csv -e blah@blah.com
|
||||
$ dspace metadata-import -f /tmp/bioversity2.csv -e blah@blah.com
|
||||
</code></pre><ul>
|
||||
@ -377,7 +377,7 @@ $ dspace metadata-import -f /tmp/bioversity2.csv -e blah@blah.com
|
||||
<li>Deploy Tomcat 7.0.96 and PostgreSQL JDBC 42.2.6 driver on CGSpace (linde18)</li>
|
||||
<li>After restarting Tomcat one of the Solr statistics cores failed to start up:</li>
|
||||
</ul>
|
||||
<pre><code>statistics-2015: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error opening new searcher
|
||||
<pre tabindex="0"><code>statistics-2015: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error opening new searcher
|
||||
</code></pre><ul>
|
||||
<li>I decided to run all system updates on the server and reboot it</li>
|
||||
<li>After reboot the statistics-2018 core failed to load so I restarted <code>tomcat7</code> again</li>
|
||||
@ -393,7 +393,7 @@ $ dspace metadata-import -f /tmp/bioversity2.csv -e blah@blah.com
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre><code>import os
|
||||
<pre tabindex="0"><code>import os
|
||||
|
||||
return os.path.basename(value)
|
||||
</code></pre><ul>
|
||||
@ -429,7 +429,7 @@ return os.path.basename(value)
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre><code>$ ./fix-metadata-values.py -i ~/Downloads/2019-08-26-Peter-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -m 3 -t correct
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i ~/Downloads/2019-08-26-Peter-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -m 3 -t correct
|
||||
</code></pre><ul>
|
||||
<li>Apply the corrections on CGSpace and DSpace Test
|
||||
<ul>
|
||||
@ -437,7 +437,7 @@ return os.path.basename(value)
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre><code>$ time schedtool -B -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
||||
<pre tabindex="0"><code>$ time schedtool -B -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
||||
|
||||
real 81m47.057s
|
||||
user 8m5.265s
|
||||
@ -478,21 +478,21 @@ sys 2m24.715s
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/2019-08-28-all-authors.csv with csv header;
|
||||
<pre tabindex="0"><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/2019-08-28-all-authors.csv with csv header;
|
||||
COPY 65597
|
||||
</code></pre><ul>
|
||||
<li>Then I created a new CSV with two author columns (edit title of second column after):</li>
|
||||
</ul>
|
||||
<pre><code>$ csvcut -c dc.contributor.author,dc.contributor.author /tmp/2019-08-28-all-authors.csv > /tmp/all-authors.csv
|
||||
<pre tabindex="0"><code>$ csvcut -c dc.contributor.author,dc.contributor.author /tmp/2019-08-28-all-authors.csv > /tmp/all-authors.csv
|
||||
</code></pre><ul>
|
||||
<li>Then I ran my script on the new CSV, skipping one of the author columns:</li>
|
||||
</ul>
|
||||
<pre><code>$ csv-metadata-quality -u -i /tmp/all-authors.csv -o /tmp/authors.csv -x dc.contributor.author
|
||||
<pre tabindex="0"><code>$ csv-metadata-quality -u -i /tmp/all-authors.csv -o /tmp/authors.csv -x dc.contributor.author
|
||||
</code></pre><ul>
|
||||
<li>This fixed a bunch of issues with spaces, commas, unneccesary Unicode characters, etc</li>
|
||||
<li>Then I ran the corrections on my test server and there were 185 of them!</li>
|
||||
</ul>
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -m 3 -t correctauthor
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -m 3 -t correctauthor
|
||||
</code></pre><ul>
|
||||
<li>I very well might run these on CGSpace soon…</li>
|
||||
</ul>
|
||||
@ -506,7 +506,7 @@ COPY 65597
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre><code>$ find dspace/modules/xmlui-mirage2/src/main/webapp/themes -iname "*.xsl" -exec ./cgcore-xsl-replacements.sed {} \;
|
||||
<pre tabindex="0"><code>$ find dspace/modules/xmlui-mirage2/src/main/webapp/themes -iname "*.xsl" -exec ./cgcore-xsl-replacements.sed {} \;
|
||||
</code></pre><ul>
|
||||
<li>I think I got everything in the XMLUI themes, but there may be some things I should check once I get a deployment up and running:
|
||||
<ul>
|
||||
@ -526,7 +526,7 @@ COPY 65597
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre><code>"handles":["10986/30568","10568/97825"],"handle":"10986/30568"
|
||||
<pre tabindex="0"><code>"handles":["10986/30568","10568/97825"],"handle":"10986/30568"
|
||||
</code></pre><ul>
|
||||
<li>So this is the same issue we had before, where Altmetric <em>knows</em> this Handle is associated with a DOI that has a score, but the client-side JavaScript code doesn’t show it because it seems to a secondary handle or something</li>
|
||||
</ul>
|
||||
@ -535,7 +535,7 @@ COPY 65597
|
||||
<li>Run system updates on DSpace Test (linode19) and reboot the server</li>
|
||||
<li>Run the author fixes on DSpace Test and CGSpace and start a full Discovery re-index:</li>
|
||||
</ul>
|
||||
<pre><code>$ time schedtool -B -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
||||
<pre tabindex="0"><code>$ time schedtool -B -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
||||
|
||||
real 90m47.967s
|
||||
user 8m12.826s
|
||||
|
Reference in New Issue
Block a user