mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2022-03-04
This commit is contained in:
@ -32,7 +32,7 @@ So far we’ve spent at least fifty hours to process the statistics and stat
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.92.2" />
|
||||
<meta name="generator" content="Hugo 0.93.1" />
|
||||
|
||||
|
||||
|
||||
@ -150,8 +150,8 @@ So far we’ve spent at least fifty hours to process the statistics and stat
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i 2020-11-05-fix-862-affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -t 'correct' -m 211
|
||||
$ ./delete-metadata-values.py -i 2020-11-05-delete-29-affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -m 211
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i 2020-11-05-fix-862-affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -t 'correct' -m 211
|
||||
$ ./delete-metadata-values.py -i 2020-11-05-delete-29-affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -m 211
|
||||
</code></pre><ul>
|
||||
<li>Then I started a Discovery re-index on CGSpace:</li>
|
||||
</ul>
|
||||
@ -191,7 +191,7 @@ sys 2m26.931s
|
||||
<li>Since I was going to restart CGSpace and update the Discovery indexes anyways I decided to check for any straggling upper case AGROVOC entries and lower case them:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>dspace=# BEGIN;
|
||||
dspace=# UPDATE metadatavalue SET text_value=LOWER(text_value) WHERE resource_type_id=2 AND metadata_field_id=57 AND text_value ~ '[[:upper:]]';
|
||||
dspace=# UPDATE metadatavalue SET text_value=LOWER(text_value) WHERE resource_type_id=2 AND metadata_field_id=57 AND text_value ~ '[[:upper:]]';
|
||||
UPDATE 164
|
||||
dspace=# COMMIT;
|
||||
</code></pre><ul>
|
||||
@ -314,8 +314,8 @@ $ git checkout origin/6_x-dev-atmire-modules
|
||||
$ npm install -g yarn
|
||||
$ chrt -b 0 mvn -U -Dmirage2.on=true -Dmirage2.deps.included=false -P \!dspace-lni,\!dspace-rdf,\!dspace-sword,\!dspace-swordv2,\!dspace-jspui clean package
|
||||
$ sudo su - postgres
|
||||
$ psql dspace -c 'CREATE EXTENSION pgcrypto;'
|
||||
$ psql dspace -c "DELETE FROM schema_version WHERE version IN ('5.8.2015.12.03.3');"
|
||||
$ psql dspace -c 'CREATE EXTENSION pgcrypto;'
|
||||
$ psql dspace -c "DELETE FROM schema_version WHERE version IN ('5.8.2015.12.03.3');"
|
||||
$ exit
|
||||
$ rm -rf /home/cgspace/config/spring
|
||||
$ ant update
|
||||
@ -338,7 +338,7 @@ $ sudo systemctl start tomcat7
|
||||
# pg_upgradecluster 9.6 main
|
||||
# pg_dropcluster 9.6 main
|
||||
# systemctl start postgresql
|
||||
# dpkg -l | grep postgresql | grep 9.6 | awk '{print $2}' | xargs dpkg -r
|
||||
# dpkg -l | grep postgresql | grep 9.6 | awk '{print $2}' | xargs dpkg -r
|
||||
</code></pre><ul>
|
||||
<li>Then I ran all system updates and rebooted the server…</li>
|
||||
<li>After the server came back up I re-ran the Ansible playbook to make sure all configs and services were updated</li>
|
||||
@ -372,13 +372,13 @@ Error sending email:
|
||||
<li>I copied the <code>mail.extraproperties = mail.smtp.starttls.enable=true</code> setting from the old DSpace 5 <code>dspace.cfg</code> and now the emails are working</li>
|
||||
<li>After the Discovery indexing finished I started processing the Solr stats one core and 2.5 million records at a time:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx2048m'
|
||||
<pre tabindex="0"><code>$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx2048m'
|
||||
$ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics
|
||||
</code></pre><ul>
|
||||
<li>After about 6,000,000 records I got the same error that I’ve gotten every time I test this migration process:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>Exception: Error while creating field 'p_group_id{type=uuid,properties=indexed,stored,multiValued}' from value '10'
|
||||
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error while creating field 'p_group_id{type=uuid,properties=indexed,stored,multiValued}' from value '10'
|
||||
<pre tabindex="0"><code>Exception: Error while creating field 'p_group_id{type=uuid,properties=indexed,stored,multiValued}' from value '10'
|
||||
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error while creating field 'p_group_id{type=uuid,properties=indexed,stored,multiValued}' from value '10'
|
||||
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552)
|
||||
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
|
||||
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
|
||||
@ -407,7 +407,7 @@ org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error whil
|
||||
<ul>
|
||||
<li>There are almost 1,500 locks:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
|
||||
<pre tabindex="0"><code>$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
|
||||
1494
|
||||
</code></pre><ul>
|
||||
<li>I sent a mail to the dspace-tech mailing list to ask for help…
|
||||
@ -454,8 +454,8 @@ java.lang.OutOfMemoryError: Java heap space
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>Exception: Error while creating field 'p_group_id{type=uuid,properties=indexed,stored,multiValued}' from value '10'
|
||||
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error while creating field 'p_group_id{type=uuid,properties=indexed,stored,multiValued}' from value '10'
|
||||
<pre tabindex="0"><code>Exception: Error while creating field 'p_group_id{type=uuid,properties=indexed,stored,multiValued}' from value '10'
|
||||
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error while creating field 'p_group_id{type=uuid,properties=indexed,stored,multiValued}' from value '10'
|
||||
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552)
|
||||
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
|
||||
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
|
||||
@ -486,7 +486,7 @@ org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error whil
|
||||
<ul>
|
||||
<li>There are over 2,000 locks:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
|
||||
<pre tabindex="0"><code>$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
|
||||
2071
|
||||
</code></pre><h2 id="2020-11-18">2020-11-18</h2>
|
||||
<ul>
|
||||
@ -603,7 +603,7 @@ org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error whil
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>dspace=# \COPY (SELECT item_id,uuid FROM item WHERE in_archive='t' AND withdrawn='f' AND item_id IS NOT NULL) TO /tmp/2020-11-22-item-id2uuid.csv WITH CSV HEADER;
|
||||
<pre tabindex="0"><code>dspace=# \COPY (SELECT item_id,uuid FROM item WHERE in_archive='t' AND withdrawn='f' AND item_id IS NOT NULL) TO /tmp/2020-11-22-item-id2uuid.csv WITH CSV HEADER;
|
||||
COPY 87411
|
||||
</code></pre><ul>
|
||||
<li>Saving some notes I wrote down about faceting by community and collection in Solr, for potential use in the future in the DSpace Statistics API</li>
|
||||
@ -688,11 +688,11 @@ COPY 87411
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ xml sel -t -m '//value-pairs[@value-pairs-name="ilrisubject"]/pair/displayed-value/text()' -c '.' -n dspace/config/input-forms.xml
|
||||
<pre tabindex="0"><code>$ xml sel -t -m '//value-pairs[@value-pairs-name="ilrisubject"]/pair/displayed-value/text()' -c '.' -n dspace/config/input-forms.xml
|
||||
</code></pre><ul>
|
||||
<li>IWMI sent me a few new ORCID identifiers so I combined them with our existing ones as well as another ILRI one that Tezira asked me to update, filtered the unique ones, and then resolved their names using my <code>resolve-orcids.py</code> script:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/iwmi-orcids.txt /tmp/hung.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq > /tmp/2020-11-30-combined-orcids.txt
|
||||
<pre tabindex="0"><code>$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/iwmi-orcids.txt /tmp/hung.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq > /tmp/2020-11-30-combined-orcids.txt
|
||||
$ ./resolve-orcids.py -i /tmp/2020-11-30-combined-orcids.txt -o /tmp/2020-11-30-combined-orcids-names.txt -d
|
||||
# sort names, copy to cg-creator-id.xml, add XML formatting, and then format with tidy (preserving accents)
|
||||
$ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-id.xml
|
||||
@ -701,15 +701,15 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ cat 2020-11-30-fix-hung-orcid.csv
|
||||
cg.creator.id,correct
|
||||
"Hung Nguyen-Viet: 0000-0001-9877-0596","Hung Nguyen-Viet: 0000-0003-1549-2733"
|
||||
"Adriana Tofiño: 0000-0001-7115-7169","Adriana Tofiño Rivera: 0000-0001-7115-7169"
|
||||
"Cristhian Puerta Rodriguez: 0000-0001-5992-1697","David Puerta: 0000-0001-5992-1697"
|
||||
"Ermias Betemariam: 0000-0002-1955-6995","Ermias Aynekulu: 0000-0002-1955-6995"
|
||||
"Hirut Betaw: 0000-0002-1205-3711","Betaw Hirut: 0000-0002-1205-3711"
|
||||
"Megan Zandstra: 0000-0002-3326-6492","Megan McNeil Zandstra: 0000-0002-3326-6492"
|
||||
"Tolu Eyinla: 0000-0003-1442-4392","Toluwalope Emmanuel: 0000-0003-1442-4392"
|
||||
"VInay Nangia: 0000-0001-5148-8614","Vinay Nangia: 0000-0001-5148-8614"
|
||||
$ ./fix-metadata-values.py -i 2020-11-30-fix-hung-orcid.csv -db dspace63 -u dspacetest -p 'dom@in34sniper' -f cg.creator.id -t 'correct' -m 240
|
||||
"Hung Nguyen-Viet: 0000-0001-9877-0596","Hung Nguyen-Viet: 0000-0003-1549-2733"
|
||||
"Adriana Tofiño: 0000-0001-7115-7169","Adriana Tofiño Rivera: 0000-0001-7115-7169"
|
||||
"Cristhian Puerta Rodriguez: 0000-0001-5992-1697","David Puerta: 0000-0001-5992-1697"
|
||||
"Ermias Betemariam: 0000-0002-1955-6995","Ermias Aynekulu: 0000-0002-1955-6995"
|
||||
"Hirut Betaw: 0000-0002-1205-3711","Betaw Hirut: 0000-0002-1205-3711"
|
||||
"Megan Zandstra: 0000-0002-3326-6492","Megan McNeil Zandstra: 0000-0002-3326-6492"
|
||||
"Tolu Eyinla: 0000-0003-1442-4392","Toluwalope Emmanuel: 0000-0003-1442-4392"
|
||||
"VInay Nangia: 0000-0001-5148-8614","Vinay Nangia: 0000-0001-5148-8614"
|
||||
$ ./fix-metadata-values.py -i 2020-11-30-fix-hung-orcid.csv -db dspace63 -u dspacetest -p 'dom@in34sniper' -f cg.creator.id -t 'correct' -m 240
|
||||
</code></pre><!-- raw HTML omitted -->
|
||||
|
||||
|
||||
|
Reference in New Issue
Block a user