Add notes for 2022-03-04

This commit is contained in:
2022-03-04 15:30:06 +03:00
parent 7453499827
commit 27acbac859
115 changed files with 6550 additions and 6444 deletions

View File

@ -34,7 +34,7 @@ I see that CGSpace (linode18) is still using PostgreSQL JDBC driver version 42.2
"/>
<meta name="generator" content="Hugo 0.92.2" />
<meta name="generator" content="Hugo 0.93.1" />
@ -166,7 +166,7 @@ I see that CGSpace (linode18) is still using PostgreSQL JDBC driver version 42.2
</ul>
</li>
</ul>
<pre tabindex="0"><code># cat /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;07/May/2020:(01|03|04)&quot; | goaccess --log-format=COMBINED -
<pre tabindex="0"><code># cat /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &#34;07/May/2020:(01|03|04)&#34; | goaccess --log-format=COMBINED -
</code></pre><ul>
<li>The two main IPs making requests around then are 188.134.31.88 and 212.34.8.188
<ul>
@ -211,9 +211,9 @@ Total number of bot hits purged: 192332
</ul>
<pre tabindex="0"><code>$ cat 2020-05-11-add-orcids.csv
dc.contributor.author,cg.creator.id
&quot;Lutakome, P.&quot;,&quot;Pius Lutakome: 0000-0002-0804-2649&quot;
&quot;Lutakome, Pius&quot;,&quot;Pius Lutakome: 0000-0002-0804-2649&quot;
$ ./add-orcid-identifiers-csv.py -i 2020-05-11-add-orcids.csv -db dspace -u dspace -p 'fuuu' -d
&#34;Lutakome, P.&#34;,&#34;Pius Lutakome: 0000-0002-0804-2649&#34;
&#34;Lutakome, Pius&#34;,&#34;Pius Lutakome: 0000-0002-0804-2649&#34;
$ ./add-orcid-identifiers-csv.py -i 2020-05-11-add-orcids.csv -db dspace -u dspace -p &#39;fuuu&#39; -d
</code></pre><ul>
<li>Run system updates on CGSpace (linode18) and reboot it
<ul>
@ -265,8 +265,8 @@ $ ./add-orcid-identifiers-csv.py -i 2020-05-11-add-orcids.csv -db dspace -u dspa
</ul>
<pre tabindex="0"><code>$ cat 2020-05-19-add-orcids.csv
dc.contributor.author,cg.creator.id
&quot;Bahta, Sirak T.&quot;,&quot;Sirak Bahta: 0000-0002-5728-2489&quot;
$ ./add-orcid-identifiers-csv.py -i 2020-05-19-add-orcids.csv -db dspace -u dspace -p 'fuuu' -d
&#34;Bahta, Sirak T.&#34;,&#34;Sirak Bahta: 0000-0002-5728-2489&#34;
$ ./add-orcid-identifiers-csv.py -i 2020-05-19-add-orcids.csv -db dspace -u dspace -p &#39;fuuu&#39; -d
</code></pre><ul>
<li>An IITA user is having issues submitting to CGSpace and I see there are a rising number of PostgreSQL connections waiting in transaction and in lock:</li>
</ul>
@ -300,9 +300,9 @@ $ ./add-orcid-identifiers-csv.py -i 2020-05-19-add-orcids.csv -db dspace -u dspa
</ul>
<pre tabindex="0"><code>$ cat 2020-05-25-add-orcids.csv
dc.contributor.author,cg.creator.id
&quot;Díaz, Manuel F.&quot;,&quot;Manuel Francisco Diaz Baca: 0000-0001-8996-5092&quot;
&quot;Díaz, Manuel Francisco&quot;,&quot;Manuel Francisco Diaz Baca: 0000-0001-8996-5092&quot;
$ ./add-orcid-identifiers-csv.py -i 2020-05-25-add-orcids.csv -db dspace -u dspace -p 'fuuu' -d
&#34;Díaz, Manuel F.&#34;,&#34;Manuel Francisco Diaz Baca: 0000-0001-8996-5092&#34;
&#34;Díaz, Manuel Francisco&#34;,&#34;Manuel Francisco Diaz Baca: 0000-0001-8996-5092&#34;
$ ./add-orcid-identifiers-csv.py -i 2020-05-25-add-orcids.csv -db dspace -u dspace -p &#39;fuuu&#39; -d
</code></pre><ul>
<li>Last week Maria asked again about searching for items by accession or issue date
<ul>
@ -327,7 +327,7 @@ $ ./add-orcid-identifiers-csv.py -i 2020-05-25-add-orcids.csv -db dspace -u dspa
</ul>
</li>
</ul>
<pre tabindex="0"><code># cat /var/log/nginx/*.log.1 | grep -E &quot;29/May/2020:(02|03|04|05)&quot; | goaccess --log-format=COMBINED -
<pre tabindex="0"><code># cat /var/log/nginx/*.log.1 | grep -E &#34;29/May/2020:(02|03|04|05)&#34; | goaccess --log-format=COMBINED -
</code></pre><ul>
<li>The top is 172.104.229.92, which is the AReS harvester (still not using a user agent, but it&rsquo;s tagged as a bot in the nginx mapping)</li>
<li>Second is 188.134.31.88, which is a Russian host that we also saw in the last few weeks, using a browser user agent and hitting the XMLUI (but it is tagged as a bot in nginx as well)</li>
@ -361,13 +361,13 @@ $ ./add-orcid-identifiers-csv.py -i 2020-05-25-add-orcids.csv -db dspace -u dspa
<pre tabindex="0"><code>$ sudo su - postgres
$ dropdb dspacetest
$ createdb -O dspacetest --encoding=UNICODE dspacetest
$ psql dspacetest -c 'alter user dspacetest superuser;'
$ psql dspacetest -c &#39;alter user dspacetest superuser;&#39;
$ pg_restore -d dspacetest -O --role=dspacetest /tmp/cgspace_2020-05-31.backup
$ psql dspacetest -c 'alter user dspacetest nosuperuser;'
$ psql dspacetest -c &#39;alter user dspacetest nosuperuser;&#39;
# run DSpace 5 version of update-sequences.sql!!!
$ psql -f /home/dspace/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest
$ psql dspacetest -c &quot;DELETE FROM schema_version WHERE version IN ('5.8.2015.12.03.3');&quot;
$ psql dspacetest -c 'CREATE EXTENSION pgcrypto;'
$ psql dspacetest -c &#34;DELETE FROM schema_version WHERE version IN (&#39;5.8.2015.12.03.3&#39;);&#34;
$ psql dspacetest -c &#39;CREATE EXTENSION pgcrypto;&#39;
$ exit
</code></pre><ul>
<li>Now switch to the DSpace 6.x branch and start a build:</li>
@ -391,7 +391,7 @@ $ ant update
<li>I had a mistake in my Solr internal URL parameter so DSpace couldn&rsquo;t find it, but once I fixed that DSpace starts up OK!</li>
<li>Once the initial Discovery reindexing was completed (after three hours or so!) I started the Solr statistics UUID migration:</li>
</ul>
<pre tabindex="0"><code>$ export JAVA_OPTS=&quot;-Xmx1024m -Dfile.encoding=UTF-8&quot;
<pre tabindex="0"><code>$ export JAVA_OPTS=&#34;-Xmx1024m -Dfile.encoding=UTF-8&#34;
$ dspace solr-upgrade-statistics-6x -i statistics -n 250000
$ dspace solr-upgrade-statistics-6x -i statistics -n 1000000
$ dspace solr-upgrade-statistics-6x -i statistics -n 1000000
@ -400,8 +400,8 @@ $ dspace solr-upgrade-statistics-6x -i statistics -n 1000000
<li>It&rsquo;s taking about 35 minutes for 1,000,000 records&hellip;</li>
<li>Some issues towards the end of this core:</li>
</ul>
<pre tabindex="0"><code>Exception: Error while creating field 'p_group_id{type=uuid,properties=indexed,stored,multiValued}' from value '10'
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error while creating field 'p_group_id{type=uuid,properties=indexed,stored,multiValued}' from value '10'
<pre tabindex="0"><code>Exception: Error while creating field &#39;p_group_id{type=uuid,properties=indexed,stored,multiValued}&#39; from value &#39;10&#39;
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error while creating field &#39;p_group_id{type=uuid,properties=indexed,stored,multiValued}&#39; from value &#39;10&#39;
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
@ -425,17 +425,17 @@ org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error whil
</ul>
</li>
</ul>
<pre tabindex="0"><code>$ ./run.sh -s http://localhost:8081/solr/statistics -a export -o statistics-unmigrated.json -k uid -f '(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)'
$ curl -s &quot;http://localhost:8081/solr/statistics/update?softCommit=true&quot; -H &quot;Content-Type: text/xml&quot; --data-binary &quot;&lt;delete&gt;&lt;query&gt;(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)&lt;/query&gt;&lt;/delete&gt;&quot;
<pre tabindex="0"><code>$ ./run.sh -s http://localhost:8081/solr/statistics -a export -o statistics-unmigrated.json -k uid -f &#39;(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)&#39;
$ curl -s &#34;http://localhost:8081/solr/statistics/update?softCommit=true&#34; -H &#34;Content-Type: text/xml&#34; --data-binary &#34;&lt;delete&gt;&lt;query&gt;(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)&lt;/query&gt;&lt;/delete&gt;&#34;
</code></pre><ul>
<li>Now the UUID conversion script says there is nothing left to convert, so I can try to run the Atmire CUA conversion utility:</li>
</ul>
<pre tabindex="0"><code>$ export JAVA_OPTS=&quot;-Xmx1024m -Dfile.encoding=UTF-8&quot;
<pre tabindex="0"><code>$ export JAVA_OPTS=&#34;-Xmx1024m -Dfile.encoding=UTF-8&#34;
$ dspace dsrun com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI -t 1
</code></pre><ul>
<li>The processing is very slow and there are lots of errors like this:</li>
</ul>
<pre tabindex="0"><code>Record uid: 7b5b3900-28e8-417f-9c1c-e7d88a753221 couldn't be processed
<pre tabindex="0"><code>Record uid: 7b5b3900-28e8-417f-9c1c-e7d88a753221 couldn&#39;t be processed
com.atmire.statistics.util.update.atomic.ProcessingException: something went wrong while processing record uid: 7b5b3900-28e8-417f-9c1c-e7d88a753221, an error occured in the com.atmire.statistics.util.update.atomic.processor.ContainerOwnerDBProcessor
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.applyProcessors(AtomicStatisticsUpdater.java:304)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.processRecords(AtomicStatisticsUpdater.java:176)