mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2021-09-13
This commit is contained in:
@ -36,7 +36,7 @@ I started processing those (about 411,000 records):
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.87.0" />
|
||||
<meta name="generator" content="Hugo 0.88.1" />
|
||||
|
||||
|
||||
|
||||
@ -132,7 +132,7 @@ I started processing those (about 411,000 records):
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre><code class="language-console" data-lang="console">$ chrt -b 0 dspace dsrun com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI -t 12 -c statistics-2015
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ chrt -b 0 dspace dsrun com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI -t 12 -c statistics-2015
|
||||
</code></pre><ul>
|
||||
<li>AReS went down when the <code>renew-letsencrypt</code> service stopped the <code>angular_nginx</code> container in the pre-update hook and failed to bring it back up
|
||||
<ul>
|
||||
@ -151,7 +151,7 @@ I started processing those (about 411,000 records):
|
||||
</li>
|
||||
<li>Start testing export/import of yearly Solr statistics data into the main statistics core on DSpace Test, for example:</li>
|
||||
</ul>
|
||||
<pre><code>$ ./run.sh -s http://localhost:8081/solr/statistics-2010 -a export -o statistics-2010.json -k uid
|
||||
<pre tabindex="0"><code>$ ./run.sh -s http://localhost:8081/solr/statistics-2010 -a export -o statistics-2010.json -k uid
|
||||
$ ./run.sh -s http://localhost:8081/solr/statistics -a import -o statistics-2010.json -k uid
|
||||
$ curl -s "http://localhost:8081/solr/statistics-2010/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:*</query></delete>"
|
||||
</code></pre><ul>
|
||||
@ -179,13 +179,13 @@ $ curl -s "http://localhost:8081/solr/statistics-2010/update?softCommit=tru
|
||||
<ul>
|
||||
<li>First the 2010 core:</li>
|
||||
</ul>
|
||||
<pre><code class="language-console" data-lang="console">$ chrt -b 0 ./run.sh -s http://localhost:8081/solr/statistics-2010 -a export -o statistics-2010.json -k uid
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ chrt -b 0 ./run.sh -s http://localhost:8081/solr/statistics-2010 -a export -o statistics-2010.json -k uid
|
||||
$ chrt -b 0 ./run.sh -s http://localhost:8081/solr/statistics -a import -o statistics-2010.json -k uid
|
||||
$ curl -s "http://localhost:8081/solr/statistics-2010/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:*</query></delete>"
|
||||
</code></pre><ul>
|
||||
<li>Judging by the DSpace logs all these cores had a problem starting up in the last month:</li>
|
||||
</ul>
|
||||
<pre><code class="language-console" data-lang="console"># grep -rsI "Unable to create core" [dspace]/log/dspace.log.2020-* | grep -o -E "statistics-[0-9]+" | sort | uniq -c
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console"># grep -rsI "Unable to create core" [dspace]/log/dspace.log.2020-* | grep -o -E "statistics-[0-9]+" | sort | uniq -c
|
||||
24 statistics-2010
|
||||
24 statistics-2015
|
||||
18 statistics-2016
|
||||
@ -193,7 +193,7 @@ $ curl -s "http://localhost:8081/solr/statistics-2010/update?softCommit=tru
|
||||
</code></pre><ul>
|
||||
<li>The message is always this:</li>
|
||||
</ul>
|
||||
<pre><code>org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error CREATEing SolrCore 'statistics-2016': Unable to create core [statistics-2016] Caused by: Lock obtain timed out: NativeFSLock@/[dspace]/solr/statistics-2016/data/index/write.lock
|
||||
<pre tabindex="0"><code>org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error CREATEing SolrCore 'statistics-2016': Unable to create core [statistics-2016] Caused by: Lock obtain timed out: NativeFSLock@/[dspace]/solr/statistics-2016/data/index/write.lock
|
||||
</code></pre><ul>
|
||||
<li>I will migrate all these cores and see if it makes a difference, then probably end up migrating all of them
|
||||
<ul>
|
||||
@ -223,7 +223,7 @@ $ curl -s "http://localhost:8081/solr/statistics-2010/update?softCommit=tru
|
||||
<ul>
|
||||
<li>There are apparently 1,700 locks right now:</li>
|
||||
</ul>
|
||||
<pre><code class="language-console" data-lang="console">$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
|
||||
1739
|
||||
</code></pre><h2 id="2020-12-08">2020-12-08</h2>
|
||||
<ul>
|
||||
@ -233,7 +233,7 @@ $ curl -s "http://localhost:8081/solr/statistics-2010/update?softCommit=tru
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre><code>Record uid: 64387815-d9a7-4605-8024-1c0a5c7520e0 couldn't be processed
|
||||
<pre tabindex="0"><code>Record uid: 64387815-d9a7-4605-8024-1c0a5c7520e0 couldn't be processed
|
||||
com.atmire.statistics.util.update.atomic.ProcessingException: something went wrong while processing record uid: 64387815-d9a7-4605-8024-1c0a5c7520e0, an error occured in the com.atmire.statistics.util.update.atomic.processor.DeduplicateValuesProcessor
|
||||
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.applyProcessors(SourceFile:304)
|
||||
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.processRecords(SourceFile:176)
|
||||
@ -270,7 +270,7 @@ Caused by: java.lang.UnsupportedOperationException
|
||||
<ul>
|
||||
<li>I was running the AtomicStatisticsUpdateCLI to remove duplicates on DSpace Test but it failed near the end of the statistics core (after 20 hours or so) with a memory error:</li>
|
||||
</ul>
|
||||
<pre><code>Successfully finished updating Solr Storage Reports | Wed Dec 09 15:25:11 CET 2020
|
||||
<pre tabindex="0"><code>Successfully finished updating Solr Storage Reports | Wed Dec 09 15:25:11 CET 2020
|
||||
Run 1 — 67% — 10,000/14,935 docs — 6m 6s — 6m 6s
|
||||
Exception: GC overhead limit exceeded
|
||||
java.lang.OutOfMemoryError: GC overhead limit exceeded
|
||||
@ -279,7 +279,7 @@ java.lang.OutOfMemoryError: GC overhead limit exceeded
|
||||
<li>I increased the JVM heap to 2048m and tried again, but it failed with a memory error again…</li>
|
||||
<li>I increased the JVM heap to 4096m and tried again, but it failed with another error:</li>
|
||||
</ul>
|
||||
<pre><code>Successfully finished updating Solr Storage Reports | Wed Dec 09 15:53:40 CET 2020
|
||||
<pre tabindex="0"><code>Successfully finished updating Solr Storage Reports | Wed Dec 09 15:53:40 CET 2020
|
||||
Exception: parsing error
|
||||
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: parsing error
|
||||
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:530)
|
||||
@ -341,7 +341,7 @@ Caused by: org.apache.http.TruncatedChunkException: Truncated chunk ( expected s
|
||||
<ul>
|
||||
<li>I can see it in the <code>openrxv-items-final</code> index:</li>
|
||||
</ul>
|
||||
<pre><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/openrxv-items-final/_count?q=*' | json_pp
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/openrxv-items-final/_count?q=*' | json_pp
|
||||
{
|
||||
"_shards" : {
|
||||
"failed" : 0,
|
||||
@ -355,14 +355,14 @@ Caused by: org.apache.http.TruncatedChunkException: Truncated chunk ( expected s
|
||||
<li>I filed a bug on OpenRXV: <a href="https://github.com/ilri/OpenRXV/issues/64">https://github.com/ilri/OpenRXV/issues/64</a></li>
|
||||
<li>For now I will try to delete the index and start a re-harvest in the Admin UI:</li>
|
||||
</ul>
|
||||
<pre><code>$ curl -XDELETE http://localhost:9200/openrxv-items-final
|
||||
<pre tabindex="0"><code>$ curl -XDELETE http://localhost:9200/openrxv-items-final
|
||||
{"acknowledged":true}%
|
||||
</code></pre><ul>
|
||||
<li>Moayad said he’s working on the harvesting so I stopped it for now to re-deploy his latest changes</li>
|
||||
<li>I updated Tomcat to version 7.0.107 on CGSpace (linode18), ran all updates, and restarted the server</li>
|
||||
<li>I deleted both items indexes and restarted the harvesting:</li>
|
||||
</ul>
|
||||
<pre><code>$ curl -XDELETE http://localhost:9200/openrxv-items-final
|
||||
<pre tabindex="0"><code>$ curl -XDELETE http://localhost:9200/openrxv-items-final
|
||||
$ curl -XDELETE http://localhost:9200/openrxv-items-temp
|
||||
</code></pre><ul>
|
||||
<li>Peter asked me for a list of all submitters and approvers that were active recently on CGSpace
|
||||
@ -371,7 +371,7 @@ $ curl -XDELETE http://localhost:9200/openrxv-items-temp
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre><code class="language-console" data-lang="console">localhost/dspace63= > SELECT * FROM metadatavalue WHERE metadata_field_id=28 AND text_value ~ '^.*on 2020-[0-9]{2}-*';
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">localhost/dspace63= > SELECT * FROM metadatavalue WHERE metadata_field_id=28 AND text_value ~ '^.*on 2020-[0-9]{2}-*';
|
||||
</code></pre><h2 id="2020-12-14">2020-12-14</h2>
|
||||
<ul>
|
||||
<li>The re-harvesting finished last night on AReS but there are no records in the <code>openrxv-items-final</code> index
|
||||
@ -380,7 +380,7 @@ $ curl -XDELETE http://localhost:9200/openrxv-items-temp
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*' | json_pp
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*' | json_pp
|
||||
{
|
||||
"count" : 99992,
|
||||
"_shards" : {
|
||||
@ -397,14 +397,14 @@ $ curl -XDELETE http://localhost:9200/openrxv-items-temp
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre><code class="language-console" data-lang="console">$ curl -X PUT "localhost:9200/openrxv-items-temp/_settings?pretty" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": true}}'
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -X PUT "localhost:9200/openrxv-items-temp/_settings?pretty" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": true}}'
|
||||
$ curl -s -X POST http://localhost:9200/openrxv-items-temp/_clone/openrxv-items-final
|
||||
{"acknowledged":true,"shards_acknowledged":true,"index":"openrxv-items-final"}
|
||||
$ curl -X PUT "localhost:9200/openrxv-items-temp/_settings?pretty" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": false}}'
|
||||
</code></pre><ul>
|
||||
<li>Now I see that the <code>openrxv-items-final</code> index has items, but there are still none in AReS Explorer UI!</li>
|
||||
</ul>
|
||||
<pre><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/openrxv-items-final/_count?q=*&pretty'
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/openrxv-items-final/_count?q=*&pretty'
|
||||
{
|
||||
"count" : 99992,
|
||||
"_shards" : {
|
||||
@ -417,7 +417,7 @@ $ curl -X PUT "localhost:9200/openrxv-items-temp/_settings?pretty" -H
|
||||
</code></pre><ul>
|
||||
<li>The api logs show this from last night after the harvesting:</li>
|
||||
</ul>
|
||||
<pre><code class="language-console" data-lang="console">[Nest] 92 - 12/13/2020, 1:58:52 PM [HarvesterService] Starting Harvest
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">[Nest] 92 - 12/13/2020, 1:58:52 PM [HarvesterService] Starting Harvest
|
||||
[Nest] 92 - 12/13/2020, 10:50:20 PM [FetchConsumer] OnGlobalQueueDrained
|
||||
[Nest] 92 - 12/13/2020, 11:00:20 PM [PluginsConsumer] OnGlobalQueueDrained
|
||||
[Nest] 92 - 12/13/2020, 11:00:20 PM [HarvesterService] reindex function is called
|
||||
@ -432,7 +432,7 @@ $ curl -X PUT "localhost:9200/openrxv-items-temp/_settings?pretty" -H
|
||||
<li>I cloned the <code>openrxv-items-final</code> index to <code>openrxv-items</code> index and now I see items in the explorer UI</li>
|
||||
<li>The PDF report was broken and I looked in the API logs and saw this:</li>
|
||||
</ul>
|
||||
<pre><code class="language-console" data-lang="console">(node:94) UnhandledPromiseRejectionWarning: Error: Error: Could not find soffice binary
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">(node:94) UnhandledPromiseRejectionWarning: Error: Error: Could not find soffice binary
|
||||
at ExportService.downloadFile (/backend/dist/export/services/export/export.service.js:51:19)
|
||||
at processTicksAndRejections (internal/process/task_queues.js:97:5)
|
||||
</code></pre><ul>
|
||||
@ -457,7 +457,7 @@ $ curl -X PUT "localhost:9200/openrxv-items-temp/_settings?pretty" -H
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre><code>$ http --print b 'https://cgspace.cgiar.org/rest/collections/defee001-8cc8-4a6c-8ac8-21bb5adab2db?expand=all&limit=100&offset=0' | json_pp > /tmp/policy1.json
|
||||
<pre tabindex="0"><code>$ http --print b 'https://cgspace.cgiar.org/rest/collections/defee001-8cc8-4a6c-8ac8-21bb5adab2db?expand=all&limit=100&offset=0' | json_pp > /tmp/policy1.json
|
||||
$ http --print b 'https://cgspace.cgiar.org/rest/collections/defee001-8cc8-4a6c-8ac8-21bb5adab2db?expand=all&limit=100&offset=100' | json_pp > /tmp/policy2.json
|
||||
$ query-json '.items | length' /tmp/policy1.json
|
||||
100
|
||||
@ -487,7 +487,7 @@ $ query-json '.items | length' /tmp/policy2.json
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre><code class="language-console" data-lang="console">$ curl -X PUT "localhost:9200/openrxv-items-temp/_settings?pretty" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": true}}'
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -X PUT "localhost:9200/openrxv-items-temp/_settings?pretty" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": true}}'
|
||||
$ curl -s -X POST http://localhost:9200/openrxv-items-temp/_clone/openrxv-items-2020-12-14
|
||||
$ curl -X PUT "localhost:9200/openrxv-items-temp/_settings?pretty" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": false}}'
|
||||
</code></pre><h2 id="2020-12-15">2020-12-15</h2>
|
||||
@ -499,12 +499,12 @@ $ curl -X PUT "localhost:9200/openrxv-items-temp/_settings?pretty" -H
|
||||
</li>
|
||||
<li>I checked the 1,534 fixes in Open Refine (had to fix a few UTF-8 errors, as always from Peter’s CSVs) and then applied them using the <code>fix-metadata-values.py</code> script:</li>
|
||||
</ul>
|
||||
<pre><code class="language-console" data-lang="console">$ ./fix-metadata-values.py -i /tmp/2020-10-28-fix-1534-Authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t 'correct' -m 3
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ ./fix-metadata-values.py -i /tmp/2020-10-28-fix-1534-Authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t 'correct' -m 3
|
||||
$ ./delete-metadata-values.py -i /tmp/2020-10-28-delete-2-Authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -m 3
|
||||
</code></pre><ul>
|
||||
<li>Since I was re-indexing Discovery anyways I decided to check for any uppercase AGROVOC and lowercase them:</li>
|
||||
</ul>
|
||||
<pre><code class="language-console" data-lang="console">dspace=# BEGIN;
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">dspace=# BEGIN;
|
||||
BEGIN
|
||||
dspace=# UPDATE metadatavalue SET text_value=LOWER(text_value) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=57 AND text_value ~ '[[:upper:]]';
|
||||
UPDATE 406
|
||||
@ -513,7 +513,7 @@ COMMIT
|
||||
</code></pre><ul>
|
||||
<li>I also updated the Font Awesome icon classes for version 5 syntax:</li>
|
||||
</ul>
|
||||
<pre><code class="language-console" data-lang="console">dspace=# BEGIN;
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">dspace=# BEGIN;
|
||||
dspace=# UPDATE metadatavalue SET text_value = REGEXP_REPLACE(text_value, 'fa fa-rss','fas fa-rss', 'g') WHERE text_value LIKE '%fa fa-rss%';
|
||||
UPDATE 74
|
||||
dspace=# UPDATE metadatavalue SET text_value = REGEXP_REPLACE(text_value, 'fa fa-at','fas fa-at', 'g') WHERE text_value LIKE '%fa fa-at%';
|
||||
@ -522,7 +522,7 @@ dspace=# COMMIT;
|
||||
</code></pre><ul>
|
||||
<li>Then I started a full Discovery re-index:</li>
|
||||
</ul>
|
||||
<pre><code class="language-console" data-lang="console">$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx512m"
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx512m"
|
||||
$ time chrt -b 0 ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
||||
|
||||
real 265m11.224s
|
||||
@ -544,7 +544,7 @@ sys 2m41.097s
|
||||
<ul>
|
||||
<li>After the Discovery re-indexing finished on CGSpace I prepared to start re-harvesting AReS by making sure the <code>openrxv-items-temp</code> index was empty and that the backup index I made yesterday was still there:</li>
|
||||
</ul>
|
||||
<pre><code class="language-console" data-lang="console">$ curl -XDELETE 'http://localhost:9200/openrxv-items-temp?pretty'
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -XDELETE 'http://localhost:9200/openrxv-items-temp?pretty'
|
||||
{
|
||||
"acknowledged" : true
|
||||
}
|
||||
@ -576,7 +576,7 @@ $ curl -s 'http://localhost:9200/openrxv-items-2020-12-14/_count?q=*&pretty'
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*&pretty'
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*&pretty'
|
||||
{
|
||||
"count" : 100046,
|
||||
"_shards" : {
|
||||
@ -611,7 +611,7 @@ $ curl -XDELETE 'http://localhost:9200/openrxv-items-temp?pretty'
|
||||
</li>
|
||||
<li>Generate a list of submitters and approvers active in the last months using the Provenance field on CGSpace:</li>
|
||||
</ul>
|
||||
<pre><code class="language-console" data-lang="console">$ psql -h localhost -U postgres dspace -c "SELECT text_value FROM metadatavalue WHERE metadata_field_id=28 AND text_value ~ '^.*on 2020-(06|07|08|09|10|11|12)-*'" > /tmp/provenance.txt
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ psql -h localhost -U postgres dspace -c "SELECT text_value FROM metadatavalue WHERE metadata_field_id=28 AND text_value ~ '^.*on 2020-(06|07|08|09|10|11|12)-*'" > /tmp/provenance.txt
|
||||
$ grep -o -E 'by .*)' /tmp/provenance.txt | grep -v -E "( on |checksum)" | sed -e 's/by //' -e 's/ (/,/' -e 's/)//' | sort | uniq > /tmp/recent-submitters-approvers.csv
|
||||
</code></pre><ul>
|
||||
<li>Peter wanted it to send some mail to the users…</li>
|
||||
@ -620,7 +620,7 @@ $ grep -o -E 'by .*)' /tmp/provenance.txt | grep -v -E "( on |checksum)&quo
|
||||
<ul>
|
||||
<li>I see some errors from CUA in our Tomcat logs:</li>
|
||||
</ul>
|
||||
<pre><code class="language-console" data-lang="console">Thu Dec 17 07:35:27 CET 2020 | Query:containerItem:b049326a-0e76-45a8-ac0c-d8ec043a50c6
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">Thu Dec 17 07:35:27 CET 2020 | Query:containerItem:b049326a-0e76-45a8-ac0c-d8ec043a50c6
|
||||
Error while updating
|
||||
java.lang.UnsupportedOperationException: Multiple update components target the same field:solr_update_time_stamp
|
||||
at com.atmire.dspace.cua.CUASolrLoggerServiceImpl$5.visit(SourceFile:1155)
|
||||
@ -636,7 +636,7 @@ java.lang.UnsupportedOperationException: Multiple update components target the s
|
||||
</li>
|
||||
<li>I was trying to export the ILRI community on CGSpace so I could update one of the ILRI author’s names, but it throws an error…</li>
|
||||
</ul>
|
||||
<pre><code class="language-console" data-lang="console">$ dspace metadata-export -i 10568/1 -f /tmp/2020-12-17-ILRI.csv
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ dspace metadata-export -i 10568/1 -f /tmp/2020-12-17-ILRI.csv
|
||||
Loading @mire database changes for module MQM
|
||||
Changes have been processed
|
||||
Exporting community 'International Livestock Research Institute (ILRI)' (10568/1)
|
||||
@ -657,7 +657,7 @@ java.lang.NullPointerException
|
||||
</code></pre><ul>
|
||||
<li>I did it via CSV with <code>fix-metadata-values.py</code> instead:</li>
|
||||
</ul>
|
||||
<pre><code class="language-console" data-lang="console">$ cat 2020-12-17-update-ILRI-author.csv
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ cat 2020-12-17-update-ILRI-author.csv
|
||||
dc.contributor.author,correct
|
||||
"Padmakumar, V.P.","Varijakshapanicker, Padmakumar"
|
||||
$ ./fix-metadata-values.py -i 2020-12-17-update-ILRI-author.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t 'correct' -m 3
|
||||
@ -668,7 +668,7 @@ $ ./fix-metadata-values.py -i 2020-12-17-update-ILRI-author.csv -db dspace -u ds
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre><code>$ csvcut -c 'dc.identifier.citation[en_US],dc.identifier.uri,dc.identifier.uri[],dc.identifier.uri[en_US],dc.date.issued,dc.date.issued[],dc.date.issued[en_US],cg.identifier.status[en_US]' ~/Downloads/10568-80099.csv | csvgrep -c 'cg.identifier.status[en_US]' -m 'Limited Access' | csvgrep -c 'dc.date.issued' -m 2020 -c 'dc.date.issued[]' -m 2020 -c 'dc.date.issued[en_US]' -m 2020 > /tmp/limited-2020.csv
|
||||
<pre tabindex="0"><code>$ csvcut -c 'dc.identifier.citation[en_US],dc.identifier.uri,dc.identifier.uri[],dc.identifier.uri[en_US],dc.date.issued,dc.date.issued[],dc.date.issued[en_US],cg.identifier.status[en_US]' ~/Downloads/10568-80099.csv | csvgrep -c 'cg.identifier.status[en_US]' -m 'Limited Access' | csvgrep -c 'dc.date.issued' -m 2020 -c 'dc.date.issued[]' -m 2020 -c 'dc.date.issued[en_US]' -m 2020 > /tmp/limited-2020.csv
|
||||
</code></pre><h2 id="2020-12-18">2020-12-18</h2>
|
||||
<ul>
|
||||
<li>I added support for indexing community views and downloads to <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a>
|
||||
@ -689,7 +689,7 @@ $ ./fix-metadata-values.py -i 2020-12-17-update-ILRI-author.csv -db dspace -u ds
|
||||
<ul>
|
||||
<li>The DeduplicateValuesProcessor has been running on DSpace Test since two days ago and it almost completed its second twelve-hour run, but crashed near the end:</li>
|
||||
</ul>
|
||||
<pre><code class="language-console" data-lang="console">...
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">...
|
||||
Run 1 — 100% — 8,230,000/8,239,228 docs — 39s — 9h 8m 31s
|
||||
Exception: Java heap space
|
||||
java.lang.OutOfMemoryError: Java heap space
|
||||
@ -744,7 +744,7 @@ java.lang.OutOfMemoryError: Java heap space
|
||||
<li>The AReS harvest finished this morning and I moved the Elasticsearch index manually</li>
|
||||
<li>First, check the number of records in the temp index to make sure it seems complete and not with double data:</li>
|
||||
</ul>
|
||||
<pre><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*&pretty'
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*&pretty'
|
||||
{
|
||||
"count" : 100135,
|
||||
"_shards" : {
|
||||
@ -757,13 +757,13 @@ java.lang.OutOfMemoryError: Java heap space
|
||||
</code></pre><ul>
|
||||
<li>Then delete the old backup and clone the current items index as a backup:</li>
|
||||
</ul>
|
||||
<pre><code class="language-console" data-lang="console">$ curl -XDELETE 'http://localhost:9200/openrxv-items-2020-12-14?pretty'
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -XDELETE 'http://localhost:9200/openrxv-items-2020-12-14?pretty'
|
||||
$ curl -X PUT "localhost:9200/openrxv-items/_settings?pretty" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": true}}'
|
||||
$ curl -s -X POST http://localhost:9200/openrxv-items/_clone/openrxv-items-2020-12-21
|
||||
</code></pre><ul>
|
||||
<li>Then delete the current items index and clone it from temp:</li>
|
||||
</ul>
|
||||
<pre><code class="language-console" data-lang="console">$ curl -XDELETE 'http://localhost:9200/openrxv-items?pretty'
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -XDELETE 'http://localhost:9200/openrxv-items?pretty'
|
||||
$ curl -X PUT "localhost:9200/openrxv-items-temp/_settings?pretty" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": true}}'
|
||||
$ curl -s -X POST http://localhost:9200/openrxv-items-temp/_clone/openrxv-items
|
||||
$ curl -X PUT "localhost:9200/openrxv-items-temp/_settings?pretty" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": false}}'
|
||||
@ -806,11 +806,11 @@ $ curl -X PUT "localhost:9200/openrxv-items-temp/_settings?pretty" -H
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre><code>statistics-2012: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error opening new searcher
|
||||
<pre tabindex="0"><code>statistics-2012: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error opening new searcher
|
||||
</code></pre><ul>
|
||||
<li>I exported the 2012 stats from the year core and imported them to the main statistics core with solr-import-export-json:</li>
|
||||
</ul>
|
||||
<pre><code class="language-console" data-lang="console">$ chrt -b 0 ./run.sh -s http://localhost:8081/solr/statistics-2012 -a export -o statistics-2012.json -k uid
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ chrt -b 0 ./run.sh -s http://localhost:8081/solr/statistics-2012 -a export -o statistics-2012.json -k uid
|
||||
$ chrt -b 0 ./run.sh -s http://localhost:8081/solr/statistics -a import -o statistics-2010.json -k uid
|
||||
$ curl -s "http://localhost:8081/solr/statistics-2012/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:*</query></delete>"
|
||||
</code></pre><ul>
|
||||
@ -824,7 +824,7 @@ $ curl -s "http://localhost:8081/solr/statistics-2012/update?softCommit=tru
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/openrxv-items/_count?q=*&pretty'
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/openrxv-items/_count?q=*&pretty'
|
||||
{
|
||||
"count" : 100135,
|
||||
"_shards" : {
|
||||
@ -842,7 +842,7 @@ $ curl -X PUT "localhost:9200/openrxv-items/_settings?pretty" -H 'Cont
|
||||
<ul>
|
||||
<li>The indexing on AReS finished so I cloned the <code>openrxv-items-temp</code> index to <code>openrxv-items</code> and deleted the backup index:</li>
|
||||
</ul>
|
||||
<pre><code class="language-console" data-lang="console">$ curl -XDELETE 'http://localhost:9200/openrxv-items?pretty'
|
||||
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -XDELETE 'http://localhost:9200/openrxv-items?pretty'
|
||||
$ curl -X PUT "localhost:9200/openrxv-items-temp/_settings?pretty" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": true}}'
|
||||
$ curl -s -X POST http://localhost:9200/openrxv-items-temp/_clone/openrxv-items
|
||||
$ curl -X PUT "localhost:9200/openrxv-items-temp/_settings?pretty" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": false}}'
|
||||
|
Reference in New Issue
Block a user