Add notes for 2021-09-13

This commit is contained in:
2021-09-13 16:21:16 +03:00
parent 8b487a4a77
commit c05c7213c2
109 changed files with 2627 additions and 2530 deletions

View File

@ -36,7 +36,7 @@ I started processing those (about 411,000 records):
"/>
<meta name="generator" content="Hugo 0.87.0" />
<meta name="generator" content="Hugo 0.88.1" />
@ -132,7 +132,7 @@ I started processing those (about 411,000 records):
</ul>
</li>
</ul>
<pre><code class="language-console" data-lang="console">$ chrt -b 0 dspace dsrun com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI -t 12 -c statistics-2015
<pre tabindex="0"><code class="language-console" data-lang="console">$ chrt -b 0 dspace dsrun com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI -t 12 -c statistics-2015
</code></pre><ul>
<li>AReS went down when the <code>renew-letsencrypt</code> service stopped the <code>angular_nginx</code> container in the pre-update hook and failed to bring it back up
<ul>
@ -151,7 +151,7 @@ I started processing those (about 411,000 records):
</li>
<li>Start testing export/import of yearly Solr statistics data into the main statistics core on DSpace Test, for example:</li>
</ul>
<pre><code>$ ./run.sh -s http://localhost:8081/solr/statistics-2010 -a export -o statistics-2010.json -k uid
<pre tabindex="0"><code>$ ./run.sh -s http://localhost:8081/solr/statistics-2010 -a export -o statistics-2010.json -k uid
$ ./run.sh -s http://localhost:8081/solr/statistics -a import -o statistics-2010.json -k uid
$ curl -s &quot;http://localhost:8081/solr/statistics-2010/update?softCommit=true&quot; -H &quot;Content-Type: text/xml&quot; --data-binary &quot;&lt;delete&gt;&lt;query&gt;*:*&lt;/query&gt;&lt;/delete&gt;&quot;
</code></pre><ul>
@ -179,13 +179,13 @@ $ curl -s &quot;http://localhost:8081/solr/statistics-2010/update?softCommit=tru
<ul>
<li>First the 2010 core:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ chrt -b 0 ./run.sh -s http://localhost:8081/solr/statistics-2010 -a export -o statistics-2010.json -k uid
<pre tabindex="0"><code class="language-console" data-lang="console">$ chrt -b 0 ./run.sh -s http://localhost:8081/solr/statistics-2010 -a export -o statistics-2010.json -k uid
$ chrt -b 0 ./run.sh -s http://localhost:8081/solr/statistics -a import -o statistics-2010.json -k uid
$ curl -s &quot;http://localhost:8081/solr/statistics-2010/update?softCommit=true&quot; -H &quot;Content-Type: text/xml&quot; --data-binary &quot;&lt;delete&gt;&lt;query&gt;*:*&lt;/query&gt;&lt;/delete&gt;&quot;
</code></pre><ul>
<li>Judging by the DSpace logs all these cores had a problem starting up in the last month:</li>
</ul>
<pre><code class="language-console" data-lang="console"># grep -rsI &quot;Unable to create core&quot; [dspace]/log/dspace.log.2020-* | grep -o -E &quot;statistics-[0-9]+&quot; | sort | uniq -c
<pre tabindex="0"><code class="language-console" data-lang="console"># grep -rsI &quot;Unable to create core&quot; [dspace]/log/dspace.log.2020-* | grep -o -E &quot;statistics-[0-9]+&quot; | sort | uniq -c
24 statistics-2010
24 statistics-2015
18 statistics-2016
@ -193,7 +193,7 @@ $ curl -s &quot;http://localhost:8081/solr/statistics-2010/update?softCommit=tru
</code></pre><ul>
<li>The message is always this:</li>
</ul>
<pre><code>org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error CREATEing SolrCore 'statistics-2016': Unable to create core [statistics-2016] Caused by: Lock obtain timed out: NativeFSLock@/[dspace]/solr/statistics-2016/data/index/write.lock
<pre tabindex="0"><code>org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error CREATEing SolrCore 'statistics-2016': Unable to create core [statistics-2016] Caused by: Lock obtain timed out: NativeFSLock@/[dspace]/solr/statistics-2016/data/index/write.lock
</code></pre><ul>
<li>I will migrate all these cores and see if it makes a difference, then probably end up migrating all of them
<ul>
@ -223,7 +223,7 @@ $ curl -s &quot;http://localhost:8081/solr/statistics-2010/update?softCommit=tru
<ul>
<li>There are apparently 1,700 locks right now:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
<pre tabindex="0"><code class="language-console" data-lang="console">$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
1739
</code></pre><h2 id="2020-12-08">2020-12-08</h2>
<ul>
@ -233,7 +233,7 @@ $ curl -s &quot;http://localhost:8081/solr/statistics-2010/update?softCommit=tru
</ul>
</li>
</ul>
<pre><code>Record uid: 64387815-d9a7-4605-8024-1c0a5c7520e0 couldn't be processed
<pre tabindex="0"><code>Record uid: 64387815-d9a7-4605-8024-1c0a5c7520e0 couldn't be processed
com.atmire.statistics.util.update.atomic.ProcessingException: something went wrong while processing record uid: 64387815-d9a7-4605-8024-1c0a5c7520e0, an error occured in the com.atmire.statistics.util.update.atomic.processor.DeduplicateValuesProcessor
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.applyProcessors(SourceFile:304)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.processRecords(SourceFile:176)
@ -270,7 +270,7 @@ Caused by: java.lang.UnsupportedOperationException
<ul>
<li>I was running the AtomicStatisticsUpdateCLI to remove duplicates on DSpace Test but it failed near the end of the statistics core (after 20 hours or so) with a memory error:</li>
</ul>
<pre><code>Successfully finished updating Solr Storage Reports | Wed Dec 09 15:25:11 CET 2020
<pre tabindex="0"><code>Successfully finished updating Solr Storage Reports | Wed Dec 09 15:25:11 CET 2020
Run 1 —  67% — 10,000/14,935 docs — 6m 6s — 6m 6s
Exception: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
@ -279,7 +279,7 @@ java.lang.OutOfMemoryError: GC overhead limit exceeded
<li>I increased the JVM heap to 2048m and tried again, but it failed with a memory error again&hellip;</li>
<li>I increased the JVM heap to 4096m and tried again, but it failed with another error:</li>
</ul>
<pre><code>Successfully finished updating Solr Storage Reports | Wed Dec 09 15:53:40 CET 2020
<pre tabindex="0"><code>Successfully finished updating Solr Storage Reports | Wed Dec 09 15:53:40 CET 2020
Exception: parsing error
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: parsing error
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:530)
@ -341,7 +341,7 @@ Caused by: org.apache.http.TruncatedChunkException: Truncated chunk ( expected s
<ul>
<li>I can see it in the <code>openrxv-items-final</code> index:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/openrxv-items-final/_count?q=*' | json_pp
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/openrxv-items-final/_count?q=*' | json_pp
{
&quot;_shards&quot; : {
&quot;failed&quot; : 0,
@ -355,14 +355,14 @@ Caused by: org.apache.http.TruncatedChunkException: Truncated chunk ( expected s
<li>I filed a bug on OpenRXV: <a href="https://github.com/ilri/OpenRXV/issues/64">https://github.com/ilri/OpenRXV/issues/64</a></li>
<li>For now I will try to delete the index and start a re-harvest in the Admin UI:</li>
</ul>
<pre><code>$ curl -XDELETE http://localhost:9200/openrxv-items-final
<pre tabindex="0"><code>$ curl -XDELETE http://localhost:9200/openrxv-items-final
{&quot;acknowledged&quot;:true}%
</code></pre><ul>
<li>Moayad said he&rsquo;s working on the harvesting so I stopped it for now to re-deploy his latest changes</li>
<li>I updated Tomcat to version 7.0.107 on CGSpace (linode18), ran all updates, and restarted the server</li>
<li>I deleted both items indexes and restarted the harvesting:</li>
</ul>
<pre><code>$ curl -XDELETE http://localhost:9200/openrxv-items-final
<pre tabindex="0"><code>$ curl -XDELETE http://localhost:9200/openrxv-items-final
$ curl -XDELETE http://localhost:9200/openrxv-items-temp
</code></pre><ul>
<li>Peter asked me for a list of all submitters and approvers that were active recently on CGSpace
@ -371,7 +371,7 @@ $ curl -XDELETE http://localhost:9200/openrxv-items-temp
</ul>
</li>
</ul>
<pre><code class="language-console" data-lang="console">localhost/dspace63= &gt; SELECT * FROM metadatavalue WHERE metadata_field_id=28 AND text_value ~ '^.*on 2020-[0-9]{2}-*';
<pre tabindex="0"><code class="language-console" data-lang="console">localhost/dspace63= &gt; SELECT * FROM metadatavalue WHERE metadata_field_id=28 AND text_value ~ '^.*on 2020-[0-9]{2}-*';
</code></pre><h2 id="2020-12-14">2020-12-14</h2>
<ul>
<li>The re-harvesting finished last night on AReS but there are no records in the <code>openrxv-items-final</code> index
@ -380,7 +380,7 @@ $ curl -XDELETE http://localhost:9200/openrxv-items-temp
</ul>
</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*' | json_pp
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*' | json_pp
{
&quot;count&quot; : 99992,
&quot;_shards&quot; : {
@ -397,14 +397,14 @@ $ curl -XDELETE http://localhost:9200/openrxv-items-temp
</ul>
</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -X PUT &quot;localhost:9200/openrxv-items-temp/_settings?pretty&quot; -H 'Content-Type: application/json' -d'{&quot;settings&quot;: {&quot;index.blocks.write&quot;: true}}'
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -X PUT &quot;localhost:9200/openrxv-items-temp/_settings?pretty&quot; -H 'Content-Type: application/json' -d'{&quot;settings&quot;: {&quot;index.blocks.write&quot;: true}}'
$ curl -s -X POST http://localhost:9200/openrxv-items-temp/_clone/openrxv-items-final
{&quot;acknowledged&quot;:true,&quot;shards_acknowledged&quot;:true,&quot;index&quot;:&quot;openrxv-items-final&quot;}
$ curl -X PUT &quot;localhost:9200/openrxv-items-temp/_settings?pretty&quot; -H 'Content-Type: application/json' -d'{&quot;settings&quot;: {&quot;index.blocks.write&quot;: false}}'
</code></pre><ul>
<li>Now I see that the <code>openrxv-items-final</code> index has items, but there are still none in AReS Explorer UI!</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/openrxv-items-final/_count?q=*&amp;pretty'
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/openrxv-items-final/_count?q=*&amp;pretty'
{
&quot;count&quot; : 99992,
&quot;_shards&quot; : {
@ -417,7 +417,7 @@ $ curl -X PUT &quot;localhost:9200/openrxv-items-temp/_settings?pretty&quot; -H
</code></pre><ul>
<li>The api logs show this from last night after the harvesting:</li>
</ul>
<pre><code class="language-console" data-lang="console">[Nest] 92 - 12/13/2020, 1:58:52 PM [HarvesterService] Starting Harvest
<pre tabindex="0"><code class="language-console" data-lang="console">[Nest] 92 - 12/13/2020, 1:58:52 PM [HarvesterService] Starting Harvest
[Nest] 92 - 12/13/2020, 10:50:20 PM [FetchConsumer] OnGlobalQueueDrained
[Nest] 92 - 12/13/2020, 11:00:20 PM [PluginsConsumer] OnGlobalQueueDrained
[Nest] 92 - 12/13/2020, 11:00:20 PM [HarvesterService] reindex function is called
@ -432,7 +432,7 @@ $ curl -X PUT &quot;localhost:9200/openrxv-items-temp/_settings?pretty&quot; -H
<li>I cloned the <code>openrxv-items-final</code> index to <code>openrxv-items</code> index and now I see items in the explorer UI</li>
<li>The PDF report was broken and I looked in the API logs and saw this:</li>
</ul>
<pre><code class="language-console" data-lang="console">(node:94) UnhandledPromiseRejectionWarning: Error: Error: Could not find soffice binary
<pre tabindex="0"><code class="language-console" data-lang="console">(node:94) UnhandledPromiseRejectionWarning: Error: Error: Could not find soffice binary
at ExportService.downloadFile (/backend/dist/export/services/export/export.service.js:51:19)
at processTicksAndRejections (internal/process/task_queues.js:97:5)
</code></pre><ul>
@ -457,7 +457,7 @@ $ curl -X PUT &quot;localhost:9200/openrxv-items-temp/_settings?pretty&quot; -H
</ul>
</li>
</ul>
<pre><code>$ http --print b 'https://cgspace.cgiar.org/rest/collections/defee001-8cc8-4a6c-8ac8-21bb5adab2db?expand=all&amp;limit=100&amp;offset=0' | json_pp &gt; /tmp/policy1.json
<pre tabindex="0"><code>$ http --print b 'https://cgspace.cgiar.org/rest/collections/defee001-8cc8-4a6c-8ac8-21bb5adab2db?expand=all&amp;limit=100&amp;offset=0' | json_pp &gt; /tmp/policy1.json
$ http --print b 'https://cgspace.cgiar.org/rest/collections/defee001-8cc8-4a6c-8ac8-21bb5adab2db?expand=all&amp;limit=100&amp;offset=100' | json_pp &gt; /tmp/policy2.json
$ query-json '.items | length' /tmp/policy1.json
100
@ -487,7 +487,7 @@ $ query-json '.items | length' /tmp/policy2.json
</ul>
</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -X PUT &quot;localhost:9200/openrxv-items-temp/_settings?pretty&quot; -H 'Content-Type: application/json' -d'{&quot;settings&quot;: {&quot;index.blocks.write&quot;: true}}'
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -X PUT &quot;localhost:9200/openrxv-items-temp/_settings?pretty&quot; -H 'Content-Type: application/json' -d'{&quot;settings&quot;: {&quot;index.blocks.write&quot;: true}}'
$ curl -s -X POST http://localhost:9200/openrxv-items-temp/_clone/openrxv-items-2020-12-14
$ curl -X PUT &quot;localhost:9200/openrxv-items-temp/_settings?pretty&quot; -H 'Content-Type: application/json' -d'{&quot;settings&quot;: {&quot;index.blocks.write&quot;: false}}'
</code></pre><h2 id="2020-12-15">2020-12-15</h2>
@ -499,12 +499,12 @@ $ curl -X PUT &quot;localhost:9200/openrxv-items-temp/_settings?pretty&quot; -H
</li>
<li>I checked the 1,534 fixes in Open Refine (had to fix a few UTF-8 errors, as always from Peter&rsquo;s CSVs) and then applied them using the <code>fix-metadata-values.py</code> script:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ ./fix-metadata-values.py -i /tmp/2020-10-28-fix-1534-Authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t 'correct' -m 3
<pre tabindex="0"><code class="language-console" data-lang="console">$ ./fix-metadata-values.py -i /tmp/2020-10-28-fix-1534-Authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t 'correct' -m 3
$ ./delete-metadata-values.py -i /tmp/2020-10-28-delete-2-Authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -m 3
</code></pre><ul>
<li>Since I was re-indexing Discovery anyways I decided to check for any uppercase AGROVOC and lowercase them:</li>
</ul>
<pre><code class="language-console" data-lang="console">dspace=# BEGIN;
<pre tabindex="0"><code class="language-console" data-lang="console">dspace=# BEGIN;
BEGIN
dspace=# UPDATE metadatavalue SET text_value=LOWER(text_value) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=57 AND text_value ~ '[[:upper:]]';
UPDATE 406
@ -513,7 +513,7 @@ COMMIT
</code></pre><ul>
<li>I also updated the Font Awesome icon classes for version 5 syntax:</li>
</ul>
<pre><code class="language-console" data-lang="console">dspace=# BEGIN;
<pre tabindex="0"><code class="language-console" data-lang="console">dspace=# BEGIN;
dspace=# UPDATE metadatavalue SET text_value = REGEXP_REPLACE(text_value, 'fa fa-rss','fas fa-rss', 'g') WHERE text_value LIKE '%fa fa-rss%';
UPDATE 74
dspace=# UPDATE metadatavalue SET text_value = REGEXP_REPLACE(text_value, 'fa fa-at','fas fa-at', 'g') WHERE text_value LIKE '%fa fa-at%';
@ -522,7 +522,7 @@ dspace=# COMMIT;
</code></pre><ul>
<li>Then I started a full Discovery re-index:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx512m&quot;
<pre tabindex="0"><code class="language-console" data-lang="console">$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx512m&quot;
$ time chrt -b 0 ionice -c2 -n7 nice -n19 dspace index-discovery -b
real 265m11.224s
@ -544,7 +544,7 @@ sys 2m41.097s
<ul>
<li>After the Discovery re-indexing finished on CGSpace I prepared to start re-harvesting AReS by making sure the <code>openrxv-items-temp</code> index was empty and that the backup index I made yesterday was still there:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -XDELETE 'http://localhost:9200/openrxv-items-temp?pretty'
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -XDELETE 'http://localhost:9200/openrxv-items-temp?pretty'
{
&quot;acknowledged&quot; : true
}
@ -576,7 +576,7 @@ $ curl -s 'http://localhost:9200/openrxv-items-2020-12-14/_count?q=*&amp;pretty'
</ul>
</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*&amp;pretty'
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*&amp;pretty'
{
&quot;count&quot; : 100046,
&quot;_shards&quot; : {
@ -611,7 +611,7 @@ $ curl -XDELETE 'http://localhost:9200/openrxv-items-temp?pretty'
</li>
<li>Generate a list of submitters and approvers active in the last months using the Provenance field on CGSpace:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ psql -h localhost -U postgres dspace -c &quot;SELECT text_value FROM metadatavalue WHERE metadata_field_id=28 AND text_value ~ '^.*on 2020-(06|07|08|09|10|11|12)-*'&quot; &gt; /tmp/provenance.txt
<pre tabindex="0"><code class="language-console" data-lang="console">$ psql -h localhost -U postgres dspace -c &quot;SELECT text_value FROM metadatavalue WHERE metadata_field_id=28 AND text_value ~ '^.*on 2020-(06|07|08|09|10|11|12)-*'&quot; &gt; /tmp/provenance.txt
$ grep -o -E 'by .*)' /tmp/provenance.txt | grep -v -E &quot;( on |checksum)&quot; | sed -e 's/by //' -e 's/ (/,/' -e 's/)//' | sort | uniq &gt; /tmp/recent-submitters-approvers.csv
</code></pre><ul>
<li>Peter wanted it to send some mail to the users&hellip;</li>
@ -620,7 +620,7 @@ $ grep -o -E 'by .*)' /tmp/provenance.txt | grep -v -E &quot;( on |checksum)&quo
<ul>
<li>I see some errors from CUA in our Tomcat logs:</li>
</ul>
<pre><code class="language-console" data-lang="console">Thu Dec 17 07:35:27 CET 2020 | Query:containerItem:b049326a-0e76-45a8-ac0c-d8ec043a50c6
<pre tabindex="0"><code class="language-console" data-lang="console">Thu Dec 17 07:35:27 CET 2020 | Query:containerItem:b049326a-0e76-45a8-ac0c-d8ec043a50c6
Error while updating
java.lang.UnsupportedOperationException: Multiple update components target the same field:solr_update_time_stamp
at com.atmire.dspace.cua.CUASolrLoggerServiceImpl$5.visit(SourceFile:1155)
@ -636,7 +636,7 @@ java.lang.UnsupportedOperationException: Multiple update components target the s
</li>
<li>I was trying to export the ILRI community on CGSpace so I could update one of the ILRI author&rsquo;s names, but it throws an error&hellip;</li>
</ul>
<pre><code class="language-console" data-lang="console">$ dspace metadata-export -i 10568/1 -f /tmp/2020-12-17-ILRI.csv
<pre tabindex="0"><code class="language-console" data-lang="console">$ dspace metadata-export -i 10568/1 -f /tmp/2020-12-17-ILRI.csv
Loading @mire database changes for module MQM
Changes have been processed
Exporting community 'International Livestock Research Institute (ILRI)' (10568/1)
@ -657,7 +657,7 @@ java.lang.NullPointerException
</code></pre><ul>
<li>I did it via CSV with <code>fix-metadata-values.py</code> instead:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ cat 2020-12-17-update-ILRI-author.csv
<pre tabindex="0"><code class="language-console" data-lang="console">$ cat 2020-12-17-update-ILRI-author.csv
dc.contributor.author,correct
&quot;Padmakumar, V.P.&quot;,&quot;Varijakshapanicker, Padmakumar&quot;
$ ./fix-metadata-values.py -i 2020-12-17-update-ILRI-author.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t 'correct' -m 3
@ -668,7 +668,7 @@ $ ./fix-metadata-values.py -i 2020-12-17-update-ILRI-author.csv -db dspace -u ds
</ul>
</li>
</ul>
<pre><code>$ csvcut -c 'dc.identifier.citation[en_US],dc.identifier.uri,dc.identifier.uri[],dc.identifier.uri[en_US],dc.date.issued,dc.date.issued[],dc.date.issued[en_US],cg.identifier.status[en_US]' ~/Downloads/10568-80099.csv | csvgrep -c 'cg.identifier.status[en_US]' -m 'Limited Access' | csvgrep -c 'dc.date.issued' -m 2020 -c 'dc.date.issued[]' -m 2020 -c 'dc.date.issued[en_US]' -m 2020 &gt; /tmp/limited-2020.csv
<pre tabindex="0"><code>$ csvcut -c 'dc.identifier.citation[en_US],dc.identifier.uri,dc.identifier.uri[],dc.identifier.uri[en_US],dc.date.issued,dc.date.issued[],dc.date.issued[en_US],cg.identifier.status[en_US]' ~/Downloads/10568-80099.csv | csvgrep -c 'cg.identifier.status[en_US]' -m 'Limited Access' | csvgrep -c 'dc.date.issued' -m 2020 -c 'dc.date.issued[]' -m 2020 -c 'dc.date.issued[en_US]' -m 2020 &gt; /tmp/limited-2020.csv
</code></pre><h2 id="2020-12-18">2020-12-18</h2>
<ul>
<li>I added support for indexing community views and downloads to <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a>
@ -689,7 +689,7 @@ $ ./fix-metadata-values.py -i 2020-12-17-update-ILRI-author.csv -db dspace -u ds
<ul>
<li>The DeduplicateValuesProcessor has been running on DSpace Test since two days ago and it almost completed its second twelve-hour run, but crashed near the end:</li>
</ul>
<pre><code class="language-console" data-lang="console">...
<pre tabindex="0"><code class="language-console" data-lang="console">...
Run 1 — 100% — 8,230,000/8,239,228 docs — 39s — 9h 8m 31s
Exception: Java heap space
java.lang.OutOfMemoryError: Java heap space
@ -744,7 +744,7 @@ java.lang.OutOfMemoryError: Java heap space
<li>The AReS harvest finished this morning and I moved the Elasticsearch index manually</li>
<li>First, check the number of records in the temp index to make sure it seems complete and not with double data:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*&amp;pretty'
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*&amp;pretty'
{
&quot;count&quot; : 100135,
&quot;_shards&quot; : {
@ -757,13 +757,13 @@ java.lang.OutOfMemoryError: Java heap space
</code></pre><ul>
<li>Then delete the old backup and clone the current items index as a backup:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -XDELETE 'http://localhost:9200/openrxv-items-2020-12-14?pretty'
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -XDELETE 'http://localhost:9200/openrxv-items-2020-12-14?pretty'
$ curl -X PUT &quot;localhost:9200/openrxv-items/_settings?pretty&quot; -H 'Content-Type: application/json' -d'{&quot;settings&quot;: {&quot;index.blocks.write&quot;: true}}'
$ curl -s -X POST http://localhost:9200/openrxv-items/_clone/openrxv-items-2020-12-21
</code></pre><ul>
<li>Then delete the current items index and clone it from temp:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -XDELETE 'http://localhost:9200/openrxv-items?pretty'
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -XDELETE 'http://localhost:9200/openrxv-items?pretty'
$ curl -X PUT &quot;localhost:9200/openrxv-items-temp/_settings?pretty&quot; -H 'Content-Type: application/json' -d'{&quot;settings&quot;: {&quot;index.blocks.write&quot;: true}}'
$ curl -s -X POST http://localhost:9200/openrxv-items-temp/_clone/openrxv-items
$ curl -X PUT &quot;localhost:9200/openrxv-items-temp/_settings?pretty&quot; -H 'Content-Type: application/json' -d'{&quot;settings&quot;: {&quot;index.blocks.write&quot;: false}}'
@ -806,11 +806,11 @@ $ curl -X PUT &quot;localhost:9200/openrxv-items-temp/_settings?pretty&quot; -H
</ul>
</li>
</ul>
<pre><code>statistics-2012: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error opening new searcher
<pre tabindex="0"><code>statistics-2012: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error opening new searcher
</code></pre><ul>
<li>I exported the 2012 stats from the year core and imported them to the main statistics core with solr-import-export-json:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ chrt -b 0 ./run.sh -s http://localhost:8081/solr/statistics-2012 -a export -o statistics-2012.json -k uid
<pre tabindex="0"><code class="language-console" data-lang="console">$ chrt -b 0 ./run.sh -s http://localhost:8081/solr/statistics-2012 -a export -o statistics-2012.json -k uid
$ chrt -b 0 ./run.sh -s http://localhost:8081/solr/statistics -a import -o statistics-2010.json -k uid
$ curl -s &quot;http://localhost:8081/solr/statistics-2012/update?softCommit=true&quot; -H &quot;Content-Type: text/xml&quot; --data-binary &quot;&lt;delete&gt;&lt;query&gt;*:*&lt;/query&gt;&lt;/delete&gt;&quot;
</code></pre><ul>
@ -824,7 +824,7 @@ $ curl -s &quot;http://localhost:8081/solr/statistics-2012/update?softCommit=tru
</ul>
</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/openrxv-items/_count?q=*&amp;pretty'
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/openrxv-items/_count?q=*&amp;pretty'
{
&quot;count&quot; : 100135,
&quot;_shards&quot; : {
@ -842,7 +842,7 @@ $ curl -X PUT &quot;localhost:9200/openrxv-items/_settings?pretty&quot; -H 'Cont
<ul>
<li>The indexing on AReS finished so I cloned the <code>openrxv-items-temp</code> index to <code>openrxv-items</code> and deleted the backup index:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -XDELETE 'http://localhost:9200/openrxv-items?pretty'
<pre tabindex="0"><code class="language-console" data-lang="console">$ curl -XDELETE 'http://localhost:9200/openrxv-items?pretty'
$ curl -X PUT &quot;localhost:9200/openrxv-items-temp/_settings?pretty&quot; -H 'Content-Type: application/json' -d'{&quot;settings&quot;: {&quot;index.blocks.write&quot;: true}}'
$ curl -s -X POST http://localhost:9200/openrxv-items-temp/_clone/openrxv-items
$ curl -X PUT &quot;localhost:9200/openrxv-items-temp/_settings?pretty&quot; -H 'Content-Type: application/json' -d'{&quot;settings&quot;: {&quot;index.blocks.write&quot;: false}}'