Add notes for 2020-11-18

This commit is contained in:
Alan Orth 2020-11-18 23:15:06 +02:00
parent 2557931751
commit efbfbf46af
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
28 changed files with 372 additions and 55 deletions

View File

@ -352,4 +352,49 @@ $ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid =
2071
```
## 2020-11-18
- I decided to enable the `rollbackOnReturn=true` option in [Tomcat's JDBC connection pool parameters](https://tomcat.apache.org/tomcat-7.0-doc/jdbc-pool.html) because I noticed that all of the "idle in transaction" connections waiting for locks were SELECT queries
- There are many posts on the Internet about people having this issue with Hibernate
- The locks are lower now, but Peter and Abenet are still having issues approving items and Tezira forwarded one strange case where an item was "approved" and was assigned a handle, but it doesn't exist...
- I sent another mail to the dspace-tech mailing list to ask for help
- I reverted the `rollbackOnReturn` change in Tomcat...
- I sent a message to Atmire to ask for urgent help
- Call with IWMI and Abenet about them potentially moving from InMagic to CGSpace
- They have questions about the reporting on AReS
- We told them that we can use collections to infer Strategic Priorities and Research Groups and WLE Flagships
- It sounds like we will create this structure under the top-level IWMI community:
- IWMI Strategic Priorities (sub-community)
- Water, Food and Ecosystems (sub-community)
- Sustainable and Resilient Food Production Systems (collection)
- Sustainable Water infrastructure and Ecosystems (collection)
- Integrated Basin and Aquifer Management
- Water, Climate Change and Resilience (sub-community)
- Climate Change Adaptation and Resilience (collection)
- etc...
- They will submit items to their normal output type collections and map to these
- In other news I finally finished processing the Solr statistics for UUIDs and re-indexed the stats with the dspace-statistics-api
- I started the Atmire stats processing, notes in the dedicated [CGSpace DSpace 6 Upgrade section]({{< relref "cgspace-dspace6-upgrade.md" >}})
- Peter got a strange message this evening when trying to update metadata:
```
2020-11-18 16:57:33,309 ERROR org.hibernate.engine.jdbc.batch.internal.BatchingBatch @ HHH000315: Exception executing batch [Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1]
2020-11-18 16:57:33,316 ERROR org.hibernate.engine.jdbc.batch.internal.BatchingBatch @ HHH000315: Exception executing batch [Batch update returned unexpected row count from update [13]; actual row count: 0; expected: 1]
2020-11-18 16:57:33,385 INFO org.hibernate.engine.jdbc.batch.internal.AbstractBatchImpl @ HHH000010: On release of batch it still contained JDBC statements
```
- Minor bug fixes to limit parameter in DSpace Statistics API
- Release [version 1.3.2](https://github.com/ilri/dspace-statistics-api/releases/tag/v1.3.2)
- Send a list of potential ToRs for a next phase of OpenRXV development to Michael Victor for feedback:
- Enable advanced reporting templates using "Angular expressions" in Docxtemplater (would be used immediately for IWMI and BioversityCIAT)
- Enable embedding of charts like world map and word cloud in reports
- Enable embedding of item thumbnails in reports, similar to the "list of information products"
- Enable something like the "Statistics" Excel report Peter wanted in 2019 so we can get community and collection statistics reports
- Add a new "metrics" block with statistics about top authors and items by number of views and downloads for the current search terms
- Add ability to change the explorer UI to "Usage Statistics" mode where lists of authors, affiliations, sponsors, CRPs, communities, collections, etc are sorted according to the number of views or downloads for the current search results, rather than by number of occurrences of metadata values
- Add ability to "drill down" or modify search filter terms by clicking on countries in the map
- Enable date-based usage statistics (currently only "all time" statistics are available)
- Fixing minor bugs for all issues filed on GitHub
- I also added GitHub issues for each of them
<!-- vim: set sw=2 ts=2: -->

View File

@ -11,7 +11,8 @@ Notes about the DSpace 6 upgrade on CGSpace in 2020-11.
<!--more-->
- [Processing Solr Statistics With solr-upgrade-statistics-6x](#processing-solr-statistics-with-solr-upgrade-statistics-6x)
- [Re-import OAI with clean index](#re-import-oai-with-clean-index)
- [Processing Solr statistics with solr-upgrade-statistics-6x](#processing-solr-statistics-with-solr-upgrade-statistics-6x)
- [Current year's statistics core](#statistics)
- [statistics-2019 core](#statistics-2019)
- [statistics-2018 core](#statistics-2018)
@ -20,11 +21,28 @@ Notes about the DSpace 6 upgrade on CGSpace in 2020-11.
- [statistics-2015 core](#statistics-2015)
- [statistics-2014 core](#statistics-2014)
- [statistics-2013 core](#statistics-2013)
- [statistics-2013 core](#statistics-2012)
- [statistics-2013 core](#statistics-2011)
- [statistics-2013 core](#statistics-2010)
- [Processing Solr statistics with AtomicStatisticsUpdateCLI](processing-solr-statistics-with-atomicstatisticsupdatecli)
## Processing Solr Statistics With solr-upgrade-statistics-6x
### Re-import OAI with clean index
After the upgrade is complete, re-index all items into OAI with a clean index:
```console
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx2048m"
$ dspace oai -c import
```
The process ran out of memory several times so I had to keep trying again with more JVM heap memory.
### Processing Solr Statistics With solr-upgrade-statistics-6x
After the main upgrade process was finished and DSpace was running I started processing the Solr statistics with `solr-upgrade-statistics-6x` to migrate all IDs to UUIDs.
### statistics
## statistics
First process the current year's statistics core:
```console
@ -57,7 +75,7 @@ After several rounds of processing it finished. Here are some statistics about u
$ curl -s "http://localhost:8081/solr/statistics/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:* NOT id:/.{36}/</query></delete>"
```
### statistics-2019
## statistics-2019
Processing the statistics-2019 core:
```console
@ -89,7 +107,7 @@ After several rounds of processing it finished. Here are some statistics about u
$ curl -s "http://localhost:8081/solr/statistics-2019/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:* NOT id:/.{36}/</query></delete>"
```
### statistics-2018
## statistics-2018
Processing the statistics-2018 core:
```console
@ -161,7 +179,7 @@ Eventually the processing finished. Here are some statistics about unmigrated do
$ curl -s "http://localhost:8081/solr/statistics-2017/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:* NOT id:/.{36}/</query></delete>"
```
### statistics-2016
## statistics-2016
Processing the statistics-2016 core:
@ -192,7 +210,8 @@ $ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics-2016
$ curl -s "http://localhost:8081/solr/statistics-2016/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:* NOT id:/.{36}/</query></delete>"
```
### statistics-2015
## statistics-2015
Processing the statistics-2015 core:
@ -225,6 +244,7 @@ Summary of stats after processing:
$ curl -s "http://localhost:8081/solr/statistics-2015/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:* NOT id:/.{36}/</query></delete>"
```
## statistics-2014
Processing the statistics-2014 core:
@ -259,6 +279,7 @@ Summary of unmigrated documents after processing:
$ curl -s "http://localhost:8081/solr/statistics-2014/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:* NOT id:/.{36}/</query></delete>"
```
## statistics-2013
Processing the statistics-2013 core:
@ -292,3 +313,105 @@ Summary of unmigrated docs after processing:
```console
$ curl -s "http://localhost:8081/solr/statistics-2013/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:* NOT id:/.{36}/</query></delete>"
```
## statistics-2012
Processing the statistics-2012 core:
```console
$ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics-2012
...
=================================================================
*** Statistics Records with Legacy Id ***
2,229,332 Item View
913,577 Bistream View
215,577 Collection View
104,734 Community View
--------------------------------------
3,463,220 TOTAL
=================================================================
```
Summary of unmigrated docs after processing:
- 0: `(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)`
- 33,161: `id:/.+-unmigrated/`
- 33,161: `*:* NOT id:/.{36}/`
- 33,161 are `type: 3` (COLLECTION), which is different than I've seen previously... but I suppose I still have to purge them because there will be errors in the Atmire modules otherwise:
```console
$ curl -s "http://localhost:8081/solr/statistics-2012/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:* NOT id:/.{36}/</query></delete>"
```
## statistics-2011
Processing the statistics-2011 core:
```console
$ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics-2011
...
=================================================================
*** Statistics Records with Legacy Id ***
904,896 Item View
385,789 Bistream View
154,356 Collection View
62,978 Community View
--------------------------------------
1,508,019 TOTAL
=================================================================
```
Summary of unmigrated docs after processing:
- 0: `(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)`
- 17,551: `id:/.+-unmigrated/`
- 17,551: `*:* NOT id:/.{36}/`
- 12,116 are `type: 3` (COLLECTION), which is different than I've seen previously... but I suppose I still have to purge them because there will be errors in the Atmire modules otherwise:
```console
$ curl -s "http://localhost:8081/solr/statistics-2011/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:* NOT id:/.{36}/</query></delete>"
```
## statistics-2010
Processing the statistics-2010 core:
```console
$ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics-2010
...
=================================================================
*** Statistics Records with Legacy Id ***
26,067 Item View
15,615 Bistream View
4,116 Collection View
1,094 Community View
--------------------------------------
46,892 TOTAL
=================================================================
```
Summary of unmigrated docs after processing:
- 0: `(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)`
- 1,012: `id:/.+-unmigrated/`
- 1,012: `*:* NOT id:/.{36}/`
- 654 are `type: 3` (COLLECTION), which is different than I've seen previously... but I suppose I still have to purge them because there will be errors in the Atmire modules otherwise:
```console
$ curl -s "http://localhost:8081/solr/statistics-2010/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:* NOT id:/.{36}/</query></delete>"
```
### Processing Solr statistics with AtomicStatisticsUpdateCLI
On 2020-11-18 I finished processing the Solr statistics with solr-upgrade-statistics-6x and I started processing them with AtomicStatisticsUpdateCLI:
```
$ chrt -b 0 dspace dsrun com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI -t 12 -c statistics
```

View File

@ -16,7 +16,7 @@ I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu&rsquo;s munin-pl
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-02/" />
<meta property="article:published_time" content="2018-02-01T16:28:54+02:00" />
<meta property="article:modified_time" content="2019-10-28T13:39:25+02:00" />
<meta property="article:modified_time" content="2020-11-18T17:15:23+02:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="February, 2018"/>
@ -39,7 +39,7 @@ I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu&rsquo;s munin-pl
"url": "https://alanorth.github.io/cgspace-notes/2018-02/",
"wordCount": "6410",
"datePublished": "2018-02-01T16:28:54+02:00",
"dateModified": "2019-10-28T13:39:25+02:00",
"dateModified": "2020-11-18T17:15:23+02:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -208,7 +208,7 @@ Tue Feb 6 09:30:32 UTC 2018
<li>So I restarted Tomcat and now everything is fine</li>
<li>Next time I see that many database connections I need to save the output so I can analyze it later</li>
<li>I&rsquo;m going to re-schedule the taskUpdateSolrStatsMetadata task as <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=566">Bram detailed in ticket 566</a> to see if it makes CGSpace stop crashing every morning</li>
<li>If I move the task from 3AM to 3PM, deally CGSpace will stop crashing in the morning, or start crashing ~12 hours later</li>
<li>If I move the task from 3AM to 3PM, ideally CGSpace will stop crashing in the morning, or start crashing ~12 hours later</li>
<li>Eventually Atmire has said that there will be a fix for this high load caused by their script, but it will come with the 5.8 compatability they are already working on</li>
<li>I re-deployed CGSpace with the new task time of 3PM, ran all system updates, and restarted the server</li>
<li>Also, I changed the name of the DSpace fallback pool on DSpace Test and CGSpace to be called &lsquo;dspaceCli&rsquo; so that I can distinguish it in <code>pg_stat_activity</code></li>

View File

@ -17,7 +17,7 @@ So far we&rsquo;ve spent at least fifty hours to process the statistics and stat
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2020-11/" />
<meta property="article:published_time" content="2020-11-01T13:11:54+02:00" />
<meta property="article:modified_time" content="2020-11-16T10:53:45+02:00" />
<meta property="article:modified_time" content="2020-11-17T22:14:56+02:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="November, 2020"/>
@ -39,9 +39,9 @@ So far we&rsquo;ve spent at least fifty hours to process the statistics and stat
"@type": "BlogPosting",
"headline": "November, 2020",
"url": "https://alanorth.github.io/cgspace-notes/2020-11/",
"wordCount": "2131",
"wordCount": "2665",
"datePublished": "2020-11-01T13:11:54+02:00",
"dateModified": "2020-11-16T10:53:45+02:00",
"dateModified": "2020-11-17T22:14:56+02:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -486,7 +486,77 @@ org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error whil
</ul>
<pre><code>$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
2071
</code></pre><!-- raw HTML omitted -->
</code></pre><h2 id="2020-11-18">2020-11-18</h2>
<ul>
<li>I decided to enable the <code>rollbackOnReturn=true</code> option in <a href="https://tomcat.apache.org/tomcat-7.0-doc/jdbc-pool.html">Tomcat&rsquo;s JDBC connection pool parameters</a> because I noticed that all of the &ldquo;idle in transaction&rdquo; connections waiting for locks were SELECT queries
<ul>
<li>There are many posts on the Internet about people having this issue with Hibernate</li>
<li>The locks are lower now, but Peter and Abenet are still having issues approving items and Tezira forwarded one strange case where an item was &ldquo;approved&rdquo; and was assigned a handle, but it doesn&rsquo;t exist&hellip;</li>
<li>I sent another mail to the dspace-tech mailing list to ask for help</li>
<li>I reverted the <code>rollbackOnReturn</code> change in Tomcat&hellip;</li>
<li>I sent a message to Atmire to ask for urgent help</li>
</ul>
</li>
<li>Call with IWMI and Abenet about them potentially moving from InMagic to CGSpace
<ul>
<li>They have questions about the reporting on AReS</li>
<li>We told them that we can use collections to infer Strategic Priorities and Research Groups and WLE Flagships</li>
<li>It sounds like we will create this structure under the top-level IWMI community:
<ul>
<li>IWMI Strategic Priorities (sub-community)
<ul>
<li>Water, Food and Ecosystems (sub-community)
<ul>
<li>Sustainable and Resilient Food Production Systems (collection)</li>
<li>Sustainable Water infrastructure and Ecosystems (collection)</li>
<li>Integrated Basin and Aquifer Management</li>
</ul>
</li>
<li>Water, Climate Change and Resilience (sub-community)
<ul>
<li>Climate Change Adaptation and Resilience (collection)</li>
</ul>
</li>
<li>etc&hellip;</li>
</ul>
</li>
</ul>
</li>
<li>They will submit items to their normal output type collections and map to these</li>
</ul>
</li>
<li>In other news I finally finished processing the Solr statistics for UUIDs and re-indexed the stats with the dspace-statistics-api
<ul>
<li>I started the Atmire stats processing, notes in the dedicated <a href="/cgspace-notes/cgspace-dspace6-upgrade/">CGSpace DSpace 6 Upgrade section</a></li>
</ul>
</li>
<li>Peter got a strange message this evening when trying to update metadata:</li>
</ul>
<pre><code>2020-11-18 16:57:33,309 ERROR org.hibernate.engine.jdbc.batch.internal.BatchingBatch @ HHH000315: Exception executing batch [Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1]
2020-11-18 16:57:33,316 ERROR org.hibernate.engine.jdbc.batch.internal.BatchingBatch @ HHH000315: Exception executing batch [Batch update returned unexpected row count from update [13]; actual row count: 0; expected: 1]
2020-11-18 16:57:33,385 INFO org.hibernate.engine.jdbc.batch.internal.AbstractBatchImpl @ HHH000010: On release of batch it still contained JDBC statements
</code></pre><ul>
<li>Minor bug fixes to limit parameter in DSpace Statistics API
<ul>
<li>Release <a href="https://github.com/ilri/dspace-statistics-api/releases/tag/v1.3.2">version 1.3.2</a></li>
</ul>
</li>
<li>Send a list of potential ToRs for a next phase of OpenRXV development to Michael Victor for feedback:
<ul>
<li>Enable advanced reporting templates using &ldquo;Angular expressions&rdquo; in Docxtemplater (would be used immediately for IWMI and BioversityCIAT)</li>
<li>Enable embedding of charts like world map and word cloud in reports</li>
<li>Enable embedding of item thumbnails in reports, similar to the &ldquo;list of information products&rdquo;</li>
<li>Enable something like the &ldquo;Statistics&rdquo; Excel report Peter wanted in 2019 so we can get community and collection statistics reports</li>
<li>Add a new &ldquo;metrics&rdquo; block with statistics about top authors and items by number of views and downloads for the current search terms</li>
<li>Add ability to change the explorer UI to &ldquo;Usage Statistics&rdquo; mode where lists of authors, affiliations, sponsors, CRPs, communities, collections, etc are sorted according to the number of views or downloads for the current search results, rather than by number of occurrences of metadata values</li>
<li>Add ability to &ldquo;drill down&rdquo; or modify search filter terms by clicking on countries in the map</li>
<li>Enable date-based usage statistics (currently only &ldquo;all time&rdquo; statistics are available)</li>
<li>Fixing minor bugs for all issues filed on GitHub</li>
</ul>
</li>
<li>I also added GitHub issues for each of them</li>
</ul>
<!-- raw HTML omitted -->

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
<meta property="og:updated_time" content="2020-11-16T10:53:45+02:00" />
<meta property="og:updated_time" content="2020-11-18T17:15:23+02:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Categories"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2020-11-16T10:53:45+02:00" />
<meta property="og:updated_time" content="2020-11-18T17:15:23+02:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2020-11-16T10:53:45+02:00" />
<meta property="og:updated_time" content="2020-11-18T17:15:23+02:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2020-11-16T10:53:45+02:00" />
<meta property="og:updated_time" content="2020-11-18T17:15:23+02:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2020-11-16T10:53:45+02:00" />
<meta property="og:updated_time" content="2020-11-18T17:15:23+02:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2020-11-16T10:53:45+02:00" />
<meta property="og:updated_time" content="2020-11-18T17:15:23+02:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/>

View File

@ -10,7 +10,7 @@
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/cgspace-dspace6-upgrade/" />
<meta property="article:published_time" content="2020-11-15T13:27:35+02:00" />
<meta property="article:modified_time" content="2020-11-15T13:27:35+02:00" />
<meta property="article:modified_time" content="2020-11-17T22:14:56+02:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace DSpace 6 Upgrade"/>
@ -25,9 +25,9 @@
"@type": "BlogPosting",
"headline": "CGSpace DSpace 6 Upgrade",
"url": "https://alanorth.github.io/cgspace-notes/cgspace-dspace6-upgrade/",
"wordCount": "878",
"wordCount": "1281",
"datePublished": "2020-11-15T13:27:35+02:00",
"dateModified": "2020-11-15T13:27:35+02:00",
"dateModified": "2020-11-17T22:14:56+02:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -106,7 +106,8 @@
</header>
<p>Notes about the DSpace 6 upgrade on CGSpace in 2020-11.</p>
<ul>
<li><a href="#processing-solr-statistics-with-solr-upgrade-statistics-6x">Processing Solr Statistics With solr-upgrade-statistics-6x</a>
<li><a href="#re-import-oai-with-clean-index">Re-import OAI with clean index</a></li>
<li><a href="#processing-solr-statistics-with-solr-upgrade-statistics-6x">Processing Solr statistics with solr-upgrade-statistics-6x</a>
<ul>
<li><a href="#statistics">Current year&rsquo;s statistics core</a></li>
<li><a href="#statistics-2019">statistics-2019 core</a></li>
@ -116,12 +117,21 @@
<li><a href="#statistics-2015">statistics-2015 core</a></li>
<li><a href="#statistics-2014">statistics-2014 core</a></li>
<li><a href="#statistics-2013">statistics-2013 core</a></li>
<li><a href="#statistics-2012">statistics-2013 core</a></li>
<li><a href="#statistics-2011">statistics-2013 core</a></li>
<li><a href="#statistics-2010">statistics-2013 core</a></li>
</ul>
</li>
<li><a href="processing-solr-statistics-with-atomicstatisticsupdatecli">Processing Solr statistics with AtomicStatisticsUpdateCLI</a></li>
</ul>
<h2 id="processing-solr-statistics-with-solr-upgrade-statistics-6x">Processing Solr Statistics With solr-upgrade-statistics-6x</h2>
<h3 id="re-import-oai-with-clean-index">Re-import OAI with clean index</h3>
<p>After the upgrade is complete, re-index all items into OAI with a clean index:</p>
<pre><code class="language-console" data-lang="console">$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx2048m&quot;
$ dspace oai -c import
</code></pre><p>The process ran out of memory several times so I had to keep trying again with more JVM heap memory.</p>
<h3 id="processing-solr-statistics-with-solr-upgrade-statistics-6x">Processing Solr Statistics With solr-upgrade-statistics-6x</h3>
<p>After the main upgrade process was finished and DSpace was running I started processing the Solr statistics with <code>solr-upgrade-statistics-6x</code> to migrate all IDs to UUIDs.</p>
<h3 id="statistics">statistics</h3>
<h2 id="statistics">statistics</h2>
<p>First process the current year&rsquo;s statistics core:</p>
<pre><code class="language-console" data-lang="console">$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx2048m'
$ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics
@ -147,7 +157,7 @@ $ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics
<li>Majority are <code>type: 5</code> (aka SITE, according to <code>Constants.java</code>) so we can purge them:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -s &quot;http://localhost:8081/solr/statistics/update?softCommit=true&quot; -H &quot;Content-Type: text/xml&quot; --data-binary &quot;&lt;delete&gt;&lt;query&gt;*:* NOT id:/.{36}/&lt;/query&gt;&lt;/delete&gt;&quot;
</code></pre><h3 id="statistics-2019">statistics-2019</h3>
</code></pre><h2 id="statistics-2019">statistics-2019</h2>
<p>Processing the statistics-2019 core:</p>
<pre><code class="language-console" data-lang="console">$ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics
...
@ -172,7 +182,7 @@ $ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics
<li>4,172,929 are <code>type: 5</code> (aka SITE) so we can purge them:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -s &quot;http://localhost:8081/solr/statistics-2019/update?softCommit=true&quot; -H &quot;Content-Type: text/xml&quot; --data-binary &quot;&lt;delete&gt;&lt;query&gt;*:* NOT id:/.{36}/&lt;/query&gt;&lt;/delete&gt;&quot;
</code></pre><h3 id="statistics-2018">statistics-2018</h3>
</code></pre><h2 id="statistics-2018">statistics-2018</h2>
<p>Processing the statistics-2018 core:</p>
<pre><code class="language-console" data-lang="console">$ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics-2018
...
@ -225,7 +235,7 @@ $ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics-2018
<li>1,660,524 are <code>type: 5</code> (SITE) so we can purge them:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -s &quot;http://localhost:8081/solr/statistics-2017/update?softCommit=true&quot; -H &quot;Content-Type: text/xml&quot; --data-binary &quot;&lt;delete&gt;&lt;query&gt;*:* NOT id:/.{36}/&lt;/query&gt;&lt;/delete&gt;&quot;
</code></pre><h3 id="statistics-2016">statistics-2016</h3>
</code></pre><h2 id="statistics-2016">statistics-2016</h2>
<p>Processing the statistics-2016 core:</p>
<pre><code class="language-console" data-lang="console">$ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics-2016
...
@ -249,7 +259,7 @@ $ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics-2018
<li>1,469,706 are <code>type: 5</code> (SITE) so we can purge them:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -s &quot;http://localhost:8081/solr/statistics-2016/update?softCommit=true&quot; -H &quot;Content-Type: text/xml&quot; --data-binary &quot;&lt;delete&gt;&lt;query&gt;*:* NOT id:/.{36}/&lt;/query&gt;&lt;/delete&gt;&quot;
</code></pre><h3 id="statistics-2015">statistics-2015</h3>
</code></pre><h2 id="statistics-2015">statistics-2015</h2>
<p>Processing the statistics-2015 core:</p>
<pre><code class="language-console" data-lang="console">$ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics-2015
...
@ -326,6 +336,75 @@ $ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics-2018
<li>15,691 are <code>type: 5</code> (SITE) so we can purge them:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -s &quot;http://localhost:8081/solr/statistics-2013/update?softCommit=true&quot; -H &quot;Content-Type: text/xml&quot; --data-binary &quot;&lt;delete&gt;&lt;query&gt;*:* NOT id:/.{36}/&lt;/query&gt;&lt;/delete&gt;&quot;
</code></pre><h2 id="statistics-2012">statistics-2012</h2>
<p>Processing the statistics-2012 core:</p>
<pre><code class="language-console" data-lang="console">$ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics-2012
...
=================================================================
*** Statistics Records with Legacy Id ***
2,229,332 Item View
913,577 Bistream View
215,577 Collection View
104,734 Community View
--------------------------------------
3,463,220 TOTAL
=================================================================
</code></pre><p>Summary of unmigrated docs after processing:</p>
<ul>
<li>0: <code>(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)</code></li>
<li>33,161: <code>id:/.+-unmigrated/</code></li>
<li>33,161: <code>*:* NOT id:/.{36}/</code></li>
<li>33,161 are <code>type: 3</code> (COLLECTION), which is different than I&rsquo;ve seen previously&hellip; but I suppose I still have to purge them because there will be errors in the Atmire modules otherwise:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -s &quot;http://localhost:8081/solr/statistics-2012/update?softCommit=true&quot; -H &quot;Content-Type: text/xml&quot; --data-binary &quot;&lt;delete&gt;&lt;query&gt;*:* NOT id:/.{36}/&lt;/query&gt;&lt;/delete&gt;&quot;
</code></pre><h2 id="statistics-2011">statistics-2011</h2>
<p>Processing the statistics-2011 core:</p>
<pre><code class="language-console" data-lang="console">$ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics-2011
...
=================================================================
*** Statistics Records with Legacy Id ***
904,896 Item View
385,789 Bistream View
154,356 Collection View
62,978 Community View
--------------------------------------
1,508,019 TOTAL
=================================================================
</code></pre><p>Summary of unmigrated docs after processing:</p>
<ul>
<li>0: <code>(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)</code></li>
<li>17,551: <code>id:/.+-unmigrated/</code></li>
<li>17,551: <code>*:* NOT id:/.{36}/</code></li>
<li>12,116 are <code>type: 3</code> (COLLECTION), which is different than I&rsquo;ve seen previously&hellip; but I suppose I still have to purge them because there will be errors in the Atmire modules otherwise:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -s &quot;http://localhost:8081/solr/statistics-2011/update?softCommit=true&quot; -H &quot;Content-Type: text/xml&quot; --data-binary &quot;&lt;delete&gt;&lt;query&gt;*:* NOT id:/.{36}/&lt;/query&gt;&lt;/delete&gt;&quot;
</code></pre><h2 id="statistics-2010">statistics-2010</h2>
<p>Processing the statistics-2010 core:</p>
<pre><code class="language-console" data-lang="console">$ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics-2010
...
=================================================================
*** Statistics Records with Legacy Id ***
26,067 Item View
15,615 Bistream View
4,116 Collection View
1,094 Community View
--------------------------------------
46,892 TOTAL
=================================================================
</code></pre><p>Summary of unmigrated docs after processing:</p>
<ul>
<li>0: <code>(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)</code></li>
<li>1,012: <code>id:/.+-unmigrated/</code></li>
<li>1,012: <code>*:* NOT id:/.{36}/</code></li>
<li>654 are <code>type: 3</code> (COLLECTION), which is different than I&rsquo;ve seen previously&hellip; but I suppose I still have to purge them because there will be errors in the Atmire modules otherwise:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -s &quot;http://localhost:8081/solr/statistics-2010/update?softCommit=true&quot; -H &quot;Content-Type: text/xml&quot; --data-binary &quot;&lt;delete&gt;&lt;query&gt;*:* NOT id:/.{36}/&lt;/query&gt;&lt;/delete&gt;&quot;
</code></pre><h3 id="processing-solr-statistics-with-atomicstatisticsupdatecli">Processing Solr statistics with AtomicStatisticsUpdateCLI</h3>
<p>On 2020-11-18 I finished processing the Solr statistics with solr-upgrade-statistics-6x and I started processing them with AtomicStatisticsUpdateCLI:</p>
<pre><code>$ chrt -b 0 dspace dsrun com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI -t 12 -c statistics
</code></pre>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-11-16T10:53:45+02:00" />
<meta property="og:updated_time" content="2020-11-18T17:15:23+02:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-11-16T10:53:45+02:00" />
<meta property="og:updated_time" content="2020-11-18T17:15:23+02:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-11-16T10:53:45+02:00" />
<meta property="og:updated_time" content="2020-11-18T17:15:23+02:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-11-16T10:53:45+02:00" />
<meta property="og:updated_time" content="2020-11-18T17:15:23+02:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-11-16T10:53:45+02:00" />
<meta property="og:updated_time" content="2020-11-18T17:15:23+02:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-11-16T10:53:45+02:00" />
<meta property="og:updated_time" content="2020-11-18T17:15:23+02:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-11-16T10:53:45+02:00" />
<meta property="og:updated_time" content="2020-11-18T17:15:23+02:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-11-16T10:53:45+02:00" />
<meta property="og:updated_time" content="2020-11-18T17:15:23+02:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-11-16T10:53:45+02:00" />
<meta property="og:updated_time" content="2020-11-18T17:15:23+02:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-11-16T10:53:45+02:00" />
<meta property="og:updated_time" content="2020-11-18T17:15:23+02:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-11-16T10:53:45+02:00" />
<meta property="og:updated_time" content="2020-11-18T17:15:23+02:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-11-16T10:53:45+02:00" />
<meta property="og:updated_time" content="2020-11-18T17:15:23+02:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-11-16T10:53:45+02:00" />
<meta property="og:updated_time" content="2020-11-18T17:15:23+02:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-11-16T10:53:45+02:00" />
<meta property="og:updated_time" content="2020-11-18T17:15:23+02:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>

View File

@ -4,42 +4,42 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
<lastmod>2020-11-16T10:53:45+02:00</lastmod>
<lastmod>2020-11-18T17:15:23+02:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/cgspace-dspace6-upgrade/</loc>
<lastmod>2020-11-15T13:27:35+02:00</lastmod>
<lastmod>2020-11-17T22:14:56+02:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2020-11-16T10:53:45+02:00</lastmod>
<lastmod>2020-11-18T17:15:23+02:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/migration/</loc>
<lastmod>2020-11-15T13:27:35+02:00</lastmod>
<lastmod>2020-11-17T22:14:56+02:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
<lastmod>2020-11-16T10:53:45+02:00</lastmod>
<lastmod>2020-11-18T17:15:23+02:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2020-11-16T10:53:45+02:00</lastmod>
<lastmod>2020-11-18T17:15:23+02:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
<lastmod>2020-11-15T13:27:35+02:00</lastmod>
<lastmod>2020-11-17T22:14:56+02:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/2020-11/</loc>
<lastmod>2020-11-16T10:53:45+02:00</lastmod>
<lastmod>2020-11-17T22:14:56+02:00</lastmod>
</url>
<url>
@ -209,7 +209,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/2018-02/</loc>
<lastmod>2019-10-28T13:39:25+02:00</lastmod>
<lastmod>2020-11-18T17:15:23+02:00</lastmod>
</url>
<url>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/tags/" />
<meta property="og:updated_time" content="2020-11-15T13:27:35+02:00" />
<meta property="og:updated_time" content="2020-11-17T22:14:56+02:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Tags"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/tags/migration/" />
<meta property="og:updated_time" content="2020-11-15T13:27:35+02:00" />
<meta property="og:updated_time" content="2020-11-17T22:14:56+02:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Migration"/>