cgspace-notes/content/posts/cgspace-dspace6-upgrade.md

295 lines
10 KiB
Markdown

+++
title = "CGSpace DSpace 6 Upgrade"
date = 2020-11-15T13:27:35+02:00
description = "Documenting the DSpace 6 upgrade."
categories = ["Notes"]
tags = ["Migration"]
url = "cgspace-dspace6-upgrade"
+++
Notes about the DSpace 6 upgrade on CGSpace in 2020-11.
<!--more-->
- [Processing Solr Statistics With solr-upgrade-statistics-6x](#processing-solr-statistics-with-solr-upgrade-statistics-6x)
- [Current year's statistics core](#statistics)
- [statistics-2019 core](#statistics-2019)
- [statistics-2018 core](#statistics-2018)
- [statistics-2017 core](#statistics-2017)
- [statistics-2016 core](#statistics-2016)
- [statistics-2015 core](#statistics-2015)
- [statistics-2014 core](#statistics-2014)
- [statistics-2013 core](#statistics-2013)
## Processing Solr Statistics With solr-upgrade-statistics-6x
After the main upgrade process was finished and DSpace was running I started processing the Solr statistics with `solr-upgrade-statistics-6x` to migrate all IDs to UUIDs.
### statistics
First process the current year's statistics core:
```console
$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx2048m'
$ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics
...
=================================================================
*** Statistics Records with Legacy Id ***
3,817,407 Bistream View
1,693,443 Item View
105,974 Collection View
62,383 Community View
163,192 Community Search
162,581 Collection Search
470,288 Unexpected Type & Full Site
--------------------------------------
6,475,268 TOTAL
=================================================================
```
After several rounds of processing it finished. Here are some statistics about unmigrated documents:
- 227,000: `(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)`
- 471,000: `id:/.+-unmigrated/`
- 698,000: `*:* NOT id:/.{36}/`
- Majority are `type: 5` (aka SITE, according to `Constants.java`) so we can purge them:
```console
$ curl -s "http://localhost:8081/solr/statistics/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:* NOT id:/.{36}/</query></delete>"
```
### statistics-2019
Processing the statistics-2019 core:
```console
$ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics
...
=================================================================
*** Statistics Records with Legacy Id ***
5,569,344 Bistream View
2,179,105 Item View
117,194 Community View
104,091 Collection View
774,138 Community Search
568,347 Collection Search
1,482,620 Unexpected Type & Full Site
--------------------------------------
10,794,839 TOTAL
=================================================================
```
After several rounds of processing it finished. Here are some statistics about unmigrated documents:
- 2,690,309: `(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)`
- 1,494,587: `id:/.+-unmigrated/`
- 4,184,896: `*:* NOT id:/.{36}/`
- 4,172,929 are `type: 5` (aka SITE) so we can purge them:
```console
$ curl -s "http://localhost:8081/solr/statistics-2019/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:* NOT id:/.{36}/</query></delete>"
```
### statistics-2018
Processing the statistics-2018 core:
```console
$ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics-2018
...
=================================================================
*** Statistics Records with Legacy Id ***
3,561,532 Bistream View
1,129,326 Item View
97,401 Community View
63,508 Collection View
207,827 Community Search
43,752 Collection Search
457,820 Unexpected Type & Full Site
--------------------------------------
5,561,166 TOTAL
=================================================================
```
After some time I got an error about Java heap space so I increased the JVM memory and restarted processing:
```console
$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx4096m'
$ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics-2018
```
Eventually the processing finished. Here are some statistics about unmigrated documents:
- 365,473: `(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)`
- 546,955: `id:/.+-unmigrated/`
- 923,158: `*:* NOT id:/.{36}/`
- 823,293: are `type: 5` so we can purge them:
```console
$ curl -s "http://localhost:8081/solr/statistics-2018/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:* NOT id:/.{36}/</query></delete>"
```
## statistics-2017
Processing the statistics-2017 core:
```console
$ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics-2017
...
=================================================================
*** Statistics Records with Legacy Id ***
2,529,208 Bistream View
1,618,717 Item View
144,945 Community View
74,249 Collection View
479,647 Community Search
114,658 Collection Search
852,215 Unexpected Type & Full Site
--------------------------------------
5,813,639 TOTAL
=================================================================
```
Eventually the processing finished. Here are some statistics about unmigrated documents:
- 808,309: `(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)`
- 893,868: `id:/.+-unmigrated/`
- 1,702,177: `*:* NOT id:/.{36}/`
- 1,660,524 are `type: 5` (SITE) so we can purge them:
```console
$ curl -s "http://localhost:8081/solr/statistics-2017/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:* NOT id:/.{36}/</query></delete>"
```
### statistics-2016
Processing the statistics-2016 core:
```console
$ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics-2016
...
=================================================================
*** Statistics Records with Legacy Id ***
1,765,924 Bistream View
1,151,575 Item View
187,110 Community View
51,204 Collection View
347,382 Community Search
66,605 Collection Search
620,298 Unexpected Type & Full Site
--------------------------------------
4,190,098 TOTAL
=================================================================
```
- 849,408: `(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)`
- 627,747: `id:/.+-unmigrated/`
- 1,477,155: `*:* NOT id:/.{36}/`
- 1,469,706 are `type: 5` (SITE) so we can purge them:
```console
$ curl -s "http://localhost:8081/solr/statistics-2016/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:* NOT id:/.{36}/</query></delete>"
```
### statistics-2015
Processing the statistics-2015 core:
```console
$ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics-2015
...
=================================================================
*** Statistics Records with Legacy Id ***
990,916 Bistream View
506,070 Item View
116,153 Community View
33,282 Collection View
21,062 Community Search
10,788 Collection Search
52,107 Unexpected Type & Full Site
--------------------------------------
1,730,378 TOTAL
=================================================================
```
Summary of stats after processing:
- 195,293: `(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)`
- 67,146: `id:/.+-unmigrated/`
- 262,439: `*:* NOT id:/.{36}/`
- 247,400 are `type: 5` (SITE) so we can purge them:
```console
$ curl -s "http://localhost:8081/solr/statistics-2015/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:* NOT id:/.{36}/</query></delete>"
```
## statistics-2014
Processing the statistics-2014 core:
```console
$ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics-2014
...
=================================================================
*** Statistics Records with Legacy Id ***
2,381,603 Item View
1,323,357 Bistream View
501,545 Community View
247,805 Collection View
250 Collection Search
188 Community Search
50 Item Search
10,918 Unexpected Type & Full Site
--------------------------------------
4,465,716 TOTAL
=================================================================
```
Summary of unmigrated documents after processing:
- 182,131: `(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)`
- 39,947: `id:/.+-unmigrated/`
- 222,078: `*:* NOT id:/.{36}/`
- 188,791 are `type: 5` (SITE) so we can purge them:
```console
$ curl -s "http://localhost:8081/solr/statistics-2014/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:* NOT id:/.{36}/</query></delete>"
```
## statistics-2013
Processing the statistics-2013 core:
```console
$ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics-2013
...
=================================================================
*** Statistics Records with Legacy Id ***
2,352,124 Item View
1,117,676 Bistream View
575,711 Community View
171,639 Collection View
248 Item Search
7 Collection Search
5 Community Search
1,452 Unexpected Type & Full Site
--------------------------------------
4,218,862 TOTAL
=================================================================
```
Summary of unmigrated docs after processing:
- 2,548 : `(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)`
- 29,772: `id:/.+-unmigrated/`
- 32,320: `*:* NOT id:/.{36}/`
- 15,691 are `type: 5` (SITE) so we can purge them:
```console
$ curl -s "http://localhost:8081/solr/statistics-2013/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:* NOT id:/.{36}/</query></delete>"
```