Add notes for 2020-11-17

This commit is contained in:
2020-11-17 22:14:56 +02:00
parent a50ad3ac76
commit d7a6467475
106 changed files with 1824 additions and 664 deletions

View File

@ -259,4 +259,97 @@ org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error whil
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)
```
## 2020-11-16
- Users are having issues submitting items to CGSpace
- Looking at the data I see that connections skyrocketed since DSpace 6 upgrade yesterday, and they are all in "waiting for lock" state:
![PostgreSQL connections week](/cgspace-notes/2020/11/postgres_connections_ALL-week.png)
![PostgreSQL locks week](/cgspace-notes/2020/11/postgres_locks_ALL-week.png)
- There are almost 1,500 locks:
```
$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
1494
```
- I sent a mail to the dspace-tech mailing list to ask for help...
- For now I just restarted PostgreSQL and a few users were able to complete submissions...
- While processing the statistics-2018 Solr core I got the *same* memory error that I have gotten every time I processed this core in testing:
```
Exception: Java heap space
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
at java.lang.StringBuffer.append(StringBuffer.java:270)
at java.io.StringWriter.write(StringWriter.java:101)
at org.apache.solr.common.util.XML.writeXML(XML.java:133)
at org.apache.solr.client.solrj.util.ClientUtils.writeVal(SourceFile:160)
at org.apache.solr.client.solrj.util.ClientUtils.writeXML(SourceFile:128)
at org.apache.solr.client.solrj.request.UpdateRequest.writeXML(UpdateRequest.java:365)
at org.apache.solr.client.solrj.request.UpdateRequest.getXML(UpdateRequest.java:281)
at org.apache.solr.client.solrj.request.RequestWriter.getContentStream(RequestWriter.java:67)
at org.apache.solr.client.solrj.request.RequestWriter$LazyContentStream.getDelegate(RequestWriter.java:95)
at org.apache.solr.client.solrj.request.RequestWriter$LazyContentStream.getName(RequestWriter.java:105)
at org.apache.solr.client.solrj.impl.HttpSolrServer.createMethod(HttpSolrServer.java:302)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54)
at org.dspace.util.SolrUpgradePre6xStatistics.batchUpdateStats(SolrUpgradePre6xStatistics.java:161)
at org.dspace.util.SolrUpgradePre6xStatistics.run(SolrUpgradePre6xStatistics.java:456)
at org.dspace.util.SolrUpgradePre6xStatistics.main(SolrUpgradePre6xStatistics.java:365)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)
```
- I increased the Java heap memory to 4096MB and restarted the processing
- After a few hours I got the following error, which I have gotten several times over the last few months:
```
Exception: Error while creating field 'p_group_id{type=uuid,properties=indexed,stored,multiValued}' from value '10'
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error while creating field 'p_group_id{type=uuid,properties=indexed,stored,multiValued}' from value '10'
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54)
at org.dspace.util.SolrUpgradePre6xStatistics.batchUpdateStats(SolrUpgradePre6xStatistics.java:161)
at org.dspace.util.SolrUpgradePre6xStatistics.run(SolrUpgradePre6xStatistics.java:456)
at org.dspace.util.SolrUpgradePre6xStatistics.main(SolrUpgradePre6xStatistics.java:365)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)
```
## 2020-11-17
- Chat with Peter about using some remaining CRP Livestock open access money to fund more work on OpenRXV / AReS
- I will create GitHub issues for each of the things we talked about and then create ToRs to send to CodeObia for a quote
- Continue migrating Solr statistics to DSpace 6 UUID format after the upgrade on Sunday
- Regarding the IWMI issue about flagships and strategic priorities we can use CRP Livestock as an example because all their [flagships are mapped to collections](https://cgspace.cgiar.org/handle/10568/80102)
- Database issues are worse today...
![PostgreSQL connections week](/cgspace-notes/2020/11/postgres_connections_ALL-week2.png)
![PostgreSQL locks week](/cgspace-notes/2020/11/postgres_locks_ALL-week2.png)
- There are over 2,000 locks:
```
$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
2071
```
<!-- vim: set sw=2 ts=2: -->

View File

@ -0,0 +1,294 @@
+++
title = "CGSpace DSpace 6 Upgrade"
date = 2020-11-15T13:27:35+02:00
description = "Documenting the DSpace 6 upgrade."
categories = ["Notes"]
tags = ["Migration"]
url = "cgspace-dspace6-upgrade"
+++
Notes about the DSpace 6 upgrade on CGSpace in 2020-11.
<!--more-->
- [Processing Solr Statistics With solr-upgrade-statistics-6x](#processing-solr-statistics-with-solr-upgrade-statistics-6x)
- [Current year's statistics core](#statistics)
- [statistics-2019 core](#statistics-2019)
- [statistics-2018 core](#statistics-2018)
- [statistics-2017 core](#statistics-2017)
- [statistics-2016 core](#statistics-2016)
- [statistics-2015 core](#statistics-2015)
- [statistics-2014 core](#statistics-2014)
- [statistics-2013 core](#statistics-2013)
## Processing Solr Statistics With solr-upgrade-statistics-6x
After the main upgrade process was finished and DSpace was running I started processing the Solr statistics with `solr-upgrade-statistics-6x` to migrate all IDs to UUIDs.
### statistics
First process the current year's statistics core:
```console
$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx2048m'
$ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics
...
=================================================================
*** Statistics Records with Legacy Id ***
3,817,407 Bistream View
1,693,443 Item View
105,974 Collection View
62,383 Community View
163,192 Community Search
162,581 Collection Search
470,288 Unexpected Type & Full Site
--------------------------------------
6,475,268 TOTAL
=================================================================
```
After several rounds of processing it finished. Here are some statistics about unmigrated documents:
- 227,000: `(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)`
- 471,000: `id:/.+-unmigrated/`
- 698,000: `*:* NOT id:/.{36}/`
- Majority are `type: 5` (aka SITE, according to `Constants.java`) so we can purge them:
```console
$ curl -s "http://localhost:8081/solr/statistics/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:* NOT id:/.{36}/</query></delete>"
```
### statistics-2019
Processing the statistics-2019 core:
```console
$ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics
...
=================================================================
*** Statistics Records with Legacy Id ***
5,569,344 Bistream View
2,179,105 Item View
117,194 Community View
104,091 Collection View
774,138 Community Search
568,347 Collection Search
1,482,620 Unexpected Type & Full Site
--------------------------------------
10,794,839 TOTAL
=================================================================
```
After several rounds of processing it finished. Here are some statistics about unmigrated documents:
- 2,690,309: `(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)`
- 1,494,587: `id:/.+-unmigrated/`
- 4,184,896: `*:* NOT id:/.{36}/`
- 4,172,929 are `type: 5` (aka SITE) so we can purge them:
```console
$ curl -s "http://localhost:8081/solr/statistics-2019/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:* NOT id:/.{36}/</query></delete>"
```
### statistics-2018
Processing the statistics-2018 core:
```console
$ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics-2018
...
=================================================================
*** Statistics Records with Legacy Id ***
3,561,532 Bistream View
1,129,326 Item View
97,401 Community View
63,508 Collection View
207,827 Community Search
43,752 Collection Search
457,820 Unexpected Type & Full Site
--------------------------------------
5,561,166 TOTAL
=================================================================
```
After some time I got an error about Java heap space so I increased the JVM memory and restarted processing:
```console
$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx4096m'
$ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics-2018
```
Eventually the processing finished. Here are some statistics about unmigrated documents:
- 365,473: `(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)`
- 546,955: `id:/.+-unmigrated/`
- 923,158: `*:* NOT id:/.{36}/`
- 823,293: are `type: 5` so we can purge them:
```console
$ curl -s "http://localhost:8081/solr/statistics-2018/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:* NOT id:/.{36}/</query></delete>"
```
## statistics-2017
Processing the statistics-2017 core:
```console
$ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics-2017
...
=================================================================
*** Statistics Records with Legacy Id ***
2,529,208 Bistream View
1,618,717 Item View
144,945 Community View
74,249 Collection View
479,647 Community Search
114,658 Collection Search
852,215 Unexpected Type & Full Site
--------------------------------------
5,813,639 TOTAL
=================================================================
```
Eventually the processing finished. Here are some statistics about unmigrated documents:
- 808,309: `(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)`
- 893,868: `id:/.+-unmigrated/`
- 1,702,177: `*:* NOT id:/.{36}/`
- 1,660,524 are `type: 5` (SITE) so we can purge them:
```console
$ curl -s "http://localhost:8081/solr/statistics-2017/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:* NOT id:/.{36}/</query></delete>"
```
### statistics-2016
Processing the statistics-2016 core:
```console
$ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics-2016
...
=================================================================
*** Statistics Records with Legacy Id ***
1,765,924 Bistream View
1,151,575 Item View
187,110 Community View
51,204 Collection View
347,382 Community Search
66,605 Collection Search
620,298 Unexpected Type & Full Site
--------------------------------------
4,190,098 TOTAL
=================================================================
```
- 849,408: `(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)`
- 627,747: `id:/.+-unmigrated/`
- 1,477,155: `*:* NOT id:/.{36}/`
- 1,469,706 are `type: 5` (SITE) so we can purge them:
```console
$ curl -s "http://localhost:8081/solr/statistics-2016/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:* NOT id:/.{36}/</query></delete>"
```
### statistics-2015
Processing the statistics-2015 core:
```console
$ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics-2015
...
=================================================================
*** Statistics Records with Legacy Id ***
990,916 Bistream View
506,070 Item View
116,153 Community View
33,282 Collection View
21,062 Community Search
10,788 Collection Search
52,107 Unexpected Type & Full Site
--------------------------------------
1,730,378 TOTAL
=================================================================
```
Summary of stats after processing:
- 195,293: `(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)`
- 67,146: `id:/.+-unmigrated/`
- 262,439: `*:* NOT id:/.{36}/`
- 247,400 are `type: 5` (SITE) so we can purge them:
```console
$ curl -s "http://localhost:8081/solr/statistics-2015/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:* NOT id:/.{36}/</query></delete>"
```
## statistics-2014
Processing the statistics-2014 core:
```console
$ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics-2014
...
=================================================================
*** Statistics Records with Legacy Id ***
2,381,603 Item View
1,323,357 Bistream View
501,545 Community View
247,805 Collection View
250 Collection Search
188 Community Search
50 Item Search
10,918 Unexpected Type & Full Site
--------------------------------------
4,465,716 TOTAL
=================================================================
```
Summary of unmigrated documents after processing:
- 182,131: `(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)`
- 39,947: `id:/.+-unmigrated/`
- 222,078: `*:* NOT id:/.{36}/`
- 188,791 are `type: 5` (SITE) so we can purge them:
```console
$ curl -s "http://localhost:8081/solr/statistics-2014/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:* NOT id:/.{36}/</query></delete>"
```
## statistics-2013
Processing the statistics-2013 core:
```console
$ chrt -b 0 dspace solr-upgrade-statistics-6x -n 2500000 -i statistics-2013
...
=================================================================
*** Statistics Records with Legacy Id ***
2,352,124 Item View
1,117,676 Bistream View
575,711 Community View
171,639 Collection View
248 Item Search
7 Collection Search
5 Community Search
1,452 Unexpected Type & Full Site
--------------------------------------
4,218,862 TOTAL
=================================================================
```
Summary of unmigrated docs after processing:
- 2,548 : `(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)`
- 29,772: `id:/.+-unmigrated/`
- 32,320: `*:* NOT id:/.{36}/`
- 15,691 are `type: 5` (SITE) so we can purge them:
```console
$ curl -s "http://localhost:8081/solr/statistics-2013/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:* NOT id:/.{36}/</query></delete>"
```