diff --git a/content/posts/2020-05.md b/content/posts/2020-05.md
index 38529d37e..2554f4a65 100644
--- a/content/posts/2020-05.md
+++ b/content/posts/2020-05.md
@@ -216,10 +216,53 @@ $ ant update
- Database migrations take 10:18.287s during the first startup...
- perhaps when we do the production CGSpace migration I can do this in advance and tell users not to make any submissions?
- I had a mistake in my Solr internal URL parameter so DSpace couldn't find it, but once I fixed that DSpace starts up OK!
-- Once the initial Discovery reindexing is completed I started the Solr statistics UUID migration:
+- Once the initial Discovery reindexing was completed (after three hours or so!) I started the Solr statistics UUID migration:
```
$ export JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8"
+$ dspace solr-upgrade-statistics-6x -i statistics -n 250000
+$ dspace solr-upgrade-statistics-6x -i statistics -n 1000000
+$ dspace solr-upgrade-statistics-6x -i statistics -n 1000000
+...
+```
+
+- It's taking about 35 minutes for 1,000,000 records...
+- Some issues towards the end of this core:
+
+```
+Exception: Error while creating field 'p_group_id{type=uuid,properties=indexed,stored,multiValued}' from value '10'
+org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error while creating field 'p_group_id{type=uuid,properties=indexed,stored,multiValued}' from value '10'
+ at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552)
+ at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
+ at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
+ at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
+ at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68)
+ at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54)
+ at org.dspace.util.SolrUpgradePre6xStatistics.batchUpdateStats(SolrUpgradePre6xStatistics.java:161)
+ at org.dspace.util.SolrUpgradePre6xStatistics.run(SolrUpgradePre6xStatistics.java:456)
+ at org.dspace.util.SolrUpgradePre6xStatistics.main(SolrUpgradePre6xStatistics.java:365)
+ at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
+ at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
+ at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
+ at java.lang.reflect.Method.invoke(Method.java:498)
+ at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229)
+ at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)
+```
+
+- So basically there are some documents that have IDs that have *not* been converted to UUID, and have *not* been labeled as "unmigrated" either...
+ - Of these 101,257 documents, 90,000 are of type 5 (search), 9,000 are type storage, and 800 are type view, but it's weird because if I look at their type/statistics_type using a facet the storage ones disappear...
+ - For now I will export these documents from the statistics core and then delete them:
+
+```
+$ ./run.sh -s http://localhost:8081/solr/statistics -a export -o statistics-unmigrated.json -k uid -f '(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)'
+$ curl -s "http://localhost:8081/solr/statistics/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)"
+```
+
+- Now the UUID conversion script says there is nothing left to convert, so I can try to run the Atmire CUA conversion utility:
+
+```
+$ export JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8"
+$ dspace dsrun com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI -t 1
```
- Experiment a bit with the Python [country-converter](https://pypi.org/project/country-converter/) library as it can convert between different formats (like ISO 3166 and UN m49)
diff --git a/docs/2020-04/index.html b/docs/2020-04/index.html
index 31afebc8c..4e6a9f742 100644
--- a/docs/2020-04/index.html
+++ b/docs/2020-04/index.html
@@ -25,7 +25,7 @@ On the same note, the one item Abenet pointed out last week now has a donut with
-
+
@@ -57,7 +57,7 @@ On the same note, the one item Abenet pointed out last week now has a donut with
"url": "https://alanorth.github.io/cgspace-notes/2020-04/",
"wordCount": "3406",
"datePublished": "2020-04-02T10:53:24+03:00",
- "dateModified": "2020-04-30T14:49:46+03:00",
+ "dateModified": "2020-05-31T20:15:08+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@@ -454,7 +454,7 @@ atmire-cua.version.number=${cua.version.number}
-
I manually editied the CUA version variable and was then able to run the com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI
script
+I manually edited the CUA version variable and was then able to run the com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI
script
- On the first run it took one hour to process 100,000 records on my local test instance…
- On the second run it took one hour to process 140,000 records
diff --git a/docs/2020-05/index.html b/docs/2020-05/index.html
index ab31f2feb..149e67f4a 100644
--- a/docs/2020-05/index.html
+++ b/docs/2020-05/index.html
@@ -18,7 +18,7 @@ I see that CGSpace (linode18) is still using PostgreSQL JDBC driver version 42.2
-
+
@@ -41,9 +41,9 @@ I see that CGSpace (linode18) is still using PostgreSQL JDBC driver version 42.2
"@type": "BlogPosting",
"headline": "May, 2020",
"url": "https://alanorth.github.io/cgspace-notes/2020-05/",
- "wordCount": "1861",
+ "wordCount": "2094",
"datePublished": "2020-05-02T09:52:04+03:00",
- "dateModified": "2020-05-30T18:38:16+03:00",
+ "dateModified": "2020-05-31T16:04:18+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@@ -386,9 +386,49 @@ $ ant update
I had a mistake in my Solr internal URL parameter so DSpace couldn’t find it, but once I fixed that DSpace starts up OK!
-Once the initial Discovery reindexing is completed I started the Solr statistics UUID migration:
+Once the initial Discovery reindexing was completed (after three hours or so!) I started the Solr statistics UUID migration:
$ export JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8"
+$ dspace solr-upgrade-statistics-6x -i statistics -n 250000
+$ dspace solr-upgrade-statistics-6x -i statistics -n 1000000
+$ dspace solr-upgrade-statistics-6x -i statistics -n 1000000
+...
+
+- It’s taking about 35 minutes for 1,000,000 records…
+- Some issues towards the end of this core:
+
+Exception: Error while creating field 'p_group_id{type=uuid,properties=indexed,stored,multiValued}' from value '10'
+org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error while creating field 'p_group_id{type=uuid,properties=indexed,stored,multiValued}' from value '10'
+ at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552)
+ at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
+ at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
+ at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
+ at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68)
+ at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54)
+ at org.dspace.util.SolrUpgradePre6xStatistics.batchUpdateStats(SolrUpgradePre6xStatistics.java:161)
+ at org.dspace.util.SolrUpgradePre6xStatistics.run(SolrUpgradePre6xStatistics.java:456)
+ at org.dspace.util.SolrUpgradePre6xStatistics.main(SolrUpgradePre6xStatistics.java:365)
+ at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
+ at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
+ at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
+ at java.lang.reflect.Method.invoke(Method.java:498)
+ at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229)
+ at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)
+
+- So basically there are some documents that have IDs that have not been converted to UUID, and have not been labeled as “unmigrated” either…
+
+- Of these 101,257 documents, 90,000 are of type 5 (search), 9,000 are type storage, and 800 are type view, but it’s weird because if I look at their type/statistics_type using a facet the storage ones disappear…
+- For now I will export these documents from the statistics core and then delete them:
+
+
+
+$ ./run.sh -s http://localhost:8081/solr/statistics -a export -o statistics-unmigrated.json -k uid -f '(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)'
+$ curl -s "http://localhost:8081/solr/statistics/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)</query></delete>"
+
+- Now the UUID conversion script says there is nothing left to convert, so I can try to run the Atmire CUA conversion utility:
+
+$ export JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8"
+$ dspace dsrun com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI -t 1
- Experiment a bit with the Python country-converter library as it can convert between different formats (like ISO 3166 and UN m49)
diff --git a/docs/categories/index.html b/docs/categories/index.html
index 3e6cc03c4..aefb79ead 100644
--- a/docs/categories/index.html
+++ b/docs/categories/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html
index f28c8f802..194e8a271 100644
--- a/docs/categories/notes/index.html
+++ b/docs/categories/notes/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html
index 3f276bf93..4d615cae8 100644
--- a/docs/categories/notes/page/2/index.html
+++ b/docs/categories/notes/page/2/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html
index d0a2ea765..1c47f7071 100644
--- a/docs/categories/notes/page/3/index.html
+++ b/docs/categories/notes/page/3/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html
index 4c072da04..38ce2633f 100644
--- a/docs/categories/notes/page/4/index.html
+++ b/docs/categories/notes/page/4/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/categories/page/2/index.html b/docs/categories/page/2/index.html
index 5e30db1b5..5aa8c5fc5 100644
--- a/docs/categories/page/2/index.html
+++ b/docs/categories/page/2/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/categories/page/3/index.html b/docs/categories/page/3/index.html
index 98cdc21ac..4ce9ec9e4 100644
--- a/docs/categories/page/3/index.html
+++ b/docs/categories/page/3/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/categories/page/4/index.html b/docs/categories/page/4/index.html
index 8deca296d..19d68bbd8 100644
--- a/docs/categories/page/4/index.html
+++ b/docs/categories/page/4/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/categories/page/5/index.html b/docs/categories/page/5/index.html
index 603c85821..540b79c44 100644
--- a/docs/categories/page/5/index.html
+++ b/docs/categories/page/5/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/categories/page/6/index.html b/docs/categories/page/6/index.html
index 8a96d11a6..5b3669e82 100644
--- a/docs/categories/page/6/index.html
+++ b/docs/categories/page/6/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/index.html b/docs/index.html
index c27c3989f..40004e739 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/page/2/index.html b/docs/page/2/index.html
index 842e99e56..3f632134e 100644
--- a/docs/page/2/index.html
+++ b/docs/page/2/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/page/3/index.html b/docs/page/3/index.html
index 9454522da..f97acebb1 100644
--- a/docs/page/3/index.html
+++ b/docs/page/3/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/page/4/index.html b/docs/page/4/index.html
index 378ffe6d0..884d10955 100644
--- a/docs/page/4/index.html
+++ b/docs/page/4/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/page/5/index.html b/docs/page/5/index.html
index 205bef9ad..8979ec2db 100644
--- a/docs/page/5/index.html
+++ b/docs/page/5/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/page/6/index.html b/docs/page/6/index.html
index 1c8905698..5ed8bb621 100644
--- a/docs/page/6/index.html
+++ b/docs/page/6/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/posts/index.html b/docs/posts/index.html
index e7216d1f1..c6e24dc37 100644
--- a/docs/posts/index.html
+++ b/docs/posts/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html
index de6b875c1..c7e18a553 100644
--- a/docs/posts/page/2/index.html
+++ b/docs/posts/page/2/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html
index cf33be82a..6bcbb0549 100644
--- a/docs/posts/page/3/index.html
+++ b/docs/posts/page/3/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html
index 9fd294df2..f446e3acb 100644
--- a/docs/posts/page/4/index.html
+++ b/docs/posts/page/4/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html
index 1d269d755..027946bb3 100644
--- a/docs/posts/page/5/index.html
+++ b/docs/posts/page/5/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html
index e06bcbf01..938a47bac 100644
--- a/docs/posts/page/6/index.html
+++ b/docs/posts/page/6/index.html
@@ -9,7 +9,7 @@
-
+
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index 448c663bd..9bde7f865 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -4,32 +4,32 @@
https://alanorth.github.io/cgspace-notes/categories/
- 2020-05-30T18:38:16+03:00
+ 2020-05-31T20:15:08+03:00
https://alanorth.github.io/cgspace-notes/
- 2020-05-30T18:38:16+03:00
+ 2020-05-31T20:15:08+03:00
https://alanorth.github.io/cgspace-notes/2020-05/
- 2020-05-30T18:38:16+03:00
+ 2020-05-31T16:04:18+03:00
https://alanorth.github.io/cgspace-notes/categories/notes/
- 2020-05-30T18:38:16+03:00
+ 2020-05-31T20:15:08+03:00
https://alanorth.github.io/cgspace-notes/posts/
- 2020-05-30T18:38:16+03:00
+ 2020-05-31T20:15:08+03:00
https://alanorth.github.io/cgspace-notes/2020-04/
- 2020-04-30T14:49:46+03:00
+ 2020-05-31T20:15:08+03:00