diff --git a/content/posts/2020-05.md b/content/posts/2020-05.md index 38529d37e..2554f4a65 100644 --- a/content/posts/2020-05.md +++ b/content/posts/2020-05.md @@ -216,10 +216,53 @@ $ ant update - Database migrations take 10:18.287s during the first startup... - perhaps when we do the production CGSpace migration I can do this in advance and tell users not to make any submissions? - I had a mistake in my Solr internal URL parameter so DSpace couldn't find it, but once I fixed that DSpace starts up OK! -- Once the initial Discovery reindexing is completed I started the Solr statistics UUID migration: +- Once the initial Discovery reindexing was completed (after three hours or so!) I started the Solr statistics UUID migration: ``` $ export JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" +$ dspace solr-upgrade-statistics-6x -i statistics -n 250000 +$ dspace solr-upgrade-statistics-6x -i statistics -n 1000000 +$ dspace solr-upgrade-statistics-6x -i statistics -n 1000000 +... +``` + +- It's taking about 35 minutes for 1,000,000 records... +- Some issues towards the end of this core: + +``` +Exception: Error while creating field 'p_group_id{type=uuid,properties=indexed,stored,multiValued}' from value '10' +org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error while creating field 'p_group_id{type=uuid,properties=indexed,stored,multiValued}' from value '10' + at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552) + at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210) + at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206) + at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124) + at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68) + at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54) + at org.dspace.util.SolrUpgradePre6xStatistics.batchUpdateStats(SolrUpgradePre6xStatistics.java:161) + at org.dspace.util.SolrUpgradePre6xStatistics.run(SolrUpgradePre6xStatistics.java:456) + at org.dspace.util.SolrUpgradePre6xStatistics.main(SolrUpgradePre6xStatistics.java:365) + at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) + at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) + at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) + at java.lang.reflect.Method.invoke(Method.java:498) + at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229) + at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81) +``` + +- So basically there are some documents that have IDs that have *not* been converted to UUID, and have *not* been labeled as "unmigrated" either... + - Of these 101,257 documents, 90,000 are of type 5 (search), 9,000 are type storage, and 800 are type view, but it's weird because if I look at their type/statistics_type using a facet the storage ones disappear... + - For now I will export these documents from the statistics core and then delete them: + +``` +$ ./run.sh -s http://localhost:8081/solr/statistics -a export -o statistics-unmigrated.json -k uid -f '(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)' +$ curl -s "http://localhost:8081/solr/statistics/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)" +``` + +- Now the UUID conversion script says there is nothing left to convert, so I can try to run the Atmire CUA conversion utility: + +``` +$ export JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" +$ dspace dsrun com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI -t 1 ``` - Experiment a bit with the Python [country-converter](https://pypi.org/project/country-converter/) library as it can convert between different formats (like ISO 3166 and UN m49) diff --git a/docs/2020-04/index.html b/docs/2020-04/index.html index 31afebc8c..4e6a9f742 100644 --- a/docs/2020-04/index.html +++ b/docs/2020-04/index.html @@ -25,7 +25,7 @@ On the same note, the one item Abenet pointed out last week now has a donut with - + @@ -57,7 +57,7 @@ On the same note, the one item Abenet pointed out last week now has a donut with "url": "https://alanorth.github.io/cgspace-notes/2020-04/", "wordCount": "3406", "datePublished": "2020-04-02T10:53:24+03:00", - "dateModified": "2020-04-30T14:49:46+03:00", + "dateModified": "2020-05-31T20:15:08+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -454,7 +454,7 @@ atmire-cua.version.number=${cua.version.number} -
  • I manually editied the CUA version variable and was then able to run the com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI script +
  • I manually edited the CUA version variable and was then able to run the com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI script
  • I had a mistake in my Solr internal URL parameter so DSpace couldn’t find it, but once I fixed that DSpace starts up OK!
  • -
  • Once the initial Discovery reindexing is completed I started the Solr statistics UUID migration:
  • +
  • Once the initial Discovery reindexing was completed (after three hours or so!) I started the Solr statistics UUID migration:
  • $ export JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8"
    +$ dspace solr-upgrade-statistics-6x -i statistics -n 250000
    +$ dspace solr-upgrade-statistics-6x -i statistics -n 1000000
    +$ dspace solr-upgrade-statistics-6x -i statistics -n 1000000
    +...
    +
    +
    Exception: Error while creating field 'p_group_id{type=uuid,properties=indexed,stored,multiValued}' from value '10'
    +org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error while creating field 'p_group_id{type=uuid,properties=indexed,stored,multiValued}' from value '10'
    +        at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552)
    +        at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
    +        at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
    +        at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
    +        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68)
    +        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54)
    +        at org.dspace.util.SolrUpgradePre6xStatistics.batchUpdateStats(SolrUpgradePre6xStatistics.java:161)
    +        at org.dspace.util.SolrUpgradePre6xStatistics.run(SolrUpgradePre6xStatistics.java:456)
    +        at org.dspace.util.SolrUpgradePre6xStatistics.main(SolrUpgradePre6xStatistics.java:365)
    +        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    +        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    +        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    +        at java.lang.reflect.Method.invoke(Method.java:498)
    +        at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229)
    +        at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)
    +
    +
    $ ./run.sh -s http://localhost:8081/solr/statistics -a export -o statistics-unmigrated.json -k uid -f '(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)'
    +$ curl -s "http://localhost:8081/solr/statistics/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>(*:* NOT id:/.{36}/) AND (*:* NOT id:/.+-unmigrated/)</query></delete>"
    +
    +
    $ export JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8"
    +$ dspace dsrun com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI -t 1