diff --git a/content/posts/2020-08.md b/content/posts/2020-08.md index 37fd18e04..d9a376330 100644 --- a/content/posts/2020-08.md +++ b/content/posts/2020-08.md @@ -398,4 +398,56 @@ dspace=# SELECT count(text_value) FROM metadatavalue WHERE metadata_field_id = 2 - I purged 150,000 hits from 2020 and 2020 from these user agents and hosts +## 2020-08-14 + +- Last night I started the processing of the statistics-2016 core with the Atmire stats util and I see some errors like this: + +``` +Record uid: f6b288d7-d60d-4df9-b311-1696b88552a0 couldn't be processed +com.atmire.statistics.util.update.atomic.ProcessingException: something went wrong while processing record uid: f6b288d7-d60d-4df9-b311-1696b88552a0, an error occured in the com.atmire.statistics.util.update.atomic.processor.ContainerOwnerDBProcessor + at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.applyProcessors(SourceFile:304) + at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.processRecords(SourceFile:176) + at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.performRun(SourceFile:161) + at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.update(SourceFile:128) + at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI.main(SourceFile:78) + at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) + at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) + at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) + at java.lang.reflect.Method.invoke(Method.java:498) + at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229) + at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81) +Caused by: java.lang.NullPointerException +``` + +- I see it has `id: 980-unmigrated` and `type: 0`... +- The 2016 core has 629,983 unmigrated docs, mostly: + - `type: 5`: 620311 + - `type: 0`: 7255 + - `type: 3`: 1333 +- I purged the unmigrated docs and continued processing: + +``` +$ curl -s "http://localhost:8081/solr/statistics-2016/update?softCommit=true" -H "Content-Type: text/xml" --data-binary 'id:/.*unmigrated.*/' +$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx2048m' +$ chrt -b 0 dspace dsrun com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI -t 12 -c statistics-2016 +``` + +- Then I see there are 849,000 docs with `id: -1` and `type: 5` so I should purge those too probably: + +``` +$ curl -s "http://localhost:8081/solr/statistics-2017/update?softCommit=true" -H "Content-Type: text/xml" --data-binary 'id:\-1' +``` + +- Altmetric asked for a dump of CGSpace's OAI "sets" so they can update their affiliation mappings + - I did it in a kinda ghetto way: + +``` +$ http 'https://cgspace.cgiar.org/oai/request?verb=ListSets' > /tmp/0.xml +$ for num in {100..1300..100}; do http "https://cgspace.cgiar.org/oai/request?verb=ListSets&resumptionToken=////$num" > /tmp/$num.xml; sleep 2; done +$ for num in {0..1300..100}; do cat /tmp/$num.xml >> /tmp/cgspace-oai-sets.xml; done +``` + +- This produces one file that has all the sets, albeit with 14 pages of responses concatenated into one document, but that's how theirs was in the first place... +- Help Bizu with a restricted item for CIAT + diff --git a/docs/2020-08/index.html b/docs/2020-08/index.html index d837597df..0aff276c3 100644 --- a/docs/2020-08/index.html +++ b/docs/2020-08/index.html @@ -19,7 +19,7 @@ It is class based so I can easily add support for other vocabularies, and the te - + @@ -43,9 +43,9 @@ It is class based so I can easily add support for other vocabularies, and the te "@type": "BlogPosting", "headline": "August, 2020", "url": "https://alanorth.github.io/cgspace-notes/2020-08/", - "wordCount": "2554", + "wordCount": "2800", "datePublished": "2020-08-02T15:35:54+03:00", - "dateModified": "2020-08-11T11:35:05+03:00", + "dateModified": "2020-08-13T17:56:39+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -566,6 +566,56 @@ $ curl -s "http://localhost:8081/solr/statistics-2010/update?softCommit=tru +

2020-08-14

+ +
Record uid: f6b288d7-d60d-4df9-b311-1696b88552a0 couldn't be processed
+com.atmire.statistics.util.update.atomic.ProcessingException: something went wrong while processing record uid: f6b288d7-d60d-4df9-b311-1696b88552a0, an error occured in the com.atmire.statistics.util.update.atomic.processor.ContainerOwnerDBProcessor
+        at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.applyProcessors(SourceFile:304)
+        at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.processRecords(SourceFile:176)
+        at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.performRun(SourceFile:161)
+        at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.update(SourceFile:128)
+        at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI.main(SourceFile:78)
+        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
+        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
+        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
+        at java.lang.reflect.Method.invoke(Method.java:498)
+        at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229)
+        at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)
+Caused by: java.lang.NullPointerException
+
+
$ curl -s "http://localhost:8081/solr/statistics-2016/update?softCommit=true" -H "Content-Type: text/xml" --data-binary '<delete><query>id:/.*unmigrated.*/</query></delete>'
+$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx2048m'
+$ chrt -b 0 dspace dsrun com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI -t 12 -c statistics-2016
+
+
$ curl -s "http://localhost:8081/solr/statistics-2017/update?softCommit=true" -H "Content-Type: text/xml" --data-binary '<delete><query>id:\-1</query></delete>'
+
+
$ http 'https://cgspace.cgiar.org/oai/request?verb=ListSets' > /tmp/0.xml
+$ for num in {100..1300..100}; do http "https://cgspace.cgiar.org/oai/request?verb=ListSets&resumptionToken=////$num" > /tmp/$num.xml; sleep 2; done
+$ for num in {0..1300..100}; do cat /tmp/$num.xml >> /tmp/cgspace-oai-sets.xml; done
+
diff --git a/docs/categories/index.html b/docs/categories/index.html index 00b82285d..868b8ecd1 100644 --- a/docs/categories/index.html +++ b/docs/categories/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html index ee7849083..379864045 100644 --- a/docs/categories/notes/index.html +++ b/docs/categories/notes/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html index e456a8f40..9c9d430bc 100644 --- a/docs/categories/notes/page/2/index.html +++ b/docs/categories/notes/page/2/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html index 86f2338be..aa2ae38cd 100644 --- a/docs/categories/notes/page/3/index.html +++ b/docs/categories/notes/page/3/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html index 7c40fc2ee..0e015d57f 100644 --- a/docs/categories/notes/page/4/index.html +++ b/docs/categories/notes/page/4/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/index.html b/docs/index.html index a56b81956..3422e15c7 100644 --- a/docs/index.html +++ b/docs/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/page/2/index.html b/docs/page/2/index.html index b95c9f969..05f126e0a 100644 --- a/docs/page/2/index.html +++ b/docs/page/2/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/page/3/index.html b/docs/page/3/index.html index a9d854ae8..2653183af 100644 --- a/docs/page/3/index.html +++ b/docs/page/3/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/page/4/index.html b/docs/page/4/index.html index 32628af38..fa686af45 100644 --- a/docs/page/4/index.html +++ b/docs/page/4/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/page/5/index.html b/docs/page/5/index.html index dca9a6059..45f09b99a 100644 --- a/docs/page/5/index.html +++ b/docs/page/5/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/page/6/index.html b/docs/page/6/index.html index 6db475464..0ddf4df9e 100644 --- a/docs/page/6/index.html +++ b/docs/page/6/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/posts/index.html b/docs/posts/index.html index b4a799a38..9a5687bcb 100644 --- a/docs/posts/index.html +++ b/docs/posts/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html index 23990c11d..3192d2e4b 100644 --- a/docs/posts/page/2/index.html +++ b/docs/posts/page/2/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html index f450e0005..4ddaa76ab 100644 --- a/docs/posts/page/3/index.html +++ b/docs/posts/page/3/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html index c5e9ec501..26ae59ad0 100644 --- a/docs/posts/page/4/index.html +++ b/docs/posts/page/4/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html index 74321b930..fe654a54f 100644 --- a/docs/posts/page/5/index.html +++ b/docs/posts/page/5/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html index 1f238c2b2..fb7a87e57 100644 --- a/docs/posts/page/6/index.html +++ b/docs/posts/page/6/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 9f5a2d25b..ec11f05a2 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,27 +4,27 @@ https://alanorth.github.io/cgspace-notes/2020-08/ - 2020-08-11T11:35:05+03:00 + 2020-08-13T17:56:39+03:00 https://alanorth.github.io/cgspace-notes/categories/ - 2020-08-11T11:35:05+03:00 + 2020-08-13T17:56:39+03:00 https://alanorth.github.io/cgspace-notes/ - 2020-08-11T11:35:05+03:00 + 2020-08-13T17:56:39+03:00 https://alanorth.github.io/cgspace-notes/categories/notes/ - 2020-08-11T11:35:05+03:00 + 2020-08-13T17:56:39+03:00 https://alanorth.github.io/cgspace-notes/posts/ - 2020-08-11T11:35:05+03:00 + 2020-08-13T17:56:39+03:00