diff --git a/content/posts/2020-08.md b/content/posts/2020-08.md index 97170138f..b9e59284c 100644 --- a/content/posts/2020-08.md +++ b/content/posts/2020-08.md @@ -209,4 +209,145 @@ on_id=[A-Z0-9]{32}' | sort | uniq | wc -l - I ran it on CGSpace and it cleaned up 3,769 thumbnails! - Afterwards I ran `dspace cleanup -v` to remove the deleted thumbnails +## 2020-08-08 + +- The Atmire stats processing for the statistics-2018 Solr core keeps stopping with this error: + +``` +Exception: 50 consecutive records couldn't be saved. There's most likely an issue with the connection to the solr server. Shutting down. +java.lang.RuntimeException: 50 consecutive records couldn't be saved. There's most likely an issue with the connection to the solr server. Shutting down. + at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.storeOnServer(SourceFile:317) + at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.processRecords(SourceFile:177) + at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.performRun(SourceFile:161) + at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.update(SourceFile:128) + at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI.main(SourceFile:78) + at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) + at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) + at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) + at java.lang.reflect.Method.invoke(Method.java:498) + at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229) + at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81) +``` + +- It lists a few of the records that it is having issues with and they all have integer IDs + - When I checked Solr I see 8,000 of them, some of which have type 0 and some with no type... + - I purged them and then the process continues: + +``` +$ curl -s "http://localhost:8081/solr/statistics-2018/update?softCommit=true" -H "Content-Type: text/xml" --data-binary 'id:/[0-9]+/' +``` + +## 2020-08-09 + +- The Atmire script did something to the server and created 132GB of log files so the root partition ran out of space... +- I removed the log file and tried to re-run the process but it seems to be looping over 11,000 records and failing, creating millions of lines in the logs again: + +``` +# grep -oE "Record uid: ([a-f0-9\\-]*){1} couldn't be processed" /home/dspacetest.cgiar.org/log/dspace.log.2020-08-09 > /tmp/not-processed-errors.txt +# wc -l /tmp/not-processed-errors.txt +2202973 /tmp/not-processed-errors.txt +# sort /tmp/not-processed-errors.txt | uniq -c | tail -n 10 + 220 Record uid: ffe52878-ba23-44fb-8df7-a261bb358abc couldn't be processed + 220 Record uid: ffecb2b0-944d-4629-afdf-5ad995facaf9 couldn't be processed + 220 Record uid: ffedde6b-0782-4d9f-93ff-d1ba1a737585 couldn't be processed + 220 Record uid: ffedfb13-e929-4909-b600-a18295520a97 couldn't be processed + 220 Record uid: fff116fb-a1a0-40d0-b0fb-b71e9bb898e5 couldn't be processed + 221 Record uid: fff1349d-79d5-4ceb-89a1-ce78107d982d couldn't be processed + 220 Record uid: fff13ddb-b2a2-410a-9baa-97e333118c74 couldn't be processed + 220 Record uid: fff232a6-a008-47d0-ad83-6e209bb6cdf9 couldn't be processed + 221 Record uid: fff75243-c3be-48a0-98f8-a656f925cb68 couldn't be processed + 221 Record uid: fff88af8-88d4-4f79-ba1a-79853973c872 couldn't be processed +``` + +- I looked at some of those records and saw strange objects in their `containerCommunity`, `containerCollection`, etc... + +``` +{ + "responseHeader": { + "status": 0, + "QTime": 0, + "params": { + "q": "uid:fff1349d-79d5-4ceb-89a1-ce78107d982d", + "indent": "true", + "wt": "json", + "_": "1596957629970" + } + }, + "response": { + "numFound": 1, + "start": 0, + "docs": [ + { + "containerCommunity": [ + "155", + "155", + "{set=null}" + ], + "uid": "fff1349d-79d5-4ceb-89a1-ce78107d982d", + "containerCollection": [ + "1099", + "830", + "{set=830}" + ], + "owningComm": [ + "155", + "155", + "{set=null}" + ], + "isInternal": false, + "isBot": false, + "statistics_type": "view", + "time": "2018-05-08T23:17:00.157Z", + "owningColl": [ + "1099", + "830", + "{set=830}" + ], + "_version_": 1621500445042147300 + } + ] + } +} +``` + +- I deleted those 11,724 records with the strange "set" object in the collections and communities, as well as 360,000 records with `id: -1` + +``` +$ curl -s "http://localhost:8081/solr/statistics-2018/update?softCommit=true" -H "Content-Type: text/xml" --data-binary 'owningColl:/.*set.*/' +$ curl -s "http://localhost:8081/solr/statistics-2018/update?softCommit=true" -H "Content-Type: text/xml" --data-binary 'id:\-1' +``` + +- I was going to compare the CUA stats for 2018 and 2019 on CGSpace and DSpace Test, but after Linode rebooted CGSpace (linode18) for maintenance yesterday the solr cores didn't all come back up OK + - I had to restart Tomcat five times before they all came up! + - After that I generated a report for 2018 and 2019 on each server and found that the difference is about 10,000–20,000 per month, which is much less than I was expecting +- I noticed some authors that should have ORCID identifiers, but didn't (perhaps older items before we were tagging ORCID metadata) + - With the simple list below I added 1,341 identifiers! + +``` +$ cat 2020-08-09-add-ILRI-orcids.csv +dc.contributor.author,cg.creator.id +"Grace, Delia","Delia Grace: 0000-0002-0195-9489" +"Delia Grace","Delia Grace: 0000-0002-0195-9489" +"Baker, Derek","Derek Baker: 0000-0001-6020-6973" +"Ngan Tran Thi","Tran Thi Ngan: 0000-0002-7184-3086" +"Dang Xuan Sinh","Sinh Dang-Xuan: 0000-0002-0522-7808" +"Hung Nguyen-Viet","Hung Nguyen-Viet: 0000-0001-9877-0596" +"Pham Van Hung","Pham Anh Hung: 0000-0001-9366-0259" +"Lindahl, Johanna F.","Johanna Lindahl: 0000-0002-1175-0398" +"Teufel, Nils","Nils Teufel: 0000-0001-5305-6620" +"Duncan, Alan J.",Alan Duncan: 0000-0002-3954-3067" +"Moodley, Arshnee","Arshnee Moodley: 0000-0002-6469-3948" +``` + +- That got me curious, so I generated a list of all the unique ORCID identifiers we have in the database: + +``` +dspace=# \COPY (SELECT DISTINCT text_value FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=240) TO /tmp/2020-08-09-orcid-identifiers.csv; +COPY 2095 +dspace=# \q +$ grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' /tmp/2020-08-09-orcid-identifiers.csv | sort | uniq > /tmp/2020-08-09-orcid-identifiers-uniq.csv +$ wc -l /tmp/2020-08-09-orcid-identifiers-uniq.csv +1949 /tmp/2020-08-09-orcid-identifiers-uniq.csv +``` + diff --git a/docs/2020-08/index.html b/docs/2020-08/index.html index fd0540c52..fab61838d 100644 --- a/docs/2020-08/index.html +++ b/docs/2020-08/index.html @@ -19,7 +19,7 @@ It is class based so I can easily add support for other vocabularies, and the te - + @@ -43,9 +43,9 @@ It is class based so I can easily add support for other vocabularies, and the te "@type": "BlogPosting", "headline": "August, 2020", "url": "https://alanorth.github.io/cgspace-notes/2020-08/", - "wordCount": "1421", + "wordCount": "2049", "datePublished": "2020-08-02T15:35:54+03:00", - "dateModified": "2020-08-06T16:24:01+03:00", + "dateModified": "2020-08-07T19:55:21+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -370,7 +370,141 @@ on_id=[A-Z0-9]{32}' | sort | uniq | wc -l - +

2020-08-08

+ +
Exception: 50 consecutive records couldn't be saved. There's most likely an issue with the connection to the solr server. Shutting down.
+java.lang.RuntimeException: 50 consecutive records couldn't be saved. There's most likely an issue with the connection to the solr server. Shutting down.
+        at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.storeOnServer(SourceFile:317)
+        at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.processRecords(SourceFile:177)
+        at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.performRun(SourceFile:161)
+        at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.update(SourceFile:128)
+        at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI.main(SourceFile:78)
+        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
+        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
+        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
+        at java.lang.reflect.Method.invoke(Method.java:498)
+        at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229)
+        at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)
+
+
$ curl -s "http://localhost:8081/solr/statistics-2018/update?softCommit=true" -H "Content-Type: text/xml" --data-binary '<delete><query>id:/[0-9]+/</query></delete>'
+

2020-08-09

+ +
# grep -oE "Record uid: ([a-f0-9\\-]*){1} couldn't be processed" /home/dspacetest.cgiar.org/log/dspace.log.2020-08-09 > /tmp/not-processed-errors.txt
+# wc -l /tmp/not-processed-errors.txt
+2202973 /tmp/not-processed-errors.txt
+# sort /tmp/not-processed-errors.txt | uniq -c | tail -n 10
+    220 Record uid: ffe52878-ba23-44fb-8df7-a261bb358abc couldn't be processed
+    220 Record uid: ffecb2b0-944d-4629-afdf-5ad995facaf9 couldn't be processed
+    220 Record uid: ffedde6b-0782-4d9f-93ff-d1ba1a737585 couldn't be processed
+    220 Record uid: ffedfb13-e929-4909-b600-a18295520a97 couldn't be processed
+    220 Record uid: fff116fb-a1a0-40d0-b0fb-b71e9bb898e5 couldn't be processed
+    221 Record uid: fff1349d-79d5-4ceb-89a1-ce78107d982d couldn't be processed
+    220 Record uid: fff13ddb-b2a2-410a-9baa-97e333118c74 couldn't be processed
+    220 Record uid: fff232a6-a008-47d0-ad83-6e209bb6cdf9 couldn't be processed
+    221 Record uid: fff75243-c3be-48a0-98f8-a656f925cb68 couldn't be processed
+    221 Record uid: fff88af8-88d4-4f79-ba1a-79853973c872 couldn't be processed
+
+
{
+  "responseHeader": {
+    "status": 0,
+    "QTime": 0,
+    "params": {
+      "q": "uid:fff1349d-79d5-4ceb-89a1-ce78107d982d",
+      "indent": "true",
+      "wt": "json",
+      "_": "1596957629970"
+    }
+  },
+  "response": {
+    "numFound": 1,
+    "start": 0,
+    "docs": [
+      {
+        "containerCommunity": [
+          "155",
+          "155",
+          "{set=null}"
+        ],
+        "uid": "fff1349d-79d5-4ceb-89a1-ce78107d982d",
+        "containerCollection": [
+          "1099",
+          "830",
+          "{set=830}"
+        ],
+        "owningComm": [
+          "155",
+          "155",
+          "{set=null}"
+        ],
+        "isInternal": false,
+        "isBot": false,
+        "statistics_type": "view",
+        "time": "2018-05-08T23:17:00.157Z",
+        "owningColl": [
+          "1099",
+          "830",
+          "{set=830}"
+        ],
+        "_version_": 1621500445042147300
+      }
+    ]
+  }
+}
+
+
$ curl -s "http://localhost:8081/solr/statistics-2018/update?softCommit=true" -H "Content-Type: text/xml" --data-binary '<delete><query>owningColl:/.*set.*/</query></delete>'
+$ curl -s "http://localhost:8081/solr/statistics-2018/update?softCommit=true" -H "Content-Type: text/xml" --data-binary '<delete><query>id:\-1</query></delete>'
+
+
$ cat 2020-08-09-add-ILRI-orcids.csv
+dc.contributor.author,cg.creator.id
+"Grace, Delia","Delia Grace: 0000-0002-0195-9489"
+"Delia Grace","Delia Grace: 0000-0002-0195-9489"
+"Baker, Derek","Derek Baker: 0000-0001-6020-6973"
+"Ngan Tran Thi","Tran Thi Ngan: 0000-0002-7184-3086"
+"Dang Xuan Sinh","Sinh Dang-Xuan: 0000-0002-0522-7808"
+"Hung Nguyen-Viet","Hung Nguyen-Viet: 0000-0001-9877-0596"
+"Pham Van Hung","Pham Anh Hung: 0000-0001-9366-0259"
+"Lindahl, Johanna F.","Johanna Lindahl: 0000-0002-1175-0398"
+"Teufel, Nils","Nils Teufel: 0000-0001-5305-6620"
+"Duncan, Alan J.",Alan Duncan: 0000-0002-3954-3067"
+"Moodley, Arshnee","Arshnee Moodley: 0000-0002-6469-3948"
+
+
dspace=# \COPY (SELECT DISTINCT text_value FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=240) TO /tmp/2020-08-09-orcid-identifiers.csv;
+COPY 2095
+dspace=# \q
+$ grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' /tmp/2020-08-09-orcid-identifiers.csv | sort | uniq > /tmp/2020-08-09-orcid-identifiers-uniq.csv
+$ wc -l /tmp/2020-08-09-orcid-identifiers-uniq.csv
+1949 /tmp/2020-08-09-orcid-identifiers-uniq.csv
+
diff --git a/docs/categories/index.html b/docs/categories/index.html index a898e832c..74a2297bd 100644 --- a/docs/categories/index.html +++ b/docs/categories/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html index e785deea7..71bced4a0 100644 --- a/docs/categories/notes/index.html +++ b/docs/categories/notes/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html index a46729cad..3c194c0ed 100644 --- a/docs/categories/notes/page/2/index.html +++ b/docs/categories/notes/page/2/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html index 340c2dff9..737076049 100644 --- a/docs/categories/notes/page/3/index.html +++ b/docs/categories/notes/page/3/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html index 3ca6a693f..7687aaccb 100644 --- a/docs/categories/notes/page/4/index.html +++ b/docs/categories/notes/page/4/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/index.html b/docs/index.html index f9e8b347a..5504db17a 100644 --- a/docs/index.html +++ b/docs/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/page/2/index.html b/docs/page/2/index.html index ceea736ea..8d34a3020 100644 --- a/docs/page/2/index.html +++ b/docs/page/2/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/page/3/index.html b/docs/page/3/index.html index 537dca946..c615ff8bb 100644 --- a/docs/page/3/index.html +++ b/docs/page/3/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/page/4/index.html b/docs/page/4/index.html index bd175d7d8..de66a7f9d 100644 --- a/docs/page/4/index.html +++ b/docs/page/4/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/page/5/index.html b/docs/page/5/index.html index cf405ac4d..a13a03435 100644 --- a/docs/page/5/index.html +++ b/docs/page/5/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/page/6/index.html b/docs/page/6/index.html index ede77718d..e44d3e974 100644 --- a/docs/page/6/index.html +++ b/docs/page/6/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/posts/index.html b/docs/posts/index.html index a701be457..3a08c6c98 100644 --- a/docs/posts/index.html +++ b/docs/posts/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html index 11563debd..5956eec7d 100644 --- a/docs/posts/page/2/index.html +++ b/docs/posts/page/2/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html index a17f1147b..671a9c0a0 100644 --- a/docs/posts/page/3/index.html +++ b/docs/posts/page/3/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html index c11669e9c..37b6cafba 100644 --- a/docs/posts/page/4/index.html +++ b/docs/posts/page/4/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html index 230c2b9c6..53bbb03bd 100644 --- a/docs/posts/page/5/index.html +++ b/docs/posts/page/5/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html index c63d300b7..9fd833436 100644 --- a/docs/posts/page/6/index.html +++ b/docs/posts/page/6/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 02dec6ed5..e6eea3447 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,27 +4,27 @@ https://alanorth.github.io/cgspace-notes/2020-08/ - 2020-08-06T16:24:01+03:00 + 2020-08-07T19:55:21+03:00 https://alanorth.github.io/cgspace-notes/categories/ - 2020-08-06T16:24:01+03:00 + 2020-08-07T19:55:21+03:00 https://alanorth.github.io/cgspace-notes/ - 2020-08-06T16:24:01+03:00 + 2020-08-07T19:55:21+03:00 https://alanorth.github.io/cgspace-notes/categories/notes/ - 2020-08-06T16:24:01+03:00 + 2020-08-07T19:55:21+03:00 https://alanorth.github.io/cgspace-notes/posts/ - 2020-08-06T16:24:01+03:00 + 2020-08-07T19:55:21+03:00