diff --git a/content/posts/2023-02.md b/content/posts/2023-02.md index 9dd2d3450..2e710ffec 100644 --- a/content/posts/2023-02.md +++ b/content/posts/2023-02.md @@ -202,4 +202,130 @@ Done. - I think that particular error is because I applied the [indexes in this unmerged DSpace 6 patch](https://github.com/DSpace/DSpace/pull/1792), so I don't need to report this as an error in DSpace 7 +## 2023-02-16 + +- I found a suspicious number of PostgreSQL locks on CGSpace and decided to investigate: + +```console +$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c + 44 dspaceApi + 372 dspaceCli + 446 dspaceWeb +``` + +- This started happening yesterday and I killed a few locks that were several hours old after inspecting the `locks-age.sql` output +- I also checked the `locks.sql` output, which helpfully lists the blocked PID and the blocking PID, to find one blocking PID that was idle in transaction + - I killed that process and then all other locks were instantly processed +- I filed [a GitHub issue](https://github.com/DSpace/dspace-angular/issues/2103) on dspace-angular requesting the item view to use the bitstream description instead of the file name if present +- Weekly CG Core types meeting + - I need to go through the actions and remove those items that are only for CGSpace internal use, ie: + - CD-ROM + - Manuscript-unpublished + - Photo Report + - Questionnaire + - Wiki +- Weekly CGIAR Repository Working Group meeting +- I did some experiments with Crossref dates for about 20,000 DOIs in CGSpace using my `crossref-doi-lookup.py` script +- Some things I noted from reading the [Crossref API docs](https://github.com/CrossRef/rest-api-doc/blob/master/api_format.md) and inspecting the records for a few dozen DOIs manually: + - `["created"]["date-parts"]` → Date on which the DOI was first registered (not useful for us) + - `["published-print"]["date-parts"]` → Date on which the work was published in print + - `["journal-issue"]["published-print"]["date-parts"]` → When present, is 99% the same as the above + - `["published-online"]["date-parts"]` → Date on which the work was published online + - `["journal-issue"]["published-online"]["date-parts"]` → Much more rare, and only 50% the same as the above, so unreliable + - `["issued"]["date-parts"]` → Earliest of published-print and published-online (not useful to us) +- After checking the DOIs manully I decided that when the `published-print` date exists, it is usually more accurate than our issued dates + - I set 12,300 issue dates to those from Crossref +- I also decided that, when `published-online` exists, it is usually accurate when I check the publisher page (we don't have many online dates to compare) + - I set the available date for ~7,000 items to the published-online date as long as: + - There was no `dcterms.available` date already + - It was different than the issued date, because for now I only want online dates that are different, in case this is an online only journal in which case that can be the issue date... maybe I'll re-visit that later + +## 2023-02-17 + +- It seems some (all?) of the changes I applied to dates last night didn't get saved... + - I don't know what happened, so I will run them again after some investigation + - I submitted the first batch of ~7,600 changes and it took twelve hours! + - I almost cancelled it because after applying the changes there was a lock blocking everything for two hours, and it seemed to be stuck, but I kept checking it and saw that the `query_start` and `state_change` were being updated despite it being state "idle in transaction": + +```console +$ psql -c 'SELECT * FROM pg_stat_activity WHERE pid=1025176' | less -S +``` + +- I will apply the other changes in smaller batches... +- Lately I've noticed a lot of activity from the country code tagger curation task + - Looking in the logs I see items being tagged that are very old and should have already been tagged years ago + - Also, I see a ton of these errors whenever the task is updating an item: + +```console +2023-02-17 08:01:00,252 INFO org.dspace.curate.Curator @ Curation task: countrycodetagger performed on: 10568/89020 with status: 0. Result: '10568/89020: added 1 alpha2 country code(s)' +2023-02-17 08:01:00,467 ERROR com.atmire.versioning.ModificationLogger @ Error while writing item to versioning index: a0fe9d9a-6ac1-4b6a-8fcb-dae07a6bbf58 message:missing required field: epersonID +org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: missing required field: epersonID + at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552) + at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210) + at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206) + at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124) + at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116) + at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102) + at com.atmire.versioning.ModificationLogger.indexItem(ModificationLogger.java:263) + at com.atmire.versioning.ModificationConsumer.end(ModificationConsumer.java:134) + at org.dspace.event.BasicDispatcher.dispatch(BasicDispatcher.java:157) + at org.dspace.core.Context.dispatchEvents(Context.java:455) + at org.dspace.curate.Curator.visit(Curator.java:541) + at org.dspace.curate.Curator$TaskRunner.run(Curator.java:568) + at org.dspace.curate.Curator.doCollection(Curator.java:515) + at org.dspace.curate.Curator.doCommunity(Curator.java:487) + at org.dspace.curate.Curator.doSite(Curator.java:451) + at org.dspace.curate.Curator.curate(Curator.java:269) + at org.dspace.curate.Curator.curate(Curator.java:203) + at org.dspace.curate.CurationCli.main(CurationCli.java:220) + at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) + at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) + at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) + at java.lang.reflect.Method.invoke(Method.java:498) + at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229) + at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81) +``` + +- This must be related... + +## 2023-02-18 + +- I realized why the country-code-tagger was tagging everything: I had overridden the `force` parameter last week! +- Start a harvest on AReS + +## 2023-02-20 + +- IWMI is concerned that some of their items with top Altmetric attention scores don't show up in the AReS Explorer + - I looked into it for one and found that AReS is using the Handle, but Altmetric hasn't associated the Handle with the DOI +- Looking into country and region issues for the PRMS team + - Last week they had some questions about some invalid countries that ended up being typos + - I realized my cgspace-java-helpers country-code-tagger curation task is not using the latest version, so it was missing Türkiye + - I compiled the new version and ran it manually, but I have to upload a new version to Maven Central and then update the dependency in `dspace/modules/additions/pom.xml` ughhhhhh + - I tagged version 6.2 with the change for Türkiye and uploaded to to Maven Central with `mvn clean deploy` +- I'm having second thoughts about switching to UN M.49 for countries because there are just too many tradeoffs + - I want to find a way to keep our existing list, and codify some rules for it + - There are several discussions related to the shortcomings of ISO themselves and the iso-codes project, for example: + - [Inconsistency with articles in ISO-3166-1 English short names](https://salsa.debian.org/iso-codes-team/iso-codes/-/issues/33) (this one was filed by me two years ago!) + - [ISO 3166-1: What's the policy for `common_name`?](https://salsa.debian.org/iso-codes-team/iso-codes/-/issues/44) + - I almost want to say fuck it, let's just use iso-codes and tell everyone to deal with it, but make sure we handle ISO 3166-1 Alpha2 or probably Alpha3 in the future + - Something like: + - Prefer `common_name` if it exists + - Prefer the shorter of `name` and `official name` + +## 2023-02-21 + +- Continue working on my `parse-iso-codes.py` script to parse the iso-codes JSON for ISO 3166-1 + - I also started a spreadsheet to track current CGSpace country names, proposed new names using the compromise above, and UN M.49 names + - I proposed this to Peter but he wasn't happy because there are still some stupidly long and political names there +- I bumped the version of cgspace-java-helpers to 6.2-SNAPSHOT and pushed it to Maven Central because I can't figure out how to get non-snapshot releases to go there +- Ouch, grunt 1.6.0 was released a few weeks ago, which relies on Node.js v16, thus breaking the Mirage 2 build in DSpace 6 + - I filed [an issue in DSpace](https://github.com/DSpace/DSpace/issues/8676) +- Help Moises from CIP troubleshoot harvesting issues on their WordPress site + - I see 2,000 requests with the user agent "RTB website BOT" today and they are all HTTP 200 + +```console +# grep 'RTB website BOT' /var/log/nginx/rest.log | awk '{print $9}' | sort | uniq -c | sort -h + 2023 200 +``` + diff --git a/docs/2023-02/index.html b/docs/2023-02/index.html index a2f3dcac2..355689546 100644 --- a/docs/2023-02/index.html +++ b/docs/2023-02/index.html @@ -18,7 +18,7 @@ I want to try to expand my use of their data to journals, publishers, volumes, i - + @@ -42,9 +42,9 @@ I want to try to expand my use of their data to journals, publishers, volumes, i "@type": "BlogPosting", "headline": "February, 2023", "url": "https://alanorth.github.io/cgspace-notes/2023-02/", - "wordCount": "1245", + "wordCount": "2333", "datePublished": "2023-02-01T10:57:36+03:00", - "dateModified": "2023-02-14T23:13:35+03:00", + "dateModified": "2023-02-15T19:47:13+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -340,7 +340,175 @@ I want to try to expand my use of their data to journals, publishers, volumes, i - +

2023-02-16

+ +
$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c
+     44 dspaceApi
+    372 dspaceCli
+    446 dspaceWeb
+
+

2023-02-17

+ +
$ psql -c 'SELECT * FROM pg_stat_activity WHERE pid=1025176' | less -S
+
+
2023-02-17 08:01:00,252 INFO  org.dspace.curate.Curator @ Curation task: countrycodetagger performed on: 10568/89020 with status: 0. Result: '10568/89020: added 1 alpha2 country code(s)'
+2023-02-17 08:01:00,467 ERROR com.atmire.versioning.ModificationLogger @ Error while writing item to versioning index: a0fe9d9a-6ac1-4b6a-8fcb-dae07a6bbf58 message:missing required field: epersonID
+org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: missing required field: epersonID
+        at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552)
+        at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
+        at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
+        at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
+        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
+        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)
+        at com.atmire.versioning.ModificationLogger.indexItem(ModificationLogger.java:263)
+        at com.atmire.versioning.ModificationConsumer.end(ModificationConsumer.java:134)
+        at org.dspace.event.BasicDispatcher.dispatch(BasicDispatcher.java:157)
+        at org.dspace.core.Context.dispatchEvents(Context.java:455)
+        at org.dspace.curate.Curator.visit(Curator.java:541)
+        at org.dspace.curate.Curator$TaskRunner.run(Curator.java:568)
+        at org.dspace.curate.Curator.doCollection(Curator.java:515)
+        at org.dspace.curate.Curator.doCommunity(Curator.java:487)
+        at org.dspace.curate.Curator.doSite(Curator.java:451)
+        at org.dspace.curate.Curator.curate(Curator.java:269)
+        at org.dspace.curate.Curator.curate(Curator.java:203)
+        at org.dspace.curate.CurationCli.main(CurationCli.java:220)
+        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
+        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
+        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
+        at java.lang.reflect.Method.invoke(Method.java:498)
+        at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229)
+        at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)
+
+

2023-02-18

+ +

2023-02-20

+ +

2023-02-21

+ +
# grep 'RTB website BOT' /var/log/nginx/rest.log | awk '{print $9}' | sort | uniq -c | sort -h
+   2023 200
+
diff --git a/docs/categories/index.html b/docs/categories/index.html index f1671b2cf..912d07caf 100644 --- a/docs/categories/index.html +++ b/docs/categories/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html index b813552a6..ce23dabd3 100644 --- a/docs/categories/notes/index.html +++ b/docs/categories/notes/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html index cd6607640..d46a2d839 100644 --- a/docs/categories/notes/page/2/index.html +++ b/docs/categories/notes/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html index ff679c8e9..15ac92603 100644 --- a/docs/categories/notes/page/3/index.html +++ b/docs/categories/notes/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html index dcbd6cc37..64f00b47e 100644 --- a/docs/categories/notes/page/4/index.html +++ b/docs/categories/notes/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html index 61cc19d1b..4cf1c6e57 100644 --- a/docs/categories/notes/page/5/index.html +++ b/docs/categories/notes/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/6/index.html b/docs/categories/notes/page/6/index.html index dd33f75ff..5270c1ac1 100644 --- a/docs/categories/notes/page/6/index.html +++ b/docs/categories/notes/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/7/index.html b/docs/categories/notes/page/7/index.html index adfb68ff2..fd22b8025 100644 --- a/docs/categories/notes/page/7/index.html +++ b/docs/categories/notes/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/index.html b/docs/index.html index 5c5f3d351..4616f1616 100644 --- a/docs/index.html +++ b/docs/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/10/index.html b/docs/page/10/index.html index 3f4465936..7c5c9ff30 100644 --- a/docs/page/10/index.html +++ b/docs/page/10/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/2/index.html b/docs/page/2/index.html index 32a14fec3..0f3391a81 100644 --- a/docs/page/2/index.html +++ b/docs/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/3/index.html b/docs/page/3/index.html index f78cad368..f6f19e485 100644 --- a/docs/page/3/index.html +++ b/docs/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/4/index.html b/docs/page/4/index.html index 2e70a9af4..4f1670e89 100644 --- a/docs/page/4/index.html +++ b/docs/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/5/index.html b/docs/page/5/index.html index ceb1fce6d..c3b13112f 100644 --- a/docs/page/5/index.html +++ b/docs/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/6/index.html b/docs/page/6/index.html index df3e5f1cb..e20b29065 100644 --- a/docs/page/6/index.html +++ b/docs/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/7/index.html b/docs/page/7/index.html index 1daed6e1d..9f70064d4 100644 --- a/docs/page/7/index.html +++ b/docs/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/8/index.html b/docs/page/8/index.html index c2a809210..6a7eae01b 100644 --- a/docs/page/8/index.html +++ b/docs/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/9/index.html b/docs/page/9/index.html index 021eee323..fba1bed8c 100644 --- a/docs/page/9/index.html +++ b/docs/page/9/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/index.html b/docs/posts/index.html index e0efb1fff..c4c1130e4 100644 --- a/docs/posts/index.html +++ b/docs/posts/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/10/index.html b/docs/posts/page/10/index.html index edf8710e9..3c05be58b 100644 --- a/docs/posts/page/10/index.html +++ b/docs/posts/page/10/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html index 4815b7ce0..d7d1ea6ea 100644 --- a/docs/posts/page/2/index.html +++ b/docs/posts/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html index 0a49a4e2f..b2b78ac0b 100644 --- a/docs/posts/page/3/index.html +++ b/docs/posts/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html index 6ae6f5e22..07e56d283 100644 --- a/docs/posts/page/4/index.html +++ b/docs/posts/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html index d70362054..14ada67a9 100644 --- a/docs/posts/page/5/index.html +++ b/docs/posts/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html index 7a5803aa9..c1be48781 100644 --- a/docs/posts/page/6/index.html +++ b/docs/posts/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html index 189933819..9e98df96c 100644 --- a/docs/posts/page/7/index.html +++ b/docs/posts/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/8/index.html b/docs/posts/page/8/index.html index d8c34d347..c1deaa208 100644 --- a/docs/posts/page/8/index.html +++ b/docs/posts/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/9/index.html b/docs/posts/page/9/index.html index 302ee1fcf..f5df103c8 100644 --- a/docs/posts/page/9/index.html +++ b/docs/posts/page/9/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 47ca30b24..779ffd1d6 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -3,19 +3,19 @@ xmlns:xhtml="http://www.w3.org/1999/xhtml"> https://alanorth.github.io/cgspace-notes/categories/ - 2023-02-14T23:13:35+03:00 + 2023-02-15T19:47:13+03:00 https://alanorth.github.io/cgspace-notes/ - 2023-02-14T23:13:35+03:00 + 2023-02-15T19:47:13+03:00 https://alanorth.github.io/cgspace-notes/2023-02/ - 2023-02-14T23:13:35+03:00 + 2023-02-15T19:47:13+03:00 https://alanorth.github.io/cgspace-notes/categories/notes/ - 2023-02-14T23:13:35+03:00 + 2023-02-15T19:47:13+03:00 https://alanorth.github.io/cgspace-notes/posts/ - 2023-02-14T23:13:35+03:00 + 2023-02-15T19:47:13+03:00 https://alanorth.github.io/cgspace-notes/2023-01/ 2023-01-31T22:20:38+03:00