diff --git a/content/posts/2020-12.md b/content/posts/2020-12.md index 9f1d7d66e..8b7cbe205 100644 --- a/content/posts/2020-12.md +++ b/content/posts/2020-12.md @@ -527,6 +527,7 @@ $ cat 2020-12-17-update-ILRI-author.csv dc.contributor.author,correct "Padmakumar, V.P.","Varijakshapanicker, Padmakumar" $ ./fix-metadata-values.py -i 2020-12-17-update-ILRI-author.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t 'correct' -m 3 +``` - Abenet needed a list of all 2020 outputs from the Livestock CRP that were Limited Access - I exported the community from CGSpace and used `csvcut` and `csvgrep` to get a list: diff --git a/content/posts/2021-01.md b/content/posts/2021-01.md index 4c2ea4677..c61591be1 100644 --- a/content/posts/2021-01.md +++ b/content/posts/2021-01.md @@ -10,7 +10,53 @@ categories: ["Notes"] - Peter notified me that some filters on AReS were broken again - It's the same issue with the field names getting `.keyword` appended to the end that I already [filed an issue on OpenRXV about last month](https://github.com/ilri/OpenRXV/issues/66) - I fixed the broken filters (careful to not edit any others, lest they break too!) +- Fix an issue with start page number for the DSpace REST API and statistics API in OpenRXV + - The start page had been "1" in the UI, but in the backend they were doing some gymnastics to adjust to the zero-based offset/limit/page of the DSpace REST API and the statistics API + - I adjusted it to default to 0 and added a note to the admin screen + - I realized that this issue was actually causing the first page of 100 statistics to be missing... + - For example, [this item](https://cgspace.cgiar.org/handle/10568/66839) has 51 views on CGSpace, but 0 on AReS +- Start a re-index on AReS + - First delete the old Elasticsearch temp index: + +```console +$ curl -XDELETE 'http://localhost:9200/openrxv-items-temp' +# start indexing in AReS +``` + +- Then, the next morning when it's done, check the results of the harvesting, backup the current `openrxv-items` index, and clone the `openrxv-items-temp` index to `openrxv-items`: + +```console +$ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*&pretty' +{ + "count" : 100278, + "_shards" : { + "total" : 1, + "successful" : 1, + "skipped" : 0, + "failed" : 0 + } +} +$ curl -X PUT "localhost:9200/openrxv-items/_settings" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": true}}' +$ curl -s -X POST http://localhost:9200/openrxv-items/_clone/openrxv-items-2021-01-04 +$ curl -XDELETE 'http://localhost:9200/openrxv-items' +$ curl -X PUT "localhost:9200/openrxv-items-temp/_settings" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": true}}' +$ curl -s -X POST http://localhost:9200/openrxv-items-temp/_clone/openrxv-items +$ curl -XDELETE 'http://localhost:9200/openrxv-items-temp' +$ curl -XDELETE 'http://localhost:9200/openrxv-items-2021-01-04' +``` + +## 2021-01-04 + +- There is one item that appears twice in AReS: [10568/66839](https://cgspace.cgiar.org/handle/10568/66839) + - If I use the Handle filter I see it twice... whereas other items don't appear twice + - I filed a bug on OpenRXV: https://github.com/ilri/OpenRXV/issues/67 +- Help Peter troubleshoot an issue with Altmetric badges on AReS + - He generated a report of our repository from Altmetric and noticed that many were missing scores despite having scores on CGSpace item pages + - AReS harvest Altmetric scores using the Handle prefix (10568) in batch, while CGSpace uses the DOI if it is found, and falls back to using the Handle + - I think it's due to the fact that some items were never tweeted, so Altmetric never made the link between the DOI and the Handle + - I did some tweets of five items and within an hour or so the DOI API link registers the associated Handle, and within an hour or so the Handle API link is live with the same score + diff --git a/docs/2020-12/index.html b/docs/2020-12/index.html index 2eb6a8684..87716c736 100644 --- a/docs/2020-12/index.html +++ b/docs/2020-12/index.html @@ -20,7 +20,7 @@ I started processing those (about 411,000 records): - + @@ -46,9 +46,9 @@ I started processing those (about 411,000 records): "@type": "BlogPosting", "headline": "December, 2020", "url": "https://alanorth.github.io/cgspace-notes/2020-12/", - "wordCount": "3785", + "wordCount": "3772", "datePublished": "2020-12-01T11:32:54+02:00", - "dateModified": "2020-12-30T09:44:45+02:00", + "dateModified": "2021-01-04T14:09:58+02:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -631,7 +631,7 @@ java.lang.UnsupportedOperationException: Multiple update components target the s
$ csvcut -c ‘dc.identifier.citation[en_US],dc.identifier.uri,dc.identifier.uri[],dc.identifier.uri[en_US],dc.date.issued,dc.date.issued[],dc.date.issued[en_US],cg.identifier.status[en_US]’ ~/Downloads/10568-80099.csv | csvgrep -c ‘cg.identifier.status[en_US]’ -m ‘Limited Access’ | csvgrep -c ‘dc.date.issued’ -m 2020 -c ‘dc.date.issued[]’ -m 2020 -c ‘dc.date.issued[en_US]’ -m 2020 > /tmp/limited-2020.csv
-
-## 2020-12-18
-
-- I added support for indexing community views and downloads to [dspace-statistics-api](https://github.com/ilri/dspace-statistics-api)
- - I still have to add the API endpoints to make the stats available
- - Also, I played a little bit with Swagger via [falcon-swagger-ui](https://github.com/rdidyk/falcon-swagger-ui) and I think I can get that working for better API documentation / testing
-- Atmire sent some feedback on the DeduplicateValuesProcessor
- - They confirm that it should process _all_ duplicates, not just those in `owningComm` and `owningColl`
- - They asked me to try it again on DSpace Test now that I've resync'd the Solr statistics cores from production
- - I started processing the statistics core on DSpace Test
-
-## 2020-12-20
-
-- The DeduplicateValuesProcessor has been running on DSpace Test since two days ago and it almost completed its second twelve-hour run, but crashed near the end:
-
-```console
-...
+
csvcut
and csvgrep
to get a list:$ csvcut -c 'dc.identifier.citation[en_US],dc.identifier.uri,dc.identifier.uri[],dc.identifier.uri[en_US],dc.date.issued,dc.date.issued[],dc.date.issued[en_US],cg.identifier.status[en_US]' ~/Downloads/10568-80099.csv | csvgrep -c 'cg.identifier.status[en_US]' -m 'Limited Access' | csvgrep -c 'dc.date.issued' -m 2020 -c 'dc.date.issued[]' -m 2020 -c 'dc.date.issued[en_US]' -m 2020 > /tmp/limited-2020.csv
+
owningComm
and owningColl
...
Run 1 — 100% — 8,230,000/8,239,228 docs — 39s — 9h 8m 31s
Exception: Java heap space
java.lang.OutOfMemoryError: Java heap space
diff --git a/docs/2021-01/index.html b/docs/2021-01/index.html
index 841ce9706..8c8bddfb0 100644
--- a/docs/2021-01/index.html
+++ b/docs/2021-01/index.html
@@ -15,6 +15,14 @@ It’s the same issue with the field names getting .keyword appended to the
I fixed the broken filters (careful to not edit any others, lest they break too!)
+Fix an issue with start page number for the DSpace REST API and statistics API in OpenRXV
+
+The start page had been “1” in the UI, but in the backend they were doing some gymnastics to adjust to the zero-based offset/limit/page of the DSpace REST API and the statistics API
+I adjusted it to default to 0 and added a note to the admin screen
+I realized that this issue was actually causing the first page of 100 statistics to be missing…
+For example, this item has 51 views on CGSpace, but 0 on AReS
+
+
" />
@@ -33,6 +41,14 @@ It’s the same issue with the field names getting .keyword appended to the
I fixed the broken filters (careful to not edit any others, lest they break too!)
+Fix an issue with start page number for the DSpace REST API and statistics API in OpenRXV
+
+The start page had been “1” in the UI, but in the backend they were doing some gymnastics to adjust to the zero-based offset/limit/page of the DSpace REST API and the statistics API
+I adjusted it to default to 0 and added a note to the admin screen
+I realized that this issue was actually causing the first page of 100 statistics to be missing…
+For example, this item has 51 views on CGSpace, but 0 on AReS
+
+
"/>
@@ -44,7 +60,7 @@ I fixed the broken filters (careful to not edit any others, lest they break too!
"@type": "BlogPosting",
"headline": "January, 2021",
"url": "https://alanorth.github.io/cgspace-notes/2021-01/",
- "wordCount": "52",
+ "wordCount": "420",
"datePublished": "2021-01-03T10:13:54+02:00",
"dateModified": "2021-01-03T10:15:07+02:00",
"author": {
@@ -128,6 +144,60 @@ I fixed the broken filters (careful to not edit any others, lest they break too!
I fixed the broken filters (careful to not edit any others, lest they break too!)
+Fix an issue with start page number for the DSpace REST API and statistics API in OpenRXV
+
+- The start page had been “1” in the UI, but in the backend they were doing some gymnastics to adjust to the zero-based offset/limit/page of the DSpace REST API and the statistics API
+- I adjusted it to default to 0 and added a note to the admin screen
+- I realized that this issue was actually causing the first page of 100 statistics to be missing…
+- For example, this item has 51 views on CGSpace, but 0 on AReS
+
+
+
+
+- Start a re-index on AReS
+
+- First delete the old Elasticsearch temp index:
+
+
+
+$ curl -XDELETE 'http://localhost:9200/openrxv-items-temp'
+# start indexing in AReS
+
+- Then, the next morning when it’s done, check the results of the harvesting, backup the current
openrxv-items
index, and clone the openrxv-items-temp
index to openrxv-items
:
+
+$ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*&pretty'
+{
+ "count" : 100278,
+ "_shards" : {
+ "total" : 1,
+ "successful" : 1,
+ "skipped" : 0,
+ "failed" : 0
+ }
+}
+$ curl -X PUT "localhost:9200/openrxv-items/_settings" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": true}}'
+$ curl -s -X POST http://localhost:9200/openrxv-items/_clone/openrxv-items-2021-01-04
+$ curl -XDELETE 'http://localhost:9200/openrxv-items'
+$ curl -X PUT "localhost:9200/openrxv-items-temp/_settings" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": true}}'
+$ curl -s -X POST http://localhost:9200/openrxv-items-temp/_clone/openrxv-items
+$ curl -XDELETE 'http://localhost:9200/openrxv-items-temp'
+$ curl -XDELETE 'http://localhost:9200/openrxv-items-2021-01-04'
+
2021-01-04
+
+- There is one item that appears twice in AReS: 10568/66839
+
+- If I use the Handle filter I see it twice… whereas other items don’t appear twice
+- I filed a bug on OpenRXV: https://github.com/ilri/OpenRXV/issues/67
+
+
+- Help Peter troubleshoot an issue with Altmetric badges on AReS
+
+- He generated a report of our repository from Altmetric and noticed that many were missing scores despite having scores on CGSpace item pages
+- AReS harvest Altmetric scores using the Handle prefix (10568) in batch, while CGSpace uses the DOI if it is found, and falls back to using the Handle
+- I think it’s due to the fact that some items were never tweeted, so Altmetric never made the link between the DOI and the Handle
+- I did some tweets of five items and within an hour or so the DOI API link registers the associated Handle, and within an hour or so the Handle API link is live with the same score
+
+
diff --git a/docs/categories/index.html b/docs/categories/index.html
index adfe7163d..475109d04 100644
--- a/docs/categories/index.html
+++ b/docs/categories/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html
index 461476b4f..1e83c828a 100644
--- a/docs/categories/notes/index.html
+++ b/docs/categories/notes/index.html
@@ -10,7 +10,7 @@
-
+
@@ -98,6 +98,14 @@
I fixed the broken filters (careful to not edit any others, lest they break too!)
+Fix an issue with start page number for the DSpace REST API and statistics API in OpenRXV
+
+- The start page had been “1” in the UI, but in the backend they were doing some gymnastics to adjust to the zero-based offset/limit/page of the DSpace REST API and the statistics API
+- I adjusted it to default to 0 and added a note to the admin screen
+- I realized that this issue was actually causing the first page of 100 statistics to be missing…
+- For example, this item has 51 views on CGSpace, but 0 on AReS
+
+
Read more →
diff --git a/docs/categories/notes/index.xml b/docs/categories/notes/index.xml
index e95909bc2..e614f052d 100644
--- a/docs/categories/notes/index.xml
+++ b/docs/categories/notes/index.xml
@@ -21,6 +21,14 @@
<li>I fixed the broken filters (careful to not edit any others, lest they break too!)</li>
</ul>
</li>
+<li>Fix an issue with start page number for the DSpace REST API and statistics API in OpenRXV
+<ul>
+<li>The start page had been “1” in the UI, but in the backend they were doing some gymnastics to adjust to the zero-based offset/limit/page of the DSpace REST API and the statistics API</li>
+<li>I adjusted it to default to 0 and added a note to the admin screen</li>
+<li>I realized that this issue was actually causing the first page of 100 statistics to be missing…</li>
+<li>For example, <a href="https://cgspace.cgiar.org/handle/10568/66839">this item</a> has 51 views on CGSpace, but 0 on AReS</li>
+</ul>
+</li>
</ul>
diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html
index 27930c833..c7b288e8e 100644
--- a/docs/categories/notes/page/2/index.html
+++ b/docs/categories/notes/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html
index 035987f08..2673d731d 100644
--- a/docs/categories/notes/page/3/index.html
+++ b/docs/categories/notes/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html
index 995aaa972..b3477808b 100644
--- a/docs/categories/notes/page/4/index.html
+++ b/docs/categories/notes/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html
index ca5a3b479..06751fdab 100644
--- a/docs/categories/notes/page/5/index.html
+++ b/docs/categories/notes/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/index.html b/docs/index.html
index 48cf0767e..4e6b526c4 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -10,7 +10,7 @@
-
+
@@ -113,6 +113,14 @@
I fixed the broken filters (careful to not edit any others, lest they break too!)
+Fix an issue with start page number for the DSpace REST API and statistics API in OpenRXV
+
+- The start page had been “1” in the UI, but in the backend they were doing some gymnastics to adjust to the zero-based offset/limit/page of the DSpace REST API and the statistics API
+- I adjusted it to default to 0 and added a note to the admin screen
+- I realized that this issue was actually causing the first page of 100 statistics to be missing…
+- For example, this item has 51 views on CGSpace, but 0 on AReS
+
+
Read more →
diff --git a/docs/index.xml b/docs/index.xml
index 022398f96..62f3cb90b 100644
--- a/docs/index.xml
+++ b/docs/index.xml
@@ -21,6 +21,14 @@
<li>I fixed the broken filters (careful to not edit any others, lest they break too!)</li>
</ul>
</li>
+<li>Fix an issue with start page number for the DSpace REST API and statistics API in OpenRXV
+<ul>
+<li>The start page had been “1” in the UI, but in the backend they were doing some gymnastics to adjust to the zero-based offset/limit/page of the DSpace REST API and the statistics API</li>
+<li>I adjusted it to default to 0 and added a note to the admin screen</li>
+<li>I realized that this issue was actually causing the first page of 100 statistics to be missing…</li>
+<li>For example, <a href="https://cgspace.cgiar.org/handle/10568/66839">this item</a> has 51 views on CGSpace, but 0 on AReS</li>
+</ul>
+</li>
</ul>
diff --git a/docs/page/2/index.html b/docs/page/2/index.html
index 0cc53b002..01a261d5c 100644
--- a/docs/page/2/index.html
+++ b/docs/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/3/index.html b/docs/page/3/index.html
index c48a56081..d773da54d 100644
--- a/docs/page/3/index.html
+++ b/docs/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/4/index.html b/docs/page/4/index.html
index 11340fc0a..709dbb0f5 100644
--- a/docs/page/4/index.html
+++ b/docs/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/5/index.html b/docs/page/5/index.html
index 30e4bd0f8..c5193e232 100644
--- a/docs/page/5/index.html
+++ b/docs/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/6/index.html b/docs/page/6/index.html
index 1add13400..0ad01c239 100644
--- a/docs/page/6/index.html
+++ b/docs/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/7/index.html b/docs/page/7/index.html
index b3aca4237..68a7d4f77 100644
--- a/docs/page/7/index.html
+++ b/docs/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/index.html b/docs/posts/index.html
index a7f160af8..f4ed8d2e2 100644
--- a/docs/posts/index.html
+++ b/docs/posts/index.html
@@ -10,7 +10,7 @@
-
+
@@ -113,6 +113,14 @@
I fixed the broken filters (careful to not edit any others, lest they break too!)
+Fix an issue with start page number for the DSpace REST API and statistics API in OpenRXV
+
+- The start page had been “1” in the UI, but in the backend they were doing some gymnastics to adjust to the zero-based offset/limit/page of the DSpace REST API and the statistics API
+- I adjusted it to default to 0 and added a note to the admin screen
+- I realized that this issue was actually causing the first page of 100 statistics to be missing…
+- For example, this item has 51 views on CGSpace, but 0 on AReS
+
+
Read more →
diff --git a/docs/posts/index.xml b/docs/posts/index.xml
index fe7f8418b..da4b926b1 100644
--- a/docs/posts/index.xml
+++ b/docs/posts/index.xml
@@ -21,6 +21,14 @@
<li>I fixed the broken filters (careful to not edit any others, lest they break too!)</li>
</ul>
</li>
+<li>Fix an issue with start page number for the DSpace REST API and statistics API in OpenRXV
+<ul>
+<li>The start page had been “1” in the UI, but in the backend they were doing some gymnastics to adjust to the zero-based offset/limit/page of the DSpace REST API and the statistics API</li>
+<li>I adjusted it to default to 0 and added a note to the admin screen</li>
+<li>I realized that this issue was actually causing the first page of 100 statistics to be missing…</li>
+<li>For example, <a href="https://cgspace.cgiar.org/handle/10568/66839">this item</a> has 51 views on CGSpace, but 0 on AReS</li>
+</ul>
+</li>
</ul>
diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html
index 3063d2438..ada0e5bc4 100644
--- a/docs/posts/page/2/index.html
+++ b/docs/posts/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html
index 36c4fbf42..7c3b75a11 100644
--- a/docs/posts/page/3/index.html
+++ b/docs/posts/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html
index 90cba647c..9cdbb2d67 100644
--- a/docs/posts/page/4/index.html
+++ b/docs/posts/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html
index 4fe2f7e19..3b8d6c203 100644
--- a/docs/posts/page/5/index.html
+++ b/docs/posts/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html
index a9b2288e6..d947741b5 100644
--- a/docs/posts/page/6/index.html
+++ b/docs/posts/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html
index 465414c73..7660fab54 100644
--- a/docs/posts/page/7/index.html
+++ b/docs/posts/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index 9f3a48b44..b776f2075 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -4,12 +4,12 @@
https://alanorth.github.io/cgspace-notes/categories/
- 2021-01-03T10:15:07+02:00
+ 2021-01-04T14:09:58+02:00
https://alanorth.github.io/cgspace-notes/
- 2021-01-03T10:15:07+02:00
+ 2021-01-04T14:09:58+02:00
@@ -19,17 +19,17 @@
https://alanorth.github.io/cgspace-notes/categories/notes/
- 2021-01-03T10:15:07+02:00
+ 2021-01-04T14:09:58+02:00
https://alanorth.github.io/cgspace-notes/posts/
- 2021-01-03T10:15:07+02:00
+ 2021-01-04T14:09:58+02:00
https://alanorth.github.io/cgspace-notes/2020-12/
- 2020-12-30T09:44:45+02:00
+ 2021-01-04T14:09:58+02:00