mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Update notes for 2020-12-14
This commit is contained in:
@ -241,4 +241,114 @@ $ curl -XDELETE http://localhost:9200/openrxv-items-final
|
||||
$ curl -XDELETE http://localhost:9200/openrxv-items-temp
|
||||
```
|
||||
|
||||
- Peter asked me for a list of all submitters and approvers that were active recently on CGSpace
|
||||
- I can probably extract that from the `dc.description.provenance` field, for example any that contains a 2020 date:
|
||||
|
||||
```console
|
||||
localhost/dspace63= > SELECT * FROM metadatavalue WHERE metadata_field_id=28 AND text_value ~ '^.*on 2020-[0-9]{2}-*';
|
||||
```
|
||||
|
||||
## 2020-12-14
|
||||
|
||||
- The re-harvesting finished last night on AReS but there are no records in the `openrxv-items-final` index
|
||||
- Strangely, there are 99,000 items in the temp index:
|
||||
|
||||
```console
|
||||
$ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*' | json_pp
|
||||
{
|
||||
"count" : 99992,
|
||||
"_shards" : {
|
||||
"skipped" : 0,
|
||||
"total" : 1,
|
||||
"failed" : 0,
|
||||
"successful" : 1
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- I'm going to try to [clone](https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-clone-index.html) the temp index to the final one...
|
||||
- First, set the `openrxv-items-temp` index to block writes (read only) and then clone it to `openrxv-items-final`:
|
||||
|
||||
```console
|
||||
$ curl -X PUT "localhost:9200/openrxv-items-temp/_settings?pretty" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": true}}'
|
||||
$ curl -s -X POST http://localhost:9200/openrxv-items-temp/_clone/openrxv-items-final
|
||||
{"acknowledged":true,"shards_acknowledged":true,"index":"openrxv-items-final"}
|
||||
$ curl -X PUT "localhost:9200/openrxv-items-temp/_settings?pretty" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": false}}'
|
||||
```
|
||||
|
||||
- Now I see that the `openrxv-items-final` index has items, but there are still none in AReS Explorer UI!
|
||||
|
||||
```console
|
||||
$ curl -s 'http://localhost:9200/openrxv-items-final/_count?q=*&pretty'
|
||||
{
|
||||
"count" : 99992,
|
||||
"_shards" : {
|
||||
"total" : 1,
|
||||
"successful" : 1,
|
||||
"skipped" : 0,
|
||||
"failed" : 0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- The api logs show this from last night after the harvesting:
|
||||
|
||||
```console
|
||||
[Nest] 92 - 12/13/2020, 1:58:52 PM [HarvesterService] Starting Harvest
|
||||
[Nest] 92 - 12/13/2020, 10:50:20 PM [FetchConsumer] OnGlobalQueueDrained
|
||||
[Nest] 92 - 12/13/2020, 11:00:20 PM [PluginsConsumer] OnGlobalQueueDrained
|
||||
[Nest] 92 - 12/13/2020, 11:00:20 PM [HarvesterService] reindex function is called
|
||||
(node:92) UnhandledPromiseRejectionWarning: ResponseError: index_not_found_exception
|
||||
at IncomingMessage.<anonymous> (/backend/node_modules/@elastic/elasticsearch/lib/Transport.js:232:25)
|
||||
at IncomingMessage.emit (events.js:326:22)
|
||||
at endReadableNT (_stream_readable.js:1223:12)
|
||||
at processTicksAndRejections (internal/process/task_queues.js:84:21)
|
||||
```
|
||||
|
||||
- But I'm not sure why the frontend doesn't show any data despite there being documents in the index...
|
||||
- I talked to Moayad and he reminded me that OpenRXV uses an alias to point to temp and final indexes, but the UI actually uses the `openrxv-items` index
|
||||
- I cloned the `openrxv-items-final` index to `openrxv-items` index and now I see items in the explorer UI
|
||||
- The PDF report was broken and I looked in the API logs and saw this:
|
||||
|
||||
```console
|
||||
(node:94) UnhandledPromiseRejectionWarning: Error: Error: Could not find soffice binary
|
||||
at ExportService.downloadFile (/backend/dist/export/services/export/export.service.js:51:19)
|
||||
at processTicksAndRejections (internal/process/task_queues.js:97:5)
|
||||
```
|
||||
|
||||
- I installed `unoconv` in the backend api container and now it works... but I wonder why this changed...
|
||||
- Skype with Abenet and Peter to discuss AReS that will be shown to ILRI scientists this week
|
||||
- Peter noticed that [this item](https://hdl.handle.net/10568/110133) from the [ILRI policy and research briefs](https://cgspace.cgiar.org/handle/10568/24450) collection is missing in AReS, despite it being added one month ago in CGSpace and me harvesting on AReS last night
|
||||
- The item appears fine in the REST API when I check the items in that collection
|
||||
- Peter also noticed that [this item](https://hdl.handle.net/10568/110447) appears twice in AReS
|
||||
- The item is _not_ duplicated on CGSpace or in the REST API
|
||||
- We noticed that there are 136 items in the ILRI policy and research briefs collection according to AReS, yet on CGSpace there are only 132
|
||||
- This is confirmed in the REST API (using [query-json](https://github.com/davesnx/query-json)):
|
||||
|
||||
```
|
||||
$ http --print b 'https://cgspace.cgiar.org/rest/collections/defee001-8cc8-4a6c-8ac8-21bb5adab2db?expand=all&limit=100&offset=0' | json_pp > /tmp/policy1.json
|
||||
$ http --print b 'https://cgspace.cgiar.org/rest/collections/defee001-8cc8-4a6c-8ac8-21bb5adab2db?expand=all&limit=100&offset=100' | json_pp > /tmp/policy2.json
|
||||
$ query-json '.items | length' /tmp/policy1.json
|
||||
100
|
||||
$ query-json '.items | length' /tmp/policy2.json
|
||||
32
|
||||
```
|
||||
|
||||
- I realized that the issue of missing/duplicate items in AReS might be because of this [REST API bug that causes /items to return items in non-deterministic order](https://jira.lyrasis.org/browse/DS-3849)
|
||||
- I decided to cherry-pick the following two patches from DSpace 6.4 into our `6_x-prod` (6.3) branch:
|
||||
- High CPU usage when calling the collection_id/items REST endpoint
|
||||
- Jira: https://jira.lyrasis.org/browse/DS-4342
|
||||
- c2e6719fa763e291b81b2d61da2f8c758fe38ff3
|
||||
- REST API items resource returns items in non-deterministic order
|
||||
- Jira: https://jira.lyrasis.org/browse/DS-3849
|
||||
- 2a2ea0cb5d03e6da9355a2eff12aad667e465433
|
||||
- After deploying the REST API fixes I decided to harvest from AReS again to see if the missing and duplicate items get fixed
|
||||
- I made a backup of the current `openrxv-items-temp` index just in case:
|
||||
|
||||
```console
|
||||
$ curl -X PUT "localhost:9200/openrxv-items-temp/_settings?pretty" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": true}}'
|
||||
$ curl -s -X POST http://localhost:9200/openrxv-items-temp/_clone/openrxv-items-2020-12-14
|
||||
$ curl -X PUT "localhost:9200/openrxv-items-temp/_settings?pretty" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": false}}'
|
||||
```
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user