mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2021-03-22
This commit is contained in:
@ -367,4 +367,137 @@ $ curl -s 'http://localhost:9200/_alias/' | python -m json.tool | less
|
||||
- I also made some minor optimizations in the Pandas code
|
||||
- I [tagged version 0.4.7 of csv-metadata-quality on GitHub](https://github.com/ilri/csv-metadata-quality/releases/tag/v0.4.7)
|
||||
|
||||
## 2021-03-18
|
||||
|
||||
- I added the ability to check for, and fix, "mojibake" characters in csv-metadata-quality
|
||||
|
||||
## 2021-03-21
|
||||
|
||||
- Last week Atmire asked me which browser I was using to test the duplicate checker, which I had [reported](https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=934) as not loading
|
||||
- I tried to load it in Chrome and it works... hmmm
|
||||
- Back up the current `openrxv-items-final` index to start a fresh AReS Harvest:
|
||||
|
||||
```console
|
||||
$ curl -X PUT "localhost:9200/openrxv-items-final/_settings" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": true}}'
|
||||
$ curl -s -X POST http://localhost:9200/openrxv-items-final/_clone/openrxv-items-final-2021-03-21
|
||||
$ curl -X PUT "localhost:9200/openrxv-items-final/_settings" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": false}}'
|
||||
```
|
||||
|
||||
- Then start harvesting in the AReS Explorer admin UI
|
||||
|
||||
## 2021-03-22
|
||||
|
||||
- The harvesting on AReS yesterday completed, but somehow I have twice the number of items:
|
||||
|
||||
```console
|
||||
$ curl -s 'http://localhost:9200/openrxv-items-final/_count?q=*&pretty'
|
||||
{
|
||||
"count" : 206204,
|
||||
"_shards" : {
|
||||
"total" : 1,
|
||||
"successful" : 1,
|
||||
"skipped" : 0,
|
||||
"failed" : 0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- Hmmm and even my backup index has a strange number of items:
|
||||
|
||||
```console
|
||||
$ curl -s 'http://localhost:9200/openrxv-items-final-2021-03-21/_count?q=*&pretty'
|
||||
{
|
||||
"count" : 844,
|
||||
"_shards" : {
|
||||
"total" : 1,
|
||||
"successful" : 1,
|
||||
"skipped" : 0,
|
||||
"failed" : 0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- I deleted all indexes and re-created the openrxv-items alias:
|
||||
|
||||
```console
|
||||
$ curl -s -X POST 'http://localhost:9200/_aliases' -H 'Content-Type: application/json' -d'{"actions" : [{"add" : { "index" : "openrxv-items-final", "alias" : "openrxv-items"}}]}'
|
||||
$ curl -s 'http://localhost:9200/_alias/' | python -m json.tool | less
|
||||
...
|
||||
"openrxv-items-temp": {
|
||||
"aliases": {}
|
||||
},
|
||||
"openrxv-items-final": {
|
||||
"aliases": {
|
||||
"openrxv-items": {}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- Then I started a new harvesting
|
||||
- I switched the Node.js in the [Ansible infrastructure scripts](https://github.com/ilri/rmg-ansible-public) to v12 since v10 will cease to be supported soon
|
||||
- I re-deployed DSpace Test (linode26) with Node.js 12 and restarted the server
|
||||
- The AReS harvest finally finished, with 1047 pages of items, but the `openrxv-items-final` index is empty and the `openrxv-items-temp` index has a 103,000 items:
|
||||
|
||||
```console
|
||||
$ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*&pretty'
|
||||
{
|
||||
"count" : 103162,
|
||||
"_shards" : {
|
||||
"total" : 1,
|
||||
"successful" : 1,
|
||||
"skipped" : 0,
|
||||
"failed" : 0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- I tried to clone the temp index to the final, but got an error:
|
||||
|
||||
```console
|
||||
$ curl -s -X POST http://localhost:9200/openrxv-items-temp/_clone/openrxv-items-final
|
||||
{"error":{"root_cause":[{"type":"resource_already_exists_exception","reason":"index [openrxv-items-final/LmxH-rQsTRmTyWex2d8jxw] already exists","index_uuid":"LmxH-rQsTRmTyWex2d8jxw","index":"openrxv-items-final"}],"type":"resource_already_exists_exception","reason":"index [openrxv-items-final/LmxH-rQsTRmTyWex2d8jxw] already exists","index_uuid":"LmxH-rQsTRmTyWex2d8jxw","index":"openrxv-items-final"},"status":400}%
|
||||
```
|
||||
|
||||
- I looked in the Docker logs for Elasticsearch and saw a few memory errors:
|
||||
|
||||
```console
|
||||
java.lang.OutOfMemoryError: Java heap space
|
||||
```
|
||||
|
||||
- According to `/usr/share/elasticsearch/config/jvm.options` in the Elasticsearch container the default JVM heap is 1g
|
||||
- I see the running Java process has `-Xms 1g -Xmx 1g` in its process invocation so I guess that it must be indeed using 1g
|
||||
- We can [change the heap size with the ES_JAVA_OPTS environment variable](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html)
|
||||
- Or perhaps better, we should [use a jvm.options.d file](https://www.elastic.co/guide/en/elasticsearch/reference/master/jvm-options.html) because if you use the environment variable it overrides all other JVM options from the default `jvm.options`
|
||||
- I tried to set memory to 1536m by binding an options file and restarting the container, but it didn't seem to work
|
||||
- Nevertheless, after restarting I see 103,000 items in the Explorer...
|
||||
- But the indexes are still kinda messed up... the `openrxv-items` index is an alias of the wrong index!
|
||||
|
||||
```console
|
||||
"openrxv-items-final": {
|
||||
"aliases": {}
|
||||
},
|
||||
"openrxv-items-temp": {
|
||||
"aliases": {
|
||||
"openrxv-items": {}
|
||||
}
|
||||
},
|
||||
```
|
||||
|
||||
## 2021-03-23
|
||||
|
||||
- For reference you can also get the Elasticsearch JVM stats from the API:
|
||||
|
||||
```console
|
||||
$ curl -s 'http://localhost:9200/_nodes/jvm?human' | python -m json.tool
|
||||
```
|
||||
|
||||
- I re-deployed AReS with 1.5GB of heap using the `ES_JAVA_OPTS` environment variable
|
||||
- It turns out that this *is* the recommended way to set the heap: https://www.elastic.co/guide/en/elasticsearch/reference/7.6/jvm-options.html
|
||||
- Then I fixed the aliases to make sure `openrxv-items` was an alias of `openrxv-items-final`, similar to how I did a few weeks ago
|
||||
- I re-created the temp index:
|
||||
|
||||
```console
|
||||
$ curl -XPUT 'http://localhost:9200/openrxv-items-temp'
|
||||
```
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user