mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2021-02-23
This commit is contained in:
@ -528,4 +528,129 @@ $ curl -XDELETE 'http://localhost:9200/openrxv-items-temp'
|
||||
# start indexing in AReS
|
||||
```
|
||||
|
||||
## 2021-02-22
|
||||
|
||||
- Start looking at splitting the series name and number in `dcterms.isPartOf` now that we have migrated to CG Core v2
|
||||
- The numbers will go to `cg.number`
|
||||
- I notice there are about 100 series without a number, but they still have a semicolon, for example `Esporo 72;`
|
||||
- I think I will replace those like this:
|
||||
|
||||
```console
|
||||
localhost/dspace63= > UPDATE metadatavalue SET text_value = REGEXP_REPLACE(text_value, '^(.+?);$','\1', 'g') WHERE metadata_field_id=166 AND dspace_object_id IN (SELECT uuid FROM item) AND text_value ~ ';$';
|
||||
UPDATE 104
|
||||
```
|
||||
|
||||
- As for splitting the other values, I think I can export the `dspace_object_id` and `text_value` and then upload it as a CSV rather than writing a Python script to create the new metadata values
|
||||
|
||||
## 2021-02-22
|
||||
|
||||
- Check the results of the AReS harvesting from last night:
|
||||
|
||||
```console
|
||||
$ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*&pretty'
|
||||
{
|
||||
"count" : 101380,
|
||||
"_shards" : {
|
||||
"total" : 1,
|
||||
"successful" : 1,
|
||||
"skipped" : 0,
|
||||
"failed" : 0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- Set the current items index to read only and make a backup:
|
||||
|
||||
```console
|
||||
$ curl -X PUT "localhost:9200/openrxv-items/_settings" -H 'Content-Type: application/json' -d' {"settings": {"index.blocks.write":true}}'
|
||||
$ curl -s -X POST http://localhost:9200/openrxv-items/_clone/openrxv-items-2021-02-22
|
||||
```
|
||||
|
||||
- Delete the current items index and clone the temp one to it:
|
||||
|
||||
```console
|
||||
$ curl -XDELETE 'http://localhost:9200/openrxv-items'
|
||||
$ curl -X PUT "localhost:9200/openrxv-items-temp/_settings" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": true}}'
|
||||
$ curl -s -X POST http://localhost:9200/openrxv-items-temp/_clone/openrxv-items
|
||||
```
|
||||
|
||||
- Then delete the temp and backup:
|
||||
|
||||
```console
|
||||
$ curl -XDELETE 'http://localhost:9200/openrxv-items-temp'
|
||||
{"acknowledged":true}%
|
||||
$ curl -XDELETE 'http://localhost:9200/openrxv-items-2021-02-22'
|
||||
```
|
||||
|
||||
## 2021-02-23
|
||||
|
||||
- CodeObia sent a [pull request for clickable countries on AReS](https://github.com/ilri/OpenRXV/pull/75)
|
||||
- I deployed it and it seems to work, so I asked Abenet and Peter to test it so we can get feedback
|
||||
- Remove semicolons from series names without numbers:
|
||||
|
||||
```console
|
||||
dspace=# BEGIN;
|
||||
dspace=# UPDATE metadatavalue SET text_value = REGEXP_REPLACE(text_value, '^(.+?);$','\1', 'g') WHERE metadata_field_id=166 AND dspace_object_id IN (SELECT uuid FROM item) AND text_value ~ ';$';
|
||||
UPDATE 104
|
||||
dspace=# COMMIT;
|
||||
```
|
||||
|
||||
- Set all `text_lang` values on CGSpace to `en_US` to make the series replacements easier (this didn't work, read below):
|
||||
|
||||
```console
|
||||
dspace=# BEGIN;
|
||||
dspace=# UPDATE metadatavalue SET text_lang='en_US' WHERE text_lang !='en_US' AND dspace_object_id IN (SELECT uuid FROM item);
|
||||
UPDATE 911
|
||||
cgspace=# COMMIT;
|
||||
```
|
||||
|
||||
- Then export all series with their IDs to CSV:
|
||||
|
||||
```console
|
||||
dspace=# \COPY (SELECT dspace_object_id, text_value as "dcterms.isPartOf[en_US]" FROM metadatavalue WHERE metadata_field_id=166 AND dspace_object_id IN (SELECT uuid FROM item)) TO /tmp/2021-02-23-series.csv WITH CSV HEADER;
|
||||
```
|
||||
|
||||
- In OpenRefine I trimmed and consolidated whitespace, then made some quick cleanups to normalize the fields based on a sanity check
|
||||
- For example many Spore items are like "Spore, Spore 23"
|
||||
- Also, "Agritrade, August 2002"
|
||||
- Then I copied the column to a new one called `cg.number[en_US]` and split the values for each on the semicolon using `value.split(';')[0]` and `value.split(';')[1]`
|
||||
- I tried to upload some of the series data to DSpace Test but I'm having an issue where some fields change that shouldn't
|
||||
- It seems not all fields get updated when I set the text_lang globally, but if I updated it manually like this it works:
|
||||
|
||||
```console
|
||||
dspace=# UPDATE metadatavalue SET text_lang='en_US' WHERE metadata_value_id=5355845;
|
||||
UPDATE 1
|
||||
```
|
||||
|
||||
- This also seems to work, using the id for just that one item:
|
||||
|
||||
```console
|
||||
dspace=# UPDATE metadatavalue SET text_lang='en_US' WHERE dspace_object_id='9840d19b-a6ae-4352-a087-6d74d2629322';
|
||||
UPDATE 37
|
||||
```
|
||||
|
||||
- This seems to work better for some reason:
|
||||
|
||||
```console
|
||||
dspacetest=# UPDATE metadatavalue SET text_lang='en_US' WHERE metadata_field_id=166 AND dspace_object_id IN (SELECT uuid FROM item);
|
||||
UPDATE 18659
|
||||
```
|
||||
|
||||
- I split the CSV file in batches of 5,000 using xsv, then imported them one by one in CGSpace:
|
||||
|
||||
```console
|
||||
$ dspace metadata-import -f /tmp/0.csv
|
||||
```
|
||||
|
||||
- It took FOREVER to import each file... like several hours. MY GOD DSpace 6 is slow.
|
||||
- Help Dominique Perera debug some issues with the WordPress DSpace importer plugin from Macaroni Bros
|
||||
- She is not seeing the community list for CGSpace, and I see weird requests like this in the logs:
|
||||
|
||||
```console
|
||||
104.198.97.97 - - [23/Feb/2021:11:41:17 +0100] "GET /rest/communities?limit=1000 HTTP/1.1" 200 188779 "https://cgspace.cgiar.org/rest /communities?limit=1000" "RTB website BOT"
|
||||
104.198.97.97 - - [23/Feb/2021:11:41:18 +0100] "GET /rest/communities//communities HTTP/1.1" 404 714 "https://cgspace.cgiar.org/rest/communities//communities" "RTB website BOT"
|
||||
```
|
||||
|
||||
- The first request is OK, but the second one is malformed for sure
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user