mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2021-05-25
This commit is contained in:
@ -378,4 +378,105 @@ $ ./ilri/add-orcid-identifiers-csv.py -i /tmp/2021-05-18-add-orcids.csv -db dspa
|
||||
- This included the IWMI changes, so I also migrated the `cg.subject.iwmi` metadata to `dcterms.subject` and deleted the subject term
|
||||
- Then I started a full Discovery reindex
|
||||
|
||||
## 2021-05-19
|
||||
|
||||
- I realized that I need to lower case the IWMI subjects that I just moved to AGROVOC because they were probably mostly uppercase
|
||||
- To my surprise I checked `dcterms.subject` has 47,000 metadata fields that are upper or mixed case!
|
||||
|
||||
```console
|
||||
dspace=# UPDATE metadatavalue SET text_value=LOWER(text_value) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=187 AND text_value ~ '[[:upper:]]';
|
||||
UPDATE 47405
|
||||
```
|
||||
|
||||
- That's interesting because we lowercased them all a few months ago, so these must all be new... wow
|
||||
- We have 405,000 total AGROVOC terms, with 20,600 of them being unique
|
||||
- I will have to start another Discovery re-indexing to pick up these new changes
|
||||
|
||||
## 2021-05-20
|
||||
|
||||
- Export the top 5,000 AGROVOC terms to validate them:
|
||||
|
||||
```console
|
||||
localhost/dspace63= > \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id = 187 GROUP BY text_value ORDER BY count DESC LIMIT 5000) to /tmp/2021-05-20-agrovoc.csv WITH CSV HEADER;
|
||||
COPY 5000
|
||||
$ csvcut -c 1 /tmp/2021-05-20-agrovoc.csv| sed 1d > /tmp/2021-05-20-agrovoc.txt
|
||||
$ ./ilri/agrovoc-lookup.py -i /tmp/2021-05-20-agrovoc.txt -o /tmp/2021-05-20-agrovoc-results.csv
|
||||
$ csvgrep -c "number of matches" -m 0 /tmp/2021-05-20-agrovoc-results.csv > /tmp/2021-05-20-agrovoc-rejected.csv
|
||||
```
|
||||
|
||||
- Meeting with Medha and Pythagoras about the FAIR Workflow tool
|
||||
- Discussed the need for such a tool, other tools being developed, etc
|
||||
- I stressed the important of controlled vocabularies
|
||||
- No real outcome, except to keep us posted and let us know if they need help testing on DSpace
|
||||
- Meeting with Hector Tobon to discuss issues with CLARISA
|
||||
- They pushed back a bit, saying they were more focused on the needs of the CG
|
||||
- They are not against the idea of aligning closer to ROR, but lack the man power
|
||||
- They pointed out that their countries come directly from the [ISO 3166 online browsing platform on the ISO website](https://www.iso.org/iso-3166-country-codes.html)
|
||||
- Indeed the text value for Russia is "Russian Federation (the)" there... I find that strange
|
||||
- I filed [an issue](https://salsa.debian.org/iso-codes-team/iso-codes/-/issues/33) on the iso-codes GitLab repository
|
||||
|
||||
## 2021-05-24
|
||||
|
||||
- Add ORCID identifiers for missing ILRI authors and tag 550 others based on a few authors I noticed that were missing them:
|
||||
|
||||
```console
|
||||
$ cat 2021-05-24-add-orcids.csv
|
||||
dc.contributor.author,cg.creator.identifier
|
||||
"Patel, Ekta","Ekta Patel: 0000-0001-9400-6988"
|
||||
"Dessie, Tadelle","Tadelle Dessie: 0000-0002-1630-0417"
|
||||
"Tadelle, D.","Tadelle Dessie: 0000-0002-1630-0417"
|
||||
"Dione, Michel M.","Michel Dione: 0000-0001-7812-5776"
|
||||
"Kiara, Henry K.","Henry Kiara: 0000-0001-9578-1636"
|
||||
"Naessens, Jan","Jan Naessens: 0000-0002-7075-9915"
|
||||
"Steinaa, Lucilla","Lucilla Steinaa: 0000-0003-3691-3971"
|
||||
"Wieland, Barbara","Barbara Wieland: 0000-0003-4020-9186"
|
||||
"Grace, Delia","Delia Grace: 0000-0002-0195-9489"
|
||||
"Rao, Idupulapati M.","Idupulapati M. Rao: 0000-0002-8381-9358"
|
||||
"Cardoso Arango, Juan Andrés","Juan Andrés Cardoso Arango: 0000-0002-0252-4655"
|
||||
$ ./ilri/add-orcid-identifiers-csv.py -i 2021-05-24-add-orcids.csv -db dspace -u dspace -p 'fuuu'
|
||||
```
|
||||
|
||||
- A few days ago I took a backup of the Elasticsearch indexes on AReS using elasticdump:
|
||||
|
||||
```console
|
||||
$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_data.json --type=data --limit=1000
|
||||
$ elasticdump --input=http://localhost:9200/openrxv-items --output=/home/aorth/openrxv-items_mapping.json --type=mapping
|
||||
```
|
||||
|
||||
- The indexes look OK so I started a harvesting on AReS
|
||||
|
||||
## 2021-05-25
|
||||
|
||||
- The AReS harvest got messed up somehow, as I see the number of items in the indexes are the same as before the harvesting:
|
||||
|
||||
```console
|
||||
$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
|
||||
yellow open openrxv-items-temp o3ijJLcyTtGMOPeWpAJiVA 1 1 104373 106455 491.5mb 491.5mb
|
||||
yellow open openrxv-items-final soEzAnp3TDClIGZbmVyEIw 1 1 953 0 2.3mb 2.3mb
|
||||
```
|
||||
|
||||
- Update all docker images on the AReS server (linode20):
|
||||
|
||||
```console
|
||||
$ docker images | grep -v ^REPO | sed 's/ \+/:/g' | cut -d: -f1,2 | xargs -L1 docker pull
|
||||
$ docker-compose -f docker/docker-compose.yml down
|
||||
$ docker-compose -f docker/docker-compose.yml build
|
||||
```
|
||||
|
||||
- Then run all system updates on the server and reboot it
|
||||
- Oh crap, I deleted everything on AReS and restored the backup and the total items are now 104317... so it was actually correct before!
|
||||
- For reference, this is how I re-created everything:
|
||||
|
||||
```console
|
||||
curl -XDELETE 'http://localhost:9200/openrxv-items-final'
|
||||
curl -XDELETE 'http://localhost:9200/openrxv-items-temp'
|
||||
curl -XPUT 'http://localhost:9200/openrxv-items-final'
|
||||
curl -XPUT 'http://localhost:9200/openrxv-items-temp'
|
||||
curl -s -X POST 'http://localhost:9200/_aliases' -H 'Content-Type: application/json' -d'{"actions" : [{"add" : { "index" : "openrxv-items-final", "alias" : "openrxv-items"}}]}'
|
||||
elasticdump --input=/home/aorth/openrxv-items_mapping.json --output=http://localhost:9200/openrxv-items-final --type=mapping
|
||||
elasticdump --input=/home/aorth/openrxv-items_data.json --output=http://localhost:9200/openrxv-items-final --type=data --limit=1000
|
||||
```
|
||||
|
||||
- I will just start a new harvest... sigh
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user