Update notes for 2016-11-10

This commit is contained in:
2016-11-10 14:49:09 +02:00
parent b7d9b1e86b
commit 9d06d39752
5 changed files with 473 additions and 0 deletions

View File

@ -101,3 +101,92 @@ dspace=# \copy (select distinct text_value, count(*) from metadatavalue where me
- CGSpace crashed so I quickly ran system updates, applied one or two of the waiting changes from the `5_x-prod` branch, and rebooted the server
- The error was `Timeout waiting for idle object` but I haven't looked into the Tomcat logs to see what happened
- Also, I ran the corrections for CRPs from earlier this week
## 2016-11-10
- Helping Megan Zandstra and CIAT with some questions about the REST API
- Playing with `find-by-metadata-field`, this works:
```
$ curl -s -H "accept: application/json" -H "Content-Type: application/json" -X POST "http://localhost:8080/rest/items/find-by-metadata-field" -d '{"key": "cg.subject.ilri","value": "SEEDS"}'
```
- But the results are deceiving because metadata fields can have text languages and your query must match exactly!
```
dspace=# select distinct text_value, text_lang from metadatavalue where resource_type_id=2 and metadata_field_id=203 and text_value='SEEDS';
text_value | text_lang
------------+-----------
SEEDS |
SEEDS |
SEEDS | en_US
(3 rows)
```
- So basically, the text language here could be null, blank, or en_US
- To query metadata with these properties, you can do:
```
$ curl -s -H "accept: application/json" -H "Content-Type: application/json" -X POST "http://localhost:8080/rest/items/find-by-metadata-field" -d '{"key": "cg.subject.ilri","value": "SEEDS"}' | jq length
55
$ curl -s -H "accept: application/json" -H "Content-Type: application/json" -X POST "http://localhost:8080/rest/items/find-by-metadata-field" -d '{"key": "cg.subject.ilri","value": "SEEDS", "language":""}' | jq length
34
$ curl -s -H "accept: application/json" -H "Content-Type: application/json" -X POST "http://localhost:8080/rest/items/find-by-metadata-field" -d '{"key": "cg.subject.ilri","value": "SEEDS", "language":"en_US"}' | jq length
```
- The results (55+34=89) don't seem to match those from the database:
```
dspace=# select count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id=203 and text_value='SEEDS' and text_lang is null;
count
-------
15
dspace=# select count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id=203 and text_value='SEEDS' and text_lang='';
count
-------
4
dspace=# select count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id=203 and text_value='SEEDS' and text_lang='en_US';
count
-------
66
```
- So, querying from the API I get 55 + 34 = 89 results, but the database actually only has 85...
- And the `find-by-metadata-field` endpoint doesn't seem to have a way to get all items with the field, or a wildcard value
- I'll ask a question on the dspace-tech mailing list
- And speaking of `text_lang`, this is interesting:
```
dspacetest=# select distinct text_lang from metadatavalue where resource_type_id=2;
text_lang
-----------
ethnob
en
spa
EN
es
frn
en_
en_US
EN_US
eng
en_U
fr
(14 rows)
```
- Generate a list of all these so I can fix them in batch:
```
dspace=# \copy (select distinct text_lang, count(*) from metadatavalue where resource_type_id=2 group by text_lang order by count desc) to /tmp/text-langs.csv with csv;
COPY 14
```
- Perhaps we need to fix them all in batch, or experiment with fixing only certain metadatavalues:
```
dspace=# update metadatavalue set text_lang='en_US' where resource_type_id=2 and metadata_field_id=203 and text_value='SEEDS';
UPDATE 85
```