Update notes for 2019-04-24

This commit is contained in:
2019-04-24 18:49:55 +03:00
parent ae9d8cfef5
commit 03ac5b9b07
3 changed files with 150 additions and 8 deletions

View File

@ -880,5 +880,72 @@ $ csvcut -c id,dc.identifier.uri,'dc.identifier.uri[]' ~/Downloads/2019-04-24-II
- Carlos Tejo from the Land Portal had been emailing me this week to ask about the old REST API that Tsega was building in 2017
- I told him we never finished it, and that he should try to use the `/items/find-by-metadata-field` endpoint, with the caveat that you need to match the language attribute exactly (ie "en", "en_US", null, etc)
- I asked him how many terms they are interested in, as we could probably make it easier by normalizing the language attributes of these fields (it would help us anyways)
- He says he's getting HTTP 401 errors when trying to search for CPWF subject terms, which I can reproduce:
```
$ curl -f -H "accept: application/json" -H "Content-Type: application/json" -X POST "https://dspacetest.cgiar.org/rest/items/find-by-metadata-field" -d '{"key":"cg.subject.cpwf", "value":"WATER MANAGEMENT","language": "en_US"}'
curl: (22) The requested URL returned error: 401
```
- Note that curl only shows the HTTP 401 error if you use `-f` (fail), and only then if you *don't* include `-s`
- I see there are about 1,000 items using CPWF subject "WATER MANAGEMENT" in the database, so there should definitely be results
- The breakdown of `text_lang` fields used in those items is 942:
```
dspace=# SELECT COUNT(text_value) FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=208 AND text_value='WATER MANAGEMENT' AND text_lang='en_US';
count
-------
376
(1 row)
dspace=# SELECT COUNT(text_value) FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=208 AND text_value='WATER MANAGEMENT' AND text_lang='';
count
-------
149
(1 row)
dspace=# SELECT COUNT(text_value) FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=208 AND text_value='WATER MANAGEMENT' AND text_lang IS NULL;
count
-------
417
(1 row)
```
- I see that the HTTP 401 issue seems to be a bug due to an item that the user doesn't have permission to access... from the DSpace log:
```
2019-04-24 08:11:51,129 INFO org.dspace.rest.ItemsResource @ Looking for item with metadata(key=cg.subject.cpwf,value=WATER MANAGEMENT, language=en_US).
2019-04-24 08:11:51,231 INFO org.dspace.usage.LoggerUsageEventListener @ anonymous::view_item:handle=10568/72448
2019-04-24 08:11:51,238 INFO org.dspace.usage.LoggerUsageEventListener @ anonymous::view_item:handle=10568/72491
2019-04-24 08:11:51,243 INFO org.dspace.usage.LoggerUsageEventListener @ anonymous::view_item:handle=10568/75703
2019-04-24 08:11:51,252 ERROR org.dspace.rest.ItemsResource @ User(anonymous) has not permission to read item!
```
- Nevertheless, if I request using the `null` language I get 1020 results, plus 179 for a blank language attribute:
```
$ curl -s -H "Content-Type: application/json" -X POST "https://dspacetest.cgiar.org/rest/items/find-by-metadata-field" -d '{"key":"cg.subject.cpwf", "value":"WATER MANAGEMENT","language": null}' | jq length
1020
$ curl -s -H "Content-Type: application/json" -X POST "https://dspacetest.cgiar.org/rest/items/find-by-metadata-field" -d '{"key":"cg.subject.cpwf", "value":"WATER MANAGEMENT","language": ""}' | jq length
179
```
- This is weird because I see 9421156 items with "WATER MANAGEMENT" (depending on wildcard matching for errors in subject spelling):
```
dspace=# SELECT COUNT(text_value) FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=208 AND text_value='WATER MANAGEMENT';
count
-------
942
(1 row)
dspace=# SELECT COUNT(text_value) FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=208 AND text_value LIKE '%WATER MANAGEMENT%';
count
-------
1156
(1 row)
```
- I sent a message to the dspace-tech mailing list to ask for help
<!-- vim: set sw=2 ts=2: -->