From 03ac5b9b0782d7cfcbc601827c6421225814b39d Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Wed, 24 Apr 2019 18:49:55 +0300 Subject: [PATCH] Update notes for 2019-04-24 --- content/posts/2019-04.md | 67 +++++++++++++++++++++++++++++++++ docs/2019-04/index.html | 81 ++++++++++++++++++++++++++++++++++++++-- docs/sitemap.xml | 10 ++--- 3 files changed, 150 insertions(+), 8 deletions(-) diff --git a/content/posts/2019-04.md b/content/posts/2019-04.md index 3ca3d505b..59783a852 100644 --- a/content/posts/2019-04.md +++ b/content/posts/2019-04.md @@ -880,5 +880,72 @@ $ csvcut -c id,dc.identifier.uri,'dc.identifier.uri[]' ~/Downloads/2019-04-24-II - Carlos Tejo from the Land Portal had been emailing me this week to ask about the old REST API that Tsega was building in 2017 - I told him we never finished it, and that he should try to use the `/items/find-by-metadata-field` endpoint, with the caveat that you need to match the language attribute exactly (ie "en", "en_US", null, etc) - I asked him how many terms they are interested in, as we could probably make it easier by normalizing the language attributes of these fields (it would help us anyways) + - He says he's getting HTTP 401 errors when trying to search for CPWF subject terms, which I can reproduce: + +``` +$ curl -f -H "accept: application/json" -H "Content-Type: application/json" -X POST "https://dspacetest.cgiar.org/rest/items/find-by-metadata-field" -d '{"key":"cg.subject.cpwf", "value":"WATER MANAGEMENT","language": "en_US"}' +curl: (22) The requested URL returned error: 401 +``` + +- Note that curl only shows the HTTP 401 error if you use `-f` (fail), and only then if you *don't* include `-s` + - I see there are about 1,000 items using CPWF subject "WATER MANAGEMENT" in the database, so there should definitely be results + - The breakdown of `text_lang` fields used in those items is 942: + +``` +dspace=# SELECT COUNT(text_value) FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=208 AND text_value='WATER MANAGEMENT' AND text_lang='en_US'; + count +------- + 376 +(1 row) + +dspace=# SELECT COUNT(text_value) FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=208 AND text_value='WATER MANAGEMENT' AND text_lang=''; + count +------- + 149 +(1 row) + +dspace=# SELECT COUNT(text_value) FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=208 AND text_value='WATER MANAGEMENT' AND text_lang IS NULL; + count +------- + 417 +(1 row) +``` + +- I see that the HTTP 401 issue seems to be a bug due to an item that the user doesn't have permission to access... from the DSpace log: + +``` +2019-04-24 08:11:51,129 INFO org.dspace.rest.ItemsResource @ Looking for item with metadata(key=cg.subject.cpwf,value=WATER MANAGEMENT, language=en_US). +2019-04-24 08:11:51,231 INFO org.dspace.usage.LoggerUsageEventListener @ anonymous::view_item:handle=10568/72448 +2019-04-24 08:11:51,238 INFO org.dspace.usage.LoggerUsageEventListener @ anonymous::view_item:handle=10568/72491 +2019-04-24 08:11:51,243 INFO org.dspace.usage.LoggerUsageEventListener @ anonymous::view_item:handle=10568/75703 +2019-04-24 08:11:51,252 ERROR org.dspace.rest.ItemsResource @ User(anonymous) has not permission to read item! +``` + +- Nevertheless, if I request using the `null` language I get 1020 results, plus 179 for a blank language attribute: + +``` +$ curl -s -H "Content-Type: application/json" -X POST "https://dspacetest.cgiar.org/rest/items/find-by-metadata-field" -d '{"key":"cg.subject.cpwf", "value":"WATER MANAGEMENT","language": null}' | jq length +1020 +$ curl -s -H "Content-Type: application/json" -X POST "https://dspacetest.cgiar.org/rest/items/find-by-metadata-field" -d '{"key":"cg.subject.cpwf", "value":"WATER MANAGEMENT","language": ""}' | jq length +179 +``` + +- This is weird because I see 942–1156 items with "WATER MANAGEMENT" (depending on wildcard matching for errors in subject spelling): + +``` +dspace=# SELECT COUNT(text_value) FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=208 AND text_value='WATER MANAGEMENT'; + count +------- + 942 +(1 row) + +dspace=# SELECT COUNT(text_value) FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=208 AND text_value LIKE '%WATER MANAGEMENT%'; + count +------- + 1156 +(1 row) +``` + +- I sent a message to the dspace-tech mailing list to ask for help diff --git a/docs/2019-04/index.html b/docs/2019-04/index.html index 35cf32962..26565c772 100644 --- a/docs/2019-04/index.html +++ b/docs/2019-04/index.html @@ -38,7 +38,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace - + @@ -81,9 +81,9 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace "@type": "BlogPosting", "headline": "April, 2019", "url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-04\/", - "wordCount": "5667", + "wordCount": "6018", "datePublished": "2019-04-01T09:00:43\x2b03:00", - "dateModified": "2019-04-24T16:50:24\x2b03:00", + "dateModified": "2019-04-24T17:15:13\x2b03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -1250,9 +1250,84 @@ dspace.log.2019-04-20:1515 +
$ curl -f -H "accept: application/json" -H "Content-Type: application/json" -X POST "https://dspacetest.cgiar.org/rest/items/find-by-metadata-field" -d '{"key":"cg.subject.cpwf", "value":"WATER MANAGEMENT","language": "en_US"}'
+curl: (22) The requested URL returned error: 401
+
+ + + +
dspace=# SELECT COUNT(text_value) FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=208 AND text_value='WATER MANAGEMENT' AND text_lang='en_US';
+ count 
+-------
+   376
+(1 row)
+
+dspace=# SELECT COUNT(text_value) FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=208 AND text_value='WATER MANAGEMENT' AND text_lang='';
+ count 
+-------
+   149
+(1 row)
+
+dspace=# SELECT COUNT(text_value) FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=208 AND text_value='WATER MANAGEMENT' AND text_lang IS NULL;
+ count 
+-------
+   417
+(1 row)
+
+ + + +
2019-04-24 08:11:51,129 INFO  org.dspace.rest.ItemsResource @ Looking for item with metadata(key=cg.subject.cpwf,value=WATER MANAGEMENT, language=en_US).
+2019-04-24 08:11:51,231 INFO  org.dspace.usage.LoggerUsageEventListener @ anonymous::view_item:handle=10568/72448
+2019-04-24 08:11:51,238 INFO  org.dspace.usage.LoggerUsageEventListener @ anonymous::view_item:handle=10568/72491
+2019-04-24 08:11:51,243 INFO  org.dspace.usage.LoggerUsageEventListener @ anonymous::view_item:handle=10568/75703
+2019-04-24 08:11:51,252 ERROR org.dspace.rest.ItemsResource @ User(anonymous) has not permission to read item!
+
+ + + +
$ curl -s -H "Content-Type: application/json" -X POST "https://dspacetest.cgiar.org/rest/items/find-by-metadata-field" -d '{"key":"cg.subject.cpwf", "value":"WATER MANAGEMENT","language": null}' | jq length
+1020
+$ curl -s -H "Content-Type: application/json" -X POST "https://dspacetest.cgiar.org/rest/items/find-by-metadata-field" -d '{"key":"cg.subject.cpwf", "value":"WATER MANAGEMENT","language": ""}' | jq length
+179
+
+ + + +
dspace=# SELECT COUNT(text_value) FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=208 AND text_value='WATER MANAGEMENT';
+ count 
+-------
+   942
+(1 row)
+
+dspace=# SELECT COUNT(text_value) FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=208 AND text_value LIKE '%WATER MANAGEMENT%';
+ count 
+-------
+  1156
+(1 row)
+
+ + + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 5b2cf316c..3adc8d085 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,30 +4,30 @@ https://alanorth.github.io/cgspace-notes/2019-04/ - 2019-04-24T16:50:24+03:00 + 2019-04-24T17:15:13+03:00 https://alanorth.github.io/cgspace-notes/ - 2019-04-24T16:50:24+03:00 + 2019-04-24T17:15:13+03:00 0 https://alanorth.github.io/cgspace-notes/tags/notes/ - 2019-04-24T16:50:24+03:00 + 2019-04-24T17:15:13+03:00 0 https://alanorth.github.io/cgspace-notes/posts/ - 2019-04-24T16:50:24+03:00 + 2019-04-24T17:15:13+03:00 0 https://alanorth.github.io/cgspace-notes/tags/ - 2019-04-24T16:50:24+03:00 + 2019-04-24T17:15:13+03:00 0