cgspace-notes/content/2016-02.md
Alan Orth 6103f22edb
Update notes for 2016-02-06
Signed-off-by: Alan Orth <alan.orth@gmail.com>
2016-02-06 21:01:30 +02:00

1.7 KiB

+++ date = "2016-02-05T13:18:00+03:00" author = "Alan Orth" title = "February, 2016" tags = ["notes"] image = "../images/bg.jpg"

+++

2016-02-05

  • Looking at some DAGRIS data for Abenet Yabowork
  • Lots of issues with spaces, newlines, etc causing the import to fail
  • I noticed we have a very interesting list of countries on CGSpace:

CGSpace country list

  • Not only are there 49,000 countries, we have some blanks (25)...
  • Also, lots of things like "COTE D`LVOIRE" and "COTE D IVOIRE"

2016-02-06

  • Found a way to get items with null/empty metadata values from SQL
  • First, find the metadata_field_id for the field you want from the metadatafieldregistry table:
dspacetest=# select * from metadatafieldregistry;
  • In this case our country field is 78
  • Now find all resources with type 2 (item) that have null/empty values for that field:
dspacetest=# select resource_id from metadatavalue where resource_type_id=2 and metadata_field_id=78 and (text_value='' OR text_value IS NULL);
  • Then you can find the handle that owns it from its resource_id:
dspacetest=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id = '22678';
  • It's 25 items so editing in the web UI is annoying, let's try SQL!
dspacetest=# delete from metadatavalue where metadata_field_id=78 and text_value='';
DELETE 25
  • After that perhaps a regular dspace index-discovery (no -b) should suffice...
  • Hmm, I indexed, cleared the Cocoon cache, and restarted Tomcat but the 25 "|||" countries are still there
  • Maybe I need to do a full re-index...
  • Yep! The full re-index seems to work.