--- title: "July, 2022" date: 2022-07-02T14:07:36+03:00 author: "Alan Orth" categories: ["Notes"] --- ## 2022-07-02 - I learned how to use the Levenshtein functions in PostgreSQL - The thing is that there is a limit of 255 characters for these functions in PostgreSQL so you need to truncate the strings before comparing - Also, the trgm functions I've used before are case insensitive, but Levenshtein is not, so you need to make sure to lower case both strings first - A working query checking for duplicates in the recent AfricaRice items is: ```console localhost/dspace= ☘ SELECT text_value FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=64 AND levenshtein_less_equal(LOWER('International Trade and Exotic Pests: The Risks for Biodiversity and African Economies'), LEFT(LOWER(text_value), 255), 3) <= 3; text_value ──────────────────────────────────────────────────────────────────────────────────────── International trade and exotic pests: the risks for biodiversity and African economies (1 row) Time: 399.751 ms ``` - There is a great [blog post discussing Soundex with Levenshtein](https://www.crunchydata.com/blog/fuzzy-name-matching-in-postgresql) and creating indexes to make them faster - I want to do some proper checks of accuracy and speed against my trigram method ## 2022-07-03 - Start a harvest on AReS