Duplicate Search Ranking Algorithm

Hello,

I'm using the endpoint ConstituentApi.DuplicateSearchResultCollection to search for potential matches based on constituent info and I'm keen to understand how the scores returned in the rank field are calculated.

I've had one case where a match scored 0.99 because it matched first name, last name and gender. However, the search included email, phone and postal address, and these were not present in the result returned by the search. So I would not want these records to be matched. On the other hand, another record that matched on all of the above had a rank of 1.01.

Is it possible to either: amend the search itself so that it only returns matches that match all of the filters passed, or, customise the ranking algorithm to change how matches are weighted?

Thanks for your help,

Alex

Comments

  • @Tim McVicker @Sam Dickenson could either of you help with this? Thanks very much.

  • Tim McVicker
    Tim McVicker Blackbaud Employee
    Sixth Anniversary Kudos 1 Name Dropper Participant

    @Alex Feuchtwanger Hey!

    I assume you are talking about the “GetDuplicateSearchResults” endpoint located here? The goal of that endpoint is to expose the operation that RENXT uses to perform the duplicate matching function, so the ranking algorithm is what is used by RENXT itself. I would expect any duplicate matching you do to use that same ranking algorithm, so there's no way to customize it.

    The rank in the results is meant to convey the confidence level that constituent is a duplicate of the one you are searching for; by having a lower rank for the constituent that doesn't match the email and phone, I think that correctly conveys that message - RENXT is more confident that the constituent with the matching email & phone is a match.

    This is a “search” endpoint, not a “filter” endpoint (if that makes sense) - it isn't meant to only supply 100% correct matches for all of your filters; it is meant to provide some fuzzy matches to help the user identify duplicate constituents (which may each only have partial information). The more fields you supply, the more confident the duplicate search will be.

  • @Tim McVicker
    Hi Tim, thanks very much for your reply, that's very helpful.

    Yes that’s the endpoint I’m looking at. Is there documentation I can access about the ranking algorithm so I can better understand how it scores and what an appropriate threshold might be? Currently, I’ve not had a chance to run enough tests to get a sense of what ‘good’ and ‘poor’ scores look like.

    I will also explore whether a filter endpoint may actually meet our needs better. Would that be this one or is there another I should look at?