I have a 32,000-record table of businesses that includes most of the usual fields (business name, address, phone, city, etc). My objective is to return the most relative information to the user as possible when querying the business name. However, I want to know how you GURU's go about returning relative results for a simple text search. I'm too novice to write really complicated procedures with a percentage of relevance so I crafted my own rules. Here are two of the 15 or so statements that I use in my procedure.----- 90% (BizName, @Querystring + space + ANYTHING)INSERT INTO #SR(<column names removed for brevity>, Score, Pattern)SELECT <column names removed for brevity>, '90' AS Score, '-->' + @QueryString + ' ' + '%' + '<--' AS PatternFROM DIRECTORYWHERE BizName LIKE @QueryString + ' ' + '%'ANDDIRECTORY.RowID NOT IN (SELECT BizRowID FROM #SR)
----- 60% (BizName, ANYTHING + space + @Querystring + space + ANYTHING)INSERT INTO #SR(<column names removed for brevity>, Score, Pattern)SELECT <column names removed for brevity>, '60' AS Score, '-->' + '%' + ' ' + @QueryString + ' ' + '%' + '<--' AS PatternFROM DIRECTORYWHERE BizName LIKE '%' + ' ' + @QueryString + ' ' + '%'ANDDIRECTORY.RowID NOT IN (SELECT BizRowID FROM #SR)
I have something like 15 different "INSERT INTO #temp_table SELECT ... " statements like this. When the statements match against a record in my business table ("DIRECTORY"), I get results where the patterns that matched look something like this:(matched pattern is between the arrows)Pattern BizName -------------------------------- -------------------------------- -->ed %<-- Ed Dale Properties-->ed[^a-zA-Z0-9][a-zA-Z0-9]%<-- Ed's Auto Body Inc-->% ed<-- Catholic Department of Ed-->% ed %<-- Buttner, Ed CPA-->ed%<-- Eddie's Catering & Social Hall-->% ed%<-- A G Edwards & Sons Inc-->%ed%<-- AAA Lederman Bail Bonds
At the end of my procedure I select the temp table, order by my predefined score, and spit it out the door. Where I see some trouble is 1)I'm only querying against a single column (two additional relevant columns in this table), 2)the patterns I have identified as "relevant" seem to me to be only the tip of the iceberg, 3)worried about performance if the procedure gets too "table scan happy".Those of you who have authored "wicked" search engines, could you give me a few pointers or nuggets of wisdom as I move forward?Thanks-Brian