| Author |
Topic |
|
nmg196
Yak Posting Veteran
70 Posts |
Posted - 2003-01-15 : 13:00:18
|
| Hi,I have a problem... I have a website which uses Full Text Search and wraps each word in a "FORMS OF(INFLECTIONAL, x)" statement so it handles plurals properly.The problem is, if *any* of the words is a "noise word" the search now fails with the error message:"Execution of a full-text operation failed. A clause of the query contained only ignored words"eg: a search for "buy to let" becomes:SELECT * FROM CategoryTreeWHERE CONTAINS(CategoryTree.Title, 'FORMSOF(INFLECTIONAL, buy) AND FORMSOF(INFLECTIONAL, to) AND FORMSOF(INFLECTIONAL, let)')This fails...Note that only the word "to" was a noise word - "buy" and "let" were not - so this query shouldn't have failed.How can I allow users to search a field using either singular or plural terms, without this problem occuring? eg, if user searches for "mortages" it must match "mortgage". But if they search for "DOS" it should not simply strip the "s" and try and match "do" in any string.Any ideas how I can fix this? It's really starting to annoy me...Nick... |
|
|
AjarnMark
SQL Slashing Gunting Master
3246 Posts |
Posted - 2003-01-15 : 20:17:11
|
| Maybe you should analyze the search term and if it's only one word, use the FORMS OF syntax, but if it's more than one word, use the CONTAINS...NEAR syntaxFrom BOL:WHERE CONTAINS (notes, ' hardware ~ softward ~ computer ')------------------------------------------------------The more you know, the more you know you don't know. |
 |
|
|
jasper_smith
SQL Server MVP & SQLTeam MVY
846 Posts |
Posted - 2003-01-15 : 20:39:34
|
Sorry about the formatting of this post !! You can either strip these out on the client or the server. The last search system I did , I did this on the server. Basically a table of noise words, split the input into words in a table variable, join to the noise words table to delete the noise words and FTS if any non noise words are left. I used a UDF to do this parsing and it was pretty quick. I haven't got the exact code but somerthing like this should do the trick. It needs a table called noise and fn_Split[url]http://msdn.microsoft.com/library/en-us/dnsqlmag01/html/treatyourselfListing_01.txt[/url]CREATE TABLE [dbo].[noise] ( [word] [varchar] (30) NOT NULL , PRIMARY KEY CLUSTERED ( [word] ))GO--load it with your relavent noise file e.g. noise.eng--e.g. BULK INSERT noise FROM 'c:\windows\system32\noise.eng'--you need to remove the last line with all the letters on it--not from the live file !!!create function fn_noise(@str varchar(200))returns varchar(400)asbegindeclare @txt varchar(400)declare @words table(word varchar(30))declare @cntin int ; set @cntin = 0declare @cntout int ; set @cntout = 0--Split search text into words - space separatorinsert @words(word)select value from dbo.fn_Split(@str,' ')select @cntin=@@ROWCOUNT--remove noise words by joining to noise tabledelete from @words from @words t join noise n (nolock)on t.word=n.wordselect @cntout=@@ROWCOUNTIf (@cntin-@cntout)>0begin--reassemble stringSELECT @txt = COALESCE(@txt+' FORMSOF(INFLECTIONAL, ','FORMSOF(INFLECTIONAL, ')+word+') AND' FROM @words--SELECT @txt = '"'+LEFT(@txt,(LEN(@txt)-4))+'"'SELECT @txt =LEFT(@txt,(LEN(@txt)-4))endelseSELECT @txt =''return @txtendgo--then use it something likedeclare @input_string varchar(200)declare @parsed_string varchar(400)select @input_string = 'buy to let'select @parsed_string = dbo.fn_noise(@input_string)IF @parsed_string<>''SELECT * FROM CategoryTree WHERE CONTAINS(Title,@parsed_string) An example of a client side parser (this one by Hilary Cotter) would be <HTML><HEAD><TITLE>Bedwetters Anonymous</TITLE><STYLE TYPE="text/css">BODY {font-family:Tahoma,Arial,sans-serif; font-size:10pt}.heading {font-family:Tahoma,Arial,sans-serif; font-size:14pt; font-weight:bold}.cite {font-family:Tahoma,Arial,sans-serif; font-size:8pt}.document {font-size:10pt; font-weight:bold; background-color:lightgrey; width:100%}</STYLE><SCRIPT LANGUAGE="JavaScript" TYPE="text/javascript"><!--var letterexp = /[a-z09]/i;function hasLetter(str) {return letterexp.test(str)};function checkform(form){ if (!hasLetter(form.target.value)) { alert("Invalid search string"); form.target.focus(); return false; } else { test=remove_noise(form.target.value); if(!test.length) { alert("invalid string - too many noise words"); form.target.focus(); return false; } else { form.target.value=test; return true; } }}function remove_noise(phrase){//dimensionalizing variablesnoisearray=new Array();array_phrase=new Array();array_pruned=new Array();alert(phrase);var noisearray=new Array ("0","1","2","3","4","5","6","7","8","9","$","a","about","after","all","also","an","and","another","any","are","as","at","b","be","because","been","before","being","between","both","but","by","c","came","can","come","could","d","did","do","does","e","each","else","f","for","from","g","get","got","h","had","has","have","he","her","here","him","himself","his","how","i","if","in","into","is","it","its","j","just","k","l","like","m","make","many","me","might","more","most","much","must","my","n","never","now","o","of","on","only","or","other","our","out","over","p","q","r","re","s","said","same","see","should","since","so","some","still","such","t","take","Test","than","that","the","their","them","then","there","these","they","this","those","through","to","too","u","under","up","use","v","very","w","want","was","way","we","well","were","what","when","where","which","while","who","will","with","would","x","y","you","your","z");var array_pruned_length=0;var counter=0;var counter1=0;var counter2=0;var phrase1;var length=0;var noise_array_counter=0;var search_phrase="";var phrase1;phrase1=phrase;//spliting the search phrase into an arrayarray_pruned=phrase.split(" ");array_pruned_length=array_pruned.length;for (counter=0;counter <array_pruned_length;counter++){ counter2=0; counter1=0; for (counter1=0;counter1<noisearray.length;counter1++) { if(array_pruned[counter]==noisearray[counter1]) { search_phrase= " " + array_pruned[counter] + " "; search_phrase_first_position= array_pruned[counter] + " "; counter2 = phrase.indexOf(search_phrase,0); counter3 = phrase.indexOf(search_phrase_first_position,0); if(counter3<counter2) counter2=counter3; while (counter2>= 0) { temp_string=array_pruned[counter]; length=temp_string.length; phrase = phrase.substr(0,counter2) + phrase.substr(counter2+length+1); counter2=phrase.indexOf(search_phrase,0); } } }}//-- comment the below in production from heredocument.write("parsed<BR><P>");document.write(phrase)document.write("<BR><P>");document.write("original<BR><P>");document.write(phrase1)alert("click on OK to submit search");//-- to herereturn phrase;}//--></SCRIPT></HEAD><BODY BGCOLOR="LIME"><form name="searchForm" action="search.asp" method=post><b>Search:</b><INPUT TYPE="text" NAME="target" SIZE="45" MAXLENGTH="100" VALUE=""><INPUT TYPE="submit" NAME="goButton" VALUE=" Search " onclick = "return checkform(this.form)"><input type="hidden" name="navEvent" value="Top"></form><br></html> HTHJasper SmithEdited by - jasper_smith on 01/15/2003 20:44:00 |
 |
|
|
nmg196
Yak Posting Veteran
70 Posts |
Posted - 2003-01-17 : 11:48:19
|
| Thanks Jasper - that's really useful.Is there a way to query the location of the noise works file from SQL Server, or does it have to be hard coded?Which noise file does it use? If i do this:BULK INSERT #noise FROM 'noise.eng'..it seems to use one from C:\winnt\system32\noise.eng (129 lines) but there's also another one on the drive where SQL Server was installed to (119 lines).Nick... |
 |
|
|
jasper_smith
SQL Server MVP & SQLTeam MVY
846 Posts |
Posted - 2003-01-17 : 15:08:34
|
You will generally find the noise files in two locations, in system32 and also in C:\Program Files\Common Files\System\MSSearch\Data\Config . If you modify the noise words file (a common technique to improve the efficency of FTS) then both need to be modified. If neither has been modifiedthen it doesn't matter which one you use. The file used depends on the language you specify for the word breaker when setting up FTS. If none was specified then it defaults to US I believe (noise.enu) if British English is used thenit's noise.eng. You can see what the FT catalog was set up to use by running the following /* to find the default fulltext language */create table #config(name sysname, minimum int, maximum int, config_value int, run_value int)insert #configexec master.dbo.sp_configure 'default full-text language'select case run_value when 0 then 'Neutral' when 2052 then 'Chinese_Simplified' when 1028 then 'Chinese_Traditional' when 1043 then 'Dutch' when 2057 then 'English_UK' when 1033 then 'English_US' when 1036 then 'French' when 1031 then 'German' when 1040 then 'Italian' when 1041 then 'Japanese' when 1042 then 'Korean' when 3082 then 'Spanish_Modern' when 1053 then 'Swedish_Default' else 'Unknown' end as 'DEFAULT_FTS_LANGUAGE'from #configdrop table #config/* to find the language used as the word breaker for a column you can use a modified version of sp_help_fulltext_columns */use mastergocreate procedure sp_help_fulltext_columns_ex @table_name nvarchar(517) = NULL, -- table name @column_name sysname = NULL -- column nameas -- FULLTEXT MUST BE ACTIVE IN DATABASE -- if DatabaseProperty(db_name(), 'IsFulltextEnabled') = 0 begin raiserror(15601,-1,-1) return 1 end if @table_name is not null begin -- VALIDATE TABLE NAME -- -- (1) Must exist in current database declare @objid int select @objid = object_id(@table_name, 'local') if @objid is null begin declare @curdbname sysname select @curdbname = db_name() raiserror(15009,-1,-1 ,@table_name, @curdbname) return 1 end -- (2) Must be a user table (and not a temp table) if ObjectProperty(@objid, 'IsUserTable') = 0 OR substring(parsename(@table_name,1),1,1) = '#' begin raiserror(15218,-1,-1 ,@table_name) return 1 end -- VALIDATE COLUMN NAME (CANNOT BE COMPUTED) -- if @column_name is not null begin declare @typename sysname select @typename = type_name(ColumnProperty(@objid, @column_name, 'SystemType')) if @typename is null OR ColumnProperty(@objid, @column_name, 'IsComputed') = 1 begin raiserror(15104,-1,-1,@table_name,@column_name) return 1 end end end select distinct susr.name as TABLE_OWNER, sobj.id as TABLE_ID, sobj.name as TABLE_NAME, scol.name as FULLTEXT_COLUMN_NAME, scol.colid as FULLTEXT_COLID, b.FT_BLOBTPNAME as FULLTEXT_BLOBTP_COLNAME, a.FT_BLOBTPCOLID as FULLTEXT_BLOBTP_COLID, case scol.language when 0 then 'Neutral' when 2052 then 'Chinese_Simplified' when 1028 then 'Chinese_Traditional' when 1043 then 'Dutch' when 2057 then 'English_UK' when 1033 then 'English_US' when 1036 then 'French' when 1031 then 'German' when 1040 then 'Italian' when 1041 then 'Japanese' when 1042 then 'Korean' when 3082 then 'Spanish_Modern' when 1053 then 'Swedish_Default' else 'Unknown' end as FULLTEXT_LANGUAGE from sysobjects as sobj, sysusers as susr, syscolumns as scol left outer join ( select sdep.id as TABLE_ID, sdep.number as FULLTEXT_COLID, sdep.depnumber as FT_BLOBTPCOLID from sysdepends as sdep, syscolumns as scol where scol.colid = sdep.number and scol.id = sdep.id and sdep.deptype = 1 and ColumnProperty(scol.id, scol.name, 'IsFullTextIndexed') = 1 ) as a on (scol.colid = a.FULLTEXT_COLID and scol.id = a.TABLE_ID) left outer join ( select sdep.id as TABLE_ID, sdep.depnumber as FT_BLOBTPCOLID, scol.name as FT_BLOBTPNAME, sdep.number as FULLTEXT_COLID from syscolumns as scol, sysdepends as sdep where scol.colid = sdep.depnumber and ColumnProperty(sdep.id, scol.name, 'IsTypeForFullTextBlob') = 1 ) as b on (a.FULLTEXT_COLID = b.FULLTEXT_COLID and a.TABLE_ID = b.TABLE_ID) where ( @table_name is null or sobj.id = @objid ) and scol.id = sobj.id and sobj.uid = susr.uid and ( @column_name is null or scol.name = @column_name ) and ColumnProperty(sobj.id, scol.name, 'IsFullTextIndexed') = 1 order by TABLE_OWNER, TABLE_NAME, FULLTEXT_COLID -- SUCCESS -- return 0 -- sp_help_fulltext_columns_exGOgrant exec on sp_help_fulltext_columns_ex to publicGO HTHJasper SmithEdited by - jasper_smith on 01/17/2003 15:12:05 |
 |
|
|
|
|
|