Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 Noise words interfere with normal search operation

Author  Topic 

nmg196
Yak Posting Veteran

70 Posts

Posted - 2003-01-15 : 13:00:18
Hi,

I have a problem... I have a website which uses Full Text Search and wraps each word in a "FORMS OF(INFLECTIONAL, x)" statement so it handles plurals properly.

The problem is, if *any* of the words is a "noise word" the search now fails with the error message:

"Execution of a full-text operation failed. A clause of the query contained only ignored words"

eg: a search for "buy to let" becomes:

SELECT * FROM CategoryTree
WHERE CONTAINS(CategoryTree.Title, 'FORMSOF(INFLECTIONAL, buy) AND FORMSOF(INFLECTIONAL, to) AND FORMSOF(INFLECTIONAL, let)')

This fails...

Note that only the word "to" was a noise word - "buy" and "let" were not - so this query shouldn't have failed.

How can I allow users to search a field using either singular or plural terms, without this problem occuring?

eg, if user searches for "mortages" it must match "mortgage". But if they search for "DOS" it should not simply strip the "s" and try and match "do" in any string.

Any ideas how I can fix this? It's really starting to annoy me...

Nick...

AjarnMark
SQL Slashing Gunting Master

3246 Posts

Posted - 2003-01-15 : 20:17:11
Maybe you should analyze the search term and if it's only one word, use the FORMS OF syntax, but if it's more than one word, use the CONTAINS...NEAR syntax

From BOL:
WHERE CONTAINS (notes, ' hardware ~ softward ~ computer ')



------------------------------------------------------
The more you know, the more you know you don't know.
Go to Top of Page

jasper_smith
SQL Server MVP & SQLTeam MVY

846 Posts

Posted - 2003-01-15 : 20:39:34
Sorry about the formatting of this post !!
You can either strip these out on the client or the server. The last search system I did , I did this on the server. Basically a
table of noise words, split the input into words in a table variable, join to the noise words table to delete the noise words
and FTS if any non noise words are left. I used a UDF to do this parsing and it was pretty quick. I haven't got the exact code
but somerthing like this should do the trick. It needs a table called noise and fn_Split
[url]http://msdn.microsoft.com/library/en-us/dnsqlmag01/html/treatyourselfListing_01.txt[/url]


CREATE TABLE [dbo].[noise] (
[word] [varchar] (30) NOT NULL ,
PRIMARY KEY CLUSTERED
(
[word]
)
)
GO

--load it with your relavent noise file e.g. noise.eng
--e.g. BULK INSERT noise FROM 'c:\windows\system32\noise.eng'
--you need to remove the last line with all the letters on it
--not from the live file !!!

create function fn_noise(@str varchar(200))
returns varchar(400)
as
begin

declare @txt varchar(400)
declare @words table(word varchar(30))
declare @cntin int ; set @cntin = 0
declare @cntout int ; set @cntout = 0

--Split search text into words - space separator
insert @words(word)
select value from dbo.fn_Split(@str,' ')
select @cntin=@@ROWCOUNT

--remove noise words by joining to noise table
delete from @words
from @words t join noise n (nolock)
on t.word=n.word
select @cntout=@@ROWCOUNT

If (@cntin-@cntout)>0
begin
--reassemble string
SELECT @txt = COALESCE(@txt+' FORMSOF(INFLECTIONAL, ','FORMSOF(INFLECTIONAL, ')+word+') AND' FROM @words
--SELECT @txt = '"'+LEFT(@txt,(LEN(@txt)-4))+'"'
SELECT @txt =LEFT(@txt,(LEN(@txt)-4))
end
else
SELECT @txt =''

return @txt
end
go


--then use it something like

declare @input_string varchar(200)
declare @parsed_string varchar(400)

select @input_string = 'buy to let'
select @parsed_string = dbo.fn_noise(@input_string)

IF @parsed_string<>''
SELECT * FROM CategoryTree
WHERE CONTAINS(Title,@parsed_string)



An example of a client side parser (this one by Hilary Cotter) would be
 
<HTML>
<HEAD>
<TITLE>Bedwetters Anonymous</TITLE>
<STYLE TYPE="text/css">
BODY {font-family:Tahoma,Arial,sans-serif; font-size:10pt}
.heading {font-family:Tahoma,Arial,sans-serif; font-size:14pt; font-weight:bold}
.cite {font-family:Tahoma,Arial,sans-serif; font-size:8pt}
.document {font-size:10pt; font-weight:bold; background-color:lightgrey; width:100%}
</STYLE>
<SCRIPT LANGUAGE="JavaScript" TYPE="text/javascript">
<!--

var letterexp = /[a-z09]/i;
function hasLetter(str) {return letterexp.test(str)};

function checkform(form)
{

if (!hasLetter(form.target.value))
{
alert("Invalid search string");
form.target.focus();
return false;
}
else
{
test=remove_noise(form.target.value);
if(!test.length)
{
alert("invalid string - too many noise words");
form.target.focus();
return false;
}
else
{
form.target.value=test;
return true;
}
}
}

function remove_noise(phrase)
{
//dimensionalizing variables
noisearray=new Array();
array_phrase=new Array();
array_pruned=new Array();
alert(phrase);

var noisearray=new Array ("0","1","2","3","4","5","6","7","8","9","$","a","about","after","all","also","an","and","another","any","are","as","at","b","be","because","been","before","being","between","both","but","by","c","came","can","come","could","d","did","do","does","e","each","else","f","for","from","g","get","got","h","had","has","have","he","her","here","him","himself","his","how","i","if","in","into","is","it","its","j","just","k","l","like","m","make","many","me","might","more","most","much","must","my","n","never","now","o","of","on","only","or","other","our","out","over","p","q","r","re","s","said","same","see","should","since","so","some","still","such","t","take","Test","than","that","the","their","them","then","there","these","they","this","those","through","to","too","u","under","up","use","v","very","w","want","was","way","we","well","were","what","when","where","which","while","who","will","with","would","x","y","you","your","z");
var array_pruned_length=0;
var counter=0;
var counter1=0;
var counter2=0;
var phrase1;
var length=0;
var noise_array_counter=0;
var search_phrase="";
var phrase1;
phrase1=phrase;

//spliting the search phrase into an array
array_pruned=phrase.split(" ");
array_pruned_length=array_pruned.length;

for (counter=0;counter <array_pruned_length;counter++)
{
counter2=0;
counter1=0;

for (counter1=0;counter1<noisearray.length;counter1++)
{

if(array_pruned[counter]==noisearray[counter1])
{
search_phrase= " " + array_pruned[counter] + " ";
search_phrase_first_position= array_pruned[counter] + " ";
counter2 = phrase.indexOf(search_phrase,0);
counter3 = phrase.indexOf(search_phrase_first_position,0);
if(counter3<counter2)
counter2=counter3;
while (counter2>= 0)
{
temp_string=array_pruned[counter];
length=temp_string.length;
phrase = phrase.substr(0,counter2) + phrase.substr(counter2+length+1);
counter2=phrase.indexOf(search_phrase,0);
}
}
}
}
//-- comment the below in production from here
document.write("parsed<BR><P>");
document.write(phrase)
document.write("<BR><P>");
document.write("original<BR><P>");
document.write(phrase1)
alert("click on OK to submit search");
//-- to here
return phrase;
}
//-->
</SCRIPT>
</HEAD>
<BODY BGCOLOR="LIME">
<form name="searchForm" action="search.asp" method=post>
<b>Search:</b>
<INPUT TYPE="text" NAME="target" SIZE="45" MAXLENGTH="100" VALUE="">
<INPUT TYPE="submit" NAME="goButton" VALUE=" Search " onclick = "return checkform(this.form)">
<input type="hidden" name="navEvent" value="Top">
</form>
<br>
</html>





HTH
Jasper Smith


Edited by - jasper_smith on 01/15/2003 20:44:00
Go to Top of Page

nmg196
Yak Posting Veteran

70 Posts

Posted - 2003-01-17 : 11:48:19
Thanks Jasper - that's really useful.

Is there a way to query the location of the noise works file from SQL Server, or does it have to be hard coded?

Which noise file does it use? If i do this:

BULK INSERT #noise FROM 'noise.eng'

..it seems to use one from C:\winnt\system32\noise.eng (129 lines) but there's also another one on the drive where SQL Server was installed to (119 lines).

Nick...

Go to Top of Page

jasper_smith
SQL Server MVP &amp; SQLTeam MVY

846 Posts

Posted - 2003-01-17 : 15:08:34
You will generally find the noise files in two locations, in system32 and also in
C:\Program Files\Common Files\System\MSSearch\Data\Config . If you modify the noise words file
(a common technique to improve the efficency of FTS) then both need to be modified. If neither has been modified
then it doesn't matter which one you use. The file used depends on the language you specify for the word breaker
when setting up FTS. If none was specified then it defaults to US I believe (noise.enu) if British English is used then
it's noise.eng. You can see what the FT catalog was set up to use by running the following


/* to find the default fulltext language */

create table #config(name sysname,
minimum int,
maximum int,
config_value int,
run_value int)
insert #config
exec master.dbo.sp_configure 'default full-text language'

select case run_value
when 0 then 'Neutral'
when 2052 then 'Chinese_Simplified'
when 1028 then 'Chinese_Traditional'
when 1043 then 'Dutch'
when 2057 then 'English_UK'
when 1033 then 'English_US'
when 1036 then 'French'
when 1031 then 'German'
when 1040 then 'Italian'
when 1041 then 'Japanese'
when 1042 then 'Korean'
when 3082 then 'Spanish_Modern'
when 1053 then 'Swedish_Default'
else 'Unknown' end as 'DEFAULT_FTS_LANGUAGE'
from #config
drop table #config

/* to find the language used as the word breaker for a column
you can use a modified version of sp_help_fulltext_columns */

use master
go
create procedure sp_help_fulltext_columns_ex
@table_name nvarchar(517) = NULL, -- table name
@column_name sysname = NULL -- column name
as
-- FULLTEXT MUST BE ACTIVE IN DATABASE --
if DatabaseProperty(db_name(), 'IsFulltextEnabled') = 0
begin
raiserror(15601,-1,-1)
return 1
end

if @table_name is not null
begin
-- VALIDATE TABLE NAME --
-- (1) Must exist in current database
declare @objid int
select @objid = object_id(@table_name, 'local')
if @objid is null
begin
declare @curdbname sysname
select @curdbname = db_name()
raiserror(15009,-1,-1 ,@table_name, @curdbname)
return 1
end
-- (2) Must be a user table (and not a temp table)
if ObjectProperty(@objid, 'IsUserTable') = 0
OR substring(parsename(@table_name,1),1,1) = '#'
begin
raiserror(15218,-1,-1 ,@table_name)
return 1
end

-- VALIDATE COLUMN NAME (CANNOT BE COMPUTED) --
if @column_name is not null
begin
declare @typename sysname
select @typename = type_name(ColumnProperty(@objid, @column_name, 'SystemType'))
if @typename is null OR ColumnProperty(@objid, @column_name, 'IsComputed') = 1
begin
raiserror(15104,-1,-1,@table_name,@column_name)
return 1
end
end

end

select distinct
susr.name as TABLE_OWNER,
sobj.id as TABLE_ID,
sobj.name as TABLE_NAME,
scol.name as FULLTEXT_COLUMN_NAME,
scol.colid as FULLTEXT_COLID,
b.FT_BLOBTPNAME as FULLTEXT_BLOBTP_COLNAME,
a.FT_BLOBTPCOLID as FULLTEXT_BLOBTP_COLID,
case scol.language
when 0 then 'Neutral'
when 2052 then 'Chinese_Simplified'
when 1028 then 'Chinese_Traditional'
when 1043 then 'Dutch'
when 2057 then 'English_UK'
when 1033 then 'English_US'
when 1036 then 'French'
when 1031 then 'German'
when 1040 then 'Italian'
when 1041 then 'Japanese'
when 1042 then 'Korean'
when 3082 then 'Spanish_Modern'
when 1053 then 'Swedish_Default'
else 'Unknown' end as FULLTEXT_LANGUAGE
from
sysobjects as sobj,
sysusers as susr,
syscolumns as scol
left outer join
(
select sdep.id as TABLE_ID,
sdep.number as FULLTEXT_COLID,
sdep.depnumber as FT_BLOBTPCOLID
from
sysdepends as sdep,
syscolumns as scol
where
scol.colid = sdep.number
and scol.id = sdep.id
and sdep.deptype = 1
and ColumnProperty(scol.id, scol.name, 'IsFullTextIndexed') = 1
) as a
on (scol.colid = a.FULLTEXT_COLID and scol.id = a.TABLE_ID)
left outer join
(
select sdep.id as TABLE_ID,
sdep.depnumber as FT_BLOBTPCOLID,
scol.name as FT_BLOBTPNAME,
sdep.number as FULLTEXT_COLID
from
syscolumns as scol,
sysdepends as sdep
where
scol.colid = sdep.depnumber
and ColumnProperty(sdep.id, scol.name, 'IsTypeForFullTextBlob') = 1
) as b
on (a.FULLTEXT_COLID = b.FULLTEXT_COLID and a.TABLE_ID = b.TABLE_ID)
where (
@table_name is null or
sobj.id = @objid
) and
scol.id = sobj.id and
sobj.uid = susr.uid and
(
@column_name is null or
scol.name = @column_name
) and
ColumnProperty(sobj.id, scol.name, 'IsFullTextIndexed') = 1
order by TABLE_OWNER, TABLE_NAME, FULLTEXT_COLID

-- SUCCESS --
return 0 -- sp_help_fulltext_columns_ex

GO

grant exec on sp_help_fulltext_columns_ex to public
GO



HTH
Jasper Smith

Edited by - jasper_smith on 01/17/2003 15:12:05
Go to Top of Page
   

- Advertisement -