Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 keyword search... performance tweaking

Author  Topic 

cDc
Starting Member

30 Posts

Posted - 2004-05-08 : 16:00:23
Hi There
I have been experimenting with full text search and I just cannot squeeze enough performance out of it to make it work for my application, for the largest of results sets the response times im seeing are in the area of 5-10 seconds for 60,000 results. My data is approx 1 million rows of varchar(400) data. So I decided to experiment with something that would cover my needs using sql server to see if it would work with acceptable performance.

Using snippets of code from this very site I made a stored procedure that pulls out unique words from a string (after removing junk characters) and then populates two tables a keywords table and a keyworditem table. I ended up with about 5 million item entries and 200k unique keywords. This took about 15 mins of processing time (1.4ghz cpu)

The I hashed together another one to perform the search based on a comma seperated list of keywords. The code is below and im wondering if anyone can take a look and see if you can get better performance out of my code. Im seeing a response time of about 2 seconds for my largest keywords which is acceptable so im quite happy so far with the results if there are less than 10000 results its pretty much sub second. Im also a bit curious how this would perform when 200 users are using it at once.... any input greatly appreciated! As you can probbaly work out the matching doesnt really speed up as you add more terms in fact it can get slower depending on the terms. I would also love a UDF that can "reverse" words in a string so "the,amazing,spiderman" becomes "spiderman,amazing,the" !

and im sure im not the first to waste my time on this so if this has been done before and im missing some major flaw would love to know!!

Cheers!

CREATE TABLE [dbo].[keyworditem] (
[itemid] [int] NOT NULL ,
[keywordid] [int] NOT NULL
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[keywords] (
[id] [int] IDENTITY (1, 1) NOT NULL ,
[keyword] [varchar] (50) COLLATE Latin1_General_CI_AS NOT NULL
) ON [PRIMARY]
GO

CREATE PROCEDURE [dbo].[TestSearch]
@searchterms varchar(250)
AS

create table #tempt (itemid int)

set @searchterms = @searchterms + ','

while patindex('%,%' , @searchterms) <> 0
begin

declare @separator char(1)
declare @separator_position int
declare @array_value varchar(1000)

select @separator_position = patindex('%,%' , @searchterms)
select @array_value = left(@searchterms, @separator_position - 1)
select @searchterms = stuff(@searchterms, 1, @separator_position, '')

if not exists (select itemid from #tempt with(nolock))
begin
insert into #tempt (itemid) select distinct(itemid) from keyworditem with(nolock)
inner join keywords on
keywords.id = keyworditem.keywordid and keywords.keyword = @array_value
end
else
begin
create table #placeholder (itemid int)
insert into #placeholder
select #tempt.itemid from #tempt
INNER JOIN keyworditem k ON
#tempt.itemid = k.itemid
INNER JOIN keywords s ON
s.id = k.keywordid and s.keyword = @array_value
truncate table #tempt
insert into #tempt select itemid from #placeholder
drop table #placeholder
end
end

select distinct(itemid) from #tempt
drop table #tempt
GO

i would also love to expand this to consider "noise" words

cDc
Starting Member

30 Posts

Posted - 2004-05-18 : 19:28:34
thought I would give this a wee bump as i have modified the original sproc. Im getting good performance using this.. but still trying to get it to work as fast as possible any thoughts? Curious as to how it will handle 300 potential users simultaneously with the use of temp tables. Am I chasing a lost cause! does anyone else implement a custom search solution if so how did you do it? Ive been testing some third party apps like sql turbo aswell which looks ok & performs well.

as you can see ive been experimenting with soundex matching also, this seems to work great to improve chances of matching if theres a typo in a keyword.



CREATE PROCEDURE [dbo].[TestSearch]
@searchterms varchar(200)
AS
set nocount on
create table #tempt (itemid int PRIMARY KEY)

set @searchterms = @searchterms + ','

declare @count int
set @count = 0

while patindex('%,%' , @searchterms) <> 0
begin

declare @separator char(1)
declare @separator_position int
declare @array_value varchar(1000)

select @separator_position = patindex('%,%' , @searchterms)
select @array_value = left(@searchterms, @separator_position - 1)
select @searchterms = stuff(@searchterms, 1, @separator_position, '')

--if exists (select * from keywords where keyword = @array_value)
--begin
if not exists (select * from #tempt)
begin
insert into #tempt (itemid) select distinct(itemid) from keyworditem
inner join keywords on
keywords.id = keyworditem.keywordid and keyword = @array_value
end
else
begin
create table #placeholder (itemid int PRIMARY KEY)

-- delete from #tempt where #tempt.itemid not in (
if not isnumeric(@array_value) = 1
insert into #placeholder
select distinct(#tempt.itemid) from #tempt
INNER JOIN keyworditem k ON
#tempt.itemid = k.itemid
INNER JOIN keywords s ON
s.id = k.keywordid and s.[soundex] = soundex(@array_value)
else
insert into #placeholder
select distinct(#tempt.itemid) from #tempt
INNER JOIN keyworditem k ON
#tempt.itemid = k.itemid
INNER JOIN keywords s ON
s.id = k.keywordid and s.keyword = @array_value
if exists (select * from #placeholder)
begin
truncate table #tempt
insert into #tempt select itemid from #placeholder
end
drop table #placeholder
end
end
--end
set nocount off
select itemid from #tempt
drop table #tempt
GO
Go to Top of Page

AndrewMurphy
Master Smack Fu Yak Hacker

2916 Posts

Posted - 2004-05-19 : 06:06:38
Producing (displaying here) the execution plan would give you/us a good hint as to the likely performance.


also...(a big) maybe (indexed) views as suggested towards the latter part of this link would be worth investigating...
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=19638&whichpage=2&SearchTerms=sargable%2Cindex

as could/would index hints and/or readlock hints....(given that your keyword table would be read-only??)


You could always invest time/money in a stress-testing exercise/tool....to prove the concept.
Go to Top of Page
   

- Advertisement -