Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
cDc
Starting Member
30 Posts |
Posted - 2004-05-08 : 16:00:23
|
| Hi There I have been experimenting with full text search and I just cannot squeeze enough performance out of it to make it work for my application, for the largest of results sets the response times im seeing are in the area of 5-10 seconds for 60,000 results. My data is approx 1 million rows of varchar(400) data. So I decided to experiment with something that would cover my needs using sql server to see if it would work with acceptable performance.Using snippets of code from this very site I made a stored procedure that pulls out unique words from a string (after removing junk characters) and then populates two tables a keywords table and a keyworditem table. I ended up with about 5 million item entries and 200k unique keywords. This took about 15 mins of processing time (1.4ghz cpu)The I hashed together another one to perform the search based on a comma seperated list of keywords. The code is below and im wondering if anyone can take a look and see if you can get better performance out of my code. Im seeing a response time of about 2 seconds for my largest keywords which is acceptable so im quite happy so far with the results if there are less than 10000 results its pretty much sub second. Im also a bit curious how this would perform when 200 users are using it at once.... any input greatly appreciated! As you can probbaly work out the matching doesnt really speed up as you add more terms in fact it can get slower depending on the terms. I would also love a UDF that can "reverse" words in a string so "the,amazing,spiderman" becomes "spiderman,amazing,the" !and im sure im not the first to waste my time on this so if this has been done before and im missing some major flaw would love to know!!Cheers!CREATE TABLE [dbo].[keyworditem] ( [itemid] [int] NOT NULL , [keywordid] [int] NOT NULL ) ON [PRIMARY]GOCREATE TABLE [dbo].[keywords] ( [id] [int] IDENTITY (1, 1) NOT NULL , [keyword] [varchar] (50) COLLATE Latin1_General_CI_AS NOT NULL ) ON [PRIMARY]GOCREATE PROCEDURE [dbo].[TestSearch] @searchterms varchar(250)AScreate table #tempt (itemid int)set @searchterms = @searchterms + ','while patindex('%,%' , @searchterms) <> 0 begindeclare @separator char(1)declare @separator_position int declare @array_value varchar(1000) select @separator_position = patindex('%,%' , @searchterms)select @array_value = left(@searchterms, @separator_position - 1)select @searchterms = stuff(@searchterms, 1, @separator_position, '') if not exists (select itemid from #tempt with(nolock)) begin insert into #tempt (itemid) select distinct(itemid) from keyworditem with(nolock) inner join keywords on keywords.id = keyworditem.keywordid and keywords.keyword = @array_value end else begin create table #placeholder (itemid int) insert into #placeholder select #tempt.itemid from #tempt INNER JOIN keyworditem k ON #tempt.itemid = k.itemid INNER JOIN keywords s ON s.id = k.keywordid and s.keyword = @array_value truncate table #tempt insert into #tempt select itemid from #placeholder drop table #placeholder endendselect distinct(itemid) from #temptdrop table #temptGOi would also love to expand this to consider "noise" words |
|
|
cDc
Starting Member
30 Posts |
Posted - 2004-05-18 : 19:28:34
|
| thought I would give this a wee bump as i have modified the original sproc. Im getting good performance using this.. but still trying to get it to work as fast as possible any thoughts? Curious as to how it will handle 300 potential users simultaneously with the use of temp tables. Am I chasing a lost cause! does anyone else implement a custom search solution if so how did you do it? Ive been testing some third party apps like sql turbo aswell which looks ok & performs well.as you can see ive been experimenting with soundex matching also, this seems to work great to improve chances of matching if theres a typo in a keyword.CREATE PROCEDURE [dbo].[TestSearch] @searchterms varchar(200)ASset nocount oncreate table #tempt (itemid int PRIMARY KEY)set @searchterms = @searchterms + ','declare @count int set @count = 0while patindex('%,%' , @searchterms) <> 0 begindeclare @separator char(1)declare @separator_position int declare @array_value varchar(1000) select @separator_position = patindex('%,%' , @searchterms)select @array_value = left(@searchterms, @separator_position - 1)select @searchterms = stuff(@searchterms, 1, @separator_position, '')--if exists (select * from keywords where keyword = @array_value)--begin if not exists (select * from #tempt) begin insert into #tempt (itemid) select distinct(itemid) from keyworditem inner join keywords on keywords.id = keyworditem.keywordid and keyword = @array_value end else begin create table #placeholder (itemid int PRIMARY KEY)-- delete from #tempt where #tempt.itemid not in ( if not isnumeric(@array_value) = 1 insert into #placeholder select distinct(#tempt.itemid) from #tempt INNER JOIN keyworditem k ON #tempt.itemid = k.itemid INNER JOIN keywords s ON s.id = k.keywordid and s.[soundex] = soundex(@array_value) else insert into #placeholder select distinct(#tempt.itemid) from #tempt INNER JOIN keyworditem k ON #tempt.itemid = k.itemid INNER JOIN keywords s ON s.id = k.keywordid and s.keyword = @array_value if exists (select * from #placeholder) begin truncate table #tempt insert into #tempt select itemid from #placeholder end drop table #placeholder endend--endset nocount offselect itemid from #temptdrop table #temptGO |
 |
|
|
AndrewMurphy
Master Smack Fu Yak Hacker
2916 Posts |
Posted - 2004-05-19 : 06:06:38
|
| Producing (displaying here) the execution plan would give you/us a good hint as to the likely performance.also...(a big) maybe (indexed) views as suggested towards the latter part of this link would be worth investigating...http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=19638&whichpage=2&SearchTerms=sargable%2Cindexas could/would index hints and/or readlock hints....(given that your keyword table would be read-only??)You could always invest time/money in a stress-testing exercise/tool....to prove the concept. |
 |
|
|
|
|
|
|
|