Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
Lopaka
Starting Member
48 Posts |
Posted - 2006-03-27 : 16:11:36
|
| Problem:I have a table contain 500K records, “and growing”. In the table (a) is a field containing several words that I would like to compare against keywords, also in a table (b). Table b contains either a single word, or multiple words separated by a semi-colon. Table b will continually grow as long as table a grows, currently table b has about 500 records.Question:I am currently reading about ‘Cursors’… Before I head down this path, would it be better to loop over the select statements and compare it against table B, or would it be better to use a cursor and loop over the select?EXAMPLE:Table A:FIELD1AAA CCC EEETable B:FIELD1, FIELD21, AAA2, BBB3, CCC4, DDD5, EEE6, FFFOutput:@VARIABLE = 135AGAIN, remember the amount of records will continually to grow...Thank youRobert R. Barnes |
|
|
jhermiz
3564 Posts |
Posted - 2006-03-27 : 16:14:10
|
You could all do it without a loop using the LIKE predicate clause ? Keeping the web experience alive -- [url]http://www.web-impulse.com[/url]RS Blog -- [url]http://weblogs.sqlteam.com/jhermiz[/url] |
 |
|
|
Lopaka
Starting Member
48 Posts |
Posted - 2006-03-27 : 16:18:25
|
| Due to the amount of the records and size of the field in table A, I was wondering what would be faster a loop or a cursor? I currenlty of a basic loop, I inherited it from a SAS programmer...:( A completely other story...Robert R. Barnes |
 |
|
|
blindman
Master Smack Fu Yak Hacker
2365 Posts |
Posted - 2006-03-27 : 17:53:10
|
quote: Originally posted by Lopaka Problem:I have a table contain 500K records, “and growing”. In the table (a) is a field containing several words that I would like to compare against keywords, also in a table (b). Table b contains either a single word, or multiple words separated by a semi-colon.
So table both table (a) and table (b) may store several words in a single field? This is like a double-barreled shotgun blast to any concept of normalization.How's your resume' looking? Up to date?Whatever. Any LIKE() solution you come up with is going to be faster than a cursor, but maybe not much. Throw any concept of indexes out the window.You REALLY need to fix this schema, or you are going to be creating code to cover its flaws indefinitely. |
 |
|
|
blindman
Master Smack Fu Yak Hacker
2365 Posts |
Posted - 2006-03-27 : 17:54:47
|
| To answer your question, a cursor IS a loop, so you will not get any appreciable performance improvement using one (I predict). Its possible the database engine may be somewhat optimized to handle cursors over progamatic loops. |
 |
|
|
Lopaka
Starting Member
48 Posts |
Posted - 2006-03-31 : 11:55:22
|
| I will try and explain in better detail...TableA.ColA Contains "Senior Systems Analyst" or "Certified Public Acctountant". Now containing 1.2 Million Distinct TitlesTableB.ColAContains a list of single words, of which are "Senior", "Systems", "Analyst", "Certified", "Public", "Acctountant", etc... There are 25K distinct words...My algorithm would compare each word in TableA with the list of words in TableB. The figure out that the first title belongs to IT and the second title belongs to the Financial DepartmentRobert R. Barnes |
 |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2006-03-31 : 14:19:03
|
| Is there a reason a simple join won't work? Why do you need a loop *or* a cursor?Assuming that you have a column in your "Words" table that identifies a category, like "I.T." for example, a simple LIKE join does the trick:select Titles.Title, min(Words.Category) as Categoryfrom Titlesinner join Wordson ' ' + Titles.Title + ' ' like '%[^a-z]' + Words.Word + '[^a-z]%'group by Titles.TitleThat returns 1 category per title from the matching words, and it just picks the MIN() or the one with the lowest alphabetical sort. You can expand upon this to find the best match by using COUNT(*) and other ways.Efficiency should not matter -- something like this should not be run "on the fly". You should categorize your titles once and store the values in your table, and then do it only for new titles going forward. |
 |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2006-03-31 : 14:23:04
|
quote: Originally posted by Lopaka I will try and explain in better detail...TableA.ColA Contains "Senior Systems Analyst" or "Certified Public Acctountant". Now containing 1.2 Million Distinct TitlesTableB.ColAContains a list of single words, of which are "Senior", "Systems", "Analyst", "Certified", "Public", "Acctountant", etc... There are 25K distinct words...My algorithm would compare each word in TableA with the list of words in TableB. The figure out that the first title belongs to IT and the second title belongs to the Financial DepartmentRobert R. Barnes
I just saw this more carefuly ... you did realize that words like Senior, Analyst, Certified, and so on are extremely common in all kinds of titles, right? once again, it will be best if you give us actually specifics as to what you are doing, with actual table and column names, and a little bit of representative sample data so that we can see what your goal is. Your question is way to broad, and at this point alarms are going off indicating to me that what you are trying to do is probably not a good idea. |
 |
|
|
|
|
|
|
|