Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
trivialusername
Starting Member
6 Posts |
Posted - 2005-03-03 : 15:35:15
|
Hi all,I want to limit and sort the result set at the lowest level in a group by. Basically, the following:select cat,subcat,top 5 subsubcat,count(*) cntfrom topline_Datagroup by cat,subcat,subsubcatorder by cat,subcat,count(*) desc Resulting in:cat subcat subsubcat cnta a d 50a a x 12a a r 8a a s 8a a g 3a b m 112a b t 83a b f 80a b c 12a b y 11b a w 53b a n 50b a o 33b a p 30b a q 1... and so on I have found a way around this, but it is hugely wasteful and time intensive, and would rather use a more efficient solution. The table I am pulling from has over a million records. The real query I am writing as about 6 levels to it versus this example's 3. Using an intermediate table is wasteful because it returns about 200k rows which I then have to cycle through.First time posting here...thanks in advance for any help someone can toss my way...Rob |
|
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2005-03-03 : 15:59:00
|
| worse thanselect cat,subcat,subsubcat,count(*) cntfrom topline_Data twhere subsubcat in (select subsubcat from (select top 5 subsubcat, count(*) from topline_Data t2 where t2.cat = t.cat and t,subcat = t2.subcat group by t2.subsubcat order by count(*) desc) a)group by cat,subcat,subsubcatorder by cat,subcat,count(*) desc==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2005-03-03 : 16:02:50
|
(I just saw nr's post, as I'm posting this. His is probably better but I'll post mine anyway)Wow, this is an ugly thing to have to execute even for just 3 levels.Here is my first cut at it. Since you didn't post a DDL for thetable or add any sample data, I didn't test this code. Does it work?select td.cat ,td.subcat ,td.subsubcat ,cnt = count(*)From ( Select cat ,subcat ,subsubcat ,subsubcatSeq = (Select count(*) from topline_data where cat = a.cat and subcat = a.subcat) From topline_data a Group by cat ,subcat ,subsubCat ) as sscJOIN topline_data td ON td.cat = ssc.cat AND td.subcat = ssc.subcat AND td.subsubcat = ssc.subsubcatWhere subsubcatSeq <= 5Group by td.cat ,td.subcat ,td.subsubcat Be One with the OptimizerTG |
 |
|
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2005-03-03 : 16:11:49
|
| subsubcatSeq = (Select count(*) from topline_data where cat = a.cat and subcat = a.subcat)Where subsubcatSeq <= 5That gives all those with more than 5 recs not the 5 biggest (unless that's what's wanted).==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2005-03-03 : 16:29:57
|
| DOH!!, Man, I'm getting old...Be One with the OptimizerTG |
 |
|
|
trivialusername
Starting Member
6 Posts |
Posted - 2005-03-03 : 16:36:34
|
I didn't think of using subqueries. Think I've been using tablevariables so long I have tunnel vision. Here's the real query using nr's method:select region, modelyear, segment, vehicle, trademodel, count(*) cntfrom topline_data t1where trademodel in (select top 5 trademodel from topline_data t2 where t2.region = t1.region and t2.modelyear = t1.modelyear and t2.segment = t1.segment and t2.vehicle = t1.vehicle group by region,modelyear,segment,vehicle,trademodel order by region,modelyear,segment,vehicle,count(*) desc )group by region, modelyear, segment, vehicle, trademodelorder by region, modelyear, segment, vehicle, count(*) desc nr: not really sure why you have two selects in a row here (i just went with the one): quote: subsubcat in (select subsubcat from (select top 5 subsubcat, count(*)
My original version of the query using an intermediate step with a tablevariable took 1h45m to run this morning on a moderately loaded server. That load is about the same right now, and I just started running the new query above. We'll see if the subquery makes an impact.This is great. You guys responded in less than 10 minutes with good advice. Nice there are places like this for developers.Thanks for your help!Rob |
 |
|
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2005-03-03 : 16:46:25
|
| Shouldn't make much difference but you only need to group by trademodel in the subquery as the other columns will hold the same value each time the subquery is run.Interested to see if this is faster but wouldn't be surprised if it's not.==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
trivialusername
Starting Member
6 Posts |
Posted - 2005-03-03 : 17:06:15
|
| Good point about the group by.Thx ;)Rob |
 |
|
|
trivialusername
Starting Member
6 Posts |
Posted - 2005-03-03 : 22:23:30
|
| I stopped the query 4.5 hours in. I guess I'll just live with the temp table. Thanks much though, was worth a shot. Much simpler, but killer on the resources.Rob |
 |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2005-03-03 : 22:58:57
|
you might try this, using a helper view:create View Helper ASselect cat,subcat,subsubcat,count(*) cntfrom topline_Datagroup by cat,subcat,subsubcat then, your final select is:Select * from( select A.*, (select count(*) from HelperView B where A.cat = B. and A.subcat = B.Subcat AND A.subsubcat = B.SubSubCat AND B.Cnt <= A.Cnt) as Rank from HelperViewer A) aWhere Rank <=5 but I think in this case a temp table would probably best the most efficient. - Jeff |
 |
|
|
|
|
|
|
|