Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 Group By Limiting at Lower Level Only

Author  Topic 

trivialusername
Starting Member

6 Posts

Posted - 2005-03-03 : 15:35:15
Hi all,

I want to limit and sort the result set at the lowest level in a group by. Basically, the following:

select cat,subcat,top 5 subsubcat,count(*) cnt
from topline_Data
group by cat,subcat,subsubcat
order by cat,subcat,count(*) desc


Resulting in:
cat  subcat  subsubcat  cnt
a a d 50
a a x 12
a a r 8
a a s 8
a a g 3
a b m 112
a b t 83
a b f 80
a b c 12
a b y 11
b a w 53
b a n 50
b a o 33
b a p 30
b a q 1
... and so on

I have found a way around this, but it is hugely wasteful and time intensive, and would rather use a more efficient solution. The table I am pulling from has over a million records. The real query I am writing as about 6 levels to it versus this example's 3. Using an intermediate table is wasteful because it returns about 200k rows which I then have to cycle through.

First time posting here...thanks in advance for any help someone can toss my way...

Rob

nr
SQLTeam MVY

12543 Posts

Posted - 2005-03-03 : 15:59:00
worse than

select cat,subcat,subsubcat,count(*) cnt
from topline_Data t
where subsubcat in (select subsubcat from (select top 5 subsubcat, count(*) from topline_Data t2 where t2.cat = t.cat and t,subcat = t2.subcat group by t2.subsubcat order by count(*) desc) a)
group by cat,subcat,subsubcat
order by cat,subcat,count(*) desc


==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2005-03-03 : 16:02:50
(I just saw nr's post, as I'm posting this. His is probably better but I'll post mine anyway)

Wow, this is an ugly thing to have to execute even for just 3 levels.
Here is my first cut at it. Since you didn't post a DDL for the
table or add any sample data, I didn't test this code. Does it work?

select td.cat
,td.subcat
,td.subsubcat
,cnt = count(*)
From (
Select cat
,subcat
,subsubcat
,subsubcatSeq = (Select count(*) from topline_data
where cat = a.cat and subcat = a.subcat)
From topline_data a
Group by cat
,subcat
,subsubCat
) as ssc
JOIN topline_data td
ON td.cat = ssc.cat
AND td.subcat = ssc.subcat
AND td.subsubcat = ssc.subsubcat
Where subsubcatSeq <= 5
Group by
td.cat
,td.subcat
,td.subsubcat


Be One with the Optimizer
TG
Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2005-03-03 : 16:11:49
subsubcatSeq = (Select count(*) from topline_data
where cat = a.cat and subcat = a.subcat)


Where subsubcatSeq <= 5

That gives all those with more than 5 recs not the 5 biggest (unless that's what's wanted).

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2005-03-03 : 16:29:57
DOH!!, Man, I'm getting old...

Be One with the Optimizer
TG
Go to Top of Page

trivialusername
Starting Member

6 Posts

Posted - 2005-03-03 : 16:36:34
I didn't think of using subqueries. Think I've been using tablevariables so long I have tunnel vision. Here's the real query using nr's method:
select	region,
modelyear,
segment,
vehicle,
trademodel,
count(*) cnt
from topline_data t1
where trademodel in (select top 5 trademodel from topline_data t2
where t2.region = t1.region and
t2.modelyear = t1.modelyear and
t2.segment = t1.segment and
t2.vehicle = t1.vehicle
group by region,modelyear,segment,vehicle,trademodel
order by region,modelyear,segment,vehicle,count(*) desc
)
group by region,
modelyear,
segment,
vehicle,
trademodel
order by region,
modelyear,
segment,
vehicle,
count(*) desc


nr: not really sure why you have two selects in a row here (i just went with the one):

quote:
subsubcat in (select subsubcat from (select top 5 subsubcat, count(*)


My original version of the query using an intermediate step with a tablevariable took 1h45m to run this morning on a moderately loaded server. That load is about the same right now, and I just started running the new query above. We'll see if the subquery makes an impact.

This is great. You guys responded in less than 10 minutes with good advice. Nice there are places like this for developers.

Thanks for your help!
Rob
Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2005-03-03 : 16:46:25
Shouldn't make much difference but you only need to group by trademodel in the subquery as the other columns will hold the same value each time the subquery is run.

Interested to see if this is faster but wouldn't be surprised if it's not.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

trivialusername
Starting Member

6 Posts

Posted - 2005-03-03 : 17:06:15
Good point about the group by.

Thx ;)
Rob
Go to Top of Page

trivialusername
Starting Member

6 Posts

Posted - 2005-03-03 : 22:23:30
I stopped the query 4.5 hours in. I guess I'll just live with the temp table. Thanks much though, was worth a shot. Much simpler, but killer on the resources.

Rob
Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2005-03-03 : 22:58:57
you might try this, using a helper view:


create View Helper AS
select cat,subcat,subsubcat,count(*) cnt
from topline_Data
group by cat,subcat,subsubcat


then, your final select is:


Select * from
(
select A.*,
(select count(*)
from HelperView B
where A.cat = B. and
A.subcat = B.Subcat AND
A.subsubcat = B.SubSubCat AND
B.Cnt <= A.Cnt) as Rank
from
HelperViewer A
) a
Where Rank <=5


but I think in this case a temp table would probably best the most efficient.

- Jeff
Go to Top of Page
   

- Advertisement -