Group By Limiting at Lower Level Only

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

SQL Server Development (2000)

Group By Limiting at Lower Level Only

Author

Topic

trivialusername
Starting Member

6 Posts

Posted - 2005-03-03 : 15:35:15

Hi all,

I want to limit and sort the result set at the lowest level in a group by. Basically, the following:

select cat,subcat,top 5 subsubcat,count(*) cnt
from topline_Data
group by cat,subcat,subsubcat
order by cat,subcat,count(*) desc

Resulting in:

cat  subcat  subsubcat  cnt
a    a       d          50
a    a       x          12
a    a       r          8
a    a       s          8
a    a       g          3
a    b       m          112
a    b       t          83
a    b       f          80
a    b       c          12
a    b       y          11
b    a       w          53
b    a       n          50
b    a       o          33
b    a       p          30
b    a       q          1
... and so on

I have found a way around this, but it is hugely wasteful and time intensive, and would rather use a more efficient solution. The table I am pulling from has over a million records. The real query I am writing as about 6 levels to it versus this example's 3. Using an intermediate table is wasteful because it returns about 200k rows which I then have to cycle through.

First time posting here...thanks in advance for any help someone can toss my way...

Rob

nr
SQLTeam MVY

12543 Posts

Posted - 2005-03-03 : 15:59:00

worse than

select cat,subcat,subsubcat,count(*) cnt
from topline_Data t
where subsubcat in (select subsubcat from (select top 5 subsubcat, count(*) from topline_Data t2 where t2.cat = t.cat and t,subcat = t2.subcat group by t2.subsubcat order by count(*) desc) a)
group by cat,subcat,subsubcat
order by cat,subcat,count(*) desc

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2005-03-03 : 16:02:50

(I just saw nr's post, as I'm posting this. His is probably better but I'll post mine anyway)

Wow, this is an ugly thing to have to execute even for just 3 levels.
Here is my first cut at it. Since you didn't post a DDL for the
table or add any sample data, I didn't test this code. Does it work?


select	td.cat
	,td.subcat
	,td.subsubcat
	,cnt = count(*)
From	(
	Select	cat
		,subcat
		,subsubcat
		,subsubcatSeq = (Select count(*) from topline_data 
				where cat = a.cat and subcat = a.subcat)
	From	topline_data a
	Group by cat
		,subcat
		,subsubCat
	) as ssc
JOIN	topline_data td
		ON td.cat = ssc.cat
		AND td.subcat = ssc.subcat
		AND td.subsubcat = ssc.subsubcat
Where	subsubcatSeq <= 5
Group by
	td.cat
	,td.subcat
	,td.subsubcat

Be One with the Optimizer
TG

nr
SQLTeam MVY

12543 Posts

Posted - 2005-03-03 : 16:11:49

subsubcatSeq = (Select count(*) from topline_data
where cat = a.cat and subcat = a.subcat)

Where subsubcatSeq <= 5

That gives all those with more than 5 recs not the 5 biggest (unless that's what's wanted).

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2005-03-03 : 16:29:57

DOH!!, Man, I'm getting old...

Be One with the Optimizer
TG

trivialusername
Starting Member

6 Posts

Posted - 2005-03-03 : 16:36:34

I didn't think of using subqueries. Think I've been using tablevariables so long I have tunnel vision. Here's the real query using nr's method:

select	region,
	modelyear,
	segment,
	vehicle,
	trademodel,
	count(*) cnt
from topline_data t1
where trademodel in (select top 5 trademodel from topline_data t2
			where t2.region = t1.region and
			t2.modelyear = t1.modelyear and
			t2.segment = t1.segment and
			t2.vehicle = t1.vehicle
		     group by region,modelyear,segment,vehicle,trademodel
		     order by region,modelyear,segment,vehicle,count(*) desc
		    )
group by region,
	 modelyear,
	 segment,
	 vehicle,
	 trademodel
order by region,
	 modelyear,
	 segment,
	 vehicle,
	 count(*) desc

nr: not really sure why you have two selects in a row here (i just went with the one):

quote:
subsubcat in (select subsubcat from (select top 5 subsubcat, count(*)

My original version of the query using an intermediate step with a tablevariable took 1h45m to run this morning on a moderately loaded server. That load is about the same right now, and I just started running the new query above. We'll see if the subquery makes an impact.

This is great. You guys responded in less than 10 minutes with good advice. Nice there are places like this for developers.

Thanks for your help!
Rob

nr
SQLTeam MVY

12543 Posts

Posted - 2005-03-03 : 16:46:25

Shouldn't make much difference but you only need to group by trademodel in the subquery as the other columns will hold the same value each time the subquery is run.

Interested to see if this is faster but wouldn't be surprised if it's not.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.

trivialusername
Starting Member

6 Posts

Posted - 2005-03-03 : 17:06:15

Good point about the group by.

Thx ;)
Rob

trivialusername
Starting Member

6 Posts

Posted - 2005-03-03 : 22:23:30

I stopped the query 4.5 hours in. I guess I'll just live with the temp table. Thanks much though, was worth a shot. Much simpler, but killer on the resources.

Rob

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2005-03-03 : 22:58:57

you might try this, using a helper view:


create View Helper AS
select cat,subcat,subsubcat,count(*) cnt
from topline_Data
group by cat,subcat,subsubcat

then, your final select is:



Select * from
(
  select A.*, 
    (select count(*) 
     from HelperView B 
     where A.cat = B. and 
         A.subcat = B.Subcat AND 
         A.subsubcat = B.SubSubCat AND
         B.Cnt <= A.Cnt) as Rank
  from
  HelperViewer A
) a
Where Rank <=5

but I think in this case a temp table would probably best the most efficient.

- Jeff

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources