Iterating sorted data - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

Transact-SQL (2005)

Iterating sorted data

Author

Topic

djkane
Starting Member

5 Posts

Posted - 2011-03-25 : 13:41:10

Hi guys,

I've edited this comment and the problem is descibed much better in my reply later in the thread after jimf's comment.

---

Can anyone tell me the best way to do this?

I came up with a way by using a cursor but it's pretty ugly. I'd much prefer not to go down that route as a solution if possible.

Many thanks for helping/reading.

jimf
Master Smack Fu Yak Hacker

2875 Posts

Posted - 2011-03-25 : 15:10:04

Have you looked at quirky updates? To get a better answer you'd
need to provide DDL for the tables as well as DML to populate them
with sample values.

Jim

Everyday I learn something that somebody else already knew

djkane
Starting Member

5 Posts

Posted - 2011-03-25 : 18:59:56

Thanks jimf. Ok, let me try and show examples:

Here's what the data would look like at the beginning:


Account  Type  Vol  Limit  Packet
=======  ====  ===  =====  ======
1111111     1  313  10000    null
2222222     1  314  10000    null
1111111    99  315  10000    null
1111111     1  316  10000    null
2222222    88  317  10000    null
3333333     3  318  10000    null
4444444     4  319  10000    null
1111111     1  320  10000    null
1111111    99  321  10000    null
1111111     1  322  10000    null
1111111     1  323  10000    null
2222222    88  324  10000    null
2222222     1  325  10000    null
2222222     1  326  10000    null
2222222     1  327  10000    null

In a proc, I want to sort this data by Account and Type, so it would become for example:


Account  Type  Vol  Limit  Packet
=======  ====  ===  =====  ======   -------
1111111     1  313  10000    null   GROUP A
1111111     1  316  10000    null
1111111     1  320  10000    null   
1111111     1  322  10000    null
1111111     1  323  10000    null     
---------------------------------   -------
1111111    99  315  10000    null   GROUP B

1111111    99  321  10000    null
---------------------------------   -------
2222222     1  314  10000    null   GROUP C
2222222     1  325  10000    null
2222222     1  326  10000    null
2222222     1  327  10000    null
---------------------------------   -------
2222222    88  317  10000    null   GROUP D
2222222    88  324  10000    null
---------------------------------   -------
3333333     3  318  10000    null   GROUP E
---------------------------------   -------
4444444     4  319  10000    null   GROUP F
---------------------------------   -------

Then I want to update the column called packet. I want to put the groups of accounts + type into packets, and for examples sake, I don't want the packet to include more than 3 items. Also a packet can NOT include items from two groups. So i need to split GROUP A into a packet of size 3 and a packet of size 2.

So my end goal would be to have this:



Account  Type  Vol  Limit  Packet
=======  ====  ===  =====  ======   -------
1111111     1  313  10000  pckt-1   GROUP A
1111111     1  316  10000  pckt-1
1111111     1  320  10000  pckt-1   (Split into 3/2)
1111111     1  322  10000  pckt-2
1111111     1  323  10000  pckt-2     
---------------------------------   -------
1111111    99  315  10000  pckt-3   GROUP B
1111111    99  321  10000  pckt-3
---------------------------------   -------
2222222     1  314  10000  pckt-4   GROUP C
2222222     1  325  10000  pckt-4
2222222     1  326  10000  pckt-4
2222222     1  327  10000  pckt-5
---------------------------------   -------
2222222    88  317  10000  pckt-6   GROUP D
2222222    88  324  10000  pckt-6
---------------------------------   -------
3333333     3  318  10000  pckt-7   GROUP E
---------------------------------   -------
4444444     4  319  10000  pckt-8   GROUP F
---------------------------------   -------

Hope this clarifies my goal and helps people to help me!

Any advice greatly appreciated. Maybe this is simple and I'm over-complicating it.

Thoughts?

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2011-03-26 : 08:25:36

Very nice sample data and description, djkane. That makes it very clear what you want to accomplish.

The bad news is that when I looked at it I thought, "this is easy". But I have been at it for 20 40 minutes now, and I still don't have a clean solution, so I am giving up. (SQL makes me feel humbled each day :--) What I have here is correct, it works, and you can use it. But I feel like there is a simpler solution that is eluding me. May be someone else will post an elegant solution.


create table #tmp (Account int, [type] int, vol int, limit int, packet int);

insert into #tmp values (1111111,1,313,10000,null);
insert into #tmp values (2222222,1,314,10000,null);
insert into #tmp values (1111111,99,315,10000,null);
insert into #tmp values (1111111,1,316,10000,null);
insert into #tmp values (2222222,88,317,10000,null);
insert into #tmp values (3333333,3,318,10000,null);
insert into #tmp values (4444444,4,319,10000,null);
insert into #tmp values (1111111,1,320,10000,null);
insert into #tmp values (1111111,99,321,10000,null);
insert into #tmp values (1111111,1,322,10000,null);
insert into #tmp values (1111111,1,323,10000,null);
insert into #tmp values (2222222,88,324,10000,null);
insert into #tmp values (2222222,1,325,10000,null);
insert into #tmp values (2222222,1,326,10000,null);
insert into #tmp values (2222222,1,327,10000,null);

with cte as
(
   select
      *,
      row_number() over (order by account,type) as N1,
      row_number() over (partition by account,type order by account,type) as N2
   from
      #tmp
)
select
   a.*,
   b.PacketNumber,
   b.GroupNumber
from
   cte a
   cross apply
   (
      select
         count(case when b.N2%3 = 1 then 1 end) as PacketNumber,
         count(case when N2 = 1 then 1 end) as GroupNumber 
      from 
         cte b 
      where 
         b.N1 <= a.N1 
   ) b;
drop table #tmp;

djkane
Starting Member

5 Posts

Posted - 2011-03-26 : 09:12:32

Great attempt sunita. Trust me, I've spent a lot more than 20 minutes on this!

I actually didn't know about Common Table expressions and cross-apply. I'm a JEE developer so I'm a bit fudgy when it comes to SQL.

While this almost gives me what I want, the performance isn't great. It's taking approx 2 minutes to sort a dataset of 2500 rows in my test. Not sure if the bottleneck is caused by the cross-apply?

I'd be interested if anyone else has any ideas. Performance is critical for this operation. What seems like a simple problem is definitely no gimme!

Again, thanks Sunita for your help so far. I learned a lot from your post already.

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2011-03-26 : 09:59:33

I specialize in writing unoptimized queries :--).

Sometimes, for reasons not clear to me, I have seen that I get better performance using a temp table than a CTE. You might try splitting the query into 2 pieces as in:

   select
      *,
      row_number() over (order by account,type) as N1,
      row_number() over (partition by account,type order by account,type) as N2
   into #tmp2
   from
      #tmp

and then

select
   a.*,
   b.PacketNumber,
   b.GroupNumber
from
   #temp2 a
   cross apply
   (
      select
         count(case when b.N2%3 = 1 then 1 end) as PacketNumber,
         count(case when N2 = 1 then 1 end) as GroupNumber 
      from 
         #temp2 b 
      where 
         b.N1 <= a.N1 
   ) b;

If that does not do anything much, you could try adding an index on column N1 in #temp2.

PS: Yes, yes, you guessed it right. I AM shooting from the hip, trying to take you on a wild goose chase *grins*. So I will say no more, and let one of the people who know what they are talking about help you.

robvolk
Most Valuable Yak

15732 Posts

Posted - 2011-03-26 : 11:08:00

I too also tried for a good 45 minutes - 1 hour before I totally gave up. I had something similar to sunita's code except for the cross apply part, lots of row_number() trickery to no avail.

Here's a method using quirky update than Jimf mentioned, and borrowing sunita's setup:

create table #tmp (Account int, [type] int, vol int, limit int, packet int);
create clustered index ix_#tmp on #tmp(Account, Type, Vol)

insert into #tmp values (1111111,1,313,10000,null);
insert into #tmp values (2222222,1,314,10000,null);
insert into #tmp values (1111111,99,315,10000,null);
insert into #tmp values (1111111,1,316,10000,null);
insert into #tmp values (2222222,88,317,10000,null);
insert into #tmp values (3333333,3,318,10000,null);
insert into #tmp values (4444444,4,319,10000,null);
insert into #tmp values (1111111,1,320,10000,null);
insert into #tmp values (1111111,99,321,10000,null);
insert into #tmp values (1111111,1,322,10000,null);
insert into #tmp values (1111111,1,323,10000,null);
insert into #tmp values (2222222,88,324,10000,null);
insert into #tmp values (2222222,1,325,10000,null);
insert into #tmp values (2222222,1,326,10000,null);
insert into #tmp values (2222222,1,327,10000,null);

declare @c int, @group int, @account int, @type int
select @c=0, @group=0

update #tmp set 
@c=case when @account=Account and @type=Type and @c<=3 then @c + 1 else 1 end,
@group=case when @account=Account and @type=type and @c<=3 then @group else @group + 1 end,
packet=@group, @account=Account, @type=Type

select * from #tmp

I'll leave it to your testing to see which performs better.

The green part has the actual UPDATE. The red parts are critical features in order for this to work. The table must have a clustered index on Account, Type and Vol columns, otherwise the packet numbers will not have the correct order. If you table is not already clustered this way, and you can't change it, you can put the data in a temp table as shown, generate the packet numbers, and do an update like this:

UPDATE A SET packet=B.Packet
FROM myTable A
INNER JOIN #tmp B ON A.Account=B.Account AND A.Type=B.Type AND A.Vol=B.Vol

This also assumes that Account, Type and Vol are unique combinations in your table. If not then this won't give you the expected results.

A few comments:

- Quirky update is non-standard behavior and not guaranteed to be consistent, reliable, or even supported in the future.
- Inserting/deleting/updating rows could cause existing packets to be invalid, and force packet recalculation on the entire table.
- This seems more like a reporting feature than actual data.

The last two items are the most important, to my mind. Relational data structures do not have an "order", and the physical ordering of rows has no bearing on their values. As you've described the value of a packet is determined solely by the data's order, and if that changed then the packet is no longer valid.

If this formula must hold and packet must be in your data, you need to have additional criteria for deriving it that is based only on the row's contents, not their position. Otherwise it's too easily broken by normal database activities.

djkane
Starting Member

5 Posts

Posted - 2011-03-26 : 14:19:44

Guys, thanks for the replies. Seems I might need to take a step back and rethink if this really what I want to do. The quirky update solution seems to work rob, but I don't want to go down any route that may not be consistent or lose support in future.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources