Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Iterating sorted data

Author  Topic 

djkane
Starting Member

5 Posts

Posted - 2011-03-25 : 13:41:10
Hi guys,

I've edited this comment and the problem is descibed much better in my reply later in the thread after jimf's comment.

---

Can anyone tell me the best way to do this?

I came up with a way by using a cursor but it's pretty ugly. I'd much prefer not to go down that route as a solution if possible.

Many thanks for helping/reading.

jimf
Master Smack Fu Yak Hacker

2875 Posts

Posted - 2011-03-25 : 15:10:04
Have you looked at quirky updates? To get a better answer you'd
need to provide DDL for the tables as well as DML to populate them
with sample values.


Jim

Everyday I learn something that somebody else already knew
Go to Top of Page

djkane
Starting Member

5 Posts

Posted - 2011-03-25 : 18:59:56
Thanks jimf. Ok, let me try and show examples:

Here's what the data would look like at the beginning:


Account Type Vol Limit Packet
======= ==== === ===== ======
1111111 1 313 10000 null
2222222 1 314 10000 null
1111111 99 315 10000 null
1111111 1 316 10000 null
2222222 88 317 10000 null
3333333 3 318 10000 null
4444444 4 319 10000 null
1111111 1 320 10000 null
1111111 99 321 10000 null
1111111 1 322 10000 null
1111111 1 323 10000 null
2222222 88 324 10000 null
2222222 1 325 10000 null
2222222 1 326 10000 null
2222222 1 327 10000 null


In a proc, I want to sort this data by Account and Type, so it would become for example:



Account Type Vol Limit Packet
======= ==== === ===== ====== -------
1111111 1 313 10000 null GROUP A
1111111 1 316 10000 null
1111111 1 320 10000 null
1111111 1 322 10000 null
1111111 1 323 10000 null
--------------------------------- -------
1111111 99 315 10000 null GROUP B

1111111 99 321 10000 null
--------------------------------- -------
2222222 1 314 10000 null GROUP C
2222222 1 325 10000 null
2222222 1 326 10000 null
2222222 1 327 10000 null
--------------------------------- -------
2222222 88 317 10000 null GROUP D
2222222 88 324 10000 null
--------------------------------- -------
3333333 3 318 10000 null GROUP E
--------------------------------- -------
4444444 4 319 10000 null GROUP F
--------------------------------- -------


Then I want to update the column called packet. I want to put the groups of accounts + type into packets, and for examples sake, I don't want the packet to include more than 3 items. Also a packet can NOT include items from two groups. So i need to split GROUP A into a packet of size 3 and a packet of size 2.

So my end goal would be to have this:



Account Type Vol Limit Packet
======= ==== === ===== ====== -------
1111111 1 313 10000 pckt-1 GROUP A
1111111 1 316 10000 pckt-1
1111111 1 320 10000 pckt-1 (Split into 3/2)
1111111 1 322 10000 pckt-2
1111111 1 323 10000 pckt-2
--------------------------------- -------
1111111 99 315 10000 pckt-3 GROUP B
1111111 99 321 10000 pckt-3
--------------------------------- -------
2222222 1 314 10000 pckt-4 GROUP C
2222222 1 325 10000 pckt-4
2222222 1 326 10000 pckt-4
2222222 1 327 10000 pckt-5
--------------------------------- -------
2222222 88 317 10000 pckt-6 GROUP D
2222222 88 324 10000 pckt-6
--------------------------------- -------
3333333 3 318 10000 pckt-7 GROUP E
--------------------------------- -------
4444444 4 319 10000 pckt-8 GROUP F
--------------------------------- -------




Hope this clarifies my goal and helps people to help me!

Any advice greatly appreciated. Maybe this is simple and I'm over-complicating it.

Thoughts?
Go to Top of Page

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2011-03-26 : 08:25:36
Very nice sample data and description, djkane. That makes it very clear what you want to accomplish.

The bad news is that when I looked at it I thought, "this is easy". But I have been at it for 20 40 minutes now, and I still don't have a clean solution, so I am giving up. (SQL makes me feel humbled each day :--) What I have here is correct, it works, and you can use it. But I feel like there is a simpler solution that is eluding me. May be someone else will post an elegant solution.

create table #tmp (Account int, [type] int, vol int, limit int, packet int);

insert into #tmp values (1111111,1,313,10000,null);
insert into #tmp values (2222222,1,314,10000,null);
insert into #tmp values (1111111,99,315,10000,null);
insert into #tmp values (1111111,1,316,10000,null);
insert into #tmp values (2222222,88,317,10000,null);
insert into #tmp values (3333333,3,318,10000,null);
insert into #tmp values (4444444,4,319,10000,null);
insert into #tmp values (1111111,1,320,10000,null);
insert into #tmp values (1111111,99,321,10000,null);
insert into #tmp values (1111111,1,322,10000,null);
insert into #tmp values (1111111,1,323,10000,null);
insert into #tmp values (2222222,88,324,10000,null);
insert into #tmp values (2222222,1,325,10000,null);
insert into #tmp values (2222222,1,326,10000,null);
insert into #tmp values (2222222,1,327,10000,null);

with cte as
(
select
*,
row_number() over (order by account,type) as N1,
row_number() over (partition by account,type order by account,type) as N2
from
#tmp
)
select
a.*,
b.PacketNumber,
b.GroupNumber
from
cte a
cross apply
(
select
count(case when b.N2%3 = 1 then 1 end) as PacketNumber,
count(case when N2 = 1 then 1 end) as GroupNumber
from
cte b
where
b.N1 <= a.N1
) b;
drop table #tmp;
Go to Top of Page

djkane
Starting Member

5 Posts

Posted - 2011-03-26 : 09:12:32
Great attempt sunita. Trust me, I've spent a lot more than 20 minutes on this!

I actually didn't know about Common Table expressions and cross-apply. I'm a JEE developer so I'm a bit fudgy when it comes to SQL.

While this almost gives me what I want, the performance isn't great. It's taking approx 2 minutes to sort a dataset of 2500 rows in my test. Not sure if the bottleneck is caused by the cross-apply?

I'd be interested if anyone else has any ideas. Performance is critical for this operation. What seems like a simple problem is definitely no gimme!

Again, thanks Sunita for your help so far. I learned a lot from your post already.
Go to Top of Page

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2011-03-26 : 09:59:33
I specialize in writing unoptimized queries :--).

Sometimes, for reasons not clear to me, I have seen that I get better performance using a temp table than a CTE. You might try splitting the query into 2 pieces as in:
   select
*,
row_number() over (order by account,type) as N1,
row_number() over (partition by account,type order by account,type) as N2
into #tmp2
from
#tmp
and then
select
a.*,
b.PacketNumber,
b.GroupNumber
from
#temp2 a
cross apply
(
select
count(case when b.N2%3 = 1 then 1 end) as PacketNumber,
count(case when N2 = 1 then 1 end) as GroupNumber
from
#temp2 b
where
b.N1 <= a.N1
) b;
If that does not do anything much, you could try adding an index on column N1 in #temp2.

PS: Yes, yes, you guessed it right. I AM shooting from the hip, trying to take you on a wild goose chase *grins*. So I will say no more, and let one of the people who know what they are talking about help you.
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2011-03-26 : 11:08:00
I too also tried for a good 45 minutes - 1 hour before I totally gave up. I had something similar to sunita's code except for the cross apply part, lots of row_number() trickery to no avail.

Here's a method using quirky update than Jimf mentioned, and borrowing sunita's setup:
create table #tmp (Account int, [type] int, vol int, limit int, packet int);
create clustered index ix_#tmp on #tmp(Account, Type, Vol)

insert into #tmp values (1111111,1,313,10000,null);
insert into #tmp values (2222222,1,314,10000,null);
insert into #tmp values (1111111,99,315,10000,null);
insert into #tmp values (1111111,1,316,10000,null);
insert into #tmp values (2222222,88,317,10000,null);
insert into #tmp values (3333333,3,318,10000,null);
insert into #tmp values (4444444,4,319,10000,null);
insert into #tmp values (1111111,1,320,10000,null);
insert into #tmp values (1111111,99,321,10000,null);
insert into #tmp values (1111111,1,322,10000,null);
insert into #tmp values (1111111,1,323,10000,null);
insert into #tmp values (2222222,88,324,10000,null);
insert into #tmp values (2222222,1,325,10000,null);
insert into #tmp values (2222222,1,326,10000,null);
insert into #tmp values (2222222,1,327,10000,null);

declare @c int, @group int, @account int, @type int
select @c=0, @group=0


update #tmp set
@c=case when @account=Account and @type=Type and @c<=3 then @c + 1 else 1 end,
@group=case when @account=Account and @type=type and @c<=3 then @group else @group + 1 end,
packet=@group, @account=Account, @type=Type


select * from #tmp
I'll leave it to your testing to see which performs better.

The green part has the actual UPDATE. The red parts are critical features in order for this to work. The table must have a clustered index on Account, Type and Vol columns, otherwise the packet numbers will not have the correct order. If you table is not already clustered this way, and you can't change it, you can put the data in a temp table as shown, generate the packet numbers, and do an update like this:
UPDATE A SET packet=B.Packet
FROM myTable A
INNER JOIN #tmp B ON A.Account=B.Account AND A.Type=B.Type AND A.Vol=B.Vol
This also assumes that Account, Type and Vol are unique combinations in your table. If not then this won't give you the expected results.

A few comments:

- Quirky update is non-standard behavior and not guaranteed to be consistent, reliable, or even supported in the future.
- Inserting/deleting/updating rows could cause existing packets to be invalid, and force packet recalculation on the entire table.
- This seems more like a reporting feature than actual data.

The last two items are the most important, to my mind. Relational data structures do not have an "order", and the physical ordering of rows has no bearing on their values. As you've described the value of a packet is determined solely by the data's order, and if that changed then the packet is no longer valid.

If this formula must hold and packet must be in your data, you need to have additional criteria for deriving it that is based only on the row's contents, not their position. Otherwise it's too easily broken by normal database activities.
Go to Top of Page

djkane
Starting Member

5 Posts

Posted - 2011-03-26 : 14:19:44
Guys, thanks for the replies. Seems I might need to take a step back and rethink if this really what I want to do. The quirky update solution seems to work rob, but I don't want to go down any route that may not be consistent or lose support in future.

Go to Top of Page
   

- Advertisement -