Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 Deleting rows that push a sum() too high?

Author  Topic 

aiken
Aged Yak Warrior

525 Posts

Posted - 2002-07-26 : 18:05:30
I'm looking for the most efficient way to delete old records from a table until the a computed sum is below a threshold. A test schema is attached here, which will probably make more sense.

What I've got works, but it seems terribly inefficient. For each operation, quite a few queries are needed. Am I missing something here? I do not have the option of adding a "cumulative size" column to the table.

Any ideas?

Thanks
-b

Test schema for deleting old records until sum(size)<70.
create table test (i int identity primary key clustered,
size int)

insert into test (size) VALUES (25)
insert into test (size) VALUES (20)
insert into test (size) VALUES (15)
insert into test (size) VALUES (31)
insert into test (size) VALUES (87)
insert into test (size) VALUES (43)
insert into test (size) VALUES (22)
insert into test (size) VALUES (65)
insert into test (size) VALUES (17)
insert into test (size) VALUES (8)
insert into test (size) VALUES (22)
select * from test order by i desc

delete from test where i<(select min(i) from test t2 where (select sum(size) from test t3 where t3.i>t2.i)<70)

select * from test order by i desc



robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-07-26 : 18:44:54
Are you looking for a "best-fit" sum, or just getting rid of the largest values until you fall below the threshold?

The following will do the latter:

DECLARE @maxsum int
SET @maxsum=70
DELETE test WHERE size>@maxsum
WHILE (SELECT Sum(size) FROM test)>@maxsum
BEGIN
DELETE test WHERE size=(SELECT max(size) FROM test)
END


Doing repetitive DELETEs is far easier, and probably just as or more efficient that trying to find a single query that does it, unless you're looking for a best-fit solution.

Go to Top of Page

aiken
Aged Yak Warrior

525 Posts

Posted - 2002-07-26 : 18:49:54
Thanks... I am definitely not going for a best-fit, but I am going for a "delete oldest first" approach (in my example, using the identity column which is guaranteed to be sequential). Your approach is definitely cool, but by my read it is a "delete biggest first" approach, which isn't going to fly for me.

Cheers, and thanks!
-b

Go to Top of Page

setbasedisthetruepath
Used SQL Salesman

992 Posts

Posted - 2002-07-28 : 13:24:38
quote:

What I've got works, but it seems terribly inefficient.



Not to be picky but neither of the above assertions is true, really.

Your query to return the largest i which falls under the summation threshold:

delete from test
where i<(select min(i) from test t2 where (select sum(size) from test t3 where t3.i>t2.i)<70)
 
isn't correct; with the sample ddl and dml provided (btw that is incredibly helpful, so many posters omit it) the query returns 8. the correct max(i) is 3 from your data - the running sum of the first three rows is 60; the fourth i is 31 which puts you over.

what you want is something like:

select max(i)
from test t
where exists (
select 1
from test
where i <= t.i
having sum(size) < 70 )
 

and deleting all i > the max(i) above. As far as efficiency I think this approach is among the fastest you'll find.

Jonathan Boott, MCDBA
Go to Top of Page
   

- Advertisement -