Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 Best way to delete row regularly?

Author  Topic 

aiken
Aged Yak Warrior

525 Posts

Posted - 2004-09-08 : 17:34:44
This has to be a common issue. I've got a DB server that does nothing more than logging for a web app. Every night at midnight, the previous days's logs are compiled into aggregate statistics, and are then useless.

My problem is this: we're talking about 1.5-2 million rows/day, or an average of about 20/second. Right now, I've got a batch job doing a rolling delete of records >25 hours old once an hour, but that job itself is starting to cause excessive locking of the log table and therefore slower performance in the web app.

Clearly, I could move that job to running once a minute, or just once a day (ouch). But I'm really not sure how to forecast the impact either choice would have, and ultimately how to come up with the most efficient schedule for this job.

Any advice on how to manage this balancing act? I don't think truncate table is an option, since at 12:00am I need to run the aggregations, which happen in a few jobs that finish at about 12:10am, and I don't want to lose those 10 minutes of data.

Thanks
-b

nr
SQLTeam MVY

12543 Posts

Posted - 2004-09-08 : 17:38:14
Do yo uhave a quite time? If so run te delete then.
Otherwise maybe have two or three tables combined via a partitioned view. You can use convert(int,getdate())%3 or something like that to split the data between the tables then deleting a days worth becomes just truncating the table.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

jen
Master Smack Fu Yak Hacker

4110 Posts

Posted - 2004-09-08 : 23:59:36
how about scheduling the job on a per hour basis? you will just have one hour of data to delete then.

actually 1.5 million after the end of the day isn't that bad. we're doing 20 million a day and takes 2 hours max to complete, scheduled during off peak hours.

it might also help if you have an index on the table
Go to Top of Page

timmy
Master Smack Fu Yak Hacker

1242 Posts

Posted - 2004-09-09 : 00:34:01
You can break up the DELETE's like this:
DELETE FROM logTable
WHERE logID IN (SELECT TOP 10000 logID FROM logTable WHERE {insert delete rule here} )

You can recursively call this statement until there is nothing left to delete. I find this greatly increases the speed in which I can delete lots of records.
And, to take on Nigel's suggestion, create an SQL Agent job to run this script when the db is reasonably idle.
Go to Top of Page

aiken
Aged Yak Warrior

525 Posts

Posted - 2004-09-09 : 13:24:04
Thanks for the excellent ideas. There really isn't a quiet time; there are slower times, but the slowest time of day sees about 70% of the traffic level of the busiest time of day, so it's still not OK to do big slow queries then. There is indeed an index on the smalldatetime column that determines a row's deletability.

timmy, wouldn't that be even easier on the DB if you did:

set rowcount 10000
delete from table where ....

My problem is that I can't estimate that 10000 number; if I'm wrong, it will run slower than the rows pile up, and if I get the number right, I might as well just do a plain old delete.

I think nr's on to something here, but I'm not sure how to manage it. It seems like I could just switch back and forth between two tables based on some sort of dateserial mod 2, and then combine them with the partitioned view for reporting purposes. I'll take a look at that.

Cheers
-b
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2004-09-09 : 17:18:48
I guess this is pure heap so defragmentation shouldn't be a problem....

How about a magnet...they're good on deletes...no logs..no worries...

I'm think the answer lies in a different config...like a partition view, each view on a separate physical drive...

Don't you get any maint window?

How do you do releases?

And I like the more frequently scheduled deletes if it's a heap....say ever 10 minutes?



Brett

8-)
Go to Top of Page

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2004-09-10 : 06:56:08
Why not create a new table every day, and when the aggregation on the previous days data is finished just truncate it and drop the whole thing...?

--
Lumbago
"Real programmers don't document, if it was hard to write it should be hard to understand"
Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2004-09-10 : 07:50:21
Because creating and dropping tables is not something that should be done lightly.


==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2004-09-10 : 07:57:19
Hm, why is that? I have never done it and will probably not in a while but it would be good to know. An alternative could be to have one table for each day of the week or month. Then no dropping would be necessary (da**, I can never spell that word!) but you could still truncate the table when you are done...

--
Lumbago
"Real programmers don't document, if it was hard to write it should be hard to understand"
Go to Top of Page

AndrewMurphy
Master Smack Fu Yak Hacker

2916 Posts

Posted - 2004-09-11 : 07:25:40
"why not"....because a load of dependancies currently in place may not get recreated!!
also...linked tables....via constraints....would have to be taken-into account during/before the drop....dropping 1 table, might mean affecting 5-10-20....a simple problem, could spiral out of control fast
Go to Top of Page

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2004-09-11 : 07:38:39
Not to be rude or anything but you gotta read the question: "I've got a DB server that does nothing more than logging for a web app. Every night at midnight, the previous days's logs are compiled into aggregate statistics, and are then useless". If creating/dropping tables was a part of your application you would have to be a total moron not to handle dependencies at the same time. I didn't even take this into account as I considered it to be obvious. Logging for a web application requires one single table, and with an average of 20 inserts/sec my guess is that there aren't too many constraints/dependencies...

--
Lumbago
"Real programmers don't document, if it was hard to write it should be hard to understand"
Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2004-09-11 : 10:56:53
Because sql server will probably allocate a new object id and force the recompilation of queries involved.
Or possibly reuse the same object id not force recompilation and possibly end up with a corrupt database.

This was very common in v6.5 if you didn't restart the server after an object change - not as common now.
I certainly wouldn't try it on a system that has as frequent updates as this.

It worries me when I see people writing dts packages that drop and recreate tables on every run - often followed by an error when they try to change the structure.

One table per day or whatever is what I suggested earlier.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2004-09-12 : 04:44:24
"set rowcount 10000
delete from table where ....
"

I reckon that's the right approach, but if there is a risk that even that could be too invasive then perhaps

DECLARE @MyLoopCounter int
SELECT @MyLoopCounter = 10
WHILE @MyLoopCounter > 0
BEGIN
SET ROWCOUNT 1000
delete from table where ....
SET ROWCOUNT 0
WAITFOR DELAY '000:00:05'
SELECT @MyLoopCounter = @MyLoopCounter - 1
END

might be less stress on the system

I have never tried nr's suggest of tables for different days, but that sounds a good idea too. You could just TRUNCATE yesterday's table after the stats have been built - so I think I would have two tables, and something that recorded which was "Current" and which was "Reserve". Switch the Current to the other table at midnight, produce the stats from the Reserve, then truncate the Reserve ready for tomorrow ...

Kristen
Go to Top of Page
   

- Advertisement -