Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
ghastings
Starting Member
7 Posts |
Posted - 2009-06-11 : 08:58:37
|
A brief overview of what I'm trying to do first:I have a database table that stores customer call information. I have a reporting database with a table that stores how many inbound/outbound calls each customer had at every second of every day. The reason for this table is so we can produce graphs showing how many concurrent calls a particular customer had at any point in time.When I first looked at the query that generated the reporting information it had been written using temporary tables and cursors. It was horrible and took hours and hours to run.I have re-written it so it uses no temp tables or cursors. The query now runs for a single day in about 30 minutes. This to me still seems horribly slow.The main points regarding this query are1. A call starting at 09:00:00 and finishing at 09:01:00 will have 60 entries in the reporting table because the table stores 1 row for each second of a call. Why? Well if you query the reporting table for a single second, say 09:00:30 we can see that there was 1 call live.2. A call can't last longer than 24 hours. This means if we're checking between time periods then we should start looking at calls 24 hours prior to the start date.EG. Start date is 2009-06-11 09:00:00. It's possible a call started at 2009-06-11 07:30:00 and didn't end until 2009-06-11 11:00:00Currently I can NOT change the call information database however I would be able to change the reporting database if needed. Ultimately I'd love to be able to ditch the overnight reporting data generation and be able to generate the data on the fly!Table DefsI have simplified the call information table, it has quite a few rows that are not relavent to this queryCall tableCREATE TABLE [dbo].[calls]( [cl_customer] [int] NOT NULL, [cl_datetime] [datetime] NOT NULL, [cl_type] [nvarchar](1) NOT NULL, [cl_length] [int] NOT NULL) ON [PRIMARY]CREATE CLUSTERED INDEX [CX_DATETIME] ON [dbo].[calls] ( [cl_datetime] ASC)CREATE TABLE [dbo].[report]( [rp_in] [int] NOT NULL CONSTRAINT [DF_report_rp_in] DEFAULT ((0)), [rp_out] [int] NOT NULL CONSTRAINT [DF_report_rp_out] DEFAULT ((0)), [rp_total] [int] NOT NULL CONSTRAINT [DF_report_rp_total] DEFAULT ((0)), [rp_customer] [int] NOT NULL CONSTRAINT [DF_report_rp_customer] DEFAULT ((0)), [rp_period] [datetime] NOT NULL) ON [PRIMARY] Very simply the calls table doesn't have an end time field, it has to be calculated using dateadd(s, cl_length, cl_datetime).The report table just stores 1 rowA call type of 'O' is an outbound call, I'll just use this one for now.The reporting table is supposed to contain at minumum 1 row for every second of the day. More rows per time slice per call per customerTest DataNow for some randomish data for the calls table.declare @i as intdeclare @currenttime as datetimedeclare @secs as intselect @i = 0, @currenttime = '2009-06-11 07:00:00'while @i < 43200begin if round((rand() * 20) + 1, 0) = 20 begin insert into calls values (1, @currenttime, 'o', round((rand() * 1800) + 1, 0)) end set @currenttime = dateadd(s, 1, @currenttime) set @i = @i + 1end For each second between 7am and 7pm there is a 5% chance a call is being made. The call duration is random from 1-1800 seconds.Current procedure - the issue!Now the procedure that imports this data into the reporting database table isdeclare @startdate as datetime, @enddate as datetimeselect @startdate = '2009-06-11 00:00:00', @enddate = '2009-06-11 23:59:59.997'declare @bits as intselect @bits = datediff(s, @startdate, @enddate) / 1insert into reportselect isnull(sum([in]),0) as [in], isnull(sum(out),0) as [out], isnull(sum(total),0) as total, isnull(cl_customer, 0) as [customer], numberfrom ( select number, [in], [out], ([in] + [out]) as total, cl_customer from ( select dateadd(s, number, @startdate) as number from dbo.F_TABLE_NUMBER_RANGE(1,@bits) ) as tints left join ( select case when cl_type = 'i' then 1 else 0 end as [in], case when cl_type = 'o' then 1 else 0 end as [out], cl_datetime as start, cl_dateend as finish, cl_customer from ( select cl_customer, cl_datetime, dateadd(s, cl_length, cl_datetime) as cl_dateend, cl_type from calls where cl_datetime between dateadd(d, -1, @startdate) and @enddate ) as calls where cl_datetime between @startdate and @enddate or cl_dateend between @startdate and @enddate ) tt on (tints.number between start and finish) ) tttgroup by number, cl_customerorder by number Now I've been looking at this for hours and I can't think of any other way to write it. Can anyone give me any hints or suggestions please? |
|
ghastings
Starting Member
7 Posts |
Posted - 2009-06-11 : 09:01:57
|
Actually I have just managed to cut the execution time down by ~60%. Instead of using dbo.F_TABLE_NUMBER_RANGE I am using a pre-generated numbers table. That line now looks like thisselect dateadd(s, number, @startdate) as number from number where number between 1 and @bits The execution time for my random dataset is now around 1 minute 8 seconds. Down from 3 mins |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2009-06-11 : 09:44:38
|
First of all, create a permanent TallyNumbers table with 86400 records, ranging from 0 to 86399.CREATE TABLE TallyNumbers ( Number INT PRIMARY KEY CLUSTERED )INSERT TallyNumbers ( Number )SELECT NumberFROM dbo.F_TABLE_NUMBER_RANGE(0, 86399)ORDER BY Number With this permanent tally table in place, use this code for faster and better accuracy!DECLARE @StartDate DATETIME, @EndDate DATETIMESELECT @StartDate = '20090611', @EndDate = '20090612'SELECT DATEADD(SECOND, s.Number, @StartDate) AS theMinute, w.cl_Customer, SUM(CASE WHEN w.cl_Type = 'i' THEN 1 ELSE 0 END) AS [in], SUM(CASE WHEN w.cl_Type = 'o' THEN 1 ELSE 0 END) AS [out], COUNT(w.theSecond) AS TotalFROM TallyNumbers AS sLEFT JOIN ( SELECT cl_Customer, cl_Type, DATEDIFF(SECOND, DATEDIFF(DAY, 0, cl_DateTime), cl_DateTime) AS theSecond, cl_Length FROM Calls WHERE cl_DateTime < @EndDate AND DATEADD(SECOND, cl_Length - 1, cl_DateTime) >= @StartDate ) AS w ON w.theSecond <= s.Number AND w.theSecond + w.cl_Length > s.NumberGROUP BY DATEADD(SECOND, s.Number, @StartDate), w.cl_Customer Your suggestion runs in 2 minutes and 43 seconds on my laptop, and my suggestion runs in 1 minute and 22 seconds.About half the time.This is mainly because your suggestion includes all calls starting up to a day before @StartDate even if they end before @StartDate. E 12°55'05.63"N 56°04'39.26" |
|
|
ghastings
Starting Member
7 Posts |
Posted - 2009-06-12 : 04:37:48
|
Thanks Peso,Running your query on my dataset knocks off another 20 seconds! |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2009-06-12 : 04:46:18
|
20 seconds reduction from sample data posted above (1 min 8 sec ~ 30% reduction), or production data (30+ minutes ~ 1% reduction)? E 12°55'05.63"N 56°04'39.26" |
|
|
|
|
|
|
|