Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Transact-SQL (2000)
 Super slow query, need help please

Author  Topic 

ghastings
Starting Member

7 Posts

Posted - 2009-06-11 : 08:58:37
A brief overview of what I'm trying to do first:

I have a database table that stores customer call information.
I have a reporting database with a table that stores how many inbound/outbound calls each customer had at every second of every day. The reason for this table is so we can produce graphs showing how many concurrent calls a particular customer had at any point in time.

When I first looked at the query that generated the reporting information it had been written using temporary tables and cursors. It was horrible and took hours and hours to run.

I have re-written it so it uses no temp tables or cursors. The query now runs for a single day in about 30 minutes. This to me still seems horribly slow.

The main points regarding this query are

1. A call starting at 09:00:00 and finishing at 09:01:00 will have 60 entries in the reporting table because the table stores 1 row for each second of a call. Why? Well if you query the reporting table for a single second, say 09:00:30 we can see that there was 1 call live.

2. A call can't last longer than 24 hours. This means if we're checking between time periods then we should start looking at calls 24 hours prior to the start date.

EG. Start date is 2009-06-11 09:00:00. It's possible a call started at 2009-06-11 07:30:00 and didn't end until 2009-06-11 11:00:00

Currently I can NOT change the call information database however I would be able to change the reporting database if needed. Ultimately I'd love to be able to ditch the overnight reporting data generation and be able to generate the data on the fly!

Table Defs

I have simplified the call information table, it has quite a few rows that are not relavent to this query

Call table

CREATE TABLE [dbo].[calls](
[cl_customer] [int] NOT NULL,
[cl_datetime] [datetime] NOT NULL,
[cl_type] [nvarchar](1) NOT NULL,
[cl_length] [int] NOT NULL
) ON [PRIMARY]

CREATE CLUSTERED INDEX [CX_DATETIME] ON [dbo].[calls]
(
[cl_datetime] ASC
)

CREATE TABLE [dbo].[report](
[rp_in] [int] NOT NULL CONSTRAINT [DF_report_rp_in] DEFAULT ((0)),
[rp_out] [int] NOT NULL CONSTRAINT [DF_report_rp_out] DEFAULT ((0)),
[rp_total] [int] NOT NULL CONSTRAINT [DF_report_rp_total] DEFAULT ((0)),
[rp_customer] [int] NOT NULL CONSTRAINT [DF_report_rp_customer] DEFAULT ((0)),
[rp_period] [datetime] NOT NULL
) ON [PRIMARY]


Very simply the calls table doesn't have an end time field, it has to be calculated using dateadd(s, cl_length, cl_datetime).
The report table just stores 1 row

A call type of 'O' is an outbound call, I'll just use this one for now.

The reporting table is supposed to contain at minumum 1 row for every second of the day. More rows per time slice per call per customer

Test Data


Now for some randomish data for the calls table.


declare @i as int
declare @currenttime as datetime
declare @secs as int

select @i = 0,
@currenttime = '2009-06-11 07:00:00'

while @i < 43200
begin
if round((rand() * 20) + 1, 0) = 20
begin
insert into calls values (1, @currenttime, 'o', round((rand() * 1800) + 1, 0))
end

set @currenttime = dateadd(s, 1, @currenttime)
set @i = @i + 1
end


For each second between 7am and 7pm there is a 5% chance a call is being made. The call duration is random from 1-1800 seconds.

Current procedure - the issue!
Now the procedure that imports this data into the reporting database table is


declare @startdate as datetime, @enddate as datetime
select @startdate = '2009-06-11 00:00:00',
@enddate = '2009-06-11 23:59:59.997'

declare @bits as int
select @bits = datediff(s, @startdate, @enddate) / 1

insert into report
select
isnull(sum([in]),0) as [in],
isnull(sum(out),0) as [out],
isnull(sum(total),0) as total,
isnull(cl_customer, 0) as [customer],
number
from
(
select
number,
[in],
[out],
([in] + [out]) as total,
cl_customer
from
(
select dateadd(s, number, @startdate) as number from dbo.F_TABLE_NUMBER_RANGE(1,@bits)
) as tints
left join
(
select
case when cl_type = 'i' then 1 else 0 end as [in],
case when cl_type = 'o' then 1 else 0 end as [out],
cl_datetime as start,
cl_dateend as finish,
cl_customer
from
(
select
cl_customer,
cl_datetime,
dateadd(s, cl_length, cl_datetime) as cl_dateend,
cl_type
from calls
where
cl_datetime between dateadd(d, -1, @startdate) and @enddate
) as calls
where
cl_datetime between @startdate and @enddate or
cl_dateend between @startdate and @enddate
) tt on (tints.number between start and finish)
) ttt
group by
number,
cl_customer
order by
number


Now I've been looking at this for hours and I can't think of any other way to write it. Can anyone give me any hints or suggestions please?

ghastings
Starting Member

7 Posts

Posted - 2009-06-11 : 09:01:57
Actually I have just managed to cut the execution time down by ~60%. Instead of using dbo.F_TABLE_NUMBER_RANGE I am using a pre-generated numbers table. That line now looks like this


select dateadd(s, number, @startdate) as number from number where number between 1 and @bits


The execution time for my random dataset is now around 1 minute 8 seconds. Down from 3 mins
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-06-11 : 09:15:13
You should have read this blog post
http://weblogs.sqlteam.com/peterl/archive/2009/01/30/Alternative-approach-to-calculate-most-used-time-interval.aspx



E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-06-11 : 09:44:38
First of all, create a permanent TallyNumbers table with 86400 records, ranging from 0 to 86399.
CREATE TABLE	TallyNumbers
(
Number INT PRIMARY KEY CLUSTERED
)

INSERT TallyNumbers
(
Number
)
SELECT Number
FROM dbo.F_TABLE_NUMBER_RANGE(0, 86399)
ORDER BY Number
With this permanent tally table in place, use this code for faster and better accuracy!
DECLARE	@StartDate DATETIME,
@EndDate DATETIME

SELECT @StartDate = '20090611',
@EndDate = '20090612'

SELECT DATEADD(SECOND, s.Number, @StartDate) AS theMinute,
w.cl_Customer,
SUM(CASE WHEN w.cl_Type = 'i' THEN 1 ELSE 0 END) AS [in],
SUM(CASE WHEN w.cl_Type = 'o' THEN 1 ELSE 0 END) AS [out],
COUNT(w.theSecond) AS Total
FROM TallyNumbers AS s
LEFT JOIN (
SELECT cl_Customer,
cl_Type,
DATEDIFF(SECOND, DATEDIFF(DAY, 0, cl_DateTime), cl_DateTime) AS theSecond,
cl_Length
FROM Calls
WHERE cl_DateTime < @EndDate
AND DATEADD(SECOND, cl_Length - 1, cl_DateTime) >= @StartDate
) AS w ON w.theSecond <= s.Number
AND w.theSecond + w.cl_Length > s.Number
GROUP BY DATEADD(SECOND, s.Number, @StartDate),
w.cl_Customer
Your suggestion runs in 2 minutes and 43 seconds on my laptop, and my suggestion runs in 1 minute and 22 seconds.
About half the time.

This is mainly because your suggestion includes all calls starting up to a day before @StartDate even if they end before @StartDate.


E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

ghastings
Starting Member

7 Posts

Posted - 2009-06-12 : 04:37:48
Thanks Peso,

Running your query on my dataset knocks off another 20 seconds!
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2009-06-12 : 04:46:18
20 seconds reduction from sample data posted above (1 min 8 sec ~ 30% reduction), or production data (30+ minutes ~ 1% reduction)?


E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page
   

- Advertisement -