Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 calculate moving averages

Author  Topic 

tm1177
Starting Member

8 Posts

Posted - 2006-03-21 : 11:56:55
Hi,

I have written a script that calculates the moving averages of volatility data from tbl A and inserts into tbl B. The data extends back around 10 years (approx 1million lines). I wish to process the moving averages of the vol data for a list of securities for each day over the time span.

For lack of a better way, I wrote a cursor to accomplish this. The workflow:

1. select stockID, date from tbl A into cursor
2. using stockID, date, select vol data (30 -180 days) using join on the same table, and calc averages
3. update tbl B with data from step 2.

Although I know cursors are not an efficient way of processing such a large set of data, I know of no other way to do this. The code:


CREATE PROCEDURE [IVolatilityUser].[CalcMovAvg_date]

AS

declare @date as smalldatetime
declare @stock_id as int
declare @avg90 as real
declare @avg180 as real
declare @avg60 as real
declare @avg30 as real

declare vol_cursor2 cursor
for
select stock_id, t_date
from dbo.RealizedVolatilities
where t_date between '2-1-2006' and '2-10-2006'

open vol_cursor2

fetch vol_cursor2 into @stock_id, @date

if (@@FETCH_STATUS = 2)
begin
print 'No rows fetched'
close vol_cursor2
return
end

while (@@FETCH_STATUS = 0)

begin
select @avg90 = avg(b.vol_90)
from dbo.RealizedVolatilities a , dbo.RealizedVolatilities b
where a.stock_id = b.stock_id
and b.t_date in (select top 90 c.t_date
from dbo.RealizedVolatilities c
where c.t_date <= @date
and c.stock_id = @stock_id
order by c.t_date desc)
and a.stock_id = @stock_id
and a.t_date = @date
group by a.stock_id, a.t_date
print @date
fetch vol_cursor2 into @stock_id,@date

select @avg180 = avg(b.vol_180)
from dbo.RealizedVolatilities a , dbo.RealizedVolatilities b
where a.stock_id = b.stock_id
and b.t_date in (select top 180 c.t_date
from dbo.RealizedVolatilities c
where c.t_date <= @date
and c.stock_id = @stock_id
order by c.t_date desc)
and a.stock_id = @stock_id
and a.t_date = @date
group by a.stock_id, a.t_date

select @avg60 = avg(b.vol_60)
from dbo.RealizedVolatilities a , dbo.RealizedVolatilities b
where a.stock_id = b.stock_id
and b.t_date in (select top 60 c.t_date
from dbo.RealizedVolatilities c
where c.t_date <= @date
and c.stock_id = @stock_id
order by c.t_date desc)
and a.stock_id = @stock_id
and a.t_date = @date
group by a.stock_id, a.t_date


select @avg30 = avg(b.vol_30)
from dbo.RealizedVolatilities a , dbo.RealizedVolatilities b
where a.stock_id = b.stock_id
and b.t_date in (select top 30 c.t_date
from dbo.RealizedVolatilities c
where c.t_date <= @date
and c.stock_id = @stock_id
order by c.t_date desc)
and a.stock_id = @stock_id
and a.t_date = @date
group by a.stock_id, a.t_date


update dbo.ImpVols set avg_180 = @avg180 where imp_stock_id = @stock_id and imp_t_date = @date
end
deallocate vol_cursor2
close vol_cursor2
GO


Using the above, it would take literally weeks to fully process the dataset!! How can I optimize this???

Many thanks in advance

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2006-03-21 : 12:05:16
You will use something called a theta-join to do this. Please post the DDL for your table.
Go to Top of Page

tm1177
Starting Member

8 Posts

Posted - 2006-03-21 : 13:25:36
DDL = table definitions?
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2006-03-21 : 13:29:27
Yup.
Go to Top of Page

tm1177
Starting Member

8 Posts

Posted - 2006-03-22 : 10:36:34
Tbl RealizedVolatilities:
vol_id int
stock_id int
t_date smalldatetime
price_close real
prior_day_close real
ln_return float
vol_30 real
vol_60 real
vol_90 real
vol_180 real

Tbl ImpVols:
imp_stock_id int
imp_t_date smalldatetime
ivol_30 real
avg_30 real
stdev_30 real
num_stdev_30 real
ivol_60 real
avg_60 real
stdev_60 real
num_stdev_60 real
ivol_90 real
avg_90 real
stdev_90 real
num_stdev_90 real
ivol_180 real
avg_180 real
stdev_180 real
num_stdev_180 real


Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2006-03-22 : 12:49:42
Here is a non-cursor method of updating avg_180 using a theta join:
update	ImpVols
set avg_180 = MovingAverages.avg_180
from ImpVols
inner join --MovingAverages
(select CurrentData.stock_id,
CurrentData.t_date,
avg(PriorData.vol_180) as avg_180
from RealizedVolatilities CurrentData
inner join RealizedVolatilities PriorData
on CurrentData.stock_id = PriorData.stock_id
and PriorData.t_date <= CurrentData.t_date
and PriorData.t_date > dateadd(day, -180, CurrentData.t_date)) MovingAverages
on ImpVols.imp_stock_id = MovingAverages.stock_id
and MovingAverages.t_date <= ImpVols.imp_t_date
and MovingAverages.t_date > dateadd(day, -180, ImpVols.imp_t_date)
No guarantees that this will be fast, though, considering the amount of data you are dealing with. You might want to do your updates in small chunks.
Go to Top of Page

tm1177
Starting Member

8 Posts

Posted - 2006-03-22 : 13:21:07
blindman, thanks for the help

are there any missing 'group by' clauses? I am getting an error associated with CurrentData.stock_id
"...invalid in the select list because it is not contained in an aggregate function and there is no GROUP BY clause"
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2006-03-22 : 17:15:00
Oops...
update	ImpVols
set avg_180 = MovingAverages.avg_180
from ImpVols
inner join --MovingAverages
(select CurrentData.stock_id,
CurrentData.t_date,
avg(PriorData.vol_180) as avg_180
from RealizedVolatilities CurrentData
inner join RealizedVolatilities PriorData
on CurrentData.stock_id = PriorData.stock_id
and PriorData.t_date <= CurrentData.t_date
and PriorData.t_date > dateadd(day, -180, CurrentData.t_date)
group by CurrentData.stock_id,
CurrentData.t_date) MovingAverages
on ImpVols.imp_stock_id = MovingAverages.stock_id
and MovingAverages.t_date <= ImpVols.imp_t_date
and MovingAverages.t_date > dateadd(day, -180, ImpVols.imp_t_date)
Go to Top of Page

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2006-03-22 : 17:24:05
so why is this called a theta join??

Go with the flow & have fun! Else fight the flow
Go to Top of Page

tm1177
Starting Member

8 Posts

Posted - 2006-03-22 : 20:07:35
i think there is a problem with the dateadd(day,-30,xxx) returns a date value that is 30 calendar days less than the current date. What i need is the average of the 30 days' worth of data, which runs more than 30 calender days, since there are only 5 trading days per week. This was the reason I was using the nested (select top30...), to retrieve the 30 most recent vols.

Is there another way to do this with specifying calender days?
Go to Top of Page

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2006-03-22 : 21:12:26
quote:
Originally posted by tm1177

i think there is a problem with the dateadd(day,-30,xxx) returns a date value that is 30 calendar days less than the current date. What i need is the average of the 30 days' worth of data, which runs more than 30 calender days, since there are only 5 trading days per week. This was the reason I was using the nested (select top30...), to retrieve the 30 most recent vols.

Is there another way to do this with specifying calender days?



You can get the last 30 weekdays like this.

The Date Table Function, F_TABLE_DATE, is available here:
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=61519


declare @start_date datetime
declare @end_date datetime
select @end_date = getdate()-1
select @start_date = @end_date-100

select top 30
Weekdays = [DATE]
from
dbo.F_TABLE_DATE ( @start_date,@end_date)
where
[DAY_OF_WEEK] between 2 and 6
order by
[Date] desc


CODO ERGO SUM
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2006-03-23 : 09:47:36
So, avg_180 means the last 180 datapoints, not the last 180 days? That is an odd business rule. You could return averages for two stocks, one of which comprises 180 data points in a single day, while the other may comprise 180 datapoints over a year. Obviously, not measuring the same thing.
Go to Top of Page

tm1177
Starting Member

8 Posts

Posted - 2006-03-23 : 09:54:36
Yeah, that's what i thought as well -- users requested that the moving avgs be of the datapoints...
Go to Top of Page

pootle_flump

1064 Posts

Posted - 2006-03-23 : 10:07:05
quote:
Originally posted by spirit1

so why is this called a theta join??

I hope that I am not about to fall into a trap and teach my grandma to suck eggs or have missed something you've spotted - if I have please go easy on me....

A Theta join is where a greater than or less than comparisson operator is used in the join rather than the usual equals. Dunno why it is called a Theta join specifically - it's all Greek to me I'm afraid
Go to Top of Page

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2006-03-23 : 10:14:33
LOL!!

Thanx for explanation.


Go with the flow & have fun! Else fight the flow
Go to Top of Page

pootle_flump

1064 Posts

Posted - 2006-03-23 : 11:12:29
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2006-03-23 : 11:42:07
It's called a Theta Join so that non-dbas will think we have superior knowledge of the subjact matter. Just like any jargon. Throw "I used a Theta join to prepare the result set" into a conversation and watch people nod their heads and shut up.
Go to Top of Page

pootle_flump

1064 Posts

Posted - 2006-03-23 : 11:49:56
Lol - If you have the audacity you could try for a full outer self theta join...
Go to Top of Page

tm1177
Starting Member

8 Posts

Posted - 2006-03-23 : 13:39:47
Blindman, your code works very well - updated entire data set for 30 day vol quite quickly. Is there a way to modify the code to avg the last 30 datapoints? I wish to include both measures.
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2006-03-23 : 13:59:22
Hideous code...
update	ImpVols
set avg_180 = MovingAverages.avg_180
from ImpVols
inner join --MovingAverages
(select CurrentData.stock_id,
CurrentData.t_date,
avg(PriorData.vol_180) as avg_180
from RealizedVolatilities CurrentData
inner join RealizedVolatilities PriorData
on CurrentData.stock_id = PriorData.stock_id
where PriorData.t_date in
(select top 180 t_date
from RealizedVolatilities DataFilter
where DataFilter.stock_id = CurrentData.stock_id
and DataFilter.t_date <= CurrentData.t_date)
group by CurrentData.stock_id,
CurrentData.t_date) MovingAverages
on ImpVols.imp_stock_id = MovingAverages.stock_id
and MovingAverages.t_date <= ImpVols.imp_t_date
and MovingAverages.t_date > dateadd(day, -180, ImpVols.imp_t_date)
My advice? Save my original query, 'cause I'm betting when your users start viewing this output their response will be "Yeah that's what we asked for, but it's not what we want..."
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2006-03-23 : 16:04:16
quote:
Originally posted by spirit1

so why is this called a theta join??

Go with the flow & have fun! Else fight the flow



I usually use the term:
"non-equi join"

"theta join"
hmmm, like the taste of that

rockmoose
Go to Top of Page
    Next Page

- Advertisement -