| Author |
Topic |
|
tm1177
Starting Member
8 Posts |
Posted - 2006-03-21 : 11:56:55
|
Hi,I have written a script that calculates the moving averages of volatility data from tbl A and inserts into tbl B. The data extends back around 10 years (approx 1million lines). I wish to process the moving averages of the vol data for a list of securities for each day over the time span. For lack of a better way, I wrote a cursor to accomplish this. The workflow:1. select stockID, date from tbl A into cursor2. using stockID, date, select vol data (30 -180 days) using join on the same table, and calc averages3. update tbl B with data from step 2.Although I know cursors are not an efficient way of processing such a large set of data, I know of no other way to do this. The code:CREATE PROCEDURE [IVolatilityUser].[CalcMovAvg_date]ASdeclare @date as smalldatetimedeclare @stock_id as intdeclare @avg90 as realdeclare @avg180 as realdeclare @avg60 as realdeclare @avg30 as realdeclare vol_cursor2 cursorfor select stock_id, t_datefrom dbo.RealizedVolatilitieswhere t_date between '2-1-2006' and '2-10-2006'open vol_cursor2fetch vol_cursor2 into @stock_id, @dateif (@@FETCH_STATUS = 2)begin print 'No rows fetched' close vol_cursor2 returnend while (@@FETCH_STATUS = 0)begin select @avg90 = avg(b.vol_90) from dbo.RealizedVolatilities a , dbo.RealizedVolatilities b where a.stock_id = b.stock_id and b.t_date in (select top 90 c.t_date from dbo.RealizedVolatilities c where c.t_date <= @date and c.stock_id = @stock_id order by c.t_date desc) and a.stock_id = @stock_id and a.t_date = @date group by a.stock_id, a.t_date print @date fetch vol_cursor2 into @stock_id,@dateselect @avg180 = avg(b.vol_180) from dbo.RealizedVolatilities a , dbo.RealizedVolatilities b where a.stock_id = b.stock_id and b.t_date in (select top 180 c.t_date from dbo.RealizedVolatilities c where c.t_date <= @date and c.stock_id = @stock_id order by c.t_date desc) and a.stock_id = @stock_id and a.t_date = @date group by a.stock_id, a.t_dateselect @avg60 = avg(b.vol_60) from dbo.RealizedVolatilities a , dbo.RealizedVolatilities b where a.stock_id = b.stock_id and b.t_date in (select top 60 c.t_date from dbo.RealizedVolatilities c where c.t_date <= @date and c.stock_id = @stock_id order by c.t_date desc) and a.stock_id = @stock_id and a.t_date = @date group by a.stock_id, a.t_dateselect @avg30 = avg(b.vol_30) from dbo.RealizedVolatilities a , dbo.RealizedVolatilities b where a.stock_id = b.stock_id and b.t_date in (select top 30 c.t_date from dbo.RealizedVolatilities c where c.t_date <= @date and c.stock_id = @stock_id order by c.t_date desc) and a.stock_id = @stock_id and a.t_date = @date group by a.stock_id, a.t_dateupdate dbo.ImpVols set avg_180 = @avg180 where imp_stock_id = @stock_id and imp_t_date = @dateenddeallocate vol_cursor2close vol_cursor2GO Using the above, it would take literally weeks to fully process the dataset!! How can I optimize this???Many thanks in advance |
|
|
blindman
Master Smack Fu Yak Hacker
2365 Posts |
Posted - 2006-03-21 : 12:05:16
|
| You will use something called a theta-join to do this. Please post the DDL for your table. |
 |
|
|
tm1177
Starting Member
8 Posts |
Posted - 2006-03-21 : 13:25:36
|
| DDL = table definitions? |
 |
|
|
blindman
Master Smack Fu Yak Hacker
2365 Posts |
Posted - 2006-03-21 : 13:29:27
|
| Yup. |
 |
|
|
tm1177
Starting Member
8 Posts |
Posted - 2006-03-22 : 10:36:34
|
| Tbl RealizedVolatilities:vol_id intstock_id intt_date smalldatetimeprice_close realprior_day_close realln_return floatvol_30 realvol_60 realvol_90 realvol_180 realTbl ImpVols:imp_stock_id intimp_t_date smalldatetimeivol_30 realavg_30 realstdev_30 realnum_stdev_30 realivol_60 realavg_60 realstdev_60 realnum_stdev_60 realivol_90 realavg_90 realstdev_90 realnum_stdev_90 realivol_180 realavg_180 realstdev_180 realnum_stdev_180 real |
 |
|
|
blindman
Master Smack Fu Yak Hacker
2365 Posts |
Posted - 2006-03-22 : 12:49:42
|
Here is a non-cursor method of updating avg_180 using a theta join:update ImpVolsset avg_180 = MovingAverages.avg_180from ImpVols inner join --MovingAverages (select CurrentData.stock_id, CurrentData.t_date, avg(PriorData.vol_180) as avg_180 from RealizedVolatilities CurrentData inner join RealizedVolatilities PriorData on CurrentData.stock_id = PriorData.stock_id and PriorData.t_date <= CurrentData.t_date and PriorData.t_date > dateadd(day, -180, CurrentData.t_date)) MovingAverages on ImpVols.imp_stock_id = MovingAverages.stock_id and MovingAverages.t_date <= ImpVols.imp_t_date and MovingAverages.t_date > dateadd(day, -180, ImpVols.imp_t_date) No guarantees that this will be fast, though, considering the amount of data you are dealing with. You might want to do your updates in small chunks. |
 |
|
|
tm1177
Starting Member
8 Posts |
Posted - 2006-03-22 : 13:21:07
|
| blindman, thanks for the helpare there any missing 'group by' clauses? I am getting an error associated with CurrentData.stock_id"...invalid in the select list because it is not contained in an aggregate function and there is no GROUP BY clause" |
 |
|
|
blindman
Master Smack Fu Yak Hacker
2365 Posts |
Posted - 2006-03-22 : 17:15:00
|
Oops...update ImpVolsset avg_180 = MovingAverages.avg_180from ImpVols inner join --MovingAverages (select CurrentData.stock_id, CurrentData.t_date, avg(PriorData.vol_180) as avg_180 from RealizedVolatilities CurrentData inner join RealizedVolatilities PriorData on CurrentData.stock_id = PriorData.stock_id and PriorData.t_date <= CurrentData.t_date and PriorData.t_date > dateadd(day, -180, CurrentData.t_date) group by CurrentData.stock_id, CurrentData.t_date) MovingAverages on ImpVols.imp_stock_id = MovingAverages.stock_id and MovingAverages.t_date <= ImpVols.imp_t_date and MovingAverages.t_date > dateadd(day, -180, ImpVols.imp_t_date) |
 |
|
|
spirit1
Cybernetic Yak Master
11752 Posts |
Posted - 2006-03-22 : 17:24:05
|
so why is this called a theta join??Go with the flow & have fun! Else fight the flow |
 |
|
|
tm1177
Starting Member
8 Posts |
Posted - 2006-03-22 : 20:07:35
|
| i think there is a problem with the dateadd(day,-30,xxx) returns a date value that is 30 calendar days less than the current date. What i need is the average of the 30 days' worth of data, which runs more than 30 calender days, since there are only 5 trading days per week. This was the reason I was using the nested (select top30...), to retrieve the 30 most recent vols. Is there another way to do this with specifying calender days? |
 |
|
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2006-03-22 : 21:12:26
|
quote: Originally posted by tm1177 i think there is a problem with the dateadd(day,-30,xxx) returns a date value that is 30 calendar days less than the current date. What i need is the average of the 30 days' worth of data, which runs more than 30 calender days, since there are only 5 trading days per week. This was the reason I was using the nested (select top30...), to retrieve the 30 most recent vols. Is there another way to do this with specifying calender days?
You can get the last 30 weekdays like this.The Date Table Function, F_TABLE_DATE, is available here:http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=61519declare @start_date datetimedeclare @end_date datetimeselect @end_date = getdate()-1select @start_date = @end_date-100select top 30 Weekdays = [DATE]from dbo.F_TABLE_DATE ( @start_date,@end_date)where [DAY_OF_WEEK] between 2 and 6order by [Date] desc CODO ERGO SUM |
 |
|
|
blindman
Master Smack Fu Yak Hacker
2365 Posts |
Posted - 2006-03-23 : 09:47:36
|
| So, avg_180 means the last 180 datapoints, not the last 180 days? That is an odd business rule. You could return averages for two stocks, one of which comprises 180 data points in a single day, while the other may comprise 180 datapoints over a year. Obviously, not measuring the same thing. |
 |
|
|
tm1177
Starting Member
8 Posts |
Posted - 2006-03-23 : 09:54:36
|
| Yeah, that's what i thought as well -- users requested that the moving avgs be of the datapoints... |
 |
|
|
pootle_flump
1064 Posts |
Posted - 2006-03-23 : 10:07:05
|
quote: Originally posted by spirit1 so why is this called a theta join??
I hope that I am not about to fall into a trap and teach my grandma to suck eggs or have missed something you've spotted - if I have please go easy on me....A Theta join is where a greater than or less than comparisson operator is used in the join rather than the usual equals. Dunno why it is called a Theta join specifically - it's all Greek to me I'm afraid |
 |
|
|
spirit1
Cybernetic Yak Master
11752 Posts |
Posted - 2006-03-23 : 10:14:33
|
LOL!! Thanx for explanation.Go with the flow & have fun! Else fight the flow |
 |
|
|
pootle_flump
1064 Posts |
Posted - 2006-03-23 : 11:12:29
|
|
 |
|
|
blindman
Master Smack Fu Yak Hacker
2365 Posts |
Posted - 2006-03-23 : 11:42:07
|
| It's called a Theta Join so that non-dbas will think we have superior knowledge of the subjact matter. Just like any jargon. Throw "I used a Theta join to prepare the result set" into a conversation and watch people nod their heads and shut up. |
 |
|
|
pootle_flump
1064 Posts |
Posted - 2006-03-23 : 11:49:56
|
| Lol - If you have the audacity you could try for a full outer self theta join... |
 |
|
|
tm1177
Starting Member
8 Posts |
Posted - 2006-03-23 : 13:39:47
|
| Blindman, your code works very well - updated entire data set for 30 day vol quite quickly. Is there a way to modify the code to avg the last 30 datapoints? I wish to include both measures. |
 |
|
|
blindman
Master Smack Fu Yak Hacker
2365 Posts |
Posted - 2006-03-23 : 13:59:22
|
Hideous code...update ImpVolsset avg_180 = MovingAverages.avg_180from ImpVols inner join --MovingAverages (select CurrentData.stock_id, CurrentData.t_date, avg(PriorData.vol_180) as avg_180 from RealizedVolatilities CurrentData inner join RealizedVolatilities PriorData on CurrentData.stock_id = PriorData.stock_id where PriorData.t_date in (select top 180 t_date from RealizedVolatilities DataFilter where DataFilter.stock_id = CurrentData.stock_id and DataFilter.t_date <= CurrentData.t_date) group by CurrentData.stock_id, CurrentData.t_date) MovingAverages on ImpVols.imp_stock_id = MovingAverages.stock_id and MovingAverages.t_date <= ImpVols.imp_t_date and MovingAverages.t_date > dateadd(day, -180, ImpVols.imp_t_date) My advice? Save my original query, 'cause I'm betting when your users start viewing this output their response will be "Yeah that's what we asked for, but it's not what we want..." |
 |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2006-03-23 : 16:04:16
|
quote: Originally posted by spirit1 so why is this called a theta join??Go with the flow & have fun! Else fight the flow 
I usually use the term:"non-equi join""theta join"hmmm, like the taste of that rockmoose |
 |
|
|
Next Page
|