Author |
Topic |
BP77
Starting Member
7 Posts |
Posted - 2010-12-12 : 14:53:30
|
Hi,I dont know if this possible but here's what im trying to do.I want to scale values in my query for input to a nueral network using a single function call.The formula for scaling is (x - min(x)) / (max(x)-min(x)) so if we have data503010our result would be50 = (50-10) / (50-10) = 130 = (30-10) / (50-10) = 20/40 = .510 = (10-10) / (50-10) = 0/40 = 0Simple enough so for, but I want to 'partition' this result within ,lets say customers, like this...table custcustid, data1 , 501 , 301 , 102 , 40002 , 600So I want to write a function that I could use like..select custid, MyScaleFunc(data) over (partition by custid) fromcustits important to have a neat function call like this because 'data' is often a complicated subquery or calculation.Iver tried to write a user defined aggregate function but I couldnt get it right. I dont think its possible with a UDAF because Im tring to return a different value for each row.Any help or suggestions are appreciated,Thanks,BP |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2010-12-12 : 16:35:53
|
I was first tempted to answer you should write a SQLCLR, but you can do this with a windowed function too.DECLARE @Sample TABLE ( CustID INT, Data INT )INSERT @SampleSELECT 1, 50 UNION ALLSELECT 1, 30 UNION ALLSELECT 1, 10 UNION ALLSELECT 2, 4000 UNION ALLSELECT 2, 600-- Solution by PesoSELECT CustID, Data, 1E * (Data - mn) / (mx - mn) AS YakFROM ( SELECT CustID, Data, MAX(Data) OVER (PARTITION BY CustID) AS mx, MIN(Data) OVER (PARTITION BY CustID) AS mn FROM @Sample ) AS d N 56°04'39.26"E 12°55'05.63" |
 |
|
BP77
Starting Member
7 Posts |
Posted - 2010-12-12 : 17:52:01
|
Thanks for that Peso,I appreciate the reply.However it doesnt meet my main requirement of being a simple function call. In your example (and all my attempts) we still need to repeat 'data' 3 times. This is a problem because 'data' is often a subquery or complicated calculation. So to make my query maintainable I need to write it in the form...select myScalingFunction(<big complication subquery>) over (partition by customer) as data1,myScalingFunction(<big complication calculation>) over (partition by customer) data2 ,...myScalingFunction() data50 from ...I will be experimenting over many iterations to find the best inputs for my data mining task so these subqueries and calculations will be tweaked over and over. Rapid turnaround of each experiment is vital.You mention sqlclr, I tried to create a UDA but failed because I could not get it to return a different value for each row within the 'partition by'. If you could come up with a clr solution or any pointers Id be very grateful.Thanks again for your effort,BP |
 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2010-12-12 : 23:51:26
|
I think you are confused about how and when "Data" is calculated.SELECT CustID, Data, 1E * (Data - mn) / (mx - mn) AS YakFROM ( SELECT CustID, Data, MAX(Data) OVER (PARTITION BY CustID) AS mx, MIN(Data) OVER (PARTITION BY CustID) AS mn FROM ( SELECT dbo.MyFunction(Col1) AS Data, CustID FROM @Sample ) AS q ) AS d N 56°04'39.26"E 12°55'05.63" |
 |
|
BP77
Starting Member
7 Posts |
Posted - 2010-12-13 : 16:13:48
|
yes, im very confused, where does "myFunction" come from? |
 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2010-12-14 : 00:57:38
|
It is whatever complex calculation you have... N 56°04'39.26"E 12°55'05.63" |
 |
|
BP77
Starting Member
7 Posts |
Posted - 2010-12-14 : 17:11:00
|
I see. But if I want to have another column scaled wont I have to do this...SELECT CustID, Data, 1E * (Data - mn) / (mx - mn) AS Yak 1E * (Data2 - mn2) / (mx2 - mn2) AS Yak2FROM ( SELECT CustID, Data, MAX(Data) OVER (PARTITION BY CustID) AS mx, MIN(Data) OVER (PARTITION BY CustID) AS mn, Data2, MAX(Data2) OVER (PARTITION BY CustID) AS mx2, MIN(Data2) OVER (PARTITION BY CustID) AS mn2 FROM ( SELECT dbo.MyFunction(Col1) AS Data, dbo.MyOtherFunction(Col2) AS Data2, CustID FROM @Sample ) AS q ) AS dIf I have 50+ columns in my constantly changing query this looks difficult to maintain.Sorry Im taking so long to reply, btw. |
 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2010-12-15 : 01:53:26
|
I am sorry to tell you, but if you have 50+ columns, there is something wrong with the database design.Your design should be as normalized as possible.And what do you mean with "constantly changing"? N 56°04'39.26"E 12°55'05.63" |
 |
|
BP77
Starting Member
7 Posts |
Posted - 2010-12-15 : 16:19:19
|
My design is normalized. As I said earlier this query will be the input for a data mining task, which, as im sure you know, requires denormalized data. And like any data mining model it will be an experimental iterative process to find the best set of inputs for the model.Thats what I mean by constantly changing. |
 |
|
TimSman
Posting Yak Master
127 Posts |
Posted - 2010-12-15 : 16:42:19
|
Is Data a scalar value, or a table? |
 |
|
BP77
Starting Member
7 Posts |
Posted - 2010-12-15 : 17:22:14
|
scaler |
 |
|
TimSman
Posting Yak Master
127 Posts |
Posted - 2010-12-15 : 17:25:53
|
Couldn't you create a temp table populated with the ID and the Data value you want to use? |
 |
|
BP77
Starting Member
7 Posts |
Posted - 2010-12-15 : 17:46:14
|
Yeah I guess I could. That would simplify the scaling part of my task. So could have select (col1 - min(col1) over (partition by custid)) / (max(col1) over (partition by custid) -min(col1) over (partition by custid)) datamineInput1 ,(col2 - min(col2) over (partition by custid)) / (max(col2) over (partition by custid)-min(col2)over (partition by custid)) datamineInput2,...(colN - min(colN) over (partition by custid)) / (max(colN) over (partition by custid)-min(colN)over (partition by custid)) datamineInput2,from tempTableThats pretty close to an acceptable solution. Any ideas for making it a little more pleasant to work with?Thanks Tim |
 |
|
|