user defined aggregate function? - SQL Server Forums

Home | Weblogs | Forums

Site Sponsored By: SQLDSC - SQL Server Desired State Configuration

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

Transact-SQL (2005)

user defined aggregate function?

Author

Topic

BP77
Starting Member

7 Posts

Posted - 2010-12-12 : 14:53:30

Hi,

I dont know if this possible but here's what im trying to do.

I want to scale values in my query for input to a nueral network using a single function call.

The formula for scaling is

(x - min(x)) / (max(x)-min(x))

so if we have data
50
30
10

our result would be
50 = (50-10) / (50-10) = 1
30 = (30-10) / (50-10) = 20/40 = .5
10 = (10-10) / (50-10) = 0/40 = 0

Simple enough so for, but I want to 'partition' this result within ,lets say customers, like this...

table cust
custid, data
1 , 50
1 , 30
1 , 10
2 , 4000
2 , 600

So I want to write a function that I could use like..

select custid, MyScaleFunc(data) over (partition by custid)
from
cust

its important to have a neat function call like this because 'data' is often a complicated subquery or calculation.

Iver tried to write a user defined aggregate function but I couldnt get it right. I dont think its possible with a UDAF because Im tring to return a different value for each row.

Any help or suggestions are appreciated,
Thanks,
BP

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-12-12 : 16:35:53

I was first tempted to answer you should write a SQLCLR, but you can do this with a windowed function too.


DECLARE	@Sample TABLE
	(
		CustID INT,
		Data INT
	)

INSERT	@Sample
SELECT	1,   50 UNION ALL
SELECT	1,   30 UNION ALL
SELECT	1,   10 UNION ALL
SELECT	2, 4000 UNION ALL
SELECT	2,  600

-- Solution by Peso
SELECT	CustID,
	Data,
	1E * (Data - mn) / (mx - mn) AS Yak
FROM	(
		SELECT	CustID,
			Data,
			MAX(Data) OVER (PARTITION BY CustID) AS mx,
			MIN(Data) OVER (PARTITION BY CustID) AS mn
		FROM	@Sample
	) AS d

N 56°04'39.26"
E 12°55'05.63"

BP77
Starting Member

7 Posts

Posted - 2010-12-12 : 17:52:01

Thanks for that Peso,I appreciate the reply.

However it doesnt meet my main requirement of being a simple function call. In your example (and all my attempts) we still need to repeat 'data' 3 times. This is a problem because 'data' is often a subquery or complicated calculation. So to make my query maintainable I need to write it in the form...

select myScalingFunction(<big complication subquery>) over (partition by customer) as data1,
myScalingFunction(<big complication calculation>) over (partition by customer) data2 ,
.
.
.
myScalingFunction() data50
from ...

I will be experimenting over many iterations to find the best inputs for my data mining task so these subqueries and calculations will be tweaked over and over. Rapid turnaround of each experiment is vital.

You mention sqlclr, I tried to create a UDA but failed because I could not get it to return a different value for each row within the 'partition by'.

If you could come up with a clr solution or any pointers Id be very grateful.

Thanks again for your effort,
BP

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-12-12 : 23:51:26

I think you are confused about how and when "Data" is calculated.

SELECT	CustID,
	Data,
	1E * (Data - mn) / (mx - mn) AS Yak
FROM	(
		SELECT	CustID,
			Data,
			MAX(Data) OVER (PARTITION BY CustID) AS mx,
			MIN(Data) OVER (PARTITION BY CustID) AS mn
		FROM	(
				SELECT	dbo.MyFunction(Col1) AS Data,
					CustID
				FROM	@Sample
			) AS q
	) AS d

N 56°04'39.26"
E 12°55'05.63"

BP77
Starting Member

7 Posts

Posted - 2010-12-13 : 16:13:48

yes, im very confused, where does "myFunction" come from?

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-12-14 : 00:57:38

It is whatever complex calculation you have...

N 56°04'39.26"
E 12°55'05.63"

BP77
Starting Member

7 Posts

Posted - 2010-12-14 : 17:11:00

I see. But if I want to have another column scaled wont I have to do this...

SELECT CustID,
Data,
1E * (Data - mn) / (mx - mn) AS Yak
1E * (Data2 - mn2) / (mx2 - mn2) AS Yak2
FROM (
SELECT CustID,
Data,
MAX(Data) OVER (PARTITION BY CustID) AS mx,
MIN(Data) OVER (PARTITION BY CustID) AS mn,
Data2,
MAX(Data2) OVER (PARTITION BY CustID) AS mx2,
MIN(Data2) OVER (PARTITION BY CustID) AS mn2
FROM (
SELECT dbo.MyFunction(Col1) AS Data,
dbo.MyOtherFunction(Col2) AS Data2,
CustID
FROM @Sample
) AS q
) AS d

If I have 50+ columns in my constantly changing query this looks difficult to maintain.

Sorry Im taking so long to reply, btw.

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-12-15 : 01:53:26

I am sorry to tell you, but if you have 50+ columns, there is something wrong with the database design.
Your design should be as normalized as possible.

And what do you mean with "constantly changing"?

N 56°04'39.26"
E 12°55'05.63"

BP77
Starting Member

7 Posts

Posted - 2010-12-15 : 16:19:19

My design is normalized. As I said earlier this query will be the input for a data mining task, which, as im sure you know, requires denormalized data. And like any data mining model it will be an experimental iterative process to find the best set of inputs for the model.Thats what I mean by constantly changing.

TimSman
Posting Yak Master

127 Posts

Posted - 2010-12-15 : 16:42:19

Is Data a scalar value, or a table?

BP77
Starting Member

7 Posts

Posted - 2010-12-15 : 17:22:14

scaler

TimSman
Posting Yak Master

127 Posts

Posted - 2010-12-15 : 17:25:53

Couldn't you create a temp table populated with the ID and the Data value you want to use?

BP77
Starting Member

7 Posts

Posted - 2010-12-15 : 17:46:14

Yeah I guess I could. That would simplify the scaling part of my task.

So could have

select
(col1 - min(col1) over (partition by custid)) / (max(col1) over (partition by custid) -min(col1) over (partition by custid)) datamineInput1 ,
(col2 - min(col2) over (partition by custid)) / (max(col2) over (partition by custid)-min(col2)over (partition by custid)) datamineInput2,
.
.
.
(colN - min(colN) over (partition by custid)) / (max(colN) over (partition by custid)-min(colN)over (partition by custid)) datamineInput2,
from tempTable

Thats pretty close to an acceptable solution. Any ideas for making it a little more pleasant to work with?

Thanks Tim

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources