Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 Need suggestion on percentile-like calculation

Author  Topic 

Huligan
Yak Posting Veteran

66 Posts

Posted - 2005-07-28 : 22:43:50
Hey. Here's what I need to do and I would appreciate any suggestions. My users are selecting a date range. All the records that fall within that date range are pulled (but not displayed) and then those records are ordered by a float field. Then the top 25%, middle 50%, or bottom 25% of those records are isolated and an average is calculated on the float field for that set of isolated records.

Ideally, I would like to arrive at the final average I need with a stored procedure that would be called by an ASP page. I am no SQL expert and a bit rusty. I can easily call a full recordset of every record in the date range and do the calculations I described in VBScript, but I would like to utilize stored procedures for performance reasons. A shove in the right direction would be appreciated. Thanks for any help.

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2005-07-28 : 23:03:38
I believe the TOP keyword in sql can use percentile, so then you could just combine the results... got some sample data to play with??

Corey

Co-worker on The Wizard of Oz "...those three midgets that came out and danced, the freaked me out when I was little. But they are ok now."
Go to Top of Page

Huligan
Yak Posting Veteran

66 Posts

Posted - 2005-07-29 : 17:42:12
Hello Corey. Here's some sample data ordered by Date.

ID - Date - Miles
-----------------
1 - 1/2/2005 - 10
7 - 1/9/2005 - 17
2 - 2/4/2005 - 3
8 - 2/4/2005 - 11
11 - 2/7/2005 - 8
3 - 2/12/2005 - 12
10 - 3/1/2005 - 20
14 - 3/3/2005 - 40
5 - 4/1/2005 - 7
6 - 4/2/2005 - 25
9 - 4/3/2005 - 9
12 - 4/8/2005 - 18
4 - 5/5/2005 - 30
13 - 5/9/2005 - 21
15 - 6/1/2005 - 2
16 - 9/2/2005 - 20
17 - 9/3/2005 - 21

Here's an example of a request. I would like to pull all of the records that fall between 1/1/2005 and 3/15/2005. This pulls 8 records. Of those 8 records, I would like to calculate an average on the Miles fields for records that place in the top 25% in the Miles fields. Here are those 8 records now ordered by the Miles field.

ID - Date - Miles
-----------------
14 - 3/3/2005 - 40
10 - 3/1/2005 - 20
7 - 1/9/2005 - 17
3 - 2/12/2005 - 12
8 - 2/4/2005 - 11
1 - 1/2/2005 - 10
2 - 2/4/2005 - 3
11 - 2/7/2005 - 8

So the top 25% of 8 records would pull 2 records.

ID - Date - Miles
-----------------
14 - 3/3/2005 - 40
10 - 3/1/2005 - 20

The average for the top 25% of a date range search of 1/1/2005 and 3/15/2005 would be 30 miles. I'm going to be displaying the top 25%, middle 50%, and bottom 25% all on the same page so these queries can be grouped together if an easier solution is available when all 3 are needed. Thanks for the help.

Les
Go to Top of Page

nosepicker
Constraint Violating Yak Guru

366 Posts

Posted - 2005-07-29 : 22:09:33
To borrow/plagiarize/expand on Corey's idea:

--Top 25%
select top 25 percent * from YourTable
where convert(datetime, date) between convert(datetime, '1/1/2005') and convert(datetime, '3/15/2005')
order by miles desc

--Bottom 25%
select top 25 percent * from YourTable
where convert(datetime, date) between convert(datetime, '1/1/2005') and convert(datetime, '3/15/2005')
order by miles

--Everything else (middle 50%)
select * from YourTable
where convert(datetime, date) between convert(datetime, '1/1/2005') and convert(datetime, '3/15/2005')
and id not in
(select top 25 percent id from YourTable
where convert(datetime, date) between convert(datetime, '1/1/2005') and convert(datetime, '3/15/2005')
order by miles desc)
and id not in
(select top 25 percent id from YourTable
where convert(datetime, date) between convert(datetime, '1/1/2005') and convert(datetime, '3/15/2005')
order by miles)

There are probably better ways to get the middle 50%. I chose that way to ensure that no overlaps occur. If you change your date range so that for example 9 records are initially pulled, you will see that the top or bottom 25% actually selects 3 records each, leaving 3 records for the middle 50%, which may seem inaccurate. Also, the "not in" construct could probably be improved upon. It also assumes that the column "id" is a primary key.

And in the interest of being mindful of copyright violations:


I, nosepicker, wish to extend full and unequivocable credit to Corey (aka "Seventhnight") for the usage of his "TOP xx PERCENT" idea in the above post. Nosepicker is fully aware that the "TOP xx PERCENT" idea is the intellectual property of Seventhnight and therefore will accept no monetary benefit, public acclaim, or general pats on the back for the creation of the above post. Nosepicker also wishes to make it clear that he had nothing to do with that Ken Henderson plagiarism fiasco. So there.
Go to Top of Page

Huligan
Yak Posting Veteran

66 Posts

Posted - 2005-07-31 : 15:24:51
Hey nosepicker. I think your queries only go half way, but I may be misinterpreting them. Your queries only return the records that will be used to calculate the averages for each group (top 25%, middle 50%, bottom 25%). I need for one or more stored procedures to produce averages for each of the three groups. I don't want to pull all the records to the website if I can help it. Can you provide some insight as to how to calculate the averages for the records you have pulled in a stored procedure? Thanks for your help.

Les
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2005-07-31 : 15:56:38
Hi Huligan,

Post the sample data as insert statements, see:
(http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx)

That way, we don't have to do the work of generating test data ourselves, (sometimes we are lazy).

regards,

rockmoose
Go to Top of Page

Huligan
Yak Posting Veteran

66 Posts

Posted - 2005-07-31 : 16:16:50
Hey rockmoose. OK, here's my test table.

CREATE TABLE tblTrip(TripID int, TripDate datetime, TripMiles float)

And here's the test data for my table.

INSERT INTO tblTrip(TripID, TripDate, TripMiles)
SELECT 1, '1/2/2005', 10 UNION ALL
SELECT 7, '1/9/2005', 17 UNION ALL
SELECT 2, '2/4/2005', 3 UNION ALL
SELECT 8, '2/4/2005', 11 UNION ALL
SELECT 11, ' 2/7/2005', 8 UNION ALL
SELECT 3, '2/12/2005', 12 UNION ALL
SELECT 10, '3/1/2005', 20 UNION ALL
SELECT 14, '3/3/2005', 40 UNION ALL
SELECT 5, '4/1/2005', 7 UNION ALL
SELECT 6, '4/2/2005', 25 UNION ALL
SELECT 9, '4/3/2005', 9 UNION ALL
SELECT 12, '4/8/2005', 18 UNION ALL
SELECT 4, '5/5/2005', 30 UNION ALL
SELECT 13, '5/9/2005', 21 UNION ALL
SELECT 15, '6/1/2005', 2 UNION ALL
SELECT 16, '9/2/2005', 20 UNION ALL
SELECT 17, '9/3/2005', 21

With this data, I would like to return the averages for the top 25%, middle 50%, and bottom 25% within a date range. This data will be displayed from ASP so I would prefer to have SQL and one or more stored procedures to do the work versus call all of the records to the ASP page and do the calculations with VBScript. I would like to be able to push the date range to the stored procedure and have the 3 averages returned. Thanks for your help.

Les
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2005-07-31 : 16:55:32
To further piggyback on Corey's and nosepicker's work;

DECLARE @from DATETIME, @to DATETIME

SELECT @from = '20050101', @to = '20050315'

SELECT
CASE WHEN top25.TripId IS NOT NULL THEN 'top25'
WHEN bottom25.TripId IS NOT NULL THEN 'bottom25'
ELSE 'middle50' END AS Percentile
,AVG( tblTrip.TripMiles ) AS avgMiles
FROM
tblTrip
LEFT JOIN( SELECT TOP 25 PERCENT TripId
FROM tblTrip
WHERE TripDate BETWEEN @from AND @to
ORDER BY TripMiles DESC
) AS top25
ON tblTrip.TripId = top25.TripId
LEFT JOIN( SELECT TOP 25 PERCENT TripId
FROM tblTrip
WHERE TripDate BETWEEN @from AND @to
ORDER BY TripMiles ASC
) AS bottom25
ON tblTrip.TripId = bottom25.TripId
WHERE
TripDate BETWEEN @from AND @to
GROUP BY
CASE WHEN top25.TripId IS NOT NULL THEN 'top25'
WHEN bottom25.TripId IS NOT NULL THEN 'bottom25'
ELSE 'middle50' END


Edit: Be sure to look up the WITH TIES option for the TOP keyword.

rockmoose
Go to Top of Page

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2005-07-31 : 18:58:12
quote:
Originally posted by nosepicker

To borrow/plagiarize/expand on Corey's idea:
...

I, nosepicker, wish to extend full and unequivocable credit to Corey (aka "Seventhnight") for the usage of his "TOP xx PERCENT" idea in the above post. Nosepicker is fully aware that the "TOP xx PERCENT" idea is the intellectual property of Seventhnight and therefore will accept no monetary benefit, public acclaim, or general pats on the back for the creation of the above post. Nosepicker also wishes to make it clear that he had nothing to do with that Ken Henderson plagiarism fiasco. So there.




LOL!!

I thoroughly enjoyed getting credit when I did almost no real work.



and to slightly simplify rocks suggestion:


DECLARE @from DATETIME, @to DATETIME

SELECT @from = '20050101', @to = '20050315'

SELECT
Coalesce(top25.Segment,Bottom25.Segment,'middle50') AS Percentile
,AVG( tblTrip.TripMiles ) AS avgMiles
FROM
tblTrip
LEFT JOIN( SELECT TOP 25 PERCENT Segment='Top25', TripId
FROM tblTrip
WHERE TripDate BETWEEN @from AND @to
ORDER BY TripMiles DESC
) AS top25
ON tblTrip.TripId = top25.TripId
LEFT JOIN( SELECT TOP 25 PERCENT Segment='Bottom25', TripId
FROM tblTrip
WHERE TripDate BETWEEN @from AND @to
ORDER BY TripMiles ASC
) AS bottom25
ON tblTrip.TripId = bottom25.TripId
WHERE
TripDate BETWEEN @from AND @to
GROUP BY
Coalesce(top25.Segment,Bottom25.Segment,'middle50')


Corey

Co-worker on The Wizard of Oz "...those three midgets that came out and danced, the freaked me out when I was little. But they are ok now."
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2005-07-31 : 20:04:33
Nice.
Darn, I started with coalesce, but ended up with the case.
Thanks for adjusting Corey!

rockmoose
Go to Top of Page

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2005-08-01 : 07:23:10
I just did it so that I could say I actually contributed to the solution

It was already a nice go

Corey

Co-worker on The Wizard of Oz "...those three midgets that came out and danced, the freaked me out when I was little. But they are ok now."
Go to Top of Page

Huligan
Yak Posting Veteran

66 Posts

Posted - 2005-08-01 : 22:14:40
Thanks. Your stored procedure works well. I appreciate it.

Les
Go to Top of Page

Johnyalm
Starting Member

49 Posts

Posted - 2005-12-03 : 05:39:37
Hey,

how would that PROC look like if I needed to get the following percentiles?

Percentile: 10
Percentile: 25
Percentile: 40
Percentile: 50
Percentile: 40
Percentile: 75
Percentile: 90
Percentile: 100

regards,

Johny Alm


www.mirrorgate.com
Go to Top of Page
   

- Advertisement -