Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Transact-SQL (2000)
 Calculate median of difference in days between rec

Author  Topic 

dbenoit64
Starting Member

36 Posts

Posted - 2006-01-24 : 08:23:12
I have a table of sample data

Samples(sample_no, sample_date..)

I have no idea how to do the following in sql server or if its even possible:

1. Calculate the difference in days between all samples.
2. Select the median result

Any trick to get this done would be really helpful

thanks,

DB

Srinika
Master Smack Fu Yak Hacker

1378 Posts

Posted - 2006-01-24 : 08:25:01
Some sample data please
Go to Top of Page

madhivanan
Premature Yak Congratulator

22864 Posts

Posted - 2006-01-24 : 08:29:07
Also post expected result

Madhivanan

Failing to plan is Planning to fail
Go to Top of Page

dbenoit64
Starting Member

36 Posts

Posted - 2006-01-24 : 10:10:41
Okay here is some sample data:

Sample_date sample_no
4/29/1962 0:00 1962706086
5/1/1962 0:00 1962704968
5/3/1962 0:00 1962704891
5/4/1962 0:00 1962722248
5/15/1962 0:00 1962706641
5/18/1962 0:00 1962722249
5/18/1962 0:00 1962704969
5/22/1962 0:00 1962706344
6/5/1962 0:00 1962722690
6/7/1962 0:00 1962722250
6/8/1962 0:00 1962708068
6/9/1962 0:00 1962722842
6/12/1962 0:00 1962708069
6/12/1962 0:00 1962722928
6/13/1962 0:00 1962722691
6/13/1962 0:00 1962706345
6/13/1962 0:00 1962706427
6/16/1962 0:00 1962706561
6/16/1962 0:00 1962704970
6/16/1962 0:00 1962704971
6/16/1962 0:00 1962705794
6/17/1962 0:00 1962708070
6/17/1962 0:00 1962708071
6/17/1962 0:00 1962722929
6/17/1962 0:00 1962722930
6/18/1962 0:00 1962722692
6/18/1962 0:00 1962722693
6/18/1962 0:00 1962722694
6/18/1962 0:00 1962722695
6/18/1962 0:00 1962722696
6/18/1962 0:00 1962722697
6/18/1962 0:00 1962722698
6/18/1962 0:00 1962722843
6/23/1962 0:00 1962706562
6/23/1962 0:00 1962722251
6/23/1962 0:00 1962722252
6/29/1962 0:00 1962704892
6/29/1962 0:00 1962704893
7/1/1962 0:00 1962722931
7/2/1962 0:00 1962708072
7/2/1962 0:00 1962722844
7/4/1962 0:00 1962708073
7/6/1962 0:00 1962722932
7/7/1962 0:00 1962705881
7/7/1962 0:00 1962722253
7/11/1962 0:00 1962722933
7/15/1962 0:00 1962722254
7/18/1962 0:00 1962708074
7/18/1962 0:00 1962708075
7/18/1962 0:00 1962722934
7/22/1962 0:00 1962722255
7/22/1962 0:00 1962704894
7/25/1962 0:00 1962722935
7/27/1962 0:00 1962708076
7/27/1962 0:00 1962708077
7/28/1962 0:00 1962704895
7/28/1962 0:00 1962722256
8/1/1962 0:00 1962708078
8/1/1962 0:00 1962708079
8/1/1962 0:00 1962722699
8/3/1962 0:00 1962722936
8/3/1962 0:00 1962722937

When you look at the first 2 rows:
4/29/1962 0:00 1962706086
5/1/1962 0:00 1962704968

you see there is a difference of 3 Days.

When you look at row 2 and 3
5/1/1962 0:00 1962704968
5/3/1962 0:00 1962704891

you see there is a difference of 2 Days.

This difference can be from 1 - infinate days.

calculate all the differences between all the consecutive days in this pattern then get the median value (the number in the MIDDLE of the set of given numbers)
ie if you have 9 values
1. 12
2. 25
3. 44
4. 55
5. 89
6. 122
7. 155
8. 325
9. 8000

the median is 89. I dont want to have to calculate median manually like this, its just way to much processing although im not sure if its built into sql server.

so for each station there should be a single value comming out and:
if num between 1 - 7 then "Monthly"
if num between 7 - 14 then "Bi-Weekly"
if num between 14 - 30 then "Monthly"
if num between 30 - 360 then "Yearly"

... or something similar

To make matters worse this is only a small fraction of a larger "Station Level" query that will be including this query somehow. Groups of samples belong to different "Stations" which I have to seperate by. For example, all the data you see here is for 1 station. There are about 300 stations being returned in the higher level view that this query will belong to. For kicks I will include it:

you can see that Sampling_Frequency is there as "UNKNOWN" cause i dont know how to do it. also if anyone wants to suggest how i optimize this query feel free to add suggestions:


SELECT DISTINCT stat.station_number AS Site_ID, stat.station_number AS Site_Number, stat.station_name AS Site_Name, stat.station_description AS Site_Description,
stat.creation_date AS Site_Activation_Date, dbo.station_coordinates.lat_dec_deg AS Latitude,
dbo.station_coordinates.lat_dec_deg AS North_bounding_Latitude, dbo.station_coordinates.lat_dec_deg AS South_bounding_Latitude,
dbo.station_coordinates.lon_dec_deg AS Longitude, dbo.station_coordinates.lon_dec_deg AS West_bounding_Longitude,
dbo.station_coordinates.lon_dec_deg AS East_bounding_Longitude, 'N/A' AS Site_Photograph_link, 'N/A' AS Local_site_influences_link,
'http://map.ns.ec.gc.ca/NA' AS Collection_level_metadata_link, 'http://map.ns.ec.gc.ca/envirodat/' AS Other_info_link,
'Environment Canada, Atlantic Region, Environmental Conservation Branch' AS Organization_name, '5068512398' AS Contact_Phone_number,
'5068516608' AS Contact_Fax_number, 'PO BOX 23005' AS Contact_street_address, 'Moncton' AS Contact_city, 'NB' AS Contact_province,
'E1A 6S8' AS Contact_postal_code, 'cathy.cormier@ec.gc.ca' AS Contact_e_mail, 'http://map.ec.gc.ca/envirodat' AS Organization_web_site,
'0800-1600' AS Contact_hours_of_service, 'N/A' AS Contact_other_instructions, dbo.projects.project_no, dbo.projects.project_name,

(SELECT DISTINCT wshed
FROM envguest.stations_watersheds
WHERE station = stat.station_number) AS watershed,


case active_indicator WHEN 'Y' Then NULL else (select max(sample_date) from samples where station_no = stat.Station_number) END as Site_Final_Sampling_Date,

CASE active_indicator WHEN 'Y' THEN 'Active' ELSE 'Inactive' END AS Current_Status,

'UNKNOWN' as Jurisdiction, 'UNKNOWN' as Sampling_Frequency


FROM dbo.station_coordinates INNER JOIN
dbo.station stat ON dbo.station_coordinates.station_number = stat.station_number INNER JOIN
dbo.samples ON stat.station_number = dbo.samples.station_no INNER JOIN
dbo.projects ON dbo.samples.project_no = dbo.projects.project_no





Go to Top of Page

Norwich
Posting Yak Master

158 Posts

Posted - 2006-01-24 : 10:45:29
How do you want the mean calculated?
ie. Between all the units? or how?

If you want your computer to be faster then throw it out of the window.
Go to Top of Page

Srinika
Master Smack Fu Yak Hacker

1378 Posts

Posted - 2006-01-24 : 10:45:35
I think u might have to use Analysis Services in MS SQL Server (check it in BOL)
or more easily by something as SAS
or with some Excel skills.
It seems to be a complex task if u were to use queries / T-SQL to do this, unless, one of these Gurus has a pre written function.
Go to Top of Page

Norwich
Posting Yak Master

158 Posts

Posted - 2006-01-24 : 10:51:42
This does not calculate the MEAN just the Diff between the Sample_Dates. You might want to consider using Statistical\Analytical tools (SAS\Analysis Services)to calculate the things you want.
Try this for the date diff:
Alter Table xxxTable
Add xID int Identity

Alter Table xxxTable
Add Day_Diff Int
Go

Declare @Date_1 Datetime,
@Date_2 Datetime,
@xID int

Declare Date_Curs Cursor read_only for

Select xID, Sample_date
from xxxTable a
order by xiD

Open Date_Curs

Fetch Next from Date_Curs into
@xID,
@Date_1

While @@Fetch_Status=0
Begin

Select @date_2 = Sample_date
from xxxTable a
Where xId = @xiD + 1

Update xxxTable
Set Date_Diff = Datediff(Day, @Date_1, @Date_2)

Fetch Next from Date_Curs into
@xID,
@Date_1

End

Close Date_Curs
Deallocate Date_Curs


If you want your computer to be faster then throw it out of the window.
Go to Top of Page
   

- Advertisement -